Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup exit 1 when ZFSbackup #509

Closed
pando85 opened this issue Mar 9, 2024 · 2 comments
Closed

Backup exit 1 when ZFSbackup #509

pando85 opened this issue Mar 9, 2024 · 2 comments

Comments

@pando85
Copy link
Contributor

pando85 commented Mar 9, 2024

What steps did you take and what happened:

  • Have a PVC with a larger amount of data (>100GB).
  • Set a backup using velero with the openebs plugin.

What did you expect to happen:

  • Backup is done (as with PVCs with small amounts of data).

The output of the following commands will help us better understand what's going on:

zfs_util.go:814] zfs: could not backup the volume XXXX cmd [-c zfs send XXXX@YYYY | nc -w 3 10.42.12.159 9011] error: 
backup.go:93] backup XXXX.YYYY failed XXXX@YYYY err exit status 1

Anything else you would like to add:
I was checking everything and this happens repeatedly. It fails at:

  • 64m49s
  • 68m19s
  • 68m43s
  • ...

Environment:

  • ZFS-LocalPV version: 2.4.0
  • Kubernetes version (use kubectl version):
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.6+k3s2
  • Kubernetes installer & version: k3s
  • Cloud provider or hardware configuration: hardware
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
@pando85
Copy link
Contributor Author

pando85 commented Mar 9, 2024

Digging a bit deeper

Strace in zfs host. I executed this strace -p ${PID} -t. Where processes in PID var are:

  • nc -w ${IP} ${PORT} process:
08:41:36 read(0, "8\250aE<\374^\235T\244\201\314\367x\230%e\23\377H$\276K\373\255h\205\33\310!*I"..., 8192) = 8192ec=0}) = 1 (in [0], left {tv_sec=2, tv_usec=999996
08:41:36 write(3, "8\250aE<\374^\235T\244\201\314\367x\230%e\23\377H$\276K\373\255h\205\33\310!*I"..., 8192) = -1 ECONNRESET (Connection reset by peer)\312\254\24\304f\261V`\216qDq\21ke\267\306Q"..., 8
08:41:36 write(3, "8\250aE<\374^\235T\244\201\314\367x\230%e\23\377H$\276K\373\255h\205\33\310!*I"..., 8192) = -1 EPIPE (Broken pipe)
...
08:41:38 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=23401, si_uid=0} ---
08:41:38 write(3, "8\250aE<\374^\235T\244\201\314\367x\230%e\23\377H$\276K\373\255h\205\33\310!*I"..., 8192) = -1 EPIPE (Broken pipe)
08:41:38 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=23401, si_uid=0} ---
08:41:38 close(3)                       = 0
08:41:38 exit_group(1)                  = ?
08:41:38 +++ exited with 1 +++
  • /bin/sh /sbin/zfs send XXXX@YYYY (in /sbin/zfs send XXXX@YYYY I receive a exit 1 error instantly when executed):
strace: Process ${PID} attached
06:57:42 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0, NULL) = 23402
08:41:38 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23402, si_uid=0, si_status=SIGPIPE, si_utime=0, si_stime=12737} ---
08:41:38 rt_sigreturn({mask=[]})        = 23402
08:41:38 read(10, "", 8192)             = 0
08:41:38 exit_group(141)                = ?
08:41:38 +++ exited with 141 +++

@pando85
Copy link
Contributor Author

pando85 commented Mar 12, 2024

From the logs:

time="2024-03-12T11:23:48Z" level=warning msg="Failed to close file interface : blob (code=Unknown): MultipartUpload: upload multipart failed\n\tupload id: ZjVkOWZjNTQtYzcwMi00OTJiLWIzYzctZGQ0ZDUwNTk2NzRlLjM1MDU3NmFiLWQyM2ItNGY1MC1iNjU0LTljYzA3ZjhmMWZhMg\ncaused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit" backup=velero/ZZZZ-YYYY cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/conn.go:322" pluginName=velero-blockstore-openebs

Default is set to 5Mi in AWS: https://github.com/openebs/velero-plugin/blob/cea57783e3ed887d2b7b0e7bafc436ff26bd9a7b/pkg/clouduploader/conn.go#L110
Default MaxUploadParts: 10000
Max default size: 5Mi * 10000 = 50Gi
Another alternative is to change to 0 and use calculated size:
https://github.com/openebs/velero-plugin/blob/cea57783e3ed887d2b7b0e7bafc436ff26bd9a7b/pkg/clouduploader/operation.go#L54

This was a thing related with velero-plugin but I copy paste here the solution and just close the ticket. Sorry for the noise.

@pando85 pando85 closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant