New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restic 0.14.0 (and current master branch) hangs indefinitely after approx. 3.8 million files on Solaris 11 x86_64 #4003
Comments
Thanks for the detailed problem report! I think I've found the issue:
Can you test whether reverting commit |
The last line printed by the archiver in the debug log is probably the file which caused restic to become stuck: To expand a bit on my previous comment, in #3628 (included in restic 0.14.0) we've added support to also backup xattrs on Solaris. Based on the filename I wonder whether trying to backup a file of type |
Hi @MichaelEischer, thank you for looking at this. Your theory about the On the Solaris box, I can:
With the official 0.14.0 release, this hangs on the single file. After reverting commit dd30083 as you suggested, the backup completes successfully. I've kicked off my large backup with the reverted binary; I suspect it will complete successfully now in a couple of hours. Thanks again! |
...success! Backup completed successfully in 20 minutes, which is back to how quick it was pre-0.14.0. I'm guessing 0.14.0 was taking a couple hours because it was backing up the new xattr data for files for the first time and would return to being fast on future backups? I don't know... maybe the xattr calls themselves are just really slow? Regardless, the immediate problem of not getting through the backup is solved! |
pkg/xattr opens the file to get an fd, then uses openat on the fd to get to the attribute namespace. I think the first open should use O_NONBLOCK and/or O_NDELAY so as not to wait for the write end to be opened, but I'm not sure what that will do to other processes that use the FIFO. |
On Solaris, files need to be opened to obtain or change their attributes. But opening a FIFO file causes the calling thread to block until the write end it opened. Files are now opened using O_NONBLOCK to prevent this from happening. Should fix restic/restic#4003. Also fix comment on stringsFromByteSlice.
On Solaris, files need to be opened to obtain or change their attributes. But opening a FIFO file causes the calling thread to block until the write end is opened. Files are now opened using O_NONBLOCK to prevent this from happening. Should fix restic/restic#4003. Also fix comment on stringsFromByteSlice.
On Solaris, files need to be opened to obtain or change their attributes. But opening a FIFO file causes the calling thread to block until the write end is opened. Files are now opened using O_NONBLOCK to prevent this from happening. Should fix restic/restic#4003. Also fix comment on stringsFromByteSlice.
On Solaris, files need to be opened to obtain or change their attributes. But opening a FIFO file causes the calling thread to block until the write end is opened. Files are now opened using O_NONBLOCK to prevent this from happening. Should fix restic/restic#4003. Also fix comment on stringsFromByteSlice.
* Fix FIFO handling on Solaris On Solaris, files need to be opened to obtain or change their attributes. But opening a FIFO file causes the calling thread to block until the write end is opened. Files are now opened using O_NONBLOCK to prevent this from happening. Should fix restic/restic#4003. Also fix comment on stringsFromByteSlice. * (Solaris) Open O_NONBLOCK with unix.Open, not os.Open Passing this flag to os.Open works, but this relies on undocumented implementation details of package os. * (Solaris) Fix removexattr/hanging test
This version doesn't have a release tag yet, but it's 0.4.9 + one patch. Fixes restic#4003.
This version doesn't have a release tag yet, but it's 0.4.9 + one patch. Fixes restic#4003.
This version doesn't have a release tag yet, but it's 0.4.9 + one patch. Fixes restic#4003.
This version doesn't have a release tag yet, but it's 0.4.9 + one patch. Fixes restic#4003.
Output of
restic version
restic 0.14.0-dev (compiled manually) compiled with go1.19.2 on solaris/amd64
This issue originally occurred for me in the 0.14.0 release downloaded from the Restic web site.
My current build is from the git master branch as of earlier today, commit 92df039, built with
go build -tags debug ./cmd/restic
, usinggo version go1.19.2 solaris/amd64
How did you run restic exactly?
What backend/server/service did you use to store the repository?
sftp to a FreeBSD 13.1-RELEASE-p3 system. Repository originally created with restic 0.12.1 and has been working fine for about a year in this setup. I upgraded to 0.14.0 and upgraded the repository to v2 and the first attempt to backup hit this problem.
I have also tried using the REST/http server on the FreeBSD target server, but the behavior is the same. So appears to be a problem in reading the files on the Solaris box, not the backend implementation.
Expected behavior
I expect the backup to complete successfully as it did on the same system with restic 0.12.1.
Actual behavior
After approximately 3.8 million files, the restic process appears to hang indefinitely. I have let it run over 24 hours, and the timer keeps increasing but no further progress is made.
The output from restic was:
The timer continues going up, but the file/size/etc counts freeze where they are.
Steps to reproduce the behavior
Unfortunately I don't know anything other than "use restic with a lot of files that you're backing up."
I have tried deleting the restic cache directory, have tried both the sftp and rest backend, and I've built with several different versions of Go (both from Solaris itself and cross-compiling from Linux).. Restic 0.12.1 was working fine for me, but I can't revert to that since I already upgraded the repository version with 0.14.0.
Do you have any idea what may have caused this?
I do not. However, I can provide some additional detail...
It doesn't appear to be related to the specific files it's trying to back up. In an earlier attempt to debug, I excluded the directory that had files the debug log was last referencing. That just causes it to freeze somewhere else with a slightly different file number count.
After restic stops making progress, here's the dump after I send a SIGABRT:
https://paste.sr.ht/blob/67bcbf3b097934f7222dcef8c036461539ba23dc
And here are the last several hundred lines from the debug log:
https://paste.sr.ht/blob/b8c24053c1c929d1f646049af226552aa089c251
Do you have an idea how to solve the issue?
No. Ideally I'd try creating a new repository from scratch and see if that works, but I don't have storage space locally to try that and the system I'm backing up to currently is at a remote location and a new full initial backup would be too big to practically do over the internet.
Did restic help you today? Did it make you happy in any way?
In general restic makes me very happy :-). This last week as I've been troubleshooting this and trying to figure out how to get my backups working again (I'm basically stuck since I can't downgrade to 0.12.1 that was working since the repo version has already been upgraded!) has, I must admit, been frustrating. Once I can get past this I'm sure I'll be loving restic again, though!
The text was updated successfully, but these errors were encountered: