New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadically getting: read <filename>: input/output error #3694
Comments
Can you try whether setting the environment variable |
Thanks for looking into the problem! I have set |
Did you try setting the environment variable for the Go program you use to read from the fuse mount? |
I had not, but have now. When running my Go program, that is reading the mounted files, with |
By now I have a suspicion, that the problem is about the performance of the Minio backend. I have noticed that the problem almost exclusively occurs in a repository that is ca. 2TB in size and runs on a virtualized server. With a repository that is only ca. 300GB in size and runs on bare metal, the problem almost never occurs. I am not familiar with the code of |
it looks like restic is receiving lots of interrupts, e.g.:
That particular one seems to be the one responsible for causing the canceled context. So the main question is probably where that interrupt originates from. From looking at the fuse library code, that interrupt is sent from the kernel. So maybe the application reading the files is canceling its read requests too frequently? |
Thanks for pointing out the interrupts, I didn't recognise them as a problem source. Unfortunately I'm not familiar with bazil.org/fuse, so I'm unable to interpret them. If the interrupt is sent from the kernel, does that mean it's a UNIX-Signal? If so: is there a way to tell which one it is? Judging by fuse's logging code, the logged ID 0x1a isn't it (altough if it were, it would be I'm not quite sure what you mean by "canceling read requests", but I'm not usually closing unread files or killing Go routines while they are reading. I'm just opening two (gzipped) files in quick succession and creating a new gzip reader for each using |
My guess would be that what we're seeing here is that a program executes a I've noticed something interesting: the error log reports that restic returned an "EIO" == "input/output error" after canceling the request, so that's the source of the errors. I've dug a bit into the fuse library and it looks, like we're returning the wrong error in case of an interrupted syscall. Could you try whether the following patch fixes the problem in your use case? (The patch is far from complete) diff --git a/internal/fuse/file.go b/internal/fuse/file.go
index 571d5a865..d54f58f00 100644
--- a/internal/fuse/file.go
+++ b/internal/fuse/file.go
@@ -105,6 +105,9 @@ func (f *openFile) getBlobAt(ctx context.Context, i int) (blob []byte, err error
blob, err = f.root.repo.LoadBlob(ctx, restic.DataBlob, f.node.Content[i], nil)
if err != nil {
debug.Log("LoadBlob(%v, %v) failed: %v", f.node.Name, f.node.Content[i], err)
+ if errors.Is(err, context.Canceled) {
+ return nil, context.Canceled
+ }
return nil, err
} |
I have A/B tested without (cc8a03b) and with your fix (dummyalias@d555008) a few times and it seems to work consistently with your fix while not working without it. Awesome, thanks a lot! |
I have ran into transient input/output errors recently on a reliable server-grade bare metal with a healthy v2 repo on local filesystem mounted for consumption by another restic process. Prior to this proposed patch each read pass over a TB+ repo would yield hundreds of transient input/output errors, errors in @MichaelEischer - any suggestions on how to eliminate problems with Readdirnames? |
I stand corrected. Quick tests were incremental and did not use
|
Simple retry after a brief pause seems to work around errors in |
As mentioned above, the patch snippet is far from complete. We'll need similar checks throughout the whole fuse implementation. |
I've opened PR #3875 which should also fix the |
Output of
restic version
How did you run restic exactly?
Mounting works without a problem. The issue is, that sometimes (seems random) when reading a large file (25.7MiB in my case; reading with a Go program within a Docker container) from the mounted backup. I have not had this happen with small files (2.5KiB in my case). I have activated debugging during the problem; this is what I got (I hope I chose the right segment from the file; I have changed confidential information to
<secret>
):What backend/server/service did you use to store the repository?
I'm not sure.
Expected behavior
The file should be readable without errors.
Actual behavior
I'm getting
read <filename>: input/output error
.Do you have an idea how to solve the issue?
I can read a small amount of data from the problematic file like this:
head -c1 /path/to/my/file
. This does not give me any error. After doing this, the problem is gone - my Go program now has no problem anymore with reading the whole file.The text was updated successfully, but these errors were encountered: