Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadically getting: read <filename>: input/output error #3694

Closed
dummyalias opened this issue Apr 1, 2022 · 14 comments · Fixed by #3875
Closed

Sporadically getting: read <filename>: input/output error #3694

dummyalias opened this issue Apr 1, 2022 · 14 comments · Fixed by #3875

Comments

@dummyalias
Copy link

Output of restic version

restic 0.13.0 (v0.13.0-39-g305cd1e7) compiled with go1.17.4 on linux/amd64

How did you run restic exactly?

export DEBUG_LOG=<private>
export AWS_ACCESS_KEY_ID=<private>
export AWS_SECRET_ACCESS_KEY=<private>
export RESTIC_PASSWORD=<private>
export RESTIC_REPOSITORY=<private>
restic mount /backups/foo

Mounting works without a problem. The issue is, that sometimes (seems random) when reading a large file (25.7MiB in my case; reading with a Go program within a Docker container) from the mounted backup. I have not had this happen with small files (2.5KiB in my case). I have activated debugging during the problem; this is what I got (I hope I chose the right segment from the file; I have changed confidential information to <secret>):

2022/04/01 12:18:25 debug/round_tripper_debug.go:83	debug.loggingRoundTripper.RoundTrip	308	------------  HTTP REQUEST -----------
GET <secret> HTTP/1.1
Host: <secret>
User-Agent: MinIO (linux; amd64) minio-go/v7.0.23
Authorization: <secret>
X-Amz-Content-Sha256: <secret>
X-Amz-Date: 20220401T121825Z
Accept-Encoding: gzip


2022/04/01 12:18:26 debug/round_tripper_debug.go:96	debug.loggingRoundTripper.RoundTrip	308	------------  HTTP RESPONSE ----------
HTTP/2.0 200 OK
Content-Length: 71383
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: application/octet-stream
Date: Fri, 01 Apr 2022 12:18:26 GMT
Etag: "a110ce563f2673d9ccd883ee6c463e0d-1"
Last-Modified: Fri, 01 Apr 2022 01:00:44 GMT
Server: nginx/1.19.2
Strict-Transport-Security: max-age=15724800; includeSubDomains
Vary: Origin
X-Amz-Request-Id: 16E1C4F7876C4CC6
X-Xss-Protection: 1; mode=block


2022/04/01 12:18:26 cache/file.go:90	cache.(*Cache).Save	308	Save to cache: <data/83b1e11de7>
2022/04/01 12:18:26 s3/s3.go:363	s3.(*Backend).openReader.func1	308	Close()
2022/04/01 12:18:26 cache/file.go:45	cache.(*Cache).load	308	Load from cache: <data/83b1e11de7>
2022/04/01 12:18:26 restic/readerat.go:41	restic.ReadAt	308	ReadAt(<data/83b1e11de7>) ReadFull returned 402 bytes
2022/04/01 12:18:26 fuse/dir.go:37	fuse.newDir	308	new dir for backup (6174c84c)
2022/04/01 12:18:26 fuse/dir.go:112	fuse.(*dir).Attr	308	Attr()
2022/04/01 12:18:26 restic/cmd_mount.go:153	main.runMount.func2	308	fuse: -> [ID=0x8] Lookup 0x4 gen=0 valid=1m0s attr={valid=1m0s ino=15339770122085529968 size=0 mode=drwxrwxrwx}
2022/04/01 12:18:26 restic/cmd_mount.go:153	main.runMount.func2	309	fuse: <- Lookup [ID=0xa Node=0x4 Uid=0 Gid=0 Pid=8] "mysqldump"
2022/04/01 12:18:26 fuse/dir.go:184	fuse.(*dir).Lookup	309	Lookup(mysqldump)
2022/04/01 12:18:26 fuse/dir.go:89	fuse.(*dir).open	309	open dir backup (6174c84c)
2022/04/01 12:18:26 repository/repository.go:709	repository.(*Repository).LoadTree	309	load tree 6174c84c7bd74ca5cf73ea490c0582b2d2b36135ea5687fa405a3261bdc6461d
2022/04/01 12:18:26 repository/repository.go:164	repository.(*Repository).LoadBlob	309	load tree with id 6174c84c7bd74ca5cf73ea490c0582b2d2b36135ea5687fa405a3261bdc6461d (buf len 0, cap 0)
2022/04/01 12:18:26 repository/repository.go:178	repository.(*Repository).LoadBlob	309	blob tree/6174c84c7bd74ca5cf73ea490c0582b2d2b36135ea5687fa405a3261bdc6461d found: <Blob (tree) 6174c84c, offset 67954, length 771>
2022/04/01 12:18:26 restic/readerat.go:30	restic.ReadAt	309	ReadAt(<data/83b1e11de7>) at 67954, len 771
2022/04/01 12:18:26 cache/backend.go:164	cache.(*Backend).Load	309	Load(<data/83b1e11de7>, 771, 67954) from cache
2022/04/01 12:18:26 cache/file.go:45	cache.(*Cache).load	309	Load from cache: <data/83b1e11de7>
2022/04/01 12:18:26 restic/readerat.go:41	restic.ReadAt	309	ReadAt(<data/83b1e11de7>) ReadFull returned 771 bytes
2022/04/01 12:18:26 fuse/dir.go:37	fuse.newDir	309	new dir for mysqldump (2edd7c7f)
2022/04/01 12:18:26 fuse/dir.go:112	fuse.(*dir).Attr	309	Attr()
2022/04/01 12:18:26 restic/cmd_mount.go:153	main.runMount.func2	309	fuse: -> [ID=0xa] Lookup 0x5 gen=0 valid=1m0s attr={valid=1m0s ino=14961760787736121718 size=0 mode=drwxr-xr-x}
2022/04/01 12:18:26 restic/cmd_mount.go:153	main.runMount.func2	310	fuse: <- Lookup [ID=0xc Node=0x5 Uid=0 Gid=0 Pid=8] "myfile"
2022/04/01 12:18:26 fuse/dir.go:184	fuse.(*dir).Lookup	310	Lookup(myfile)
2022/04/01 12:18:26 fuse/dir.go:89	fuse.(*dir).open	310	open dir mysqldump (2edd7c7f)
2022/04/01 12:18:26 repository/repository.go:709	repository.(*Repository).LoadTree	310	load tree 2edd7c7f5870fd3558e26c40772ec2b98d537f8e942c2c3f249f71a0c21fed7e
2022/04/01 12:18:26 repository/repository.go:164	repository.(*Repository).LoadBlob	310	load tree with id 2edd7c7f5870fd3558e26c40772ec2b98d537f8e942c2c3f249f71a0c21fed7e (buf len 0, cap 0)
2022/04/01 12:18:26 repository/repository.go:178	repository.(*Repository).LoadBlob	310	blob tree/2edd7c7f5870fd3558e26c40772ec2b98d537f8e942c2c3f249f71a0c21fed7e found: <Blob (tree) 2edd7c7f, offset 2308, length 3393>
2022/04/01 12:18:26 restic/readerat.go:30	restic.ReadAt	310	ReadAt(<data/263f301f30>) at 2308, len 3393
2022/04/01 12:18:26 cache/backend.go:183	cache.(*Backend).Load	310	auto-store <data/263f301f30> in the cache
2022/04/01 12:18:26 s3/s3.go:323	s3.(*Backend).openReader	310	Load <data/263f301f30>, length 0, offset 0 from data/26/263f301f30482f532c8772cb9cb9adca55be168b6051ce115efc6cbb767c4a4c
2022/04/01 12:18:26 debug/round_tripper_debug.go:83	debug.loggingRoundTripper.RoundTrip	310	------------  HTTP REQUEST -----------
GET <secret> HTTP/1.1
Host: <secret>
User-Agent: MinIO (linux; amd64) minio-go/v7.0.23
Authorization: <secret>
X-Amz-Content-Sha256: <secret>
X-Amz-Date: 20220401T121826Z
Accept-Encoding: gzip


2022/04/01 12:18:26 debug/round_tripper_debug.go:96	debug.loggingRoundTripper.RoundTrip	310	------------  HTTP RESPONSE ----------
HTTP/2.0 200 OK
Content-Length: 4324577
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: application/octet-stream
Date: Fri, 01 Apr 2022 12:18:26 GMT
Etag: "472f3f7b740117e7b1c313a239fe758b-1"
Last-Modified: Fri, 01 Apr 2022 01:00:44 GMT
Server: nginx/1.19.2
Strict-Transport-Security: max-age=15724800; includeSubDomains
Vary: Origin
X-Amz-Request-Id: 16E1C4F78E217890
X-Xss-Protection: 1; mode=block


2022/04/01 12:18:26 cache/file.go:90	cache.(*Cache).Save	310	Save to cache: <data/263f301f30>
2022/04/01 12:18:28 s3/s3.go:363	s3.(*Backend).openReader.func1	310	Close()
2022/04/01 12:18:28 cache/file.go:45	cache.(*Cache).load	310	Load from cache: <data/263f301f30>
2022/04/01 12:18:28 restic/readerat.go:41	restic.ReadAt	310	ReadAt(<data/263f301f30>) ReadFull returned 3393 bytes
2022/04/01 12:18:28 fuse/file.go:40	fuse.newFile	310	create new file for myfile with 1 blobs
2022/04/01 12:18:28 fuse/file.go:49	fuse.(*file).Attr	310	Attr(myfile)
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	310	fuse: -> [ID=0xc] Lookup 0x6 gen=0 valid=1m0s attr={valid=1m0s ino=12416953139642217449 size=2439 mode=-rw-r--r--}
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	311	fuse: <- Open [ID=0xe Node=0x6 Uid=0 Gid=0 Pid=8] dir=false fl=OpenReadOnly
2022/04/01 12:18:28 fuse/file.go:70	fuse.(*file).Open	311	open file myfile with 1 blobs
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	311	fuse: -> [ID=0xe] Open 0x1 fl=0
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	1	fuse: No opcode 40
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	312	fuse: <- ID=0x10 Node=0x6 Uid=0 Gid=0 Pid=8
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	313	fuse: <- Interrupt [ID=0x11 Node=0x0 Uid=0 Gid=0 Pid=0] ID 0x10
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	312	fuse: -> [ID=0x10] Header error=ENOSYS
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	313	fuse: -> [ID=0x11] Interrupt
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	314	fuse: <- Read [ID=0x12 Node=0x6 Uid=0 Gid=0 Pid=8] 0x1 4096 @0x0 dir=false fl=0 lock=0 ffl=OpenReadOnly+OpenNonblock
2022/04/01 12:18:28 fuse/file.go:117	fuse.(*openFile).Read	314	Read(myfile, 4096, 0), file size 2439
2022/04/01 12:18:28 bloblru/cache.go:86	bloblru.(*Cache).Get	314	bloblru.Cache: get c9abc12077400f2dbe1c1a51daec3267dd26ca4a1086a9ae06d823ba46731485, hit false
2022/04/01 12:18:28 repository/repository.go:164	repository.(*Repository).LoadBlob	314	load data with id c9abc12077400f2dbe1c1a51daec3267dd26ca4a1086a9ae06d823ba46731485 (buf len 0, cap 0)
2022/04/01 12:18:28 repository/repository.go:178	repository.(*Repository).LoadBlob	314	blob data/c9abc12077400f2dbe1c1a51daec3267dd26ca4a1086a9ae06d823ba46731485 found: <Blob (data) c9abc120, offset 191561, length 2471>
2022/04/01 12:18:28 restic/readerat.go:30	restic.ReadAt	314	ReadAt(<data/abedc70a9e>) at 191561, len 2471
2022/04/01 12:18:28 cache/backend.go:179	cache.(*Backend).Load	314	Load(<data/abedc70a9e>, 2471, 191561): delegating to backend
2022/04/01 12:18:28 s3/s3.go:323	s3.(*Backend).openReader	314	Load <data/abedc70a9e>, length 2471, offset 191561 from data/ab/abedc70a9e56925869afacb4d1a67f295875eb4db7a69b015130a1a2e4a342bb
2022/04/01 12:18:28 s3/s3.go:341	s3.(*Backend).openReader	314	range: 191561-194031
2022/04/01 12:18:28 debug/round_tripper_debug.go:83	debug.loggingRoundTripper.RoundTrip	314	------------  HTTP REQUEST -----------
GET <secret> HTTP/1.1
Host: <secret>
User-Agent: MinIO (linux; amd64) minio-go/v7.0.23
Authorization: <secret>
Range: bytes=191561-194031
X-Amz-Content-Sha256: <secret>
X-Amz-Date: 20220401T121828Z


2022/04/01 12:18:28 debug/round_tripper_debug.go:96	debug.loggingRoundTripper.RoundTrip	314	------------  HTTP RESPONSE ----------
HTTP/2.0 206 Partial Content
Content-Length: 2471
Accept-Ranges: bytes
Content-Range: bytes 191561-194031/5794098
Content-Security-Policy: block-all-mixed-content
Content-Type: application/octet-stream
Date: Fri, 01 Apr 2022 12:18:28 GMT
Etag: "3bc427a7e4f1b854811297fc730a68e9-1"
Last-Modified: Fri, 01 Apr 2022 01:00:26 GMT
Server: nginx/1.19.2
Strict-Transport-Security: max-age=15724800; includeSubDomains
Vary: Origin
X-Amz-Request-Id: 16E1C4F81DAAE07C
X-Xss-Protection: 1; mode=block


2022/04/01 12:18:28 debug/round_tripper_debug.go:50	debug.(*eofDetectReader).Close	314	body not drained, 0 bytes not read: Close()
github.com/restic/restic/internal/debug.(*eofDetectReader).Close
	github.com/restic/restic/internal/debug/round_tripper_debug.go:50
github.com/restic/restic/internal/backend/s3.wrapReader.Close
	github.com/restic/restic/internal/backend/s3/s3.go:311
github.com/restic/restic/internal/backend.DefaultLoad
	github.com/restic/restic/internal/backend/utils.go:58
github.com/restic/restic/internal/backend/s3.(*Backend).Load
	github.com/restic/restic/internal/backend/s3/s3.go:319
github.com/restic/restic/internal/backend.(*RetryBackend).Load.func1
	github.com/restic/restic/internal/backend/backend_retry.go:89
github.com/cenkalti/backoff/v4.RetryNotifyWithTimer
	github.com/cenkalti/backoff/v4@v4.1.2/retry.go:55
github.com/cenkalti/backoff/v4.RetryNotify
	github.com/cenkalti/backoff/v4@v4.1.2/retry.go:34
github.com/restic/restic/internal/backend.(*RetryBackend).retry
	github.com/restic/restic/internal/backend/backend_retry.go:46
github.com/restic/restic/internal/backend.(*RetryBackend).Load
	github.com/restic/restic/internal/backend/backend_retry.go:87
github.com/restic/restic/internal/cache.(*Backend).Load
	github.com/restic/restic/internal/cache/backend.go:180
github.com/restic/restic/internal/restic.ReadAt
	github.com/restic/restic/internal/restic/readerat.go:32
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
	github.com/restic/restic/internal/repository/repository.go:199
github.com/restic/restic/internal/fuse.(*openFile).getBlobAt
	github.com/restic/restic/internal/fuse/file.go:105
github.com/restic/restic/internal/fuse.(*openFile).Read
	github.com/restic/restic/internal/fuse/file.go:148
bazil.org/fuse/fs.(*Server).handleRequest
	bazil.org/fuse@v0.0.0-20200407214033-5883e5a4b512/fs/serve.go:1242
bazil.org/fuse/fs.(*Server).serve
	bazil.org/fuse@v0.0.0-20200407214033-5883e5a4b512/fs/serve.go:871
bazil.org/fuse/fs.(*Server).Serve.func1
	bazil.org/fuse@v0.0.0-20200407214033-5883e5a4b512/fs/serve.go:425
runtime.goexit
	runtime/asm_amd64.s:1581
2022/04/01 12:18:28 s3/s3.go:363	s3.(*Backend).openReader.func1	314	Close()
2022/04/01 12:18:28 restic/readerat.go:41	restic.ReadAt	314	ReadAt(<data/abedc70a9e>) ReadFull returned 2471 bytes
2022/04/01 12:18:28 bloblru/cache.go:48	bloblru.(*Cache).Add	314	bloblru.Cache: add c9abc12077400f2dbe1c1a51daec3267dd26ca4a1086a9ae06d823ba46731485
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	314	fuse: -> [ID=0x12] Read 2439
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	332	fuse: <- Flush [ID=0x14 Node=0x6 Uid=0 Gid=0 Pid=34] 0x1 fl=0x0 lk=0x11f3faf30d4db593
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	332	fuse: -> [ID=0x14] Flush
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	333	fuse: <- Lookup [ID=0x16 Node=0x5 Uid=0 Gid=0 Pid=8] "myfile"
2022/04/01 12:18:28 fuse/dir.go:184	fuse.(*dir).Lookup	333	Lookup(myfile)
2022/04/01 12:18:28 fuse/file.go:40	fuse.newFile	333	create new file for myfile with 16 blobs
2022/04/01 12:18:28 fuse/file.go:49	fuse.(*file).Attr	333	Attr(myfile)
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	333	fuse: -> [ID=0x16] Lookup 0x7 gen=0 valid=1m0s attr={valid=1m0s ino=3559809760280939452 size=26998690 mode=-rw-r--r--}
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	334	fuse: <- Open [ID=0x18 Node=0x7 Uid=0 Gid=0 Pid=8] dir=false fl=OpenReadOnly
2022/04/01 12:18:28 fuse/file.go:70	fuse.(*file).Open	334	open file myfile with 16 blobs
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	334	fuse: -> [ID=0x18] Open 0x2 fl=0
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	335	fuse: <- Read [ID=0x1a Node=0x7 Uid=0 Gid=0 Pid=8] 0x2 16384 @0x0 dir=false fl=0 lock=0 ffl=OpenReadOnly+OpenNonblock
2022/04/01 12:18:28 fuse/file.go:117	fuse.(*openFile).Read	335	Read(myfile, 16384, 0), file size 26998690
2022/04/01 12:18:28 bloblru/cache.go:86	bloblru.(*Cache).Get	335	bloblru.Cache: get 7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9, hit false
2022/04/01 12:18:28 repository/repository.go:164	repository.(*Repository).LoadBlob	335	load data with id 7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9 (buf len 0, cap 0)
2022/04/01 12:18:28 repository/repository.go:178	repository.(*Repository).LoadBlob	335	blob data/7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9 found: <Blob (data) 7553984f, offset 440528, length 989358>
2022/04/01 12:18:28 restic/readerat.go:30	restic.ReadAt	335	ReadAt(<data/abedc70a9e>) at 440528, len 989358
2022/04/01 12:18:28 cache/backend.go:179	cache.(*Backend).Load	335	Load(<data/abedc70a9e>, 989358, 440528): delegating to backend
2022/04/01 12:18:28 s3/s3.go:323	s3.(*Backend).openReader	335	Load <data/abedc70a9e>, length 989358, offset 440528 from data/ab/abedc70a9e56925869afacb4d1a67f295875eb4db7a69b015130a1a2e4a342bb
2022/04/01 12:18:28 s3/s3.go:341	s3.(*Backend).openReader	335	range: 440528-1429885
2022/04/01 12:18:28 debug/round_tripper_debug.go:83	debug.loggingRoundTripper.RoundTrip	335	------------  HTTP REQUEST -----------
GET <secret> HTTP/1.1
Host: <secret>
User-Agent: MinIO (linux; amd64) minio-go/v7.0.23
Authorization: <secret>
Range: bytes=440528-1429885
X-Amz-Content-Sha256: <secret>
X-Amz-Date: 20220401T121828Z


2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	336	fuse: <- Interrupt [ID=0x1b Node=0x0 Uid=0 Gid=0 Pid=0] ID 0x1a
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	336	fuse: -> [ID=0x1b] Interrupt
2022/04/01 12:18:28 debug/round_tripper_debug.go:88	debug.loggingRoundTripper.RoundTrip	335	RoundTrip() returned error: context canceled
2022/04/01 12:18:28 repository/repository.go:201	repository.(*Repository).LoadBlob	335	error loading blob <Blob (data) 7553984f, offset 440528, length 989358>: ReadFull(<data/abedc70a9e>): context canceled
2022/04/01 12:18:28 fuse/file.go:107	fuse.(*openFile).getBlobAt	335	LoadBlob(myfile, 7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9) failed: ReadFull(<data/abedc70a9e>): context canceled
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	335	fuse: -> [ID=0x1a] Read error=EIO: ReadFull(<data/abedc70a9e>): context canceled
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	353	fuse: <- Read [ID=0x1c Node=0x7 Uid=0 Gid=0 Pid=8] 0x2 4096 @0x0 dir=false fl=0 lock=0 ffl=OpenReadOnly+OpenNonblock
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	354	fuse: <- Interrupt [ID=0x1d Node=0x0 Uid=0 Gid=0 Pid=0] ID 0x1c
2022/04/01 12:18:28 fuse/file.go:117	fuse.(*openFile).Read	353	Read(myfile, 4096, 0), file size 26998690
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	354	fuse: -> [ID=0x1d] Interrupt
2022/04/01 12:18:28 bloblru/cache.go:86	bloblru.(*Cache).Get	353	bloblru.Cache: get 7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9, hit false
2022/04/01 12:18:28 repository/repository.go:164	repository.(*Repository).LoadBlob	353	load data with id 7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9 (buf len 0, cap 0)
2022/04/01 12:18:28 repository/repository.go:178	repository.(*Repository).LoadBlob	353	blob data/7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9 found: <Blob (data) 7553984f, offset 440528, length 989358>
2022/04/01 12:18:28 restic/readerat.go:30	restic.ReadAt	353	ReadAt(<data/abedc70a9e>) at 440528, len 989358
2022/04/01 12:18:28 cache/backend.go:179	cache.(*Backend).Load	353	Load(<data/abedc70a9e>, 989358, 440528): delegating to backend
2022/04/01 12:18:28 repository/repository.go:201	repository.(*Repository).LoadBlob	353	error loading blob <Blob (data) 7553984f, offset 440528, length 989358>: ReadFull(<data/abedc70a9e>): context canceled
2022/04/01 12:18:28 fuse/file.go:107	fuse.(*openFile).getBlobAt	353	LoadBlob(myfile, 7553984f4919eefa37b2bc7ce3e35b113b041550de2fa1a45dcfd8846ba82bf9) failed: ReadFull(<data/abedc70a9e>): context canceled
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	353	fuse: -> [ID=0x1c] Read error=EIO: ReadFull(<data/abedc70a9e>): context canceled
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	355	fuse: <- Flush [ID=0x1e Node=0x6 Uid=0 Gid=0 Pid=8] 0x1 fl=0x0 lk=0x145173f8a364c372
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	355	fuse: -> [ID=0x1e] Flush
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	356	fuse: <- Release [ID=0x20 Node=0x6 Uid=0 Gid=0 Pid=0] 0x1 fl=OpenReadOnly+OpenNonblock rfl=0 owner=0x0
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	356	fuse: -> [ID=0x20] Release
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	357	fuse: <- Flush [ID=0x22 Node=0x7 Uid=0 Gid=0 Pid=8] 0x2 fl=0x0 lk=0x145173f8a364c372
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	357	fuse: -> [ID=0x22] Flush
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	358	fuse: <- Release [ID=0x24 Node=0x7 Uid=0 Gid=0 Pid=0] 0x2 fl=OpenReadOnly+OpenNonblock rfl=0 owner=0x0
2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	358	fuse: -> [ID=0x24] Release

What backend/server/service did you use to store the repository?

I'm not sure.

Expected behavior

The file should be readable without errors.

Actual behavior

I'm getting read <filename>: input/output error.

Do you have an idea how to solve the issue?

I can read a small amount of data from the problematic file like this: head -c1 /path/to/my/file. This does not give me any error. After doing this, the problem is gone - my Go program now has no problem anymore with reading the whole file.

@MichaelEischer
Copy link
Member

Can you try whether setting the environment variable GODEBUG=asyncpreemptoff=1 fixes the problem?

@MichaelEischer MichaelEischer added state: need feedback waiting for feedback, e.g. from the submitter state: need investigating cause unknown, need investigating/troubleshooting labels Apr 3, 2022
@dummyalias
Copy link
Author

Thanks for looking into the problem! I have set GODEBUG=asyncpreemptoff=1 for the restic mount command, but the error still occurs.

@MichaelEischer
Copy link
Member

Did you try setting the environment variable for the Go program you use to read from the fuse mount?

@dummyalias
Copy link
Author

I had not, but have now. When running my Go program, that is reading the mounted files, with GODEBUG=asyncpreemptoff=1 the error still occurs.

@dummyalias
Copy link
Author

By now I have a suspicion, that the problem is about the performance of the Minio backend. I have noticed that the problem almost exclusively occurs in a repository that is ca. 2TB in size and runs on a virtualized server. With a repository that is only ca. 300GB in size and runs on bare metal, the problem almost never occurs.

I am not familiar with the code of restic, but in the debug log I posted above I can see that a context was canceled. Is it possible that a timeout is in place here, which is too short? The input/output error occurs rather quickly, so if a timeout is the problem, I think it couldn't be more than one or two seconds.

@MichaelEischer
Copy link
Member

it looks like restic is receiving lots of interrupts, e.g.:

2022/04/01 12:18:28 restic/cmd_mount.go:153	main.runMount.func2	336	fuse: <- Interrupt [ID=0x1b Node=0x0 Uid=0 Gid=0 Pid=0] ID 0x1a

That particular one seems to be the one responsible for causing the canceled context. So the main question is probably where that interrupt originates from. From looking at the fuse library code, that interrupt is sent from the kernel. So maybe the application reading the files is canceling its read requests too frequently?

@dummyalias
Copy link
Author

Thanks for pointing out the interrupts, I didn't recognise them as a problem source. Unfortunately I'm not familiar with bazil.org/fuse, so I'm unable to interpret them. If the interrupt is sent from the kernel, does that mean it's a UNIX-Signal? If so: is there a way to tell which one it is? Judging by fuse's logging code, the logged ID 0x1a isn't it (altough if it were, it would be SIGVTALRM which doesn't sound too implausible to me).

I'm not quite sure what you mean by "canceling read requests", but I'm not usually closing unread files or killing Go routines while they are reading. I'm just opening two (gzipped) files in quick succession and creating a new gzip reader for each using gzip.NewReader (which I think starts reading the files already). The error occurs when creating the gzip reader for the second file, which is much larger than the first.

@MichaelEischer
Copy link
Member

My guess would be that what we're seeing here is that a program executes a read syscall for a file from the fuse mount and then blocks waiting for data. Then a signal arrives at that program which interrupts that syscall, causing the kernel to cancel the read request sent to restic.

I've noticed something interesting: the error log reports that restic returned an "EIO" == "input/output error" after canceling the request, so that's the source of the errors. I've dug a bit into the fuse library and it looks, like we're returning the wrong error in case of an interrupted syscall. Could you try whether the following patch fixes the problem in your use case? (The patch is far from complete)

diff --git a/internal/fuse/file.go b/internal/fuse/file.go
index 571d5a865..d54f58f00 100644
--- a/internal/fuse/file.go
+++ b/internal/fuse/file.go
@@ -105,6 +105,9 @@ func (f *openFile) getBlobAt(ctx context.Context, i int) (blob []byte, err error
        blob, err = f.root.repo.LoadBlob(ctx, restic.DataBlob, f.node.Content[i], nil)
        if err != nil {
                debug.Log("LoadBlob(%v, %v) failed: %v", f.node.Name, f.node.Content[i], err)
+               if errors.Is(err, context.Canceled) {
+                       return nil, context.Canceled
+               }
                return nil, err
        }

dummyalias pushed a commit to dummyalias/restic that referenced this issue May 9, 2022
@dummyalias
Copy link
Author

dummyalias commented May 9, 2022

I have A/B tested without (cc8a03b) and with your fix (dummyalias@d555008) a few times and it seems to work consistently with your fix while not working without it. Awesome, thanks a lot!

@MichaelEischer MichaelEischer removed the state: need feedback waiting for feedback, e.g. from the submitter label May 14, 2022
@borkd
Copy link

borkd commented Jul 3, 2022

I have ran into transient input/output errors recently on a reliable server-grade bare metal with a healthy v2 repo on local filesystem mounted for consumption by another restic process.

Prior to this proposed patch each read pass over a TB+ repo would yield hundreds of transient input/output errors, errors in read and Readdirnames. Readdirnames erros were roughly 3x more frequent than errors in read. With this patch, however incomplete the fix might be, quick tests show the read errors gone but errors in Readdirnames are still present.

@MichaelEischer - any suggestions on how to eliminate problems with Readdirnames?

@borkd
Copy link

borkd commented Jul 4, 2022

I stand corrected. Quick tests were incremental and did not use --force

  • Full backups without this patch had both Readdirnames and read errors, none of which were on the same exact files on subsequent runs.
  • Patched unforced incremental backups had only Readdirnames errors. Looking at the final log, lack of read errors is likely from fuse mount not being stressed enough due to small change set (less than 1% of source data was changed vs parent snapshot).
  • Patched backup --force on the same restic mountpoints results in Readdirnames and read errors for files which have otherwise no issues. List of affected files changes is never the same

@borkd
Copy link

borkd commented Jul 4, 2022

Simple retry after a brief pause seems to work around errors in Readdirnames.

@MichaelEischer
Copy link
Member

@MichaelEischer - any suggestions on how to eliminate problems with Readdirnames?

As mentioned above, the patch snippet is far from complete. We'll need similar checks throughout the whole fuse implementation.

@MichaelEischer
Copy link
Member

I've opened PR #3875 which should also fix the Readdirnames problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants