-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO corruption on 2.86 #2609
Comments
Thanks for the detailed report with clean reproducible steps! I verified it works well with latest dev version https://github.com/chrislusf/seaweedfs/releases/tag/dev
|
I've previous experienced a similar issue as reported #3384 but in my experience that happens because Plex will periodically vacuum its SQLite DB. I tried this sequence with v3.18, to keep it simple with the default configuration:
I can't see how your test would have worked. Did you definitely run it on a SeaweedFS mount? |
@nickb937 works well with the dev version. Please help to confirm. |
Alright, I just re-built direct from master rather than the download tar-files:
|
I noticed the problem is only on Linux. My mac is fine. The sqlite3 version on my linux is:
On mac is:
Not sure whether the sqlite3 version is the problem. |
I have compared the contents of pread64/pwrite64 calls between sqlite running on seaweedfs and a different filesystem. From the application perspective they're exactly the same. I am now trying to produce a small python script that reproduces the problem by copying the IO sequence that sqlite is doing. And viewing the IO patterns in seaweedfs. The only thing I've noticed is that append mode on seaweedfs acts different to Linux - seek operations don't work on seaweedfs in append mode where they do in Linux, however this mode isn't used by sqlite, and something that the python docs mention that some filesystems will do. |
FWIW, |
The SQLite VACUUM does the following sequence:
Starting from fresh:
We can see this sequence marking the beginning of the third state of re-writing to FD3 (around like 4081):
What I've noticed is that none of these pwrite64() calls are modifying the file, despite it being in mode
Whereas if you perform the same calls on some other filesystem:
Here's a Python script that replicates the problem. Critical to the replication of the issue is the os.fdatasync() calls and os.ftruncate() and that the file requires to be truncated. If you comment out either of the fdatasync() calls or or ftruncate() then the test will not fail.
|
First write:
Second write creates a new chunk, 3,060d51e12c, but file should presumably consist of only 1 chunk:
os.ftruncate() truncates chunk 1,05ea52ca24, the original data rather than chunk 3,060d51e12c the new data.
Looks like
|
Fixes: seaweedfs#3384 Fixes: seaweedfs#2609 Before: Chunk `3,577c823253` is dropped when truncate sets file length to 4096: ``` I0806 19:42:26.236780 weedfs_file_sync.go:151 /file set chunks: 2 I0806 19:42:26.236790 weedfs_file_sync.go:153 /file chunks 0: 3,577c823253 [0,5120) I0806 19:42:26.236798 weedfs_file_sync.go:153 /file chunks 1: 6,58be291f0e [0,4096) I0806 19:42:26.237150 weedfs_attr.go:53 /file setattr set size=4096 filersize=5120 chunks=2 I0806 19:42:26.237171 weedfs_attr.go:66 truncated chunk 3,577c823253 from 5120 to 4096 I0806 19:42:26.237411 weedfs_file_sync.go:102 doFlush /file fh 0 I0806 19:42:26.237430 weedfs_file_sync.go:151 /file set chunks: 1 I0806 19:42:26.237439 weedfs_file_sync.go:153 /file chunks 0: 3,577c823253 [0,4096) ``` After, secondary chunk is retailed: ``` I0806 19:40:51.755588 weedfs_file_sync.go:151 /file set chunks: 2 I0806 19:40:51.755598 weedfs_file_sync.go:153 /file chunks 0: 6,54a83ba23d [0,5120) I0806 19:40:51.755605 weedfs_file_sync.go:153 /file chunks 1: 7,556020bbeb [0,4096) I0806 19:40:51.756049 weedfs_attr.go:53 /file setattr set size=4096 chunks=2 I0806 19:40:51.756066 weedfs_attr.go:65 truncated chunk 6,54a83ba23d from 5120 to 4096 I0806 19:40:51.756077 weedfs_attr.go:65 truncated chunk 7,556020bbeb from 4096 to 4096 I0806 19:40:51.756109 weedfs_file_sync.go:102 doFlush /file fh 0 I0806 19:40:51.756127 weedfs_file_sync.go:151 /file set chunks: 2 I0806 19:40:51.756137 weedfs_file_sync.go:153 /file chunks 0: 6,54a83ba23d [0,4096) I0806 19:40:51.756145 weedfs_file_sync.go:153 /file chunks 1: 7,556020bbeb [0,4096) ```
Test Passed for me with latest build from master version 30GB 3.20 986daec linux arm |
Still seems to break databases where PRAGMA=WAL (e.g. Sonarr)
|
I just had a chance to test it with Plex and its the same error |
With the 3.25 release, Sonarr has been running a few days without issues for me |
Wooot! Awesome, just deployed 3.26 here and Plex is running stable! Thanks tons for fixing this!!!! 👏🏻 |
Describe the bug
Applying a 'vacuum' to a sqlite database file is enough to corrupt it. Comparing the pread64() and pwrite64() syscalls between the same command run on ext4 and seaweedfs shows differences, particularly in the last byte of a 1024-byte read.
System Setup
OS: archlinux
sqlite: v3.37.0
Expected behavior
To replicate:
https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip
unzip chinook.zip
On weedfs:
On ext4:
If you unpack chinook.db from the zip file on a non-weedfs and a weedfs filesystem and compare the syscall output, you'll see the byte variations:
If you perform the operation twice i.e. unzip, vacuum, the corruption is shown to be consistent i.e. no byte-differences shown by cmp:
I would expect that comparing the chinook.db between the vacuum performed on ext4 and weedfs should have no byte differences either.
The text was updated successfully, but these errors were encountered: