New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/db: Don't put truncated files (ref #6855, ref #6501) #6888
Conversation
This also needs a migration, as existing brokenness (files with non-nil I'd like @calmh to have a look at this before merging, as I want to be as sure as possible that I don't just introduce a new kind of truncate-hell here and he dealt (had to deal) with truncate-messes before. |
I'm on mobile and away until tomorrow but will take a look then. I've suggested before that we we should treat entries with missing parts (blocks, version vectors) as simply missing themselves. Conflict resolution should take over and repair it from there? |
(Still away, back Monday...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable, yet I'm not sure I see the need for the migration. After the fixup the file will be in the database with no blocks (I guess?) and the MustRescan flag. We'll hash it on the next scan, or trigger a rescan when pulling it.
If we didn't do the migration and returned NotFound for the file, we'd hash it on next scan or mark it for rescan when pulling (because present on disk but not in database), which is simpler and safer should the problem resurface randomly (database corruption dropping blocks or whatnot)
lib/db/backend/backend.go
Outdated
e := &errClosed{} | ||
return errors.As(err, &e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think var e *errClosed
feels more descriptive than the struct literal thing, here it feels like i need to count the ampersands while stating the type explicitly is less thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly enought, that change causes test failures but not the panics that are documented on https://golang.org/pkg/errors/#As:
As panics if target is not a non-nil pointer to either a type that implements error, or to any interface type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I mean
var e *errClosed
return errors.As(err, &e)
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that breaks our unit tests, see below. I don't really get it, as we actually pass a non-nil pointer to a typed nil-pointer implementing error, which to my understanding should fullfill this:
An error matches target if the error's concrete value is assignable to the value pointed to by target [...]
$ go test -race -short ./lib/db/
--- FAIL: TestIgnoredFiles (0.00s)
db_test.go:40: key not found
--- FAIL: TestUpdate0to3 (0.00s)
db_test.go:182: key not found
--- FAIL: TestRepairSequence (0.00s)
db_test.go:327: key not found
--- FAIL: TestDowngrade (0.00s)
db_test.go:472: key not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's not possible... https://play.golang.org/p/Em_-8XFeVJj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the failures happen on exactly this change though. I pushed it so you can see what the change and error are, maybe there's a typo I fail to see? I can't investigate right now, will do so later if the mystery hasn't been lifted until then.
lib/db/db_test.go
Outdated
@@ -305,7 +305,7 @@ func TestRepairSequence(t *testing.T) { | |||
{Name: "duplicate", Blocks: genBlocks(2)}, | |||
{Name: "missing", Blocks: genBlocks(3)}, | |||
{Name: "overwriting", Blocks: genBlocks(4)}, | |||
{Name: "inconsistent", Blocks: genBlocks(5)}, | |||
{Name: "inconsistent", Blocks: genBlocks(2)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on why it's important to have this specific number of blocks? (I guess to get the same blocks as another file?)
We do return a not found error, which makes this work except that when we encounter an err not found for entries in the global list, we panic. For one just dropping the global list entry in this case, possibly leaving other things inconsistent, would also be bad and I think we need at least discoverable failure reports in such cases - I wouldn't want a bug like this to just fly under the radar. And in this case there might even be a catch with just dropping the fileinfo: If only the device in question ever bumped the file version, the new file info created will have e.g. version {A: 1} while everyone else has e.g. {A: 10}, thus everyone elses, possibly older version overrides the current one. I really want to do failure handling right around the db, meaning try to recover gracefully from everything possible as you say, but also have means to know about failures. And recovering correctly often means that the caller needs to know whether there's been a failure or the file is genuinely not existing. |
Fair enough |
lib/db/backend/backend.go
Outdated
} | ||
|
||
func IsNotFound(err error) bool { | ||
_, ok := err.(*errNotFound) | ||
return ok | ||
var e *errClosed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...notFound :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It had to be something like that... Sorry for the wasted time.
* main: (368 commits) build: We now target Go 1.14 lib/fs: Disable ioctl on ppc (fixes syncthing#6898) (syncthing#6901) gui, man, authors: Update docs, translations, and contributors lib/dialer: Try dialing without reuse in parallel (fixes syncthing#6892) (syncthing#6893) cmd/stcrashreceiver: Don't crash on nil err all: Remove need to restart syncthing (syncthing#6883) lib/db: Don't put truncated files (ref syncthing#6855, ref syncthing#6501) (syncthing#6888) lib/osutil: Check returned error instead of info (ref syncthing#6885) (syncthing#6887) gui, man, authors: Update docs, translations, and contributors lib/osutil: Preserve perms in AtomicWriter (fixes #tbd) (syncthing#6885) lib/fs: Fix WatchRename test for FreeBSD (fixes syncthing#6613) lib/fs: Unwrap mtimeFile, get fd the "correct" way (ref syncthing#6875) (syncthing#6877) lib/model: Don't close file early (fixes syncthing#6875) (syncthing#6876) lib/fs: Unwrap mtimeFile, get fd the "correct" way (ref syncthing#6875) (syncthing#6877) gui, man, authors: Update docs, translations, and contributors lib/fs: Fix WatchRename test for FreeBSD (fixes syncthing#6613) lib/model: Don't close file early (fixes syncthing#6875) (syncthing#6876) lib/db: Log context on panic (syncthing#6872) gui: Don't show pull order on SO folders (ref syncthing#6807) (syncthing#6871) lib/model: Check folder error before sync-waiting (fixes syncthing#6793) (syncthing#6847) ...
#6855
The tiny change in
TestRepairSequence
triggers the problem described above. To prevent it, don't ever write truncated files infos. If we wanted to write truncated files, we'd need to always redirect blocklists and check that a block list for the givenBlocksHash
of the truncated file already exists. Given we only write truncated files when repairing sequences and it's just an irrelevant "optimisation", I don't want to go there and instead just write full file infos.I also added context to errors in
fillFileInfo
and adjusted theIsClosed
etc. checks to useerrors.As
, to make the "no file info at all" distinguishable from "blocks/version not found".I really need to pay more attention to numbers in sentry. The "missing-entry-panic" already affects 30 users on v1.8.0 (https://sentry.syncthing.net/syncthing/syncthing/?query=is%3Aunresolved+release%3Av1.8.0+device+present&statsPeriod=14d) and over 300 in total (https://sentry.syncthing.net/syncthing/syncthing/?query=is%3Aunresolved+device+present&statsPeriod=14d). I usually just notice newly occurring events and then either take action or shelf them...