Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/db: Don't put truncated files (ref #6855, ref #6501) #6888

Merged
merged 5 commits into from Aug 18, 2020

Conversation

imsodin
Copy link
Member

@imsodin imsodin commented Aug 11, 2020

Calling .putFile with truncate == true and a file with less blocks than blocksIndirectionCutoff puts a fileinfo without blocks. truncate == true is only the case when repairing sequence indexes, so this "shoudn't happen"™. However it does, so it's a real practical issue.

#6855

The tiny change in TestRepairSequence triggers the problem described above. To prevent it, don't ever write truncated files infos. If we wanted to write truncated files, we'd need to always redirect blocklists and check that a block list for the given BlocksHash of the truncated file already exists. Given we only write truncated files when repairing sequences and it's just an irrelevant "optimisation", I don't want to go there and instead just write full file infos.

I also added context to errors in fillFileInfo and adjusted the IsClosed etc. checks to use errors.As, to make the "no file info at all" distinguishable from "blocks/version not found".

I really need to pay more attention to numbers in sentry. The "missing-entry-panic" already affects 30 users on v1.8.0 (https://sentry.syncthing.net/syncthing/syncthing/?query=is%3Aunresolved+release%3Av1.8.0+device+present&statsPeriod=14d) and over 300 in total (https://sentry.syncthing.net/syncthing/syncthing/?query=is%3Aunresolved+device+present&statsPeriod=14d). I usually just notice newly occurring events and then either take action or shelf them...

@imsodin
Copy link
Member Author

imsodin commented Aug 12, 2020

This also needs a migration, as existing brokenness (files with non-nil BlocksHash but nil Blocks) will not solve itself. I added a test for that.

I'd like @calmh to have a look at this before merging, as I want to be as sure as possible that I don't just introduce a new kind of truncate-hell here and he dealt (had to deal) with truncate-messes before.

@calmh
Copy link
Member

calmh commented Aug 12, 2020

I'm on mobile and away until tomorrow but will take a look then.

I've suggested before that we we should treat entries with missing parts (blocks, version vectors) as simply missing themselves. Conflict resolution should take over and repair it from there?

@calmh
Copy link
Member

calmh commented Aug 15, 2020

(Still away, back Monday...)

Copy link
Member

@calmh calmh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, yet I'm not sure I see the need for the migration. After the fixup the file will be in the database with no blocks (I guess?) and the MustRescan flag. We'll hash it on the next scan, or trigger a rescan when pulling it.

If we didn't do the migration and returned NotFound for the file, we'd hash it on next scan or mark it for rescan when pulling (because present on disk but not in database), which is simpler and safer should the problem resurface randomly (database corruption dropping blocks or whatnot)

Comment on lines 161 to 162
e := &errClosed{}
return errors.As(err, &e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think var e *errClosed feels more descriptive than the struct literal thing, here it feels like i need to count the ampersands while stating the type explicitly is less thinking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly enought, that change causes test failures but not the panics that are documented on https://golang.org/pkg/errors/#As:

As panics if target is not a non-nil pointer to either a type that implements error, or to any interface type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I mean

	var e *errClosed
	return errors.As(err, &e)

right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that breaks our unit tests, see below. I don't really get it, as we actually pass a non-nil pointer to a typed nil-pointer implementing error, which to my understanding should fullfill this:

An error matches target if the error's concrete value is assignable to the value pointed to by target [...]

$ go test -race -short ./lib/db/
--- FAIL: TestIgnoredFiles (0.00s)
    db_test.go:40: key not found
--- FAIL: TestUpdate0to3 (0.00s)
    db_test.go:182: key not found
--- FAIL: TestRepairSequence (0.00s)
    db_test.go:327: key not found
--- FAIL: TestDowngrade (0.00s)
    db_test.go:472: key not found

Copy link
Member

@calmh calmh Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's not possible... https://play.golang.org/p/Em_-8XFeVJj

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the failures happen on exactly this change though. I pushed it so you can see what the change and error are, maybe there's a typo I fail to see? I can't investigate right now, will do so later if the mystery hasn't been lifted until then.

@@ -305,7 +305,7 @@ func TestRepairSequence(t *testing.T) {
{Name: "duplicate", Blocks: genBlocks(2)},
{Name: "missing", Blocks: genBlocks(3)},
{Name: "overwriting", Blocks: genBlocks(4)},
{Name: "inconsistent", Blocks: genBlocks(5)},
{Name: "inconsistent", Blocks: genBlocks(2)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on why it's important to have this specific number of blocks? (I guess to get the same blocks as another file?)

@imsodin
Copy link
Member Author

imsodin commented Aug 17, 2020

We do return a not found error, which makes this work except that when we encounter an err not found for entries in the global list, we panic. For one just dropping the global list entry in this case, possibly leaving other things inconsistent, would also be bad and I think we need at least discoverable failure reports in such cases - I wouldn't want a bug like this to just fly under the radar. And in this case there might even be a catch with just dropping the fileinfo: If only the device in question ever bumped the file version, the new file info created will have e.g. version {A: 1} while everyone else has e.g. {A: 10}, thus everyone elses, possibly older version overrides the current one.

I really want to do failure handling right around the db, meaning try to recover gracefully from everything possible as you say, but also have means to know about failures. And recovering correctly often means that the caller needs to know whether there's been a failure or the file is genuinely not existing.

@calmh
Copy link
Member

calmh commented Aug 17, 2020

Fair enough

}

func IsNotFound(err error) bool {
_, ok := err.(*errNotFound)
return ok
var e *errClosed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...notFound :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had to be something like that... Sorry for the wasted time.

@calmh calmh merged commit 8f52158 into syncthing:main Aug 18, 2020
@imsodin imsodin deleted the db/killBlocksOnSeqRep branch August 18, 2020 08:40
@calmh calmh added this to the v1.9.0 milestone Aug 18, 2020
calmh added a commit to calmh/syncthing that referenced this pull request Aug 19, 2020
* main: (368 commits)
  build: We now target Go 1.14
  lib/fs: Disable ioctl on ppc (fixes syncthing#6898) (syncthing#6901)
  gui, man, authors: Update docs, translations, and contributors
  lib/dialer: Try dialing without reuse in parallel (fixes syncthing#6892) (syncthing#6893)
  cmd/stcrashreceiver: Don't crash on nil err
  all: Remove need to restart syncthing (syncthing#6883)
  lib/db: Don't put truncated files (ref syncthing#6855, ref syncthing#6501) (syncthing#6888)
  lib/osutil: Check returned error instead of info (ref syncthing#6885) (syncthing#6887)
  gui, man, authors: Update docs, translations, and contributors
  lib/osutil: Preserve perms in AtomicWriter (fixes #tbd) (syncthing#6885)
  lib/fs: Fix WatchRename test for FreeBSD (fixes syncthing#6613)
  lib/fs: Unwrap mtimeFile, get fd the "correct" way (ref syncthing#6875) (syncthing#6877)
  lib/model: Don't close file early (fixes syncthing#6875) (syncthing#6876)
  lib/fs: Unwrap mtimeFile, get fd the "correct" way (ref syncthing#6875) (syncthing#6877)
  gui, man, authors: Update docs, translations, and contributors
  lib/fs: Fix WatchRename test for FreeBSD (fixes syncthing#6613)
  lib/model: Don't close file early (fixes syncthing#6875) (syncthing#6876)
  lib/db: Log context on panic (syncthing#6872)
  gui: Don't show pull order on SO folders (ref syncthing#6807) (syncthing#6871)
  lib/model: Check folder error before sync-waiting (fixes syncthing#6793) (syncthing#6847)
  ...
@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Aug 19, 2021
@syncthing syncthing locked and limited conversation to collaborators Aug 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants