lib/db: Slightly improve indirection (ref #6372) #6373

calmh · 2020-02-27T09:00:31Z

Purpose

I was working on indirecting version vectors, and that resulted in some
refactoring and improving the existing block indirection stuff. We may
or may not end up doing the version vector indirection, but I think
these changes are reasonable anyhow and will simplify the diff
significantly if we do go there. The main points are:

A bunch of renaming to make the indirection and GC not about "blocks"
but about "indirection".
Adding a cutoff so that we don't actually indirect for small block
lists. This gets us better performance when handling small files as it
cuts out the indirection for quite small loss in space efficiency.
Being paranoid and always recalculating the hash on put. This costs
some CPU, but the consequences if a buggy or malicious implementation
silently substituted the block list by lying about the hash would be bad.

This may or may not be RC material depending on how we think about the change to indirecting small block lists and letting the old env var into the wild...

I was working on indirecting version vectors, and that resulted in some refactoring and improving the existing block indirection stuff. We may or may not end up doing the version vector indirection, but I think these changes are reasonable anyhow and will simplify the diff significantly if we do go there. The main points are: - A bunch of renaming to make the indirection and GC not about "blocks" but about "indirection". - Adding a cutoff so that we don't actually indirect for small block lists. This gets us better performance when handling small files as it cuts out the indirection for quite small loss in space efficiency. - Being paranoid and always recalculating the hash on put. This costs some CPU, but the consequences if a buggy or malicious implementation silently substituted the block list by lying about the hash would be bad.

AudriusButkevicius · 2020-02-27T09:27:52Z

lib/db/transactions.go

 				return err
 			}
 		} else if err != nil {
 			return err
 		}
+		fi.Blocks = nil
+	} else {
+		fi.BlocksHash = nil


I don't like this. I think this is a generally useful piece of information to have. Perhaps we should check if the block list is empty when loading instead to decide if we need to take an indirection or not.

Also, avoid adding to bloom filter if it has a blockhash but also a blocklist in that case?

We could. You mean that we should always calculate, set and send the hash, even when we don't need it internally? It's not expensive to do it for the small lists that this avoids, so sure we could do that.

I sort of agree, even though the review comment below suggests the opposite: I feel like BlocksHash should either be a db internal implementation detail or always be there if exposed towards protocol.

AudriusButkevicius · 2020-02-27T09:29:07Z

lib/db/lowlevel.go


 func init() {
+	// deprecated


Lost the plot a bit, but is this actually released already?

It's in the RC. So if we delay the 1.4.0 release and put this into another RC this can go away, otherwise it kinda needs to be there.

AudriusButkevicius · 2020-02-27T09:29:38Z

lib/db/transactions.go

+func (t readWriteTransaction) putFile(fkey []byte, fi protocol.FileInfo) error {
+	var bkey []byte
+
+	if len(fi.Blocks) > blocksIndirectionCutoff {


Does idxchk need handling of this?

Not really. It would, if we did your change to always have the hash, but as is now the idxck checks the hash if it's present and otherwise not.

imsodin

In hindsight it may have been better to not expose BlocksHash on the protocol and consider it an implementation detail, but I guess that ship has sailed at this point (too big a refactor for the RC).

imsodin · 2020-02-27T09:07:28Z

lib/db/transactions.go

-		blocksKey := t.keyer.GenerateBlockListKey(nil, fi.BlocksHash)
-		if _, err := t.Get(blocksKey); backend.IsNotFound(err) {
+func (t readWriteTransaction) putFile(fkey []byte, fi protocol.FileInfo) error {
+	var bkey []byte


Why is this declared out here? Isn't (re-) used outside of the if scope below as far as I can see.

It was in my original which also indirected the version vector... I figured it could stay there for clarity, but it has no practical reason now, no.

imsodin · 2020-02-27T09:10:31Z

lib/db/transactions.go

-		// we need to copy it.
-		err := f.Unmarshal(append([]byte{}, dbi.Value()...))
+
+		intf, err := t.unmarshalTrunc(dbi.Value(), true)


Isn't the commend above valid (anymore)? unmarshalTrunc doesn't do any buffer copying.

It wasn't valid at all since we switched to protobuf, which doesn't keep references (or we'd be fucked in so many places...).

This change in general isn't strictly needed right now, but I think all unmarshalling should use the unmarshalTrunc function. (Again, when indirecting version vectors that also affects the truncated version.)

calmh · 2020-02-27T09:39:39Z

Changed to always set the hash regardless

imsodin

What's left to address: To RC, or not to RC?

I am in favor of RC: Gets rid of the additional env var and if there is something wrong with it (don't think so), I'd rather have it in the same release cycle as the original change (same for not having a huge diff in the next release cycle concerning code from this one). That seems worth delaying the release (or even skipping the next one, letting the current cycle take 2 months).

AudriusButkevicius · 2020-02-27T09:56:29Z

I'd say delay it, or release the rc with the env var changed so we're not releasing legacy straight away.

I was working on indirecting version vectors, and that resulted in some refactoring and improving the existing block indirection stuff. We may or may not end up doing the version vector indirection, but I think these changes are reasonable anyhow and will simplify the diff significantly if we do go there. The main points are: - A bunch of renaming to make the indirection and GC not about "blocks" but about "indirection". - Adding a cutoff so that we don't actually indirect for small block lists. This gets us better performance when handling small files as it cuts out the indirection for quite small loss in space efficiency. - Being paranoid and always recalculating the hash on put. This costs some CPU, but the consequences if a buggy or malicious implementation silently substituted the block list by lying about the hash would be bad.

* release: lib/db: Remove reference to env var that never existed lib/db: Slightly improve indirection (ref #6372) (#6373)

* master: lib/db: Remove reference to env var that never existed lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373) lib/db: Remove reference to env var that never existed lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)

* master: (64 commits) lib/db: Be more lenient during migration (fixes syncthing#6397) (syncthing#6398) lib/db: Be more lenient during migration (fixes syncthing#6397) (syncthing#6398) cmd/ursrv: Analytics for Synology dist build: Build image should use Go 1.13 for now gui, lib/api: Remove CPU & RAM measurements (fixes syncthing#6249) (syncthing#6393) gui, man, authors: Update docs, translations, and contributors all: Tweak error creation (syncthing#6391) authors: Cleanup on request build: We can now use Go 1.13 lib/db: Prevent GC concurrently with migration (fixes syncthing#6389) (syncthing#6390) lib/db: Prevent GC concurrently with migration (fixes syncthing#6389) (syncthing#6390) build: Fix syso creation (fixes syncthing#6386) (syncthing#6387) build: Fix syso creation (fixes syncthing#6386) (syncthing#6387) lib/db: Correct metadata recalculation (fixes syncthing#6381) (syncthing#6382) lib/db: Correct metadata recalculation (fixes syncthing#6381) (syncthing#6382) lib/db: Remove reference to env var that never existed lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373) lib/db: Remove reference to env var that never existed lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373) build: Forked github.com/spaolacci/murmur3 for unsafe (ref syncthing#6371) ...

calmh added 2 commits February 27, 2020 09:57

wip

5434773

AudriusButkevicius previously approved these changes Feb 27, 2020

View reviewed changes

AudriusButkevicius reviewed Feb 27, 2020

View reviewed changes

imsodin reviewed Feb 27, 2020

View reviewed changes

wip

9b787c5

calmh dismissed AudriusButkevicius’s stale review via 9b787c5 February 27, 2020 09:39

imsodin approved these changes Feb 27, 2020

View reviewed changes

AudriusButkevicius approved these changes Feb 27, 2020

View reviewed changes

calmh merged commit 4f7a775 into syncthing:master Feb 27, 2020

calmh deleted the imprindirect branch February 27, 2020 10:34

calmh added this to the v1.4.0 milestone Feb 27, 2020

calmh added a commit that referenced this pull request Feb 27, 2020

Merge branch 'release'

daf05c6

* release: lib/db: Remove reference to env var that never existed lib/db: Slightly improve indirection (ref #6372) (#6373)

st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Feb 27, 2021

syncthing locked and limited conversation to collaborators Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib/db: Slightly improve indirection (ref #6372) #6373

lib/db: Slightly improve indirection (ref #6372) #6373

calmh commented Feb 27, 2020

AudriusButkevicius Feb 27, 2020

AudriusButkevicius Feb 27, 2020

calmh Feb 27, 2020 •

edited

Loading

imsodin Feb 27, 2020

AudriusButkevicius Feb 27, 2020

calmh Feb 27, 2020

AudriusButkevicius Feb 27, 2020

calmh Feb 27, 2020

imsodin left a comment

imsodin Feb 27, 2020

calmh Feb 27, 2020

imsodin Feb 27, 2020

calmh Feb 27, 2020

calmh commented Feb 27, 2020

imsodin left a comment

AudriusButkevicius commented Feb 27, 2020

lib/db: Slightly improve indirection (ref #6372) #6373

lib/db: Slightly improve indirection (ref #6372) #6373

Conversation

calmh commented Feb 27, 2020

Purpose

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calmh Feb 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imsodin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calmh commented Feb 27, 2020

imsodin left a comment

Choose a reason for hiding this comment

AudriusButkevicius commented Feb 27, 2020

calmh Feb 27, 2020 •

edited

Loading