Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/db: Slightly improve indirection (ref #6372) #6373

Merged
merged 3 commits into from
Feb 27, 2020

Conversation

calmh
Copy link
Member

@calmh calmh commented Feb 27, 2020

Purpose

I was working on indirecting version vectors, and that resulted in some
refactoring and improving the existing block indirection stuff. We may
or may not end up doing the version vector indirection, but I think
these changes are reasonable anyhow and will simplify the diff
significantly if we do go there. The main points are:

  • A bunch of renaming to make the indirection and GC not about "blocks"
    but about "indirection".

  • Adding a cutoff so that we don't actually indirect for small block
    lists. This gets us better performance when handling small files as it
    cuts out the indirection for quite small loss in space efficiency.

  • Being paranoid and always recalculating the hash on put. This costs
    some CPU, but the consequences if a buggy or malicious implementation
    silently substituted the block list by lying about the hash would be bad.

This may or may not be RC material depending on how we think about the change to indirecting small block lists and letting the old env var into the wild...

I was working on indirecting version vectors, and that resulted in some
refactoring and improving the existing block indirection stuff. We may
or may not end up doing the version vector indirection, but I think
these changes are reasonable anyhow and will simplify the diff
significantly if we do go there. The main points are:

- A bunch of renaming to make the indirection and GC not about "blocks"
  but about "indirection".

- Adding a cutoff so that we don't actually indirect for small block
  lists. This gets us better performance when handling small files as it
  cuts out the indirection for quite small loss in space efficiency.

- Being paranoid and always recalculating the hash on put. This costs
  some CPU, but the consequences if a buggy or malicious implementation
  silently substituted the block list by lying about the hash would be bad.
return err
}
} else if err != nil {
return err
}
fi.Blocks = nil
} else {
fi.BlocksHash = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. I think this is a generally useful piece of information to have. Perhaps we should check if the block list is empty when loading instead to decide if we need to take an indirection or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, avoid adding to bloom filter if it has a blockhash but also a blocklist in that case?

Copy link
Member Author

@calmh calmh Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. You mean that we should always calculate, set and send the hash, even when we don't need it internally? It's not expensive to do it for the small lists that this avoids, so sure we could do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of agree, even though the review comment below suggests the opposite: I feel like BlocksHash should either be a db internal implementation detail or always be there if exposed towards protocol.


func init() {
// deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost the plot a bit, but is this actually released already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the RC. So if we delay the 1.4.0 release and put this into another RC this can go away, otherwise it kinda needs to be there.

func (t readWriteTransaction) putFile(fkey []byte, fi protocol.FileInfo) error {
var bkey []byte

if len(fi.Blocks) > blocksIndirectionCutoff {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does idxchk need handling of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. It would, if we did your change to always have the hash, but as is now the idxck checks the hash if it's present and otherwise not.

Copy link
Member

@imsodin imsodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight it may have been better to not expose BlocksHash on the protocol and consider it an implementation detail, but I guess that ship has sailed at this point (too big a refactor for the RC).

blocksKey := t.keyer.GenerateBlockListKey(nil, fi.BlocksHash)
if _, err := t.Get(blocksKey); backend.IsNotFound(err) {
func (t readWriteTransaction) putFile(fkey []byte, fi protocol.FileInfo) error {
var bkey []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this declared out here? Isn't (re-) used outside of the if scope below as far as I can see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was in my original which also indirected the version vector... I figured it could stay there for clarity, but it has no practical reason now, no.

// we need to copy it.
err := f.Unmarshal(append([]byte{}, dbi.Value()...))

intf, err := t.unmarshalTrunc(dbi.Value(), true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the commend above valid (anymore)? unmarshalTrunc doesn't do any buffer copying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't valid at all since we switched to protobuf, which doesn't keep references (or we'd be fucked in so many places...).

This change in general isn't strictly needed right now, but I think all unmarshalling should use the unmarshalTrunc function. (Again, when indirecting version vectors that also affects the truncated version.)

@calmh
Copy link
Member Author

calmh commented Feb 27, 2020

Changed to always set the hash regardless

Copy link
Member

@imsodin imsodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's left to address: To RC, or not to RC?

I am in favor of RC: Gets rid of the additional env var and if there is something wrong with it (don't think so), I'd rather have it in the same release cycle as the original change (same for not having a huge diff in the next release cycle concerning code from this one). That seems worth delaying the release (or even skipping the next one, letting the current cycle take 2 months).

@AudriusButkevicius
Copy link
Member

I'd say delay it, or release the rc with the env var changed so we're not releasing legacy straight away.

@calmh calmh merged commit 4f7a775 into syncthing:master Feb 27, 2020
@calmh calmh deleted the imprindirect branch February 27, 2020 10:34
@calmh calmh added this to the v1.4.0 milestone Feb 27, 2020
calmh added a commit that referenced this pull request Feb 27, 2020
I was working on indirecting version vectors, and that resulted in some
refactoring and improving the existing block indirection stuff. We may
or may not end up doing the version vector indirection, but I think
these changes are reasonable anyhow and will simplify the diff
significantly if we do go there. The main points are:

- A bunch of renaming to make the indirection and GC not about "blocks"
  but about "indirection".

- Adding a cutoff so that we don't actually indirect for small block
  lists. This gets us better performance when handling small files as it
  cuts out the indirection for quite small loss in space efficiency.

- Being paranoid and always recalculating the hash on put. This costs
  some CPU, but the consequences if a buggy or malicious implementation
  silently substituted the block list by lying about the hash would be bad.
calmh added a commit that referenced this pull request Feb 27, 2020
* release:
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref #6372) (#6373)
calmh added a commit to calmh/syncthing that referenced this pull request Feb 27, 2020
* master:
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)
calmh added a commit to calmh/syncthing that referenced this pull request Mar 7, 2020
* master: (64 commits)
  lib/db: Be more lenient during migration (fixes syncthing#6397) (syncthing#6398)
  lib/db: Be more lenient during migration (fixes syncthing#6397) (syncthing#6398)
  cmd/ursrv: Analytics for Synology dist
  build: Build image should use Go 1.13 for now
  gui, lib/api: Remove CPU & RAM measurements (fixes syncthing#6249) (syncthing#6393)
  gui, man, authors: Update docs, translations, and contributors
  all: Tweak error creation (syncthing#6391)
  authors: Cleanup on request
  build: We can now use Go 1.13
  lib/db: Prevent GC concurrently with migration (fixes syncthing#6389) (syncthing#6390)
  lib/db: Prevent GC concurrently with migration (fixes syncthing#6389) (syncthing#6390)
  build: Fix syso creation (fixes syncthing#6386) (syncthing#6387)
  build: Fix syso creation (fixes syncthing#6386) (syncthing#6387)
  lib/db: Correct metadata recalculation (fixes syncthing#6381) (syncthing#6382)
  lib/db: Correct metadata recalculation (fixes syncthing#6381) (syncthing#6382)
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)
  build: Forked github.com/spaolacci/murmur3 for unsafe (ref syncthing#6371)
  ...
calmh added a commit to calmh/syncthing that referenced this pull request Mar 7, 2020
* master: (64 commits)
  lib/db: Be more lenient during migration (fixes syncthing#6397) (syncthing#6398)
  lib/db: Be more lenient during migration (fixes syncthing#6397) (syncthing#6398)
  cmd/ursrv: Analytics for Synology dist
  build: Build image should use Go 1.13 for now
  gui, lib/api: Remove CPU & RAM measurements (fixes syncthing#6249) (syncthing#6393)
  gui, man, authors: Update docs, translations, and contributors
  all: Tweak error creation (syncthing#6391)
  authors: Cleanup on request
  build: We can now use Go 1.13
  lib/db: Prevent GC concurrently with migration (fixes syncthing#6389) (syncthing#6390)
  lib/db: Prevent GC concurrently with migration (fixes syncthing#6389) (syncthing#6390)
  build: Fix syso creation (fixes syncthing#6386) (syncthing#6387)
  build: Fix syso creation (fixes syncthing#6386) (syncthing#6387)
  lib/db: Correct metadata recalculation (fixes syncthing#6381) (syncthing#6382)
  lib/db: Correct metadata recalculation (fixes syncthing#6381) (syncthing#6382)
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)
  lib/db: Remove reference to env var that never existed
  lib/db: Slightly improve indirection (ref syncthing#6372) (syncthing#6373)
  build: Forked github.com/spaolacci/murmur3 for unsafe (ref syncthing#6371)
  ...
@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Feb 27, 2021
@syncthing syncthing locked and limited conversation to collaborators Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants