Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable hash algos #2314

Closed
wants to merge 1 commit into from
Closed

Pluggable hash algos #2314

wants to merge 1 commit into from

Conversation

calmh
Copy link
Member

@calmh calmh commented Sep 26, 2015

Remaining issues:

  • Stop folder on mismatch
  • Move generation of FolderRejected to ClusterConfig and add received hash algo to the event so the GUI can do the right thing
  • Drop and rescan folder if hash algo at startup differs from what is in the index
  • Hash algo usage should be reported to usage data thingy
  • Some tests on the behavior on mismatches

@calmh calmh added the pr-WIP label Sep 26, 2015
@calmh
Copy link
Member Author

calmh commented Sep 26, 2015

Do we even want this configurable? If not, things become easier, otherwise we need to handle the config change, rescan, etc.

@calmh
Copy link
Member Author

calmh commented Sep 26, 2015

If we believe it's good enough (and we should make sure it is, as well as evaluate options), we don't need it configurable imho.

m.fmut.RLock()
for _, folder := range cm.Folders {
var algo scanner.HashAlgorithm
for _, opt := range folder.Options {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a convenience method (* Options) Get(key string) (string, ok)

@AudriusButkevicius
Copy link
Member

Can't you just use: https://golang.org/pkg/hash/#Hash ?
And then just have map[string]hash.Hash availableHashingAlgos.
Why do we need the type def?

@AudriusButkevicius
Copy link
Member

Well my worry about murmur is collisions.
Also, blockmap probably needs to be aware of this too.

@calmh
Copy link
Member Author

calmh commented Sep 26, 2015

Yeah the databas stuff may need some tweaking around the hash lengths. Murmur3 sounds collision resistant from what I can find on the internet, and I'm going to write some empirical tests later on (checking that we don't get collisions on a few hundred gigs of data, with very small block sizes, and so on).

As for this actual implementation, I'm more and more convinced that we should not have it configurable at all and just make the right voice for everyone. :)

@AudriusButkevicius
Copy link
Member

I think having it configurable is nice, but the implementation becomes much harder.
This allows people who feel adventurous use CRC32, though blockmap needs to know which hashes are collision resistant, which ones are just fast.

@calmh
Copy link
Member Author

calmh commented Sep 26, 2015

Also: cityhash https://github.com/surge/cityhash

@AudriusButkevicius
Copy link
Member

@Zillode
Copy link
Contributor

Zillode commented Sep 26, 2015

I suggested murmur3 based on http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html

We should focus our benchmarks for the slow platforms imo. My pi is ready for some action:)

@calmh calmh modified the milestone: v0.13 "Copper Cockroach" Oct 16, 2015
@imraro
Copy link

imraro commented Nov 3, 2015

Hi, it's me again )

Well my worry about murmur is collisions...

Also: cityhash https://github.com/surge/cityhash

I suggested murmur3 based on http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html

Please note the SMHasher test suite "which evaluates collision, dispersion and randomness qualities of hash functions."

Some results from xxHash home page:

Name Speed Quality Author
xxHash 5.4 GB/s 10 Y.C.
MurmurHash 3a 2.7 GB/s 10 Austin Appleby
SBox 1.4 GB/s 9 Bret Mulvey
Lookup3 1.2 GB/s 9 Bob Jenkins
CityHash64 1.05 GB/s 10 Pike & Alakuijala
FNV 0.55 GB/s 5 Fowler, Noll, Vo
CRC32 0.43 GB/s 9
MD5-32 0.33 GB/s 10 Ronald L.Rivest
SHA1-32 0.28 GB/s 10

@calmh
Copy link
Member Author

calmh commented Nov 12, 2015

This should be pretty much mergeable, barring a few more tests maybe. The short version of what this does, in it's latest iteration:

  • Adds a type scanner.HashAlgorithm that needs to be passed to all things that do hashing. I do this instead of passing a hash.Hash directly because I want to only use known algos and we want those decorated with some extra info - like if it's a secure algorithm or not.
  • Refactors the block copying code to break out some methods and re-add the old method of only reusing the same block from the old version of the file. If the hash algorithm is secure (SHA256) we do exactly like we do today and use the block map to find a matching block anywhere. If the hash algorithm is not secure (Murmur3) we will reuse a block from the same file with the same offset, size and hash only.

The last point means that we don't need to worry about hash collisions between random blocks in general, we only need to worry about a given block ending up with the same hash as the previous version of the exact same block which is astronomically unlikely with at least Murmur3.

At some point - but maybe not immediately? - we need to handle how to change the hash algorithm. It needs to clear out the folder and reindex from start, possibly requiring a restart. So the existing hash algorithm per folder should be stashed in the database, and if it doesn't match we drop the folder and rescan.

@AudriusButkevicius
Copy link
Member

So two things:

  1. We should handle algo changing now, as I bet people will blow themselves up in the face.
  2. We only warn about incompatible algos, but we still accept updates, which means we might be comparing hashes of different algos. How does that even work?

@calmh
Copy link
Member Author

calmh commented Nov 12, 2015

Ack on #1. For #2, what should we do? I was considering dropping the connection as it's a serious problem. We could also attempt to just drop that folder from the clusterconfig and so on but it's more intricate. (I guess what happens now is the dreaded "peer went away or file changed" thing.)

@calmh
Copy link
Member Author

calmh commented Nov 13, 2015

For changing algorithm, I think the easiest way to handle it is the same way we handle changing the folder path. That is, we don't allow it at all after the folder has been created. To change it, remove the folder (which we should handle better, causing it to actually be removed immediately, from the config and database) and then re-add it.

This is of course intrusive and annoying, but it should be - because it also requires doing the same on other devices and so on, so just updating it and keeping it shared with the devices it was shared with before doesn't make sense.

@AudriusButkevicius
Copy link
Member

So for #2, I'd just error out the folder when my algo doesn't match some other peers algo.

@alex2108
Copy link
Contributor

Just not allowing a change of algorithm on the GUI will lead to problems, people will change it in the config.xml and restart syncthing like they do for folder paths (at least I did that several times after moving the actual folder manually).

@calmh
Copy link
Member Author

calmh commented Nov 14, 2015

Yeah

@calmh
Copy link
Member Author

calmh commented Nov 17, 2015

This is starting to look about done. I think this is a v0.13 change as well. It's backwards compatible with v0.12 as long as SHA256 is used...

@calmh calmh added the protocol Issues that require a change to the protocol (BEP, relay, discovery) label Nov 17, 2015
@calmh calmh modified the milestones: v0.13 "Copper Cockroach", v1.0 Nov 17, 2015
@calmh calmh added the enhancement New features or improvements of some kind, as opposed to a problem (bug) label Nov 17, 2015
@AudriusButkevicius
Copy link
Member

So I'd say fuck it, let's cut the mark, and start working on v0.13, which means screw the backwards compatibility, add HashAlgo as a normal field, etc.

I'll go through the changes now.

flags |= protocol.FolderHashMurmur3 << protocol.FolderHashShiftBits
default:
panic("unknown hash algorithm")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have it like:

flags |= hashAlgo.Flags() or Bits()

flag.Parse()

var algo scanner.HashAlgorithm
if err := algo.UnmarshalText([]byte(*hashAlgo)); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have scanner.HashAlgorithmFromString()

@calmh
Copy link
Member Author

calmh commented Nov 17, 2015

So I'd say fuck it, let's cut the mark, and start working on v0.13, which means screw the backwards compatibility

Agreed. v0.12 can live for a while in maintenance anyway. And I'll probably view 0.13 as 1.0beta...

add HashAlgo as a normal field, etc.

Strongly disagree here though, as I think this is exactly what the flag field is for. I'm even thinking to move the compression field into here in the same manner.

Fully agree on the HashAlgorithmFromBits and so on methods though.

@AudriusButkevicius
Copy link
Member

So I'd say that hash algorithm is quite an integral part.
Furthermore, we should not use option fields and embed them in the core protocol while we are 1.0v.
My view of options and flags was something like SSL, where you have extensions, after the protocol is already standardized, and in an attempt to keep backwards compatibility between releases in the beta.

@calmh
Copy link
Member Author

calmh commented Nov 17, 2015

Yes on the options field, but no on the flags field. The flags field already covers the introducer, read only (master) and so on bits which are exactly like compression algorithm. The fact that it is integral means that it must be a part of the protocol and have a certain set of defined values and so on.

@AudriusButkevicius
Copy link
Member

Yes, makes sense, I still thought it was part of the options since the first iteration of this.

@calmh
Copy link
Member Author

calmh commented Nov 17, 2015

Ah 👍

@calmh
Copy link
Member Author

calmh commented Nov 18, 2015

Should add the hash algos in use to the usage report.

My feeling here is that this is not a new usage reporting version and doesn't require revoking the usage reporting permission. My reasoning is that the user accepted to report features in use, the preview contained a number of per-folder features, and the hash algorithm is just one more feature in that export.

Am I wrong here?

@AudriusButkevicius
Copy link
Member

Agree, and it does not reveal anything private.

@AudriusButkevicius
Copy link
Member

Apart from the commented out benchnark block, and not being mergable, I am hapoy to merge.

@calmh
Copy link
Member Author

calmh commented Jan 1, 2016

Mismatches are handled pseudo-gracefully. In the new code, it detects the mismatch and stops the folder:

screen shot 2016-01-01 at 17 42 06

The error message is kind of ugly as it's very long and gets cut off. I'm not sure how to improve that.

The old code doesn't understand that there is a mismatch, as it doesn't look at the flags. It instead gets hash mismatches for everything:

[I6KAH] 17:42:40 INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later

screen shot 2016-01-01 at 17 42 14

So, do we still want this at all - does it have tangible benefits somewhere, on mobile?

If we do, do we want it in a patch release or in v0.13? We could make it v0.13 just to make the change visible, even if we're not "hard" incompatible with previous versions.

@AudriusButkevicius
Copy link
Member

I think we should get more stuff into 0.13, and get it merged there, and class it as incompatible.

Also, we might want to add something to 0.12 to make the incompatibility more obvious (such as checking ClusterConfig, atleast while we are not at 1.0) so that there would be less support.

I think this is cool as it's modular, and it's a nice addition.
Should we branch v0.12 and v0.13 now?
Also we should perhaps start building for PPC and other architectures, given they are now available with the beta.

AudriusButkevicius added a commit that referenced this pull request Jan 1, 2016
Detect nonstandard hash algo and stop folder (ref #2314)
@calmh
Copy link
Member Author

calmh commented Jan 1, 2016

Lets open for v0.13 additions, whatever they may be, yes.

Also we should perhaps start building for PPC and other architectures, given they are now available with the beta.

Interesting! Indeed, even with 1.5.2;

jb@syno:~/s/g/s/syncthing $ go run build.go -goos linux -goarch ppc64 build
...
jb@syno:~/s/g/s/syncthing $ file syncthing 
syncthing: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, not stripped

Absolutely no way whatsoever to test it, but maybe someone has boxes like that out there.

@alex2108
Copy link
Contributor

alex2108 commented Jan 1, 2016

Finally had the time and motivation to do a test on my "calculator" (raspberry pi 1) where this is could be useful:

Murmur3-128: 3min31sec
SHA256: 5min01sec
SHA256-noBlockMap: 4min17sec

My test was adding a 200MiB file on a fast device (so the CPU of the pi is the limitation) and sending it to the pi and measuring the time for the transfer to be completed (pi was at idle at start of transfer). The SHA256-noBlockMap is a self compiled version that has Secure=false also for SHA256 to compare how that affects the performance.

Sending from pi to the fast device is always ~3min30sec (tested before thinking about it... there is no hashing in that direction.)

Not a huge increase in performance, but still nice to have. I suggest adding the possibility of deactivating blockmap for SHA256 to get some additional performance for slow devices without losing the benefit of efficient copies on faster devices because this option does not need to be set on all devices.

edit: scanning the file itself:

Murmur3-128: 0m12.082s
SHA256: 0m45.327s

Here the difference is a lot more obvious, but at least for me this is the least important part since it's really rare that I actually add a file to a synced folder on the pi itself.

@calmh
Copy link
Member Author

calmh commented Jan 1, 2016

@alex2108 ❤️ Thanks!

@burkemw3
Copy link
Contributor

burkemw3 commented Jan 2, 2016

I think providing configurable hash algos is opening up a new complication that will require maintenance and support. I predict I'll be annoyed seeing those messages in the support forum, but I can keep that to myself. If the benefit of configuration outweighs that indefinite time sink, then let's do it!

If it is merged, I recommend modifying the UI message. How does the user know what action to take when a "Hash algorithm mismatch ..." message is displayed?

I'm not opposed to only changing the hashing algorithm. I haven't been following the discussion close enough to see if there if an algo that performs fast enough, including on our "calculator" platforms, and satisfies our risk of collisions.

@AudriusButkevicius
Copy link
Member

I think it's such an advanced feature that only people who know how to fix/use it will be the ones that fiddle with it, keeping the support overhead low.

@calmh
Copy link
Member Author

calmh commented Jan 12, 2016

Here's how I see it. There are advantages and disadvantages to this;

👍 A lower overhead algorithm improves performance on low powered devices. However, I'm not sure by how much. Probably quite a lot when scanning large files; but when syncing small or medium size files I suspect our database shuffling and stuff dominates. The test by @alex2108 above is probably the best case (maximum performance difference due to syncing a single large file) and there the difference is one between 4:17 and 3:31 (a ~21% performance increase).

👎 Code complexity increases. We need to handle the mismatches, do conversions or rescans, take into account the differences between hash algos in a lots of places, from here on and forever into the future. Once we add this there is probably no going back.

👎 Usage complexity increases. It's one more thing to choose from, and that can go wrong. This doesn't necessary need to be a big deal, but handling it super smoothly in all cases (more than just displaying a cryptic error message and failing to do what the user wanted) will take some engineering effort.

All in all, this feature is currently on the minus side in my mind. It's a 20% performance improvement, but a negative in all other aspects. On faster hardware it's a net zero - we used a little fewer CPU cycles doing the same work, but we weren't running short of CPU cycles to start with.

It's an interesting experiment but to me it feels like it costs more than it's worth, in maintenance and complexity. Taking the long view, I still think that crypto primitives like SHA256 is going to get faster, both due to hardware implementation in otherwise low power devices and potentially due to optimizations in the standard library etc.

I propose to shelf this and refocus on other stuff that makes a bigger difference, like temp indexes and variable block size.

@AudriusButkevicius
Copy link
Member

I guess I am sort of with you here.

@alex2108
Copy link
Contributor

How about just having the option to disable block map?
👍 still a bit of performance improvement, around the same as just the difference in hash algorithm.
👍 no disadvantages for faster devices (copying blocks from other files still works there), also no hash mismatches, can probably be changed without problems...
👎 code complexity is probably still a negative point, but seems to be less.

@calmh
Copy link
Member Author

calmh commented Jan 12, 2016

Possibly yes. TBD after the new puller.

@calmh calmh removed this from the v1.0 milestone Jan 12, 2016
@calmh calmh closed this Jan 12, 2016
@calmh
Copy link
Member Author

calmh commented Jan 12, 2016

This is on ice. The code is out there, for interested parties.

@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Jun 16, 2017
@syncthing syncthing locked and limited conversation to collaborators Jun 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New features or improvements of some kind, as opposed to a problem (bug) frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion protocol Issues that require a change to the protocol (BEP, relay, discovery)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants