Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-serializing XDR hashers for SHA256 and ShortHash. #2093

Merged
merged 1 commit into from
Oct 10, 2019

Conversation

graydon
Copy link
Contributor

@graydon graydon commented May 7, 2019

Description

This adds a new implementation of SipHash (borrowed from the internet -- it's quite a standard function and there are zillions of implementations) that operates in incremental mode, allowing us to stream data through it rather than only hashing ByteSlices.

Additionally, it adds an XDR archiver class that streams the bytes of an XDR object's serial representation through such a SipHasher, without allocating/serializing/freeing a byte buffer.

This should let us replace many of the cases we're using allocation-hungry hash functions in hashtables keyed by XDR values. Saw another of these in @jonjove's recent PR, figured I'd fix the problem at the source!

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v5.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • N/A If change impacts performance, include supporting evidence per the performance document

@MonsieurNicolas
Copy link
Contributor

Cool.

A few ideas/comments:

  • the high impact hasher that we use all over the place is sha256, where we often do things like sha256(xdr::xdr_to_opaque(X)), I think the "hashing xdr archiver" should target sha256 from the get go
  • I would prefer to not take another dependency just for SipHash
    • libsodium already has an implementation in the form of crypto_shorthash but doesn't expose an API for multi part hashing (like it does for SHA256 and "generic hash"). https://download.libsodium.org/doc/hashing
    • we should contribute the multi-part API to short hash to libsodium (it's one of the key libraries that we use, and it's the occasion to contribute back)
  • for sipHash specifically, in the context of hash tables (unlike HMAC use cases), we don't actually need to compute sha256(xdr::xdr_to_opaque(X)). All we need is something good enough to be used as a hash function (performance is a big factor):
    • we don't need to include the length for variable size elements (so we can avoid including many 4 bytes length), maybe we also skip discriminant values
    • we don't need to perform endian conversion
    • we don't need to hash the entire content (we can randomize this as needed)
    • in general, we may be able to feed data to the hash function in 32 bit chunks (truncating down)
  • as this has (potentially large) performance implications, we'll have to, at a minimum, perform some side-by-side performance comparison

@graydon
Copy link
Contributor Author

graydon commented May 8, 2019

Ok.

Don't merge this until our sodium repo has absorbed the sodium change in some form.

I don't think I agree that stopping short of hashing the full input is wise; there's no generic way of knowing which bytes in the input are likely stable vs. likely to change between different keys, and guessing wrong means only hashing the stable keys and then the hashtable degrades to linear search (everything hashes to the same bucket). I also think the cost of byteswapping integer fields is negligible relative to the other costs of hashing here, and keeping it in place makes the code easy to test by comparison to serialization; removing it seems to me not worth it.

Perf measurements follow:

SipHash2,4:

 Performance counter stats for './src/stellar-core test [sh-bytes-bench]':

       3877.593153      task-clock (msec)         #    1.000 CPUs utilized
    10,818,972,497      cycles                    #    2.790 GHz
    23,829,260,022      instructions              #    2.20  insn per cycle

 Performance counter stats for './src/stellar-core test [sh-xdr-bench]':

       2997.756165      task-clock (msec)         #    1.000 CPUs utilized
     8,359,917,933      cycles                    #    2.789 GHz
    19,216,317,074      instructions              #    2.30  insn per cycle


SHA256:

 Performance counter stats for './src/stellar-core test [sha-bytes-bench]':

      16397.686427      task-clock (msec)         #    1.000 CPUs utilized
    55,691,849,848      cycles                    #    3.396 GHz
   165,904,792,137      instructions              #    2.98  insn per cycle

 Performance counter stats for './src/stellar-core test [sha-xdr-bench]':

      16208.132806      task-clock (msec)         #    1.000 CPUs utilized
    54,294,671,822      cycles                    #    3.350 GHz
   163,289,596,956      instructions              #    3.01  insn per cycle

@graydon graydon changed the title Add incremental SipHash and XDR archiver that uses it. Add non-serializing XDR hashers for SHA256 and ShortHash. May 8, 2019
@MonsieurNicolas
Copy link
Contributor

We'll revive this for 12.1.0

@graydon graydon force-pushed the xdr-siphash branch 2 times, most recently from 94851b7 to 47cb92a Compare September 19, 2019 23:45
@graydon
Copy link
Contributor Author

graydon commented Sep 19, 2019

Ok, it looks like libsodium is declining to take the patch, so that puts us back to the first approach: integrate a small incremental-able 3rd-party implementation of siphash ourselves. Updated and rebased patch to do that.

@graydon
Copy link
Contributor Author

graydon commented Sep 22, 2019

Finally pushed variant that combines small-buffer use as in the libsodium attempt with the 3rd party incremental version of siphash: both are required to beat (performance-wise) the serialize-to-a-buffer-then-hash approach, and even then we only beat by a small margin.

I still think this is probably worth doing at least because it gets another heavy user of the allocator out of the picture, which makes profiling allocations easier / less noisy, and means we're less performance-coupled to allocator speed. But it's not a slam dunk, if you don't like the look of this patch we can close it out.


 Performance counter stats for './src/stellar-core test [sh-bytes-bench]' (5 runs):

       3261.538190      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.20% )
                 5      context-switches          #    0.001 K/sec                    ( +- 16.27% )
                 0      cpu-migrations            #    0.000 K/sec                  
               972      page-faults               #    0.298 K/sec                    ( +-  0.08% )
    10,689,814,393      cycles                    #    3.278 GHz                      ( +-  0.45% )
     3,307,221,261      stalled-cycles-frontend   #   30.94% frontend cycles idle     ( +-  1.11% )
    23,808,704,363      instructions              #    2.23  insn per cycle         
                                                  #    0.14  stalled cycles per insn  ( +-  0.00% )
     2,820,044,672      branches                  #  864.636 M/sec                    ( +-  0.00% )
        47,846,095      branch-misses             #    1.70% of all branches          ( +-  3.15% )

       3.262230169 seconds time elapsed                                          ( +-  0.20% )


 Performance counter stats for './src/stellar-core test [sh-xdr-bench]' (5 runs):

       3200.670976      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.81% )
                 4      context-switches          #    0.001 K/sec                    ( +- 13.69% )
                 0      cpu-migrations            #    0.000 K/sec                  
               973      page-faults               #    0.304 K/sec                    ( +-  0.08% )
    10,473,582,261      cycles                    #    3.272 GHz                      ( +-  0.50% )
     2,394,832,735      stalled-cycles-frontend   #   22.87% frontend cycles idle     ( +-  2.20% )
    27,565,040,226      instructions              #    2.63  insn per cycle         
                                                  #    0.09  stalled cycles per insn  ( +-  0.00% )
     2,237,366,954      branches                  #  699.031 M/sec                    ( +-  0.00% )
        37,054,101      branch-misses             #    1.66% of all branches          ( +-  0.68% )

       3.201269370 seconds time elapsed                                          ( +-  0.81% )


 Performance counter stats for './src/stellar-core test [sha-bytes-bench]' (5 runs):

      17201.839200      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.56% )
                14      context-switches          #    0.001 K/sec                    ( +- 21.55% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 40.82% )
             1,170      page-faults               #    0.068 K/sec                    ( +- 10.45% )
    56,389,896,362      cycles                    #    3.278 GHz                      ( +-  0.62% )
     3,556,322,508      stalled-cycles-frontend   #    6.31% frontend cycles idle     ( +-  7.09% )
   166,974,728,788      instructions              #    2.96  insn per cycle         
                                                  #    0.02  stalled cycles per insn  ( +-  0.01% )
     3,786,760,790      branches                  #  220.137 M/sec                    ( +-  0.06% )
        52,650,936      branch-misses             #    1.39% of all branches          ( +-  3.88% )

      17.202929930 seconds time elapsed                                          ( +-  0.56% )


 Performance counter stats for './src/stellar-core test [sha-xdr-bench]' (5 runs):

      16564.852926      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.42% )
                14      context-switches          #    0.001 K/sec                    ( +- 19.82% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 40.82% )
             1,173      page-faults               #    0.071 K/sec                    ( +- 10.47% )
    54,771,921,730      cycles                    #    3.307 GHz                      ( +-  0.15% )
     3,062,216,304      stalled-cycles-frontend   #    5.59% frontend cycles idle     ( +-  3.20% )
   164,375,820,787      instructions              #    3.00  insn per cycle         
                                                  #    0.02  stalled cycles per insn  ( +-  0.01% )
     3,104,093,771      branches                  #  187.390 M/sec                    ( +-  0.06% )
        45,769,801      branch-misses             #    1.47% of all branches          ( +-  1.09% )

      16.567129726 seconds time elapsed                                          ( +-  0.42% )

@MonsieurNicolas
Copy link
Contributor

r+ 9c1c9f0

latobarita added a commit that referenced this pull request Oct 10, 2019
Add non-serializing XDR hashers for SHA256 and ShortHash.

Reviewed-by: MonsieurNicolas
@latobarita latobarita merged commit 9c1c9f0 into stellar:master Oct 10, 2019
@graydon graydon deleted the xdr-siphash branch January 3, 2020 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants