New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Use hashbrown
in most workspace members
#4389
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm generally in favor of minimizing std
.
9d74f4e
to
93efa44
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## next #4389 +/- ##
===========================================
- Coverage 77.00% 48.39% -28.61%
===========================================
Files 448 448
Lines 321363 321363
===========================================
- Hits 247454 155530 -91924
- Misses 73909 165833 +91924 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 👍
If we got 3% in the peripheral libs (+ clarity
) then there's more than likely some nice gains to be had in stackslib
, especially in the chainstate index & MARF code (src/chainstate/stacks
) when it comes to block/transaction processing times.
This shouldn't be an issue as we're not directly using any arbitrary user-provided input as keys for hashmaps or values for hashsets, short of information in contracts such as qualified contract identifier, variable and function names, etc. which have been validated for both content and length. There is some documentation from AHash regarding DoS resistance for anyone interested: |
This shouldn't be an issue as we're not directly using any arbitrary
user-provided input as keys for hashmaps or values for hashsets,
Yes we are. Block IDs, txids, and clarity DB keys (among others) are
user-chosen.
…On Sat, Feb 17, 2024, 6:13 AM Cyle Witruk ***@***.***> wrote:
Uses AHash <https://github.com/tkaitchuck/aHash> as the default hasher,
which is much faster than SipHash. However, AHash does not provide the same
level of HashDoS resistance as SipHash, so if that is important to you, you
might want to consider using a different hasher
This shouldn't be an issue as we're not directly using any arbitrary
user-provided input as keys for hashmaps or values for hashsets, short of
information in contracts such as qualified contract identifier, variable
and function names, etc. which have been validated for both content and
length.
There is some documentation from AHash regarding DoS resistance for anyone
interested:
- https://github.com/tkaitchuck/aHash/blob/master/FAQ.md
-
https://github.com/tkaitchuck/aHash/wiki/How-aHash-is-resists-DOS-attacks
—
Reply to this email directly, view it on GitHub
<#4389 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQJK7OWX4MIC77DC73O4DYUCGGRAVCNFSM6AAAAABDMYNKL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBZHEZTSNRYGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
The only place where I can find any actual risk of putting arbitrary "dirty" user input in them is potentially in the HTTP server modules where there are |
The user can't use any arbitrary data for these values, right? That would make finding collisions much harder Even in the unlikely event that someone does manage find hash collisions, it doesn't "break" the |
We wouldn't even need to switch back to https://doc.rust-lang.org/nightly/core/hash/struct.BuildHasherDefault.html#examples |
The only place allowing HashMap keys or HashSet values of arbitrary length+content (i.e. |
@jbencin @cylewitruk The user can choose the hash pre-image, which in turn lets them influence the structure hash table itself (to detrimental ends). The Speaking of, according to the Regarding collision resistance, I'm trying to figure out whether or not this is going to be a problem for us (or rather, already is a problem). The AHash authors speak about it here [3] and here [4]. [1] https://github.com/tkaitchuck/aHash?tab=readme-ov-file#ahash------- [2] https://github.com/rust-lang/hashbrown?tab=readme-ov-file#hashbrown [3] https://github.com/tkaitchuck/aHash/blob/master/FAQ.md [4] https://github.com/tkaitchuck/aHash/wiki/How-aHash-is-resists-DOS-attacks |
I don't think a cryptographic hash is necessary for a hash table. If someone finds a single collision, it's not a problem. An attacker would have to find many collisions to noticeably degrade performance. And even if they do, it's only a performance issue. It's not like we're dealing with block data, where there could be serious consequences for even a single collision. Is there any cryptanalysis on AHash suggesting this is feasible or even possible?
My guess is that it uses a different feature set when used via the |
Also, from the AHash README:
It looks like AHash is initialized with a random seed, which affects the value of the hashes produced. This would make any attempt at a DoS attack pointless, as hashes would not be consistent across nodes, and would not persist through a restart. It also means there is no way for an attacker to predict the hashing behavior of a node in order to find collisions, as it cannot observe the internal state |
I doubt both of these claims. First, the attacker can measure (through many samples, if needed) the amount of time the node takes to handle a p2p or HTTP message. If there's a hashing operation on these code paths, then the attacker can infer the state of the hash table based on how quickly or slowly the target node processes the operation. AHash doesn't claim constant-time hash operations, so I very much doubt that this information cannot be learned. Second, armed with this information, the attacker can construct a sequence of queries against the node that lead to the DoS. The random factor raises the difficulty of this attack only by increasing the number of samples that must be acquired. However, the real issue here is that there's a DoS-able hash table on a network I/O path. What we really want to do is make all such hash tables bound in state, so that the worst case performance for an insert is O(max-size) (which is O(1)). |
Is collecting enough data for a timing attack really feasible here? Would it require paying transaction fees? Shouldn't we ban clients anyway that make excessive requests anyways? If not, isn't that a much simpler DoS attack vector than any |
Anyway, a few comments ago I pointed out that Back on topic, regarding merging this PR -- since |
hashbrown is not in std. The code for hashbrown hashmap is essentially the same as the code for std hashmap, but the default hasher is different (ahash vs siphash). The performance gains we see in this PR are due to the different hashers, not the hashmaps. Anywhere we are worried about hash collision, we can either use std or use a different hasher for hashbrown. Personally I think the odds of anyone actually managing to DOS a node with hashdos is essentially zero. |
Right. If an attacker can insert enough entries into hashtables to cause problems without incurring significant transaction costs, that should be fixed, whether we have collisions or not
I don't think so. If there were, I don't think the
Even if they did manage to make a single node slow and unresponsive, I assume we'd detect it and restart the node. I'm not a DevOps expert, but I'd assume our k8s config has healthchecks for that. It would be a difficult attack to pull off, and I don't see the attacker getting anything out of it |
…`./stackslib` and `./testnet/stacks-node`)
93efa44
to
610f2da
Compare
Agree - we're in extreme-hypothetical-land here, and it's very unclear what such an attack would yield for a potential attacker. I'm pro including this change in |
Description
This PR replaces
std::collections::{HashMap, HashSet}
with thehashbrown
equivalents in most of the workspace members (excludingstackslib
andstacks-node
). By doing this, we are able to validate blocks around 3% faster:Thanks @cylewitruk for the suggestion!
Applicable issues
Additional info (benefits, drawbacks, caveats)
From the Hashbrown README, not sure if this matters in the locations that it is used:
hashbrown
is not quite a drop in replacement for thestd::collections
equivalents. It uses different trait bounds on certain functions, which breaks type inference in some places, and this is why I've skipped making this change instackslib
andstacks-node
for nowChecklist
docs/rpc/openapi.yaml
andrpc-endpoints.md
for v2 endpoints,event-dispatcher.md
for new events)clarity-benchmarking
repobitcoin-tests.yml