-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enforce canonicity on HashSet/BTreeSet/HashMap/BTreeMap
#162
Conversation
18d078f
to
bf30615
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only tests here, but no actual implementation that was in #160. Is it on purpose?
Nope, it's just draft. Pushed it to periodically see feedback from ci. |
e3db33a
to
a4e7039
Compare
HashSet/BTreeSet/HashMap/BTreeMap
HashSet/BTreeSet/HashMap/BTreeMap
#[cfg(not(feature = "de_strict_order"))] | ||
{ | ||
let result = result.unwrap(); | ||
assert_eq!(result.len(), arr_key.len()); | ||
for key in &arr_key { | ||
assert!(result.contains(key)); | ||
|
||
} | ||
|
||
} | ||
|
||
#[cfg(feature = "de_strict_order")] | ||
{ | ||
assert!(result.is_err()); | ||
|
||
assert_eq!(result.unwrap_err().to_string(), ERROR_WRONG_ORDER_OF_KEYS); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You implemented it well, but semantically this feels wrong to me. This test ensures that if canonicity is broken, we still read the data just fine.
Let me think out loud here:
Currently, we ask people to opt-in for strong consistency, and if any crate in the dependencies tree opts-in (when borsh is used in several crates using different features), the strong consistency checks will be enabled everywhere, and potentially erroring on incorrectly constructed payloads and affecting performance (due to validation on deserialization).
Alternatively, we may enable the checks by default and create a feature to opt-out ("non-strict-mode"), and the risk will be the opposite that someone enabling the feature in one crate will disable canonicity checks (on deserialization), and potentially speed things up.
I cannot say I like either of the approaches more than the other, but this test case looks like we endorse broken canonicity. Also, I feel that to be on the safe side of things, I would enable this feature by default (when making a breaking change release), but at the same time, it would introduce a performance hit onto the users who always construct and read the data with borsh-rs (e.g. near-sdk-rs smart contracts, nearcore internals), and serialization is canonical there.
@dj8yfo What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test just creates a piece of runnable documentation of actual behavior and doesn't attempt to endorse or denounce canonicity.
A feature by itself doesn't nudge crate's users in any particular direction, enabling/disabling it by default does. I think that decision at some point is up to you @frol.
To better understand performance impact, a benchmark with some data type(s) used in practice is due as part of this task/another pr.
My intuition is that performance impact must be negligible for most practical types, which don't attempt to allocate memory on comparisons. But, as it often happens with intuition, it may be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔆
c0f3e1c
to
6061892
Compare
…p` impl [✓] (re)write `HashSet/BTreeSet/HashMap/BTreeMap` implementations ├── [✓] `HashSet` │ ├── [✓] rewrite to no extra `Vec` allocation │ └── [✓] `#cfg(...)` addon ├── [✓] `HashMap` │ ├── [✓] add `length_hint` in capacity allocation │ └── [✓] `#cfg(...)` addon ├── [✓] `BTreeSet` │ ├── [✓] leave it as is, as `BTreeSet` │ │ └── [✓] has optimisation for initialization from Iterator (with sorting) │ │ └── [✓] [set source](https://doc.rust-lang.org/src/alloc/collections/btree/set.rs.html#1210) │ └── [✓] `#cfg(...)` addon └── [✓] `BTreeMap` ├── [✓] replace with parsing to `Vec` first │ └── [✓] as `BTreeMap` has optimisation for init from Iterator (with sorting) │ └── [✓] [map source](https://doc.rust-lang.org/src/alloc/collections/btree/map.rs.html#2237) └── [✓] `#cfg(...)` addon
6061892
to
c52c2fa
Compare
|
I don't see any other references to
Do I read it correctly that performance degraded even without What is the current state? It sounds like you are still working on it, and we are not ready to merge, so I will turn this PR into draft (please, do it yourself next time to communicate this state). Turn it back "ready for review" once you are ready. |
This is a renewed summary of current pr.
Below is an example of expected errors in measurements ( # compare master with itself; error up to 7%
❯ critcmp master master2 -t 6
group master master2
----- ------ -------
alloc::collections::btree::set::BTreeSet<alloc::string::String>_1000/borsh_de/ 1.07 80.7±2.65µs ? ?/sec 1.00 75.4±1.36µs ? ?/sec Next, is difference between ❯ critcmp master baseline_canonicity
group baseline_canonicity master
----- ------------------- ------
alloc::collections::btree::map::BTreeMap<alloc::string::String, alloc::string::String>_10/borsh_de/ 1.08 1254.0±2.68ns ? ?/sec 1.00 1156.9±3.08ns ? ?/sec
alloc::collections::btree::map::BTreeMap<alloc::string::String, alloc::string::String>_1000/borsh_de/ 1.00 154.2±0.29µs ? ?/sec 1.46 224.8±0.71µs ? ?/sec
alloc::collections::btree::map::BTreeMap<alloc::string::String, alloc::string::String>_10000/borsh_de/ 1.00 1654.9±5.81µs ? ?/sec 1.65 2.7±0.01ms ? ?/sec
alloc::collections::btree::map::BTreeMap<benchmarks::PublicKey, benchmarks::PublicKey>_10/borsh_de/ 1.00 369.2±1.90ns ? ?/sec 1.12 414.9±1.39ns ? ?/sec
alloc::collections::btree::map::BTreeMap<benchmarks::PublicKey, benchmarks::PublicKey>_1000/borsh_de/ 1.00 39.0±0.09µs ? ?/sec 2.16 84.4±0.54µs ? ?/sec
alloc::collections::btree::map::BTreeMap<benchmarks::PublicKey, benchmarks::PublicKey>_10000/borsh_de/ 1.00 413.6±0.72µs ? ?/sec 2.52 1042.4±7.39µs ? ?/sec
alloc::collections::btree::set::BTreeSet<alloc::string::String>_10/borsh_de/ 1.11 718.2±8.19ns ? ?/sec 1.00 645.1±2.90ns ? ?/sec
alloc::collections::btree::set::BTreeSet<alloc::string::String>_1000/borsh_de/ 1.00 77.3±0.32µs ? ?/sec 1.04 80.7±2.65µs ? ?/sec
alloc::collections::btree::set::BTreeSet<alloc::string::String>_10000/borsh_de/ 1.03 887.7±1.92µs ? ?/sec 1.00 863.7±4.21µs ? ?/sec
alloc::collections::btree::set::BTreeSet<benchmarks::PublicKey>_10/borsh_de/ 1.00 172.2±3.60ns ? ?/sec 1.02 176.0±3.02ns ? ?/sec
alloc::collections::btree::set::BTreeSet<benchmarks::PublicKey>_1000/borsh_de/ 1.02 18.3±0.22µs ? ?/sec 1.00 18.0±0.05µs ? ?/sec
alloc::collections::btree::set::BTreeSet<benchmarks::PublicKey>_10000/borsh_de/ 1.02 188.1±0.65µs ? ?/sec 1.00 184.0±0.62µs ? ?/sec
std::collections::hash::map::HashMap<alloc::string::String, alloc::string::String>_10/borsh_de/ 1.00 1590.3±6.73ns ? ?/sec 1.30 2.1±0.05µs ? ?/sec
std::collections::hash::map::HashMap<alloc::string::String, alloc::string::String>_1000/borsh_de/ 1.00 198.2±5.08µs ? ?/sec 1.41 279.9±0.86µs ? ?/sec
std::collections::hash::map::HashMap<alloc::string::String, alloc::string::String>_10000/borsh_de/ 1.00 2.2±0.05ms ? ?/sec 1.31 2.8±0.01ms ? ?/sec
std::collections::hash::map::HashMap<benchmarks::PublicKey, benchmarks::PublicKey>_10/borsh_de/ 1.00 521.8±9.18ns ? ?/sec 1.36 707.5±1.65ns ? ?/sec
std::collections::hash::map::HashMap<benchmarks::PublicKey, benchmarks::PublicKey>_1000/borsh_de/ 1.00 47.3±0.13µs ? ?/sec 1.70 80.4±0.48µs ? ?/sec
std::collections::hash::map::HashMap<benchmarks::PublicKey, benchmarks::PublicKey>_10000/borsh_de/ 1.00 505.3±1.18µs ? ?/sec 1.49 751.3±2.40µs ? ?/sec
std::collections::hash::set::HashSet<alloc::string::String>_10/borsh_de/ 1.04 1066.7±1.40ns ? ?/sec 1.00 1024.6±2.85ns ? ?/sec
std::collections::hash::set::HashSet<alloc::string::String>_1000/borsh_de/ 1.03 119.9±2.29µs ? ?/sec 1.00 116.1±1.33µs ? ?/sec
std::collections::hash::set::HashSet<alloc::string::String>_10000/borsh_de/ 1.00 1255.9±14.32µs ? ?/sec 1.01 1272.7±40.84µs ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_10/borsh_de/ 1.13 381.0±30.18ns ? ?/sec 1.00 335.7±2.65ns ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_1000/borsh_de/ 1.00 29.5±0.89µs ? ?/sec 1.00 29.6±0.27µs ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_10000/borsh_de/ 1.00 306.3±1.98µs ? ?/sec 1.02 312.9±1.26µs ? ?/sec
This list has 2 anomalies, which must be due to some intermittent error on specific hardware, which i couldn't get rid of even with prolonged measument times. group baseline_canonicity master
----- ------------------- ------
alloc::collections::btree::set::BTreeSet<alloc::string::String>_10/borsh_de/ 1.11 718.2±8.19ns ? ?/sec 1.00 645.1±2.90ns ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_10/borsh_de/ 1.13 381.0±30.18ns ? ?/sec 1.00 335.7±2.65ns ? ?/sec
Finally, raw unadjusted data of difference between ❯ critcmp baseline_canonicity canonicity_de_strict_order
group baseline_canonicity canonicity_de_strict_order
----- ------------------- --------------------------
alloc::collections::btree::map::BTreeMap<alloc::string::String, alloc::string::String>_10/borsh_de/ 1.00 1254.0±2.68ns ? ?/sec 1.01 1268.7±3.63ns ? ?/sec
alloc::collections::btree::map::BTreeMap<alloc::string::String, alloc::string::String>_1000/borsh_de/ 1.00 154.2±0.29µs ? ?/sec 1.00 154.9±5.27µs ? ?/sec
alloc::collections::btree::map::BTreeMap<alloc::string::String, alloc::string::String>_10000/borsh_de/ 1.00 1654.9±5.81µs ? ?/sec 1.02 1689.3±4.42µs ? ?/sec
alloc::collections::btree::map::BTreeMap<benchmarks::PublicKey, benchmarks::PublicKey>_10/borsh_de/ 1.01 369.2±1.90ns ? ?/sec 1.00 366.5±1.37ns ? ?/sec
alloc::collections::btree::map::BTreeMap<benchmarks::PublicKey, benchmarks::PublicKey>_1000/borsh_de/ 1.00 39.0±0.09µs ? ?/sec 1.00 39.1±0.07µs ? ?/sec
alloc::collections::btree::map::BTreeMap<benchmarks::PublicKey, benchmarks::PublicKey>_10000/borsh_de/ 1.00 413.6±0.72µs ? ?/sec 1.01 415.9±0.78µs ? ?/sec
alloc::collections::btree::set::BTreeSet<alloc::string::String>_10/borsh_de/ 1.00 718.2±8.19ns ? ?/sec 1.02 734.4±5.56ns ? ?/sec
alloc::collections::btree::set::BTreeSet<alloc::string::String>_1000/borsh_de/ 1.00 77.3±0.32µs ? ?/sec 1.08 83.5±0.32µs ? ?/sec
alloc::collections::btree::set::BTreeSet<alloc::string::String>_10000/borsh_de/ 1.00 887.7±1.92µs ? ?/sec 1.06 936.8±5.02µs ? ?/sec
alloc::collections::btree::set::BTreeSet<benchmarks::PublicKey>_10/borsh_de/ 1.00 172.2±3.60ns ? ?/sec 1.07 184.7±1.41ns ? ?/sec
alloc::collections::btree::set::BTreeSet<benchmarks::PublicKey>_1000/borsh_de/ 1.00 18.3±0.22µs ? ?/sec 1.07 19.6±0.34µs ? ?/sec
alloc::collections::btree::set::BTreeSet<benchmarks::PublicKey>_10000/borsh_de/ 1.00 188.1±0.65µs ? ?/sec 1.06 198.5±0.44µs ? ?/sec
std::collections::hash::map::HashMap<alloc::string::String, alloc::string::String>_10/borsh_de/ 1.01 1590.3±6.73ns ? ?/sec 1.00 1569.7±9.70ns ? ?/sec
std::collections::hash::map::HashMap<alloc::string::String, alloc::string::String>_1000/borsh_de/ 1.00 198.2±5.08µs ? ?/sec 1.00 197.9±0.43µs ? ?/sec
std::collections::hash::map::HashMap<alloc::string::String, alloc::string::String>_10000/borsh_de/ 1.00 2.2±0.05ms ? ?/sec 1.05 2.3±0.05ms ? ?/sec
std::collections::hash::map::HashMap<benchmarks::PublicKey, benchmarks::PublicKey>_10/borsh_de/ 1.00 521.8±9.18ns ? ?/sec 1.05 548.0±25.73ns ? ?/sec
std::collections::hash::map::HashMap<benchmarks::PublicKey, benchmarks::PublicKey>_1000/borsh_de/ 1.00 47.3±0.13µs ? ?/sec 1.13 53.3±2.53µs ? ?/sec
std::collections::hash::map::HashMap<benchmarks::PublicKey, benchmarks::PublicKey>_10000/borsh_de/ 1.00 505.3±1.18µs ? ?/sec 1.01 511.3±1.67µs ? ?/sec
std::collections::hash::set::HashSet<alloc::string::String>_10/borsh_de/ 1.02 1066.7±1.40ns ? ?/sec 1.00 1043.3±2.58ns ? ?/sec
std::collections::hash::set::HashSet<alloc::string::String>_1000/borsh_de/ 1.01 119.9±2.29µs ? ?/sec 1.00 119.2±1.13µs ? ?/sec
std::collections::hash::set::HashSet<alloc::string::String>_10000/borsh_de/ 1.00 1255.9±14.32µs ? ?/sec 1.02 1275.2±1.75µs ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_10/borsh_de/ 1.14 381.0±30.18ns ? ?/sec 1.00 334.2±2.69ns ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_1000/borsh_de/ 1.00 29.5±0.89µs ? ?/sec 1.04 30.7±0.36µs ? ?/sec
std::collections::hash::set::HashSet<benchmarks::PublicKey>_10000/borsh_de/ 1.00 306.3±1.98µs ? ?/sec 1.26 385.9±0.37µs ? ?/sec
|
@dj8yfo Do I read it right, that:
This is really nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR not only brought canonicity checks (opt-in for now), but also x1.5-2 performance improvements to the deserialization of BTreeMap and HashMap collections!
Resolves #63