-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree: fix add_tree to be constant-time #1267
Conversation
I'll try to highlight the complexity problem using #1237 's large mode, and the see if your PR fixes it |
Since
I have then modified
I conclude that Example of a path collapse: |
Thanks for the analysis. Indeed,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested https://github.com/tarides/support/issues/10 on top of this PR and it fixes the issue 👍
src/irmin/tree.ml
Outdated
|| | ||
match (x.v, y.v) with | ||
| Hash (_, x), Hash (_, y) -> hash_equal x y | ||
| Value (_, x, None), Value (_, y, None) -> equal_node x y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can trop the 2 first cases as they are redundant with the ones below using cached_value
and cached_hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not totally redundant: we have to decide which equality function we will be using first depending on their underlying representation. I don't know if that makes a big difference in practice (probably not) so yes can simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see what you mean.
I tried with and without and I couldn't see any difference.
|
||
(* same as [equal] but do not compare in-memory maps | ||
recursively. *) | ||
let maybe_equal (x : t) (y : t) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is ambiguous, it returns false
when they still might be equal.
What about shallow_equal
?
We don't want to recursively explore the parameters to check if add_tree is a no-op (to try to preserve physical equality). The cost is too high in this case. Spotted by @camlspotter and profiled by @craigfe and @Ngoguey42
Updated with a simplification of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Merge when ready, and I'll cut a release containing this.
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0) CHANGES: ### Fixed - **irmin-pack** - Fix a bug in `inode` where the `remove` function could cause hashing instabilities. No user-facing change since this function is not being used yet. (mirage/irmin#1247, @Ngoguey42, @icristescu) - **irmin** - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size. (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe) ### Added - **irmin** - Added a `Perms` module containing helper types for using phantom-typed capabilities as used by the store backends. (mirage/irmin#1262, @craigfe) - Added an `Exported_for_stores` module containing miscellaneous helper types for building backends. (mirage/irmin#1262, @craigfe) - Added new operations `Tree.update` and `Tree.update_tree` for efficient read-and-set on trees. (mirage/irmin#1274, @craigfe) - **irmin-pack**: - Added `integrity-check-inodes` command to `irmin-fsck` for checking the integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42) - **irmin-bench** - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42, @craigfe) #### Changed - The `irmin-mem` package is now included with the `irmin` package under the library name `irmin.mem`. It keeps the same top-level module name of `Irmin_mem`. (mirage/irmin#1276, @craigfe) #### Removed - `Irmin_mem` no longer provides the layered in-memory store `Make_layered`. This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0) CHANGES: ### Fixed - **irmin-pack** - Fix a bug in `inode` where the `remove` function could cause hashing instabilities. No user-facing change since this function is not being used yet. (mirage/irmin#1247, @Ngoguey42, @icristescu) - **irmin** - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size. (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe) ### Added - **irmin** - Added a `Perms` module containing helper types for using phantom-typed capabilities as used by the store backends. (mirage/irmin#1262, @craigfe) - Added an `Exported_for_stores` module containing miscellaneous helper types for building backends. (mirage/irmin#1262, @craigfe) - Added new operations `Tree.update` and `Tree.update_tree` for efficient read-and-set on trees. (mirage/irmin#1274, @craigfe) - **irmin-pack**: - Added `integrity-check-inodes` command to `irmin-fsck` for checking the integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42) - **irmin-bench** - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42, @craigfe) #### Changed - The `irmin-mem` package is now included with the `irmin` package under the library name `irmin.mem`. It keeps the same top-level module name of `Irmin_mem`. (mirage/irmin#1276, @craigfe) #### Removed - `Irmin_mem` no longer provides the layered in-memory store `Make_layered`. This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
We don't want to recursively explore the parameters to check if
add_tree is a no-op (to try to preserve physical equality). The
cost is too high in this case.
Spotted by @camlspotter and profiled by @craigfe and @Ngoguey42