-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test and fix representation uniqueness of inode #1247
Conversation
I have updated the code and the introduction above to show the case with a failing root hash |
Great job finding that bug! Can you describe a bit more in which situation that error could appear (e.g. what sequence of operations is needed)? Also we should probably fuzz test that implementation (in an other PR)! :-) |
It's Ioana's finding :) I already wrote some documentation about it in the |
73030fd
to
c47fc6a
Compare
Before merging this PR, it would be nice to add more information in the commit message of the fix too. It's nice to see these explanations in |
I can copy the text I used in the changelog to the commit message. |
CHANGES.md
Outdated
|
||
#### Fixed | ||
- **irmin-pack** | ||
- Bug fix in inode, irmin-pack's tree-like data structure speeding up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we only add small explanations in the changelog. I would only say that "it fixes a bug that lead to an inconsistent hash when removing values from an inode larger than Conf.stable_hash", or something similar, and add more explanations in the commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should be more explicit about the user-facing impact: e.g. no external facing changes if the number of entries is smaller than Conf.
, otherwise remove
s can lead to non-deterministic hashes. This is very important to be explicit about this kind of impact to our user to point them why they need to upgrade. It's ok to keep the explanation in the CHANGELOG long (but precise), as that should be the first place that our users should look for information about the release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded the entry in the changelog. What do you thing?
@icristescu Let's try again. I've factorised everything and doubled the number of comments. |
CHANGES.md
Outdated
|
||
For directories containing at most `Conf.stable_hash` (256) entries, only a | ||
few hashes internal to `inode` could have been inconsistent, but not the | ||
directory's hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users will just see the root hashes, so it's not an issue if the internal hashes are not deterministic. So I suggest something like this:
For directories containing at most `Conf.stable_hash` entries (by default 256), no bug occurred. In this case, this change does not induce any user-facing change.
For directories containing more than `Conf.stable_hash` entries, the directory's hash might depends on the past sequence of operations and is thus not deterministic, which breaks various Irmin invariants and could cause data inconsistencies.
and remove the rests as it's too much internal implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Ioana suggested, you can keep the details in the commit message (as the next developper doing git blame
will like to know about the gory details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey, i'll do exactly that. Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the tests are much clearer now.
After some investigation I got to the conclusion that I'm going to update the changelog entry to a one liner and move all the gory details for the merge commit. I'll try to merge this afternoon |
66a8cbc
to
b607679
Compare
Fix a bug in `inode` where the `remove` function could cause hashing instabilities. No user-facing change since this function is not being used yet. -- Size thresholds to trigger that bug For directories containing at most `Conf.stable_hash` entries (by default 256), no bug occurred. In this case, this change does not induce any user-facing change. For directories containing more than `Conf.stable_hash` entries, the directory's hash might depends on the past sequence of operations and is thus not deterministic, which breaks various Irmin invariants and could cause data inconsistencies. For directories containing more than `Conf.entries` (32) entries but at most `Conf.stable_hash` (256) a few hashes internal to `inode` could have been inconsistent, but not the directory's hash. Adding more entries to such an inode would most likely bring it to the stage where even the root hash is affected. -- Technical description In an inode tree, whenever a `remove` operation emptied a leaf, the hash of all the ancestors of that leaf calculated a hash that was not the same as if that leaf never contained any values in the first place. The root hash is not affected on buggy inodes containing less than `Conf.stable_hash` entries because in that case the root hash is computed without relying on the structure if the tree, only the leaves are taken into account.
b607679
to
5657365
Compare
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0) CHANGES: ### Fixed - **irmin-pack** - Fix a bug in `inode` where the `remove` function could cause hashing instabilities. No user-facing change since this function is not being used yet. (mirage/irmin#1247, @Ngoguey42, @icristescu) - **irmin** - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size. (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe) ### Added - **irmin** - Added a `Perms` module containing helper types for using phantom-typed capabilities as used by the store backends. (mirage/irmin#1262, @craigfe) - Added an `Exported_for_stores` module containing miscellaneous helper types for building backends. (mirage/irmin#1262, @craigfe) - Added new operations `Tree.update` and `Tree.update_tree` for efficient read-and-set on trees. (mirage/irmin#1274, @craigfe) - **irmin-pack**: - Added `integrity-check-inodes` command to `irmin-fsck` for checking the integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42) - **irmin-bench** - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42, @craigfe) #### Changed - The `irmin-mem` package is now included with the `irmin` package under the library name `irmin.mem`. It keeps the same top-level module name of `Irmin_mem`. (mirage/irmin#1276, @craigfe) #### Removed - `Irmin_mem` no longer provides the layered in-memory store `Make_layered`. This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0) CHANGES: ### Fixed - **irmin-pack** - Fix a bug in `inode` where the `remove` function could cause hashing instabilities. No user-facing change since this function is not being used yet. (mirage/irmin#1247, @Ngoguey42, @icristescu) - **irmin** - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size. (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe) ### Added - **irmin** - Added a `Perms` module containing helper types for using phantom-typed capabilities as used by the store backends. (mirage/irmin#1262, @craigfe) - Added an `Exported_for_stores` module containing miscellaneous helper types for building backends. (mirage/irmin#1262, @craigfe) - Added new operations `Tree.update` and `Tree.update_tree` for efficient read-and-set on trees. (mirage/irmin#1274, @craigfe) - **irmin-pack**: - Added `integrity-check-inodes` command to `irmin-fsck` for checking the integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42) - **irmin-bench** - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42, @craigfe) #### Changed - The `irmin-mem` package is now included with the `irmin` package under the library name `irmin.mem`. It keeps the same top-level module name of `Irmin_mem`. (mirage/irmin#1276, @craigfe) #### Removed - `Irmin_mem` no longer provides the layered in-memory store `Make_layered`. This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
Add a test to assert representation uniqueness after an [add]/[remove] for all trees up to depth=3.
Without the fix on
inode.ml
, the test fails with:Without the fix on
inode.ml
and without the check on representation, the test fails with:Which is the test with the failing root hash we were looking for