Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and fix representation uniqueness of inode #1247

Merged
merged 1 commit into from
Jan 20, 2021

Conversation

Ngoguey42
Copy link
Contributor

@Ngoguey42 Ngoguey42 commented Jan 14, 2021

Add a test to assert representation uniqueness after an [add]/[remove] for all trees up to depth=3.

Without the fix on inode.ml, the test fails with:

FAIL The representation of a tree obtained through [remove] differs from the one obtained through [v].

   Expected: `
{'hash': 'Samuel~~',
 'stable': True,
 'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Darcy~~~',
                                         'tree': {'hash': 'Darcy~~~',
                                                  'stable': False,
                                                  'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Robert~~',
                                                                                          'tree': {'hash': 'Robert~~',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['e', {'contents': 'Ruth~~~~'}],
                                                                                                                    ['n', {'contents': 'Michael~'}]]}}}},
                                                                               {'Inode': {'hash': 'Brad~~~~',
                                                                                          'tree': {'hash': 'Brad~~~~',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['j', {'contents': 'Stephen~'}]]}}}}],
                                                                   'length': 3,
                                                                   'seed': 1}}}}},
                              'Empty'],
                  'length': 3,
                  'seed': 0}}}
'

   Received: `
{'hash': 'Samuel~~',
 'stable': True,
 'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Darcy~~~',
                                         'tree': {'hash': 'Darcy~~~',
                                                  'stable': False,
                                                  'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Robert~~',
                                                                                          'tree': {'hash': 'Robert~~',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['e', {'contents': 'Ruth~~~~'}],
                                                                                                                    ['n', {'contents': 'Michael~'}]]}}}},
                                                                               {'Inode': {'hash': 'Brad~~~~',
                                                                                          'tree': {'hash': 'Brad~~~~',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['j', {'contents': 'Stephen~'}]]}}}}],
                                                                   'length': 3,
                                                                   'seed': 1}}}}},
                              {'Inode': {'hash': 'Janet~~~', 'tree': {'hash': 'Janet~~~', 'stable': True, 'v': {'Values': []}}}}],
                  'length': 3,
                  'seed': 0}}}
'

Without the fix on inode.ml and without the check on representation, the test fails with:

FAIL The root hash of obtained through [remove] differs from the one obtained through [v].
 Expected tree: `
{'hash': 'Rick~~~~',
 'stable': False,
 'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Christa~',
                                         'tree': {'hash': 'Christa~',
                                                  'stable': False,
                                                  'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Damion~~',
                                                                                          'tree': {'hash': 'Damion~~',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['e', {'contents': 'James~~~'}],
                                                                                                                    ['n', {'contents': 'Mary~~~~'}]]}}}},
                                                                               {'Inode': {'hash': 'Patricia',
                                                                                          'tree': {'hash': 'Patricia',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['c', {'contents': 'Justin~~'}],
                                                                                                                    ['j', {'contents': 'Christie'}]]}}}}],
                                                                   'length': 4,
                                                                   'seed': 1}}}}},
                              'Empty'],
                  'length': 4,
                  'seed': 0}}}
'
 Received tree: `
{'hash': 'Scott~~~',
 'stable': False,
 'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Christa~',
                                         'tree': {'hash': 'Christa~',
                                                  'stable': False,
                                                  'v': {'Inodes': {'entries': [{'Inode': {'hash': 'Damion~~',
                                                                                          'tree': {'hash': 'Damion~~',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['e', {'contents': 'James~~~'}],
                                                                                                                    ['n', {'contents': 'Mary~~~~'}]]}}}},
                                                                               {'Inode': {'hash': 'Patricia',
                                                                                          'tree': {'hash': 'Patricia',
                                                                                                   'stable': False,
                                                                                                   'v': {'Values': [['c', {'contents': 'Justin~~'}],
                                                                                                                    ['j', {'contents': 'Christie'}]]}}}}],
                                                                   'length': 4,
                                                                   'seed': 1}}}}},
                              {'Inode': {'hash': 'Joseph~~', 'tree': {'hash': 'Joseph~~', 'stable': True, 'v': {'Values': []}}}}],
                  'length': 4,
                  'seed': 0}}}
'

Which is the test with the failing root hash we were looking for

@Ngoguey42 Ngoguey42 added the no-changelog-needed No changelog is needed here label Jan 14, 2021
@Ngoguey42
Copy link
Contributor Author

I have updated the code and the introduction above to show the case with a failing root hash

@samoht
Copy link
Member

samoht commented Jan 15, 2021

Great job finding that bug! Can you describe a bit more in which situation that error could appear (e.g. what sequence of operations is needed)?

Also we should probably fuzz test that implementation (in an other PR)! :-)

@Ngoguey42
Copy link
Contributor Author

It's Ioana's finding :) I already wrote some documentation about it in the iNode 2 hackmd

@Ngoguey42 Ngoguey42 removed the no-changelog-needed No changelog is needed here label Jan 15, 2021
@Ngoguey42 Ngoguey42 force-pushed the fix-remove-instability branch 2 times, most recently from 73030fd to c47fc6a Compare January 15, 2021 11:06
@samoht
Copy link
Member

samoht commented Jan 16, 2021

Before merging this PR, it would be nice to add more information in the commit message of the fix too. It's nice to see these explanations in git blame too.

@Ngoguey42
Copy link
Contributor Author

I can copy the text I used in the changelog to the commit message.

CHANGES.md Outdated

#### Fixed
- **irmin-pack**
- Bug fix in inode, irmin-pack's tree-like data structure speeding up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we only add small explanations in the changelog. I would only say that "it fixes a bug that lead to an inconsistent hash when removing values from an inode larger than Conf.stable_hash", or something similar, and add more explanations in the commit message?

Copy link
Member

@samoht samoht Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should be more explicit about the user-facing impact: e.g. no external facing changes if the number of entries is smaller than Conf., otherwise removes can lead to non-deterministic hashes. This is very important to be explicit about this kind of impact to our user to point them why they need to upgrade. It's ok to keep the explanation in the CHANGELOG long (but precise), as that should be the first place that our users should look for information about the release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded the entry in the changelog. What do you thing?

@Ngoguey42
Copy link
Contributor Author

@icristescu Let's try again. I've factorised everything and doubled the number of comments.

CHANGES.md Outdated Show resolved Hide resolved
CHANGES.md Outdated

For directories containing at most `Conf.stable_hash` (256) entries, only a
few hashes internal to `inode` could have been inconsistent, but not the
directory's hash.
Copy link
Member

@samoht samoht Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users will just see the root hashes, so it's not an issue if the internal hashes are not deterministic. So I suggest something like this:

For directories containing at most `Conf.stable_hash` entries (by default 256), no bug occurred. In this case, this change does not induce any user-facing change.

For directories containing more than `Conf.stable_hash` entries, the directory's hash might depends on the past sequence of operations and is thus not deterministic, which breaks various Irmin invariants and could cause data inconsistencies.

and remove the rests as it's too much internal implementation details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Ioana suggested, you can keep the details in the commit message (as the next developper doing git blame will like to know about the gory details).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, i'll do exactly that. Thank you

Copy link
Contributor

@icristescu icristescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the tests are much clearer now.

@Ngoguey42
Copy link
Contributor Author

After some investigation I got to the conclusion that Val.remove is unused (it can even be removed from the signatures ATM). We don't need to worry about the aftermaths 🎉

I'm going to update the changelog entry to a one liner and move all the gory details for the merge commit.

I'll try to merge this afternoon

@Ngoguey42 Ngoguey42 force-pushed the fix-remove-instability branch 3 times, most recently from 66a8cbc to b607679 Compare January 20, 2021 10:17
Fix a bug in `inode` where the `remove` function could cause hashing instabilities. No user-facing change since this function is not being used yet.

-- Size thresholds to trigger that bug
For directories containing at most `Conf.stable_hash` entries (by default 256), no bug occurred. In this case, this change does not induce any user-facing change.

For directories containing more than `Conf.stable_hash` entries, the directory's hash might depends on the past sequence of operations and is thus not deterministic, which breaks various Irmin invariants and could cause data inconsistencies.

For directories containing more than `Conf.entries` (32) entries but at most `Conf.stable_hash` (256) a few hashes internal to `inode` could have been inconsistent, but not the directory's hash. Adding more entries to such an inode would most likely bring it to the stage where even the root hash is affected.

-- Technical description
In an inode tree, whenever a `remove` operation emptied a leaf, the hash of all the ancestors of that leaf calculated a hash that was not the same as if that leaf never contained any values in the first place. The root hash is not affected on buggy inodes containing less than `Conf.stable_hash` entries because in that case the root hash is computed without relying on the structure if the tree, only the leaves are taken into account.
@Ngoguey42 Ngoguey42 merged commit a64aaa2 into mirage:master Jan 20, 2021
craigfe added a commit to craigfe/opam-repository that referenced this pull request Feb 2, 2021
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0)

CHANGES:

### Fixed
- **irmin-pack**
  - Fix a bug in `inode` where the `remove` function could cause hashing
    instabilities. No user-facing change since this function is not being used
    yet. (mirage/irmin#1247, @Ngoguey42, @icristescu)

- **irmin**
  - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size.
    (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe)

### Added

- **irmin**
  - Added a `Perms` module containing helper types for using phantom-typed
    capabilities as used by the store backends. (mirage/irmin#1262, @craigfe)

  - Added an `Exported_for_stores` module containing miscellaneous helper types
    for building backends. (mirage/irmin#1262, @craigfe)

  - Added new operations `Tree.update` and `Tree.update_tree` for efficient
    read-and-set on trees. (mirage/irmin#1274, @craigfe)

- **irmin-pack**:
  - Added `integrity-check-inodes` command to `irmin-fsck` for checking the
    integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42)

- **irmin-bench**
  - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42,
    @craigfe)

#### Changed

- The `irmin-mem` package is now included with the `irmin` package under the
  library name `irmin.mem`. It keeps the same top-level module name of
  `Irmin_mem`. (mirage/irmin#1276, @craigfe)

#### Removed

- `Irmin_mem` no longer provides the layered in-memory store `Make_layered`.
  This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
craigfe added a commit to craigfe/opam-repository that referenced this pull request Feb 2, 2021
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0)

CHANGES:

### Fixed
- **irmin-pack**
  - Fix a bug in `inode` where the `remove` function could cause hashing
    instabilities. No user-facing change since this function is not being used
    yet. (mirage/irmin#1247, @Ngoguey42, @icristescu)

- **irmin**
  - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size.
    (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe)

### Added

- **irmin**
  - Added a `Perms` module containing helper types for using phantom-typed
    capabilities as used by the store backends. (mirage/irmin#1262, @craigfe)

  - Added an `Exported_for_stores` module containing miscellaneous helper types
    for building backends. (mirage/irmin#1262, @craigfe)

  - Added new operations `Tree.update` and `Tree.update_tree` for efficient
    read-and-set on trees. (mirage/irmin#1274, @craigfe)

- **irmin-pack**:
  - Added `integrity-check-inodes` command to `irmin-fsck` for checking the
    integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42)

- **irmin-bench**
  - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42,
    @craigfe)

#### Changed

- The `irmin-mem` package is now included with the `irmin` package under the
  library name `irmin.mem`. It keeps the same top-level module name of
  `Irmin_mem`. (mirage/irmin#1276, @craigfe)

#### Removed

- `Irmin_mem` no longer provides the layered in-memory store `Make_layered`.
  This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants