Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some consistency checks to inodes #1253

Merged
merged 5 commits into from
Feb 1, 2021
Merged

Conversation

icristescu
Copy link
Contributor

No description provided.

@icristescu icristescu force-pushed the test_inodes branch 2 times, most recently from f35b098 to e7dd922 Compare January 19, 2021 14:04
src/irmin-pack/inode.ml Outdated Show resolved Hide resolved
src/irmin-pack/inode.ml Outdated Show resolved Hide resolved
src/irmin-pack/inode.ml Outdated Show resolved Hide resolved
@icristescu icristescu changed the title Add some consistency checks to inodes Some consistency checks to inodes Jan 19, 2021
@icristescu icristescu marked this pull request as ready for review January 20, 2021 09:14
let check ~kind ~offset ~length k =
match kind with
| `Contents -> Ok ()
| `Node -> X.Node.CA.integrity_check_inodes ~offset ~length k nodes
Copy link
Contributor

@Ngoguey42 Ngoguey42 Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i'm not mistaken, this is the source of the problem for your slow check on tezos.

X.Node.CA.integrity_check_inodes is called for all objects in the pack file with the magic 'I' or 'N', but X.Node.CA.integrity_check_inodes itself recursively loads all the inode tree below.

For each inode tree made of N objects, the number of calls to Pack.unsafe_find is not O(N) as we would expect, but O(N*N)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I modified this and we do a graph traversal instead to ensure that we check only the "root" nodes for inodes instead of checking all the intermediate nodes.

I'm running it on the 42G store and after around 3 hours I'm at
1252049k nodes / 3k commits
maybe I missed something again, and we can be faster? (the integrity check is also very slow, around 10 hours for 220k commits).

Copy link
Contributor

@Ngoguey42 Ngoguey42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your traversal algorithms seem fine to me, I guess 3 hours is a normal duration to check a 42GB store...

test/irmin-pack/test_inode.ml Outdated Show resolved Hide resolved
src/irmin-pack/inode.ml Outdated Show resolved Hide resolved
src/irmin-pack/inode.ml Outdated Show resolved Hide resolved
test/irmin-pack/test_existing_stores.ml Show resolved Hide resolved
@icristescu icristescu force-pushed the test_inodes branch 2 times, most recently from c939dc5 to c1d303c Compare January 29, 2021 11:41
icristescu and others added 3 commits February 1, 2021 16:14
@icristescu icristescu merged commit 766b881 into mirage:master Feb 1, 2021
@icristescu icristescu deleted the test_inodes branch February 1, 2021 15:21
craigfe added a commit to craigfe/opam-repository that referenced this pull request Feb 2, 2021
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0)

CHANGES:

### Fixed
- **irmin-pack**
  - Fix a bug in `inode` where the `remove` function could cause hashing
    instabilities. No user-facing change since this function is not being used
    yet. (mirage/irmin#1247, @Ngoguey42, @icristescu)

- **irmin**
  - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size.
    (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe)

### Added

- **irmin**
  - Added a `Perms` module containing helper types for using phantom-typed
    capabilities as used by the store backends. (mirage/irmin#1262, @craigfe)

  - Added an `Exported_for_stores` module containing miscellaneous helper types
    for building backends. (mirage/irmin#1262, @craigfe)

  - Added new operations `Tree.update` and `Tree.update_tree` for efficient
    read-and-set on trees. (mirage/irmin#1274, @craigfe)

- **irmin-pack**:
  - Added `integrity-check-inodes` command to `irmin-fsck` for checking the
    integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42)

- **irmin-bench**
  - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42,
    @craigfe)

#### Changed

- The `irmin-mem` package is now included with the `irmin` package under the
  library name `irmin.mem`. It keeps the same top-level module name of
  `Irmin_mem`. (mirage/irmin#1276, @craigfe)

#### Removed

- `Irmin_mem` no longer provides the layered in-memory store `Make_layered`.
  This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
craigfe added a commit to craigfe/opam-repository that referenced this pull request Feb 2, 2021
… irmin-chunk, irmin-pack, irmin-test, irmin-http, irmin-unix, ppx_irmin, irmin-bench, irmin-graphql, irmin-containers, irmin-mirage-git and irmin-mirage-graphql (2.4.0)

CHANGES:

### Fixed
- **irmin-pack**
  - Fix a bug in `inode` where the `remove` function could cause hashing
    instabilities. No user-facing change since this function is not being used
    yet. (mirage/irmin#1247, @Ngoguey42, @icristescu)

- **irmin**
  - Ensure that `Tree.add_tree t k v` complexity does not depend on `v` size.
    (mirage/irmin#1267, @samoht @Ngoguey42 and @craigfe)

### Added

- **irmin**
  - Added a `Perms` module containing helper types for using phantom-typed
    capabilities as used by the store backends. (mirage/irmin#1262, @craigfe)

  - Added an `Exported_for_stores` module containing miscellaneous helper types
    for building backends. (mirage/irmin#1262, @craigfe)

  - Added new operations `Tree.update` and `Tree.update_tree` for efficient
    read-and-set on trees. (mirage/irmin#1274, @craigfe)

- **irmin-pack**:
  - Added `integrity-check-inodes` command to `irmin-fsck` for checking the
    integrity of inodes. (mirage/irmin#1253, @icristescu, @Ngoguey42)

- **irmin-bench**
  - Added benchmarks for tree operations. (mirage/irmin#1237, @icristescu, @Ngoguey42,
    @craigfe)

#### Changed

- The `irmin-mem` package is now included with the `irmin` package under the
  library name `irmin.mem`. It keeps the same top-level module name of
  `Irmin_mem`. (mirage/irmin#1276, @craigfe)

#### Removed

- `Irmin_mem` no longer provides the layered in-memory store `Make_layered`.
  This can be constructed manually via `Irmin_layers.Make`. (mirage/irmin#1276, @craigfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants