Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix integrity check command for minimal indexing stores #2160

Merged
merged 1 commit into from
Jan 2, 2023

Conversation

icristescu
Copy link
Contributor

extracted from #2138, to contain only the fix to the integrity-check command.

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2022

Codecov Report

Merging #2160 (cd447cc) into main (2f55642) will decrease coverage by 0.16%.
The diff coverage is 55.74%.

@@            Coverage Diff             @@
##             main    #2160      +/-   ##
==========================================
- Coverage   68.23%   68.06%   -0.17%     
==========================================
  Files         134      134              
  Lines       16096    16146      +50     
==========================================
+ Hits        10983    10990       +7     
- Misses       5113     5156      +43     
Impacted Files Coverage Δ
src/irmin-pack/unix/inode.ml 12.50% <0.00%> (-20.84%) ⬇️
src/irmin-pack/unix/gc_worker.ml 4.34% <2.77%> (-1.63%) ⬇️
src/irmin-pack/unix/pack_store.ml 82.72% <50.00%> (-0.26%) ⬇️
src/irmin-pack/unix/checks.ml 23.12% <54.63%> (+13.04%) ⬆️
src/irmin-pack/inode.ml 78.96% <57.14%> (+0.25%) ⬆️
src/irmin-pack/unix/pack_key.ml 70.83% <58.33%> (-3.53%) ⬇️
src/irmin-pack/unix/mapping_file.ml 89.88% <92.85%> (-3.99%) ⬇️
src/irmin-pack/unix/store.ml 63.00% <94.44%> (-1.03%) ⬇️
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@icristescu icristescu force-pushed the integrity_only_command branch 2 times, most recently from a68d70a to 07a956d Compare December 23, 2022 16:07
Copy link
Member

@metanivek metanivek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one question and a small change needed.

CHANGES.md Outdated Show resolved Hide resolved
@@ -22,11 +22,13 @@ module type S = sig

val integrity_check :
?ppf:Format.formatter ->
?heads:commit list ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is this added just to have more flexibility in what is integrity checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integrity check -- for stores created with minimal indexing strategy-- needs a reference to a commit. It then traverses that commit. After a snapshot import there is only one commit in the store indeed, but because octez does not use branches, we cannot access it. We need its hash to look it up in index. (There should be only one commit in the index and so we could read it directly from the index I guess).

For stores that do use branches, the command is indeed somewhat flexible: if no commit is specified the main branch is traversed.

And also, if used outside of the snapshot import usecase, it can be used to check the integrity of any commit available in the store.

@icristescu
Copy link
Contributor Author

There is a test added with this PR with an integrity check on a corrupted store, and the corruption is catched. But I also started looking at some stats to see whether I missed something. And I don't yet completely understand these stats, so I link here my branch in case someone else wants to have a look : https://github.com/icristescu/irmin/tree/integrity_only_command_stats.

These stats are generated by running

dune exec -- ./test/irmin-tezos/irmin_fsck.exe integrity-check --heads="CoVUgFRhZKVdFGEGDGh1PYr7cKz4St7vRYJdg7WsH3JL3yd1SP1d" store_imported/context

from a snapshot for block 2905810.

number of blobs and nodes checked by the contents and nodes functions in Repo.iter:

counters blobs 14875352; nodes 20955334

number of times hash is recomputed for stable and non-stable nodes:

counters stable 13437401; non stable 7517933

However, the Repo.iter is probably reading several times the same node, to detect whether it should continue traversing it. So if we look at the number of different node hashes for which hash is recomputed, it is smaller: 20644306.

When the hash is recomputed we have access to the length of the node (https://github.com/icristescu/irmin/blob/integrity_only_command_stats/src/irmin-pack/inode.ml#L1243), so the number of nodes with lengths <= 32, between 32 and 256, and > 256 :

20674368 - 251383 - 29583

however if I want to print the path to the nodes that have between 32 and 256 predecessors, I don't have any (https://github.com/icristescu/irmin/blob/integrity_only_command_stats/src/irmin-pack/unix/checks.ml#L528).

So I'm not sure what is going on. It could be useful to run it on a smaller store which contains inodes with length > 32 and > 256.

@metanivek
Copy link
Member

@icristescu I spent some time today looking into your stats branch and playing with the output to get a better understanding of what is going on.

Here is output from modified counting. It definitely appears that nodes are being visited multiple times (in pred_node) but then coalesced by the object graph traversal (in check_node).

Compare nodes (inode, node pred) sum of 45286295 with nodes (non, root check) sum of 23191335, for example. I think this is okay. When you look at the node stats, you see hash_tbl length = 22863073 which indicates that some of the nodes that are read (either in pred or check) have the same hash sincei it is less than the count (23191335) from check_nodes.

I was able to observe a lowered Reads in pack value by increasing the LRU size.

16293k contents / 23191k nodes / 1 commits
counters blob 16293640; nodes 23191335

nodes (inode, node pred): 8443372 + 36842923 = 45286295
nodes (0-32, 32-256, 256+ pred): 36812040 + 26739 + 4144 = 36842923

nodes (non, root check) 8352022 + 14839313 = 23191335
nodes (non, stable check): 8356166 + 14835169 = 23191335
nodes (0-32, 32-256, 256+ check): 14808433 + 26736 + 4144 = 14839313

commit stats =
hash_tbl length = 1
Reads in pack 1

node stats =
hash_tbl length = 22863073
Reads in pack 23150711

blob stats =
hash_tbl length = 15848823
Reads in pack 16293640

As for looking at which steps correspond to the three different buckets (the last item that didn't make sense yet), I was able to do this (looking only at root node lengths, not predecessor lengths) and get the following (cleaned up output):

# step - length
commitments - 4574
ed25519 - 355
index - 350341
contents - 84
secp256k1 - 35
p256 - 53
delegated - 1297
nonces - 128
roots - 120

This is only for length > 32. It is also deduplicated based on step name, so the length is merely demonstrative. For <= 32, you get a lot of noise with the contracts/accounts which have 2 nodes.

If it is of interest, here is a comparison of my code: icristescu/irmin@integrity_only_command_stats...metanivek:irmin:integrity_only_command_stats

Overall, I think this code is good for 3.5.1, so I will move towards that next week.

@metanivek metanivek merged commit cf5d98f into mirage:main Jan 2, 2023
metanivek added a commit to metanivek/opam-repository that referenced this pull request Jan 4, 2023
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.1)

CHANGES:

### Fixed

- **irmin-pack**
  - Integrity check of a commit works on stores using the minimal indexing
    strategy. (mirage/irmin#2160, @icristescu)
metanivek added a commit to metanivek/opam-repository that referenced this pull request Jan 5, 2023
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.1)

CHANGES:

- **irmin-pack**
  - Integrity check of a commit works on stores using the minimal indexing
    strategy. (mirage/irmin#2160, @icristescu)
metanivek added a commit to metanivek/opam-repository that referenced this pull request Jan 5, 2023
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.1)

CHANGES:

### Fixed

- **irmin-pack**
  - Integrity check of a commit works on stores using the minimal indexing
    strategy. (mirage/irmin#2160, @icristescu)
@irmaTS irmaTS added tezos-support Support for bugs related to Tezos and removed tezos-support Support for bugs related to Tezos labels Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants