-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild index in prune by using in-memory index #2842
Conversation
Note that actually new index files are checked for "fullness" and saved only after having processed a complete index. This is not totally correct and will be a problem, if this is used on one big merged index. (see #2818 ) |
7506450
to
ba714d2
Compare
After #2818 has been merged, I rebased this PR. Now also the newly generated index is checked for "fullness" after writing the contents of pack. |
bf6692c
to
af8ddbd
Compare
af8ddbd
to
e038163
Compare
rebased after #2840 is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented below the current implementation can forget blobs in rewritten packs and will cause check to warn about packs which are listed in multiple indexes. Please fix this.
This PR also needs tests. There is currently no test coverage for RebuildIndex
. I think we also need a changelog entry, nothing too fancy as it will eventually be merged with the one for the main prune PR.
@MichaelEischer Thanks for your feedback! Also I'm a bit short on time ATM and would like to spend this making #2718 merge-ready. |
There's no need to hurry (too much). I actually hope to merge the VSS support (which shouldn't take too long anymore) before the prune rewrite PRs. That way we could release the VSS support soon (hopefully) and give the prune changes several weeks on master before getting them into a release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e038163
to
41a2135
Compare
Ok, I just realized that (as you can see in the failing test - one of my "edge case repos") this PR should really only be merged after #2718! The reason is that blobs not existing in the index are processed by current prune (as the first step is to rebuild the index) but then not saved in the rebuilt index (as this only saves the contents loaded from the index files). This situation will change with #2718 which would simply fail for this setting and would request a manual rebuild-index. So I'll build this PR on top of #2718. |
41a2135
to
65097bf
Compare
I rebased this to #2718 |
8099cc1
to
384fa80
Compare
I added the progress bar. Using packs as counter needs information from I'm a bit struggling about the test case. Of course, the code is used in the integration tests and works there. As this saves the index to the repository, I cannot think about another good test except some kind of integration test. Which extra test case did you have in mind? |
It would be possible to recalculate the number of packs by iterating over the index to collect all packs and then removing the rewritten ones. I haven't tested the blob counter yet, so i can't judge whether that's too noisy or not. For the test case I have something like |
384fa80
to
e511551
Compare
I wanted to avoid the iteration and now count blobs before and after repacking to get the exact number of blobs that
I boldly copied (and adapted) that test function into |
b276c25
to
6dbca52
Compare
@aawsome I've added a hook some time ago that allows injecting / wrapping a backend into OpenRepository. Search for |
6d606bf
to
4e51b0a
Compare
Great, that worked for me! The test is added (also tested that the test it failes without this PR; |
@MichaelEischer Struggling with the test yesterday, I forgot to mention another change: |
373e986
to
504eeef
Compare
504eeef
to
662ce76
Compare
rebased this to the new version of #2718 (only docu and the |
662ce76
to
028105a
Compare
028105a
to
c33a1cf
Compare
rebased after #2718 was merged. |
c33a1cf
to
3a0f03d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment about the computation of packs added, apart from that I'm happy. Thanks for your work!
7f4802f
to
47277c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What is the purpose of this change? What does it change?
Optimize the rebuilding of the index during
prune
by using the in-memory index instead of reading all pack headers.Was the change discussed in an issue or in the forum before?
This was a part of #2718 that has been extracted to be merged after #2718 has been merged.
To not save exact duplicates multiple times in the rebuilt index, either #2839 or #2863 should also be merged.
closes #1599
closes #3049
Checklist
changelog/unreleased/
that describes the changes for our users (template here)gofmt
on the code in all commits