Skip to content

Conversation

@derrickstolee
Copy link

The parse_commit_no_graph() method was added in 43d3561 ("commit-graph
write: don't die if the existing graph is corrupt" 2019-03-25) as a way
to avoid persisting bad data across commit-graph files. That is, if the
commit-graph file has undetected corrupt data -- such as a flipped bit
in a parent int-id value -- then that data will persist to the next
commit-graph file. The parse_commit_no_graph() method was used to always
use the pack data directly instead.

Unfortunately, this comes at a significant performance cost. In both
time and memory, parsing from pack files is much slower than parsing
from the commit-graph file. In a repository with 4.5 million commits,
this can lead to Git taking up to 11gb of memory to rewrite the file.

Now that the incremental commit-graph file format exists, we can rely
on the quality of the commit-graph file if we follow the two-step
pattern of (1) write a commit-graph with "--split" and (2) run "git
commit-graph verify --shallow" to verify the tip file.


@jrbriggs, @jeffhostetler, @jamill and others: this change should revert
the performance problems we are seeing with the M153 release. I will
test this carefully after generating a Windows installer. I'm not sure if
this change would be something we can send upstream or not, but I
will start a conversation about it on the list.

@derrickstolee
Copy link
Author

derrickstolee commented Aug 6, 2019

(I notice of course that this will fail the test that he added for this case... I will probably disable it for now.)

Edit: I just did a revert (and adjusted the conflicts around changes on top of that commit).

The parse_commit_no_graph() method was added in 43d3561 ("commit-graph
write: don't die if the existing graph is corrupt" 2019-03-25) as a way
to avoid persisting bad data across commit-graph files. That is, if the
commit-graph file has undetected corrupt data -- such as a flipped bit
in a parent int-id value -- then that data will persist to the next
commit-graph file. The parse_commit_no_graph() method was used to always
use the pack data directly instead.

Unfortunately, this comes at a significant performance cost. In both
time and memory, parsing from pack files is much slower than parsing
from the commit-graph file. In a repository with 4.5 million commits,
this can lead to Git taking up to 11gb of memory to rewrite the file.

Now that the incremental commit-graph file format exists, we can rely
on the quality of the commit-graph file if we follow the two-step
pattern of (1) write a commit-graph with "--split" and (2) run "git
commit-graph verify --shallow" to verify the tip file.

This reverts commit 43d3561.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee merged commit baf16c8 into microsoft:vfs-2.22.0 Aug 6, 2019
derrickstolee added a commit to derrickstolee/VFSForGit that referenced this pull request Aug 6, 2019
See microsoft/git#168 for full details.

v2.22.0 had a change that made writing commit-graph files much
slower by always parsing from pack-files instead of from the
commit-graph file. For some users of the OS repo, this caused
super-slow performance and high memory usage.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit to derrickstolee/VFSForGit that referenced this pull request Aug 6, 2019
See microsoft/git#168 for full details.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit to derrickstolee/VFSForGit that referenced this pull request Aug 6, 2019
See microsoft/git#168 for full details.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit to derrickstolee/VFSForGit that referenced this pull request Aug 6, 2019
See microsoft/git#168 for full details.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit to microsoft/VFSForGit that referenced this pull request Aug 6, 2019
See microsoft/git#168 for full details.

v2.22.0 had a change that made writing commit-graph files much
slower by always parsing from pack-files instead of from the
commit-graph file. For some users of the OS repo, this caused
super-slow performance and high memory usage.
derrickstolee added a commit to microsoft/VFSForGit that referenced this pull request Aug 6, 2019
This PR does two things:

1. Updates Git to include the slow commit-graph write fix from microsoft/git#168. Also see #1420 for the M153 hotfix.

2. Removes the multi-pack-index writes from the `PostFetchStep` and instead only writes the multi-pack-index during the `PackfileMaintenanceStep`. In order to get the most out of the step, we need to ensure we have a multi-pack-index before running the `expire` and `repack` steps. Also, we can only expire the packs from that day if they are contained in the multi-pack-index.

If we agree that we should send (2) as a hotfix to the M155 release (in addition to (1), which is necessary), then I'll create a hotfix PR to that branch.
derrickstolee added a commit to microsoft/VFSForGit that referenced this pull request Aug 7, 2019
This PR does two things:

Updates Git to include the slow commit-graph write fix from microsoft/git#168. Also see #1420 for the M153 hotfix.

Removes the multi-pack-index writes from the PostFetchStep and instead only writes the multi-pack-index during the PackfileMaintenanceStep. In order to get the most out of the step, we need to ensure we have a multi-pack-index before running the expire and repack steps. Also, we can only expire the packs from that day if they are contained in the multi-pack-index.

If we agree that we should send (2) as a hotfix to the M155 release (in addition to (1), which is necessary), then I'll create a hotfix PR to that branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants