Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git log -L is very slow #166

Closed
derrickstolee opened this issue Jul 31, 2019 · 3 comments
Closed

git log -L is very slow #166

derrickstolee opened this issue Jul 31, 2019 · 3 comments
Assignees

Comments

@derrickstolee
Copy link
Collaborator

A user reported git log -L N,M:file taking a long time (~7 minutes after cleaning up the object directory). The same git log file command was almost immediate.

Check if there are easy performance wins to reduce the computation time when using -L.

@derrickstolee
Copy link
Collaborator Author

Some helpful progress is being made on-list: https://public-inbox.org/git/20190819130323.GU20404@szeder.dev/T/#m79ee9ae1d2696dc4c57f0d409d72949403ab84dc

Here are the results using a random path I picked out from the Windows
repo (it was only changed ~10 times in the 4.5 million commits):

Before:

real    2m7.308s
real    2m8.572s

With Patch 4:

real    0m38.628s
real    0m38.477s

With Patch 5:

real    0m24.685s
real    0m24.310s

For the specific file in the bug report from a real user, I got
these numbers:

real    0m32.293s (patch 4)
real    0m19.362s (patch 5)

When running without the patch, I had to kill the process after 55 minutes of waiting (and 20,000+ blob downloads). It appears that somehow this is triggering rename detection, and the blob contents are being checked! A PerfView trace records the following stack to be interesting:

line_log_filter
+ queue_diffs
  + diffcore_std
    + diffcore_rename
      + diff_populate_filespec

The changes on-list involve not forcing the entire graph to be read, so those changes are orthogonal to #175.

@derrickstolee
Copy link
Collaborator Author

I just realized that the following code in revision.c will make this always be a minimum amount of slow:

	if (revs->line_level_traverse) {
		revs->limited = 1;
		revs->topo_order = 1;
	}

And the code in line-log.c for int line_log_filter(struct rev_info *rev) expects the commit list to be complete before scanning the contents.

So, taking stock on this problem we have a few things to think about:

  1. How can Bloom filters optimally interact with -L?
  2. Can we make the algorithm iterative instead of needing a full commit walk?
  3. Can we remove the rename detection by default? It will change some results (when there is a rename on the given file) but it is probably worth not downloading all the contents of EVERY changed blob in the history!

@derrickstolee
Copy link
Collaborator Author

Making progress here! See gitgitgadget#622 for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant