Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve blame performance #10840

Merged
merged 1 commit into from Mar 31, 2023

Conversation

pmiossec
Copy link
Member

@pmiossec pmiossec commented Mar 27, 2023

Proposed changes

Some small improvements on blame data generation:
* cache author lines instead of building it every times
(because it happens often that multiple lines belong to same commit)
* Use TryGetValue to get the commit and avatar image
* better heuristic for initial line capacity

Test methodology

  • Manual

Test environment(s)

  • Git Extensions 33.33.33
  • Build 25c1314 (Dirty)
  • Git 2.38.1.windows.1
  • Microsoft Windows NT 10.0.19045.0
  • .NET 6.0.14
  • DPI 96dpi (no scaling)

Merge strategy

Merge commit


✒️ I contribute this code under The Developer Certificate of Origin.

@ghost ghost assigned pmiossec Mar 27, 2023
@pmiossec pmiossec requested a review from gerhardol March 27, 2023 22:42
@pmiossec
Copy link
Member Author

Future work: During my investigation, I saw that
https://github.com/gitextensions/gitextensions/blob/master/GitUI/Avatars/AvatarDownloader.cs#L74 is really slow.

For example, like I said in PR description when no avatar is in cache, blame on GitModule.cs takes:

  • 6m40s when downloading avatars
  • 6s when generating the initials!!!

But a quick test replacing the use of WebClient by httpclient (⚠️ without proxy configuration!) reduce the time to around 50s (instead of more than 6 min). I 🤞 that it's not the proxy configuration that introduce the perf cost but there is some investigation to do...

I have not replaced the code because I don't know how to reproduce the same proxy configuration than with using WebClient.
If someone knows how to do it....

@gerhardol
Copy link
Member

I have been using this commit for some months and did some measurements with the change.
This improves the load time for a bigger module like GitModule.cs with a few ms, so not big compared to the Git command that requires seconds. Still a relevant change.

Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+2 LGTM

I have been using this commit for some months

Me, too.

GitCommands/Git/GitModule.cs Outdated Show resolved Hide resolved
GitUI/UserControls/BlameControl.cs Outdated Show resolved Hide resolved
GitUI/UserControls/BlameControl.cs Outdated Show resolved Hide resolved
Copy link
Member

@gerhardol gerhardol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(approved but agree with mstv, should be changed)

GitCommands/Settings/AppSettings.cs Outdated Show resolved Hide resolved
@pmiossec
Copy link
Member Author

Done.

@RussKie
Copy link
Member

RussKie commented Mar 29, 2023

  • Disable fetching avatar by default (because it really too slow: 6m40s with avatars / 6s without)

Can you please point me to where we're doing this?

But a quick test replacing the use of WebClient by httpclient (⚠️ without proxy configuration!) reduce the time to around 50s (instead of more than 6 min). I 🤞 that it's not the proxy configuration that introduce the perf cost but there is some investigation to do...

I have not replaced the code because I don't know how to reproduce the same proxy configuration than with using WebClient. If someone knows how to do it....

I think as a first step we can provide an implementation without a proxy, and then bolt on a proxy support. Unless, of course, you're behind proxy and need it :)
I don't remember when was the last time I was behind a proxy, so I won't be much help here.

A quick search for how to configure proxy with HttpClient yields these: https://stackoverflow.com/a/58560252/2338036 and https://stackoverflow.com/a/55242207/2338036.

Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -3378,7 +3378,7 @@ internal GitBlame ParseGitBlame(string output, Encoding encoding)
// is a blank line, and third is an introductory paragraph about the project.

Dictionary<ObjectId, GitBlameCommit> commitByObjectId = new();
List<GitBlameLine> lines = new(capacity: 256);
List<GitBlameLine> lines = new(capacity: Math.Min(Math.Max(256, output.Length / 120), 1000));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these magic numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a heuristic to size the list (to prevent multiple reallocation) depending on the size of the file content to blame with some limits to not allocate a too small list (256 as before) or too big (1000) one.
To 'estimate' the number of lines, I took a number bigger than what nearly all projects will have as mean source code length to guarantee that we will not over-allocate (because we blame nearly 90% of the time source code, right!?!) which is a length of 120 characters...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some hints about this.
I remember looking at this line some minutes when I tried this commit out to understand it.
(Not affecting approval.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment...

@RussKie RussKie added the 📭 needs: author feedback More info/confirmation awaited from OP; issues typically get closed after 30 days of inactivity label Mar 30, 2023
@ghost ghost removed the 📭 needs: author feedback More info/confirmation awaited from OP; issues typically get closed after 30 days of inactivity label Mar 30, 2023
@pmiossec
Copy link
Member Author

  • Disable fetching avatar by default (because it really too slow: 6m40s with avatars / 6s without)

Can you please point me to where we're doing this?

I have remove the code for another PR as @gerhardol suggestion and updated the description just a few minutes before your comment...

But a quick test replacing the use of WebClient by httpclient (⚠️ without proxy configuration!) reduce the time to around 50s (instead of more than 6 min). I 🤞 that it's not the proxy configuration that introduce the perf cost but there is some investigation to do...
I have not replaced the code because I don't know how to reproduce the same proxy configuration than with using WebClient. If someone knows how to do it....

I think as a first step we can provide an implementation without a proxy, and then bolt on a proxy support. Unless, of course, you're behind proxy and need it :) I don't remember when was the last time I was behind a proxy, so I won't be much help here.

A quick search for how to configure proxy with HttpClient yields these: https://stackoverflow.com/a/58560252/2338036 and https://stackoverflow.com/a/55242207/2338036.

I will have a look but it will be part of the other PR on avatar handling where I already fix or improve some other things...

* cache author lines instead of building it every times
(because it happens often that multiple lines belong to same commit)
* Use TryGetValue() to get the commit and avatar image
* better heuristic for initial line capacity based on
 evaluation of multiple repositories on how many characters are needed on average
in a git blame to "describe" a line of a text file (surprisingly consistent around 120 characters!)
@RussKie RussKie merged commit 9647b99 into gitextensions:master Mar 31, 2023
2 checks passed
@ghost ghost added this to the vNext milestone Mar 31, 2023
@pmiossec pmiossec deleted the improve_blame_performance branch April 1, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants