New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ObjectId parsing #11208
Optimize ObjectId parsing #11208
Conversation
Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes. Remove unused methods.
I see it is utf8 specific. Are we forcing utf8 by using i18n.logOutputEncoding and other related i18n.* config. https://git-scm.com/docs/git-config#Documentation/git-config.txt-i18nlogOutputEncoding Git default and/or dedacto may be utf8 but are we sure every single one is set to utf8? One of the many reasons I keep harping on system git confif not being read. We need to at the least read config values from system and then global and so on. |
This is a potential issue. This PR has the same assumption as the current code, that 0-9,a-f is encoded using ASCII. Similar apply to the timestamp with unix time stamp, also assumed to be ASCII 0-9 For author names, email, subject etc the data is decoded with the encoding reported by Git, config is read. Edit: hash, path etc are in utf-8 https://git-scm.com/docs/git-log#_discussion For the hash, timestamps I propose to keep it as is though also if utf-8 is not forced, there will be a performance impact if all is decoded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Seems to work well.
I have just taken a quick look at the new code (for now).
if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1) | ||
|| !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[..]
vs. .Slice(,)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can keep it consistent, but there is a suggestion to change the code Slice(0,x) to [..x], so I followed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔
if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1) | |
|| !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2) | |
if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1) | |
|| !uint.TryParse(array[8, 16], NumberStyles.AllowHexSpecifier, provider: null, out uint i2) |
(Although Slice(..., 8)
was a little clearer in this usecase.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer Slice here, even if it differs from the first.
In general, just accepting the codeanalyzer settings is the easiest.
In this case I did a small test, running slice/range .Length 100K times. Slice() took around 89 ms and [[..8] around 91 ms. This is about 6 ms improvement for 100K (and 30 ms changing all). I did not expect to see a difference...
Added SuppressMessage[] attribute.
With this, I plan to merge this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nits
if (!ObjectId.TryParse(array[..ObjectId.Sha1CharCount], out ObjectId? objectId) || | ||
!ObjectId.TryParse(array.Slice(ObjectId.Sha1CharCount, ObjectId.Sha1CharCount), out ObjectId? treeId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ .Slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow up.
Some planned changes:
- Optimize ObjectId compare and ToString(). Could be done in an addition to this PR, but I feel it is better to scope creep separately, even if it is separate.
The performance improvement is not major, but it is used in many operations on the UI why sub ms improvements are welcome.
RevisionReader improvements (really done prior to this PR, but this PR was easier to create and had bigger effects. One or two PR as it seems:
- Reduce allocation reading revisions, (maybe including changes like this too, depends on the diff). Trying to find if email and names are coded in utf8 or not.
- Use git-log "log size" to simplify log reading. This adds about 6% run time to git log (1.9s to 2.0s) for the linux-100K revisions use case, but decreases copying and allows \0 in the pattern (the latter makes Utf8Parser.TryParse() faster than ParseUnixDateTime() by 35 ms so total is likely faster - hard to measure). May just be a draft if I am not sure. (.NET7 may improve performance too.)
if (array.Length != Sha1CharCount) | ||
{ | ||
objectId = default; | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (array.Length != Sha1CharCount) | |
{ | |
objectId = default; | |
return false; | |
} | |
if (array.Length != Sha1CharCount | |
|| ... |
The positive condtion would be clearer yet.
if (array.Length != Sha1CharCount) | |
{ | |
objectId = default; | |
return false; | |
} | |
if (array.Length == Sha1CharCount | |
&& ... | |
{ | |
objectId = new ObjectId(i1, i2, i3, i4, i5); | |
return true; | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to have the error exit in the branch, normal action continue.
Two separate error checks.
|
||
public static bool TryParseAsciiHexReadOnlySpan(in ReadOnlySpan<byte> array, [NotNullWhen(returnValue: true)] out ObjectId? objectId) | ||
[SuppressMessage("Style", "IDE0057:Use range operator", Justification = "Performance")] | ||
public static bool TryParse(in ReadOnlySpan<byte> array, [NotNullWhen(returnValue: true)] out ObjectId? objectId) | ||
{ | ||
if (array.Length != Sha1CharCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
Proposed changes
Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes.
Remove unused methods.
Benchmark for
TryParse(in ReadOnlySpan<byte> array)
that run in average 3.1 times for each commit (for linux repo).To parse 100000 commits (default in GE) was before 290 ms, now it is 110 ms
The total load time therefore reduced with 550 ms. For GE this is about 160 ms.
There is a similar performance improvement for other methods but they are not performance critical.
I may add that reducing checks and additional method calls is critical in these scenarios.
Using a loop instead of five separate calls with fixed parameters was 160 ms.
Using multiple wrapper calls instead of calling the Span methods directly could add similar delay.
As the grid is displayed before all data is parsed, the visible difference is smaller.
Test methodology
Tests are updated
Note that some test cases were removed when unused methods were removed.
Merge strategy
I agree that the maintainer squash merge this PR (if the commit message is clear).
✒️ I contribute this code under The Developer Certificate of Origin.