Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ObjectId parsing #11208

Merged
merged 3 commits into from Sep 21, 2023

Conversation

gerhardol
Copy link
Member

@gerhardol gerhardol commented Sep 12, 2023

Proposed changes

Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes.

Remove unused methods.

Benchmark for TryParse(in ReadOnlySpan<byte> array) that run in average 3.1 times for each commit (for linux repo).
To parse 100000 commits (default in GE) was before 290 ms, now it is 110 ms
The total load time therefore reduced with 550 ms. For GE this is about 160 ms.

There is a similar performance improvement for other methods but they are not performance critical.

I may add that reducing checks and additional method calls is critical in these scenarios.
Using a loop instead of five separate calls with fixed parameters was 160 ms.
Using multiple wrapper calls instead of calling the Span methods directly could add similar delay.

As the grid is displayed before all data is parsed, the visible difference is smaller.

Test methodology

Tests are updated
Note that some test cases were removed when unused methods were removed.

Merge strategy

I agree that the maintainer squash merge this PR (if the commit message is clear).


✒️ I contribute this code under The Developer Certificate of Origin.

Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes.

Remove unused methods.
@ghost ghost assigned gerhardol Sep 12, 2023
@vbjay
Copy link
Contributor

vbjay commented Sep 13, 2023

I see it is utf8 specific. Are we forcing utf8 by using i18n.logOutputEncoding and other related i18n.* config. https://git-scm.com/docs/git-config#Documentation/git-config.txt-i18nlogOutputEncoding

Git default and/or dedacto may be utf8 but are we sure every single one is set to utf8? One of the many reasons I keep harping on system git confif not being read. We need to at the least read config values from system and then global and so on.

@gerhardol
Copy link
Member Author

gerhardol commented Sep 13, 2023

I see it is utf8 specific. Are we forcing utf8

This is a potential issue. This PR has the same assumption as the current code, that 0-9,a-f is encoded using ASCII.
Using UTF8parser slightly improves the compatibility...

Similar apply to the timestamp with unix time stamp, also assumed to be ASCII 0-9
(This PR is really a spin-off from a change in RevisionReader, but other changes are needed there for UTF8parser to be faster.

For author names, email, subject etc the data is decoded with the encoding reported by Git, config is read.
(I am adding a comment about the potential issue).

Edit: hash, path etc are in utf-8 https://git-scm.com/docs/git-log#_discussion
utf-8 should probably be forced for git-log. There are some optimized encoders in .NET8 I believe.
Log message can be forced, but the encoding is handled there so force is maybe not needed https://git-scm.com/docs/git-log#Documentation/git-log.txt---encodingltencodinggt

For the hash, timestamps I propose to keep it as is though also if utf-8 is not forced, there will be a performance impact if all is decoded.
(uint.TryParse(Span) is slower than UTF8Parser.TryParse(Span<byte for some reason too).

Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Seems to work well.
I have just taken a quick look at the new code (for now).

GitCommands/Git/GitModule.cs Outdated Show resolved Hide resolved
Comment on lines 164 to 165
if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)
|| !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[..] vs. .Slice(,)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can keep it consistent, but there is a suggestion to change the code Slice(0,x) to [..x], so I followed that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)
|| !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2)
if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)
|| !uint.TryParse(array[8, 16], NumberStyles.AllowHexSpecifier, provider: null, out uint i2)

(Although Slice(..., 8) was a little clearer in this usecase.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Slice here, even if it differs from the first.

In general, just accepting the codeanalyzer settings is the easiest.
In this case I did a small test, running slice/range .Length 100K times. Slice() took around 89 ms and [[..8] around 91 ms. This is about 6 ms improvement for 100K (and 30 ms changing all). I did not expect to see a difference...
Added SuppressMessage[] attribute.

With this, I plan to merge this PR

Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nits

Comment on lines +221 to +222
if (!ObjectId.TryParse(array[..ObjectId.Sha1CharCount], out ObjectId? objectId) ||
!ObjectId.TryParse(array.Slice(ObjectId.Sha1CharCount, ObjectId.Sha1CharCount), out ObjectId? treeId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Slice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow up.

Some planned changes:

  • Optimize ObjectId compare and ToString(). Could be done in an addition to this PR, but I feel it is better to scope creep separately, even if it is separate.
    The performance improvement is not major, but it is used in many operations on the UI why sub ms improvements are welcome.

RevisionReader improvements (really done prior to this PR, but this PR was easier to create and had bigger effects. One or two PR as it seems:

  • Reduce allocation reading revisions, (maybe including changes like this too, depends on the diff). Trying to find if email and names are coded in utf8 or not.
  • Use git-log "log size" to simplify log reading. This adds about 6% run time to git log (1.9s to 2.0s) for the linux-100K revisions use case, but decreases copying and allows \0 in the pattern (the latter makes Utf8Parser.TryParse() faster than ParseUnixDateTime() by 35 ms so total is likely faster - hard to measure). May just be a draft if I am not sure. (.NET7 may improve performance too.)

Comment on lines +160 to 165
if (array.Length != Sha1CharCount)
{
objectId = default;
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (array.Length != Sha1CharCount)
{
objectId = default;
return false;
}
if (array.Length != Sha1CharCount
|| ...

The positive condtion would be clearer yet.

Suggested change
if (array.Length != Sha1CharCount)
{
objectId = default;
return false;
}
if (array.Length == Sha1CharCount
&& ...
{
objectId = new ObjectId(i1, i2, i3, i4, i5);
return true;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have the error exit in the branch, normal action continue.
Two separate error checks.


public static bool TryParseAsciiHexReadOnlySpan(in ReadOnlySpan<byte> array, [NotNullWhen(returnValue: true)] out ObjectId? objectId)
[SuppressMessage("Style", "IDE0057:Use range operator", Justification = "Performance")]
public static bool TryParse(in ReadOnlySpan<byte> array, [NotNullWhen(returnValue: true)] out ObjectId? objectId)
{
if (array.Length != Sha1CharCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@gerhardol gerhardol merged commit 4acc29c into gitextensions:master Sep 21, 2023
3 of 4 checks passed
@ghost ghost added this to the vNext milestone Sep 21, 2023
@gerhardol gerhardol deleted the feature/optimize-objectid branch September 21, 2023 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants