Optimize ObjectId parsing #11208

gerhardol · 2023-09-12T22:05:20Z

Proposed changes

Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes.

Remove unused methods.

Benchmark for TryParse(in ReadOnlySpan<byte> array) that run in average 3.1 times for each commit (for linux repo).
To parse 100000 commits (default in GE) was before 290 ms, now it is 110 ms
The total load time therefore reduced with 550 ms. For GE this is about 160 ms.

There is a similar performance improvement for other methods but they are not performance critical.

I may add that reducing checks and additional method calls is critical in these scenarios.
Using a loop instead of five separate calls with fixed parameters was 160 ms.
Using multiple wrapper calls instead of calling the Span methods directly could add similar delay.

As the grid is displayed before all data is parsed, the visible difference is smaller.

Test methodology

Tests are updated
Note that some test cases were removed when unused methods were removed.

Merge strategy

I agree that the maintainer squash merge this PR (if the commit message is clear).

✒️ I contribute this code under The Developer Certificate of Origin.

Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes. Remove unused methods.

vbjay · 2023-09-13T11:23:54Z

I see it is utf8 specific. Are we forcing utf8 by using i18n.logOutputEncoding and other related i18n.* config. https://git-scm.com/docs/git-config#Documentation/git-config.txt-i18nlogOutputEncoding

Git default and/or dedacto may be utf8 but are we sure every single one is set to utf8? One of the many reasons I keep harping on system git confif not being read. We need to at the least read config values from system and then global and so on.

gerhardol · 2023-09-13T12:06:42Z

I see it is utf8 specific. Are we forcing utf8

This is a potential issue. This PR has the same assumption as the current code, that 0-9,a-f is encoded using ASCII.
Using UTF8parser slightly improves the compatibility...

Similar apply to the timestamp with unix time stamp, also assumed to be ASCII 0-9
(This PR is really a spin-off from a change in RevisionReader, but other changes are needed there for UTF8parser to be faster.

For author names, email, subject etc the data is decoded with the encoding reported by Git, config is read.
(I am adding a comment about the potential issue).

Edit: hash, path etc are in utf-8 https://git-scm.com/docs/git-log#_discussion
utf-8 should probably be forced for git-log. There are some optimized encoders in .NET8 I believe.
Log message can be forced, but the encoding is handled there so force is maybe not needed https://git-scm.com/docs/git-log#Documentation/git-log.txt---encodingltencodinggt

For the hash, timestamps I propose to keep it as is though also if utf-8 is not forced, there will be a performance impact if all is decoded.
(uint.TryParse(Span) is slower than UTF8Parser.TryParse(Span<byte for some reason too).

mstv

👍 Seems to work well.
I have just taken a quick look at the new code (for now).

GitCommands/Git/GitModule.cs

mstv · 2023-09-13T21:20:23Z

Plugins/GitUIPluginInterfaces/ObjectId.cs

+            if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)
+                || !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2)


[..] vs. .Slice(,)?

I can keep it consistent, but there is a suggestion to change the code Slice(0,x) to [..x], so I followed that

❔

Suggested change

if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)

|| !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2)

if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)

|| !uint.TryParse(array[8, 16], NumberStyles.AllowHexSpecifier, provider: null, out uint i2)

(Although Slice(..., 8) was a little clearer in this usecase.)

I prefer Slice here, even if it differs from the first.

In general, just accepting the codeanalyzer settings is the easiest.
In this case I did a small test, running slice/range .Length 100K times. Slice() took around 89 ms and [[..8] around 91 ms. This is about 6 ms improvement for 100K (and 30 ms changing all). I did not expect to see a difference...
Added SuppressMessage[] attribute.

With this, I plan to merge this PR

mstv

just nits

mstv · 2023-09-15T21:04:30Z

GitCommands/RevisionReader.cs

+            if (!ObjectId.TryParse(array[..ObjectId.Sha1CharCount], out ObjectId? objectId) ||
+                !ObjectId.TryParse(array.Slice(ObjectId.Sha1CharCount, ObjectId.Sha1CharCount), out ObjectId? treeId))


❔ .Slice

In a follow up.

Some planned changes:

Optimize ObjectId compare and ToString(). Could be done in an addition to this PR, but I feel it is better to scope creep separately, even if it is separate.
The performance improvement is not major, but it is used in many operations on the UI why sub ms improvements are welcome.

RevisionReader improvements (really done prior to this PR, but this PR was easier to create and had bigger effects. One or two PR as it seems:

Reduce allocation reading revisions, (maybe including changes like this too, depends on the diff). Trying to find if email and names are coded in utf8 or not.

Use git-log "log size" to simplify log reading. This adds about 6% run time to git log (1.9s to 2.0s) for the linux-100K revisions use case, but decreases copying and allows \0 in the pattern (the latter makes Utf8Parser.TryParse() faster than ParseUnixDateTime() by 35 ms so total is likely faster - hard to measure). May just be a draft if I am not sure. (.NET7 may improve performance too.)

mstv · 2023-09-15T21:32:20Z

Plugins/GitUIPluginInterfaces/ObjectId.cs

+            if (array.Length != Sha1CharCount)
            {
                objectId = default;
                return false;
            }



Suggested change

if (array.Length != Sha1CharCount)

{

objectId = default;

return false;

}

if (array.Length != Sha1CharCount

|| ...

The positive condtion would be clearer yet.

Suggested change

if (array.Length != Sha1CharCount)

{

objectId = default;

return false;

}

if (array.Length == Sha1CharCount

&& ...

{

objectId = new ObjectId(i1, i2, i3, i4, i5);

return true;

}

I prefer to have the error exit in the branch, normal action continue.
Two separate error checks.

mstv · 2023-09-15T21:32:43Z

Plugins/GitUIPluginInterfaces/ObjectId.cs

-
-        public static bool TryParseAsciiHexReadOnlySpan(in ReadOnlySpan<byte> array, [NotNullWhen(returnValue: true)] out ObjectId? objectId)
+        [SuppressMessage("Style", "IDE0057:Use range operator", Justification = "Performance")]
+        public static bool TryParse(in ReadOnlySpan<byte> array, [NotNullWhen(returnValue: true)] out ObjectId? objectId)
        {
            if (array.Length != Sha1CharCount)


Optimize ObjectId parsing

4f93a43

Use optimized Utf8Parser(Span<>) methods to parse Git commit hashes. Remove unused methods.

ghost assigned gerhardol Sep 12, 2023

RussKie approved these changes Sep 13, 2023

View reviewed changes

mstv reviewed Sep 13, 2023

View reviewed changes

gerhardol added 2 commits September 14, 2023 23:17

fixup! consistently use Slice

cb2934a

fixup! currentCommitId

9979917

mstv reviewed Sep 15, 2023

View reviewed changes

gerhardol merged commit 4acc29c into gitextensions:master Sep 21, 2023
3 of 4 checks passed

ghost added this to the vNext milestone Sep 21, 2023

gerhardol deleted the feature/optimize-objectid branch September 21, 2023 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ObjectId parsing #11208

Optimize ObjectId parsing #11208

gerhardol commented Sep 12, 2023 •

edited

vbjay commented Sep 13, 2023

gerhardol commented Sep 13, 2023 •

edited

mstv left a comment

mstv Sep 13, 2023

gerhardol Sep 13, 2023

mstv Sep 14, 2023

gerhardol Sep 14, 2023

mstv left a comment

mstv Sep 15, 2023

gerhardol Sep 16, 2023

mstv Sep 15, 2023

gerhardol Sep 16, 2023

mstv Sep 15, 2023

gerhardol Sep 16, 2023

		if (!uint.TryParse(array[..8], NumberStyles.AllowHexSpecifier, provider: null, out uint i1)
		\|\| !uint.TryParse(array.Slice(8, 8), NumberStyles.AllowHexSpecifier, provider: null, out uint i2)

		if (!ObjectId.TryParse(array[..ObjectId.Sha1CharCount], out ObjectId? objectId) \|\|
		!ObjectId.TryParse(array.Slice(ObjectId.Sha1CharCount, ObjectId.Sha1CharCount), out ObjectId? treeId))

Optimize ObjectId parsing #11208

Optimize ObjectId parsing #11208

Conversation

gerhardol commented Sep 12, 2023 • edited

Proposed changes

Test methodology

Merge strategy

vbjay commented Sep 13, 2023

gerhardol commented Sep 13, 2023 • edited

mstv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerhardol commented Sep 12, 2023 •

edited

gerhardol commented Sep 13, 2023 •

edited