-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formalise type for object IDs #4556
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4556 +/- ##
=========================================
+ Coverage 30.67% 30.9% +0.22%
=========================================
Files 520 522 +2
Lines 42026 42212 +186
Branches 5908 5920 +12
=========================================
+ Hits 12890 13044 +154
- Misses 28612 28639 +27
- Partials 524 529 +5
Continue to review full report at Codecov.
|
@@ -147,7 +147,7 @@ public CommitData CreateFromFormatedData(string data) | |||
|
|||
var lines = data.Split('\n'); | |||
|
|||
var guid = lines[0]; | |||
var guid = ObjectId.Parse(lines[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where needed, we call Parse
to convert from string
to ObjectId
.
GitCommands/Git/GitModule.cs
Outdated
@@ -3082,7 +3086,7 @@ public SubmoduleStatus CheckSubmoduleStatus(string commit, string oldCommit, Com | |||
if (commit == null || commit == oldCommit) | |||
return SubmoduleStatus.Unknown; | |||
|
|||
string baseCommit = GetMergeBase(commit, oldCommit); | |||
string baseCommit = GetMergeBase(commit, oldCommit).ToString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call ToString
to convert from ObjectId
to string
.
GitCommands/Git/GitModule.cs
Outdated
{ | ||
return RunGitCmd("merge-base " + a + " " + b).TrimEnd(); | ||
return ObjectId.Parse(RunGitCmd("merge-base " + a + " " + b), offset: 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse
overloads that take an offset allow parsing from within an existing string without having to allocate a substring. Hence we don't need to trim.
GitCommands/Git/GitModule.cs
Outdated
@@ -3141,7 +3145,7 @@ public string FormatBranchName([NotNull] string branchName) | |||
throw new ArgumentNullException(nameof(branchName)); | |||
|
|||
string fullBranchName = GitCommandHelpers.GetFullBranchName(branchName); | |||
if (String.IsNullOrEmpty(RevParse(fullBranchName))) | |||
if (RevParse(fullBranchName) == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RevParse
returns null
if no hash exists. The 'nil' value for ObjectId
is null
. I originally made ObjectId
a struct, but the utility of null
outweighed the performance benefit.
GitCommands/Git/GitRevision.cs
Outdated
@@ -25,14 +25,16 @@ public sealed class GitRevision : IGitItem, INotifyPropertyChanged | |||
public string[] ParentGuids; | |||
private BuildInfo _buildStatus; | |||
|
|||
public GitRevision(string guid) | |||
public GitRevision([CanBeNull] ObjectId objectId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added annotation indicating null is allowed.
? "merge base" | ||
: GitRevision.ToShortSha(mergeBaseGuid); | ||
: mergeBaseGuid.ToShortString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use the instance ToShortString
instead of the static GitRevision.ToShortSha
function. The output is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I've been reworking GitRevision
for quite some time now to remove all behaviours from it.
My refactor is not finished, my work and incoming PRs here blocked the progress.
The end GitRevision
will be just a POCO
@@ -60,6 +60,9 @@ | |||
<DocumentationFile>bin\Release\GitUIPluginInterfaces.xml</DocumentationFile> | |||
</PropertyGroup> | |||
<ItemGroup> | |||
<Reference Include="JetBrains.Annotations, Version=11.1.0.0, Culture=neutral, PublicKeyToken=1010a0d8d6380325, processorArchitecture=MSIL"> | |||
<HintPath>..\..\packages\JetBrains.Annotations.11.1.0\lib\net20\JetBrains.Annotations.dll</HintPath> | |||
</Reference> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reference to JetBrains.Annotations
. This assembly is only used at design time and the compiler won't copy it to the output folder. The attributes themselves are conditional.
In future I'll make another PR that removes annotations from the project's source code and uses this package throughout.
@@ -73,7 +73,7 @@ public string Render(CommitData commitData, bool showRevisionsAsLinks) | |||
throw new ArgumentNullException(nameof(commitData)); | |||
} | |||
|
|||
bool isArtificial = commitData.Guid.IsArtificial(); | |||
bool isArtificial = commitData.Guid.IsArtificial; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsArtificial
becomes an instance member.
@@ -33,7 +33,7 @@ public void Setup() | |||
[TestCase(CommitStatus.NoSignature, "N")] | |||
public async Task Validate_GetRevisionCommitSignatureStatusAsync(CommitStatus expected, string gitCmdReturn) | |||
{ | |||
var guid = Guid.NewGuid().ToString("N"); | |||
var guid = ObjectId.Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a Random
factory method for use in unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Random
is used only in unit tests then it shouldn't be added to the production code.
Please move this behaviour into the CommonTestUtils
test project
The revision is occasionally uses other revision parameters like HEAD. The test cases using that was now changed. I wish it had been possible to use GitRevision more consistently than mixing GitRevisions and (currently) strings, but that would make the GitRevision object a linked list. FileStatusList uses "Combined diff" (localised) string to handle the combined diff. I need to change storing a string to a GitRevision for that. With this PR the CombinedDiff must be a full string I assume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea in general.
Some xml-doc would be nice to have to help in understanding the proposed API
@@ -33,7 +33,7 @@ public void Setup() | |||
[TestCase(CommitStatus.NoSignature, "N")] | |||
public async Task Validate_GetRevisionCommitSignatureStatusAsync(CommitStatus expected, string gitCmdReturn) | |||
{ | |||
var guid = Guid.NewGuid().ToString("N"); | |||
var guid = ObjectId.Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Random
is used only in unit tests then it shouldn't be added to the production code.
Please move this behaviour into the CommonTestUtils
test project
/// <remarks> | ||
/// Instances are immutable and are guaranteed to contain valid, 160-bit (20-byte) SHA1 hashes. | ||
/// </remarks> | ||
public sealed class ObjectId : IEquatable<ObjectId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please separate state and behaviours
ObjectId
must be a POCO.ToString
,GetHashCode
etc are accepted since they are part ofObject
type definitions.ToShortString
is ok too as it is complimentary toToString
- All behavioural aspects, e.g. parsing must be in SOLID classes. Since all of them related to parsing it is probably natural to move them into
ObjectIdParser
type (or something along these lines).
IsValid
bothers me as well, but I'm having troubles articulating where/how it should be moved.
I couldn't see - does it used anywhere besides tests?
And whilst you there, could you please group all fields, properties and methods together within respective groups. It makes the code much easier to review and reason about.
Thank you
} | ||
|
||
[Test] public void UnstagedId_is_artificial() => Assert.IsTrue(ObjectId.UnstagedId.IsArtificial); | ||
[Test] public void IndexId_is_artificial() => Assert.IsTrue(ObjectId.IndexId.IsArtificial); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please turn these into full-body methods to be consistent with the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
|
||
public static bool IsValid(string hex) | ||
{ | ||
if (hex.Length != Sha1CharCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git can take variable length SHAs.
Why are forcing 40 chars here?
Same applies to TryParse
Some responses in no particular order:
I'll definitely add some if it looks like this approach would work and a PR would be merged. Some of what I'm about to write will potentially end up in the docs.
Could do. I don't see it as a biggie, but I will move it out if you prefer. I like having simple, uncontroversial factory methods on the type itself. They're more discoverable.
I get where you're coming from, but this is in contradiction to the rest of the .NET framework. Why go against a tried and tested pattern Static factories on a type are also resolved during type-targeted autocomplete by some tooling. For example, with ReSharper you can use smart-complete on I think this is especially true because parsing is the opposite of formatting ( Existing types in GE that have static
In my earlier experiments with using this type I applied it in many places and had it replace methods from
It'd be helpful if you explained what you wanted, and ideally if we could get this into the StyleCop.Analyzers stuff. For example, my style is to order as follows:
I guess you have a different preference, but it's not clear what you mean and the existing code is inconsistent throughout so I can't use it as a reference to mimic.
In an early version of this type I supported variable length hashes, but after a while decided they were a bad idea.
Of course this means that we can only use |
@gerhardol sorry, I missed your comment.
I looked for invocations of
The linked list observation is an interesting one. I guess you end up with a DAG. Pulling more metadata about a revision is going to necessitate that structure, which likely wouldn't be fully loaded at any point in time. By going for The problem with identity here is that while a git object's true identity is its hash, it may also have various aliases. From an API perspective, sometimes you may only provide the hash, other times other forms are permitted. Further down the line it'd be nice if APIs that accept more than one type of ref enforced that via the API's signature. This is quite straightforward in TypeScript via union types. There's some talk about adding discriminated unions to C#, but that wouldn't be until after v8. Such utilities exist as libraries though (I have a You can model abstraction between types of references via polymorphism (i.e. Anyway, that's just an interesting idea for the future. Maybe. For now, I think there's some value in modelling
I'm sorry but could you provide a little more context here? What do you mean by a localised combined diff? |
Sorry, I was very tired when writing this. I tried to point out that there are situations where we go between GitRevision and string. In most of the occurrences manipulation and comparison is done for the exported strings. HEAD is used in several places but in master only where the GitRevision GUID is exported to a string. You changed the existing GitRevision("HEAD") in the tests it seems. However, I have added some situations in a PR. For CombinedDiff, that is a special artificial commit that in master is not converted to a GitRevision, there it is only a guid. Changed in my PR too. For another change still in my head, I want to add stashes to the revision tree. 503add8#diff-16d3bf20df9b2330e5146f334a6e7f04 I assume it will be possible to solve this by handling CombinedDiff as a formal artificial revision, the usage will be global though. For creating revisions, maybe RevisionGrid.GetRevision() should be used. I do not want to block the proposal, just raise the question that some manipulations gets a lot more complicated. It may not be a bad thing to force typing, I just want to raise the question. |
Thank you @drewnoakes for a detailed response. Unfortunately my current workload doesn't allow me a significant time to sit down and respond in full, and my commutes aren't that long.
Xml-doc enable the intellisense, which is foremost used by developers. End-users don't want to know these details. |
I follow the principle - test code must never go to production. |
I am also a big proponent of SOLID principles - small stateless classes with clear purpose that can be tested and easily reason about - good. Everything else - not so much. .Net Fx has been designed over a decade ago when OOP was all the rage. Since then many proponents of OOP come to realise that OOP makes things harder, often leading to tighly coupled and incoherent code. I am finding all the same problems in the GE's codebase. One can argue that GE is not a framework and behaviors substitution is no a concern. In fact, we had these discussion some time last year. I don't agree with this statement as we are constantly faced with alternate behaviors - .NET Framework vs Mono, local settings vs distributed settings, different UI presentations and rendering requirements etc... |
I pretty much follow the MS internal guidelines: https://blogs.msdn.microsoft.com/brada/2005/01/26/internal-coding-guidelines/ At work I have a far greater powers to enforce file organisations and styles guidelines, here I am more relaxed, yet some basic guidelines I'd like to see followed. |
|
||
[NotNull] private readonly byte[] _bytes; | ||
|
||
private ObjectId([NotNull] byte[] bytes) => _bytes = bytes ?? throw new ArgumentNullException(nameof(bytes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tinkering with your proposal (nothing concrete yet, once there is something to share I will push it in) and having few thoughts.
I'd like to get your opinion.
Do you think we could optimise it by performing ToString
implementation in the constructor and caching the result in a field, then return the value of the field in ToString
method?
Does it make sense to calculate the hashcode in constructor as well?
private readonly byte[] _bytes;
private readonly string _sha1;
private readonly int _hashCode;
public ObjectId(IEnumerable<byte> bytes)
{
_bytes = bytes?.ToArray() ?? throw new ArgumentNullException(nameof(bytes));
// If performance here becomes a problem, review https://stackoverflow.com/q/311165/24874
var hex = new StringBuilder(ObjectIdParser.Sha1CharCount);
foreach (var b in _bytes)
hex.AppendFormat("{0:x2}", b);
_sha1 = hex.ToString();
// calculate hashcode
....
}
Furthermore, having extracted parsing logic into ObjectIdParser
we can probably cache all known ObjectIds and reuse them, if necessary. This requires further exploration, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, brain dump inbound...
Only way to decide on optimisations is to measure. If you want to cache the string then there's no point in storing the byte[]
too. The string will be roughly four times the size in memory (40 characters, UTF-16), so you'd need context to make tradeoffs between memory/CPU. If you wanted to run such measurements I would look at optimising the existing ToString
and GetHashCode
methods first, as they're not the fastest they could be right now. They're already quite snappy, but I wouldn't be surprised if they could be made many times faster still.
I considered caching all object IDs. I've tried/seen attempts at this kind of pooling before and apart from some cases where there are many long-lived instances from a small distinct set of values, there's little benefit and it often tends to come out slower. I would imagine many of the ObjectId
values would never get promoted from gen0, being collected very soon after construction. Keeping them small and light can help the GC do its job by reducing heap fragmentation and cache pressure. I don't think there's much perf to gain here and there's some potential problems, as well as slight complexity in managing the caches that you avoid if you don't have them. But again, it should be measured. The Linux kernel has ~750k commits. With the 20-byte array, plus 16 bytes overhead for the wrapper class and 12 bytes overhead for the array (minimums on x64), you're looking at ~63MB before you've even begun to store them in something you can look them up from. I just don't think it's going to be a saving. Plus the lookup collection will be in one part of memory/cache, and the wrapper/array in another, so you end up blowing away your CPU caches which tend to be helpful for the work you're actually trying to do.
Any any benefit you'd get on that scale would be offset by representing the byte array as IEnumerable<byte>
then calling ToArray
on it again. You want a fast path from parse to object readiness, with minimal allocations. Arrays are about as lean as you'll get without storing the 20 bytes of the SHA-1 in two ulong
s and an uint
. The representation is an internal consideration, not a place for abstraction, which is why I made the constructor private.
That's a lot of speculation on my part, especially given I said you one has to measure these things, but it's the kind stuff that would go through my head and lead me towards probably not even doing such comparative measurements until I knew there was a problem they might actually solve.
There are many opportunities to improve the performance and responsivity of GE that would yield orders-of-magnitude better results than such micro-optimisations. For example, have a look how much string concatenation goes on all over the place. These push the GC much harder than needed. And the async stuff I looked at in #4501 is an early (and largely incomplete) attempt at reducing the complete profligacy of creating new threads left and right to deal with blocking code. Running GE in a timeline profiler shows results that look completely different from most other applications because of this. Tonnes of threads doing almost no work.
I'll respond soon to the other comments here that I neglected. Apologies for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will decrease the size of GitRevision, but how much part is really the sha?
The Subject, Author etc should use more space already now.
Every byte could count and there may be other reasons for this PR.
For performance I find the unnecessary query of submodules, stash etc to be the most annoying. I have planned to do something about that but have been stuck on in my opinion meaningless test cases for other issues for some time.
No problem. If I were more familiar with the project it'd have made sense I'm sure. Your explanation was helpful. I have some questions/observations. If
Would such a method be able to What is the role of
The functionality/requirements should direct the type(s) and their usage, not the other way around. I think the exercise of introducing such types to replace One challenge for a newcomer like me is terminology. In my research I noted many names for string variables that could hold an ObjectId, including |
Complete agree. I would find more documentation on key types in GE helpful as a newcomer. I added a lot of docs to
Agree the names can be improved. The built-in type
Good point. But one thing the team did well in my opinion is make a core library that offers lots of useful functionality built in, with clear and consistent idioms. Before starting with .NET I did a fair bit of Java, and occasionally still do. Java's idioms favour much more abstraction and flexibility, but frankly I find I rarely need it and the rest of the time it's a pain. Classes such as
To be fair, you can do this in any language/paradigm. I worked on a monstrous Lisp codebase for a while. Monoliths form more because of mismanagement of complexity than because of any coding style.
I don't disagree there's a lot that can be done to tidy the codebase up. I don't believe that a single programming doctrine can solve everything. Problems need to be identified, analysed, discussed and addressed pragmatically.
The fact that it's not a framework does allow for a more relaxed approach to API changes, which is nice. There can be more fluidity, and it's possible to remove APIs that are no longer used without upsetting users. Not sure what you mean by behaviour substitution in this context. Could you elaborate please?
It is impressive just how much functionality is packaged into this one exe. It's maturity and breadth of scope is significant. I can see where rigidity may hamper evolution. This is often the case with code, unless you design for every eventuality. Part of the issue I see is the coupling of view with model. But, that's WinForms's default mode unfortunately. Classes with thousands of lines of (non-generated) code.
That looks pretty close to what I used in 2005, and not too far off what I use now. I think the biggest win on member ordering across the codebase now would be to get all fields (and by extension, auto-properties) together in one place, which is probably at the top of the file. Right now fields are often declared close to where they were first used, it seems. Challenging my ideas about programming is one of the things I love about working on open source. So thanks for taking the time to share your history and perspective. GE is a very interesting codebase to work through as it spans almost a decade and has had so many developers work on it that there are many different ideas throughout. Reading through the code there's a sense of forensic analysis via which that story is told. |
Sure. Such revisions are only used in modal forms like FormCommit and FormStash.
Either the real guid must be used (like resolving HEAD or the stash "commit-ish sha"). Somehow doable.
Very likely...
If the item a Blob (file), Tree (folder) or a Commit (submodule).
Next cleanup? |
Went through and rebased this.
I removed the commit that converted
Added XML docs too. Added some more unit tests and fixed a bug that they turned up :) |
Thank you Drew. Much appreciate your long and thoughtful response, I've had a ball of time reading it.
I think this is common pain for many. I'm pretty I caused some pain to @gerhardol and other in my reviews seeking clarity.
In a nutshell - same interface different implementations.
All of revision grid rendering can me tremendously simplified by creating specific renderers for each available layout: gitextensions/GitUI/UserControls/RevisionGrid.cs Lines 1566 to 1988 in 7e3516b
Commit information tab has two different layouts as well and these are as well good candidates for alternate implementations. Hope this makes sense.
Yes, this makes it close to impossible to use a different UI framework. Hence I've started moving certain parts of logic into controller classes, which should dumb-down UI controls.
Indeed, some parts of the codebase are fascinating and some parts feel like crawling through an abandoned mine about to cave it.... 😆 |
e08ae10
to
73963ef
Compare
72244cb
to
5553691
Compare
2cc9017
to
73fae02
Compare
I had some USB device plugged in which took D: and ran very slowly, causing tests to stall here. C: is safer.
In this commit we start to see calls to ObjectId.Parse being replaced with the pre-parsed value. This pattern will continue as more APIs are migrated to ObjectId.
Let's revisit this once all major transformations are complete.
…On Wed, 11 Apr 2018 3:34 am Drew Noakes ***@***.***> wrote:
Closed #4556 <#4556>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4556 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEMyXuTPDgTHrXyy9PZ2GZMD_mEl9s16ks5tnO0fgaJpZM4Sao6Z>
.
|
I need this for another PR so have folded it into that. |
I think a better use libgit2sharp and their class |
That class uses a tonne more memory than just a 40-character string. We can do better. I'll create a PR once a few more things are sorted. |
I don't consider it wise to bring another dependency for something we can
implement ourselves
…On Thu, 12 Apr 2018 7:25 pm Arkady Shapkin ***@***.***> wrote:
That class uses a tonne more memory than just a 40-character string.
Only 1.5 times
<https://github.com/libgit2/libgit2sharp/blob/c72b740259883fc0e961cd9ff2a6f5d2a35a87e3/LibGit2Sharp/ObjectId.cs#L13-L15>
more :)
You can use GitOid
<https://github.com/libgit2/libgit2sharp/blob/3f9b415fa1edfc31ce1ec2b4b3d18441c34adfff/LibGit2Sharp/Core/GitOid.cs>
it uses 2 times less memory than string
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4556 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEMyXgAW8kD9Oxt5c1cIcbWjKSKxs1Hpks5tnx1_gaJpZM4Sao6Z>
.
|
I think we just need libgit2sharp for revision graph |
I will submit a PR soon that reworks the revision graph data loading to use much less memory and CPU. |
There is a fixed cost associated with each additional object, so the number is slightly higher. The child objects also hurt in terms of indirection and locality of reference. I am working with a solution that's lighter and faster. I looked at libgit2sharp before and, at least for this type, believe a custom solution will be better. I'll include a lot more detail in the coming PR and we can discuss there. |
Git object IDs are (currently) SHA1 hashes.
In GE, these are modelled as strings.
Strings are convenient, but they have some downsides:
HEAD
, a branch name, or something more exoticnull
, empty string, whitespace)IsArtificial()
extension method)byte[]
)The git codebase underwent a transition from
unsigned char[20]
to a newobject_id
type. This PR is a mirror of that initiative.This PR proposes a new class,
ObjectId
that holds an immutable SHA-1 hash value. It explores using this type in a few APIs such as:CommitData.Guid
GitModule.RevParse
andGitModele.GetMergeBase
return valuesThis type can be introduced incrementally. One of the challenges of its use is that certain
string
values accept non-SHA-1 values such as named refs.Furthermore, now that SHA-1 has been proven insecure, an initiative is underway to introduce new hash functions to git:
ObjectId
can be extended in future to support multiple hash functions.