Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalise type for object IDs #4556

Closed
wants to merge 5 commits into from

Conversation

drewnoakes
Copy link
Member

@drewnoakes drewnoakes commented Mar 2, 2018

Git object IDs are (currently) SHA1 hashes.

In GE, these are modelled as strings.

Strings are convenient, but they have some downsides:

  • There's no guarantee that a string is a valid hash and errors from invalid values likely occur long after the invalid value was introduced
  • They don't carry much semantic power, and when you see a string id in code it's not clear whether it might also be HEAD, a branch name, or something more exotic
  • There are multiple nil values (null, empty string, whitespace)
  • We're not able to add members to values without polluting all strings (i.e. the IsArtificial() extension method)
  • They take up more memory than needed (80 bytes vs 20 bytes for a byte[])

The git codebase underwent a transition from unsigned char[20] to a new object_id type. This PR is a mirror of that initiative.

This PR proposes a new class, ObjectId that holds an immutable SHA-1 hash value. It explores using this type in a few APIs such as:

  • CommitData.Guid
  • GitModule.RevParse and GitModele.GetMergeBase return values

This type can be introduced incrementally. One of the challenges of its use is that certain string values accept non-SHA-1 values such as named refs.

Furthermore, now that SHA-1 has been proven insecure, an initiative is underway to introduce new hash functions to git:

ObjectId can be extended in future to support multiple hash functions.

@codecov
Copy link

codecov bot commented Mar 2, 2018

Codecov Report

Merging #4556 into master will increase coverage by 0.22%.
The diff coverage is 76.17%.

@@            Coverage Diff            @@
##           master   #4556      +/-   ##
=========================================
+ Coverage   30.67%   30.9%   +0.22%     
=========================================
  Files         520     522       +2     
  Lines       42026   42212     +186     
  Branches     5908    5920      +12     
=========================================
+ Hits        12890   13044     +154     
- Misses      28612   28639      +27     
- Partials      524     529       +5
Impacted Files Coverage Δ
GitUI/UserControls/CommitPickerSmallControl.cs 11.42% <0%> (ø) ⬆️
GitUI/HelperDialogs/FormChooseCommit.cs 13.04% <0%> (ø) ⬆️
GitUI/CommandsDialogs/FormFileHistory.cs 15.2% <0%> (ø) ⬆️
GitUI/CommandsDialogs/FormCommit.cs 9.33% <0%> (ø) ⬆️
...ls/RevisionGridClasses/RevisionGridMenuCommands.cs 91.15% <0%> (ø) ⬆️
GitUI/CommandsDialogs/FormBrowse.cs 5.03% <0%> (ø) ⬆️
GitCommands/RevisionGraph.cs 0% <0%> (ø) ⬆️
GitUI/UserControls/RevisionGrid.cs 8.79% <0%> (ø) ⬆️
...tUI/CommandsDialogs/BrowseDialog/FormGoToCommit.cs 7.61% <0%> (+0.21%) ⬆️
GitUI/CommandsDialogs/FormCheckoutBranch.cs 19.37% <0%> (ø) ⬆️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9620ed1...331ebe3. Read the comment docs.

@@ -147,7 +147,7 @@ public CommitData CreateFromFormatedData(string data)

var lines = data.Split('\n');

var guid = lines[0];
var guid = ObjectId.Parse(lines[0]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where needed, we call Parse to convert from string to ObjectId.

@@ -3082,7 +3086,7 @@ public SubmoduleStatus CheckSubmoduleStatus(string commit, string oldCommit, Com
if (commit == null || commit == oldCommit)
return SubmoduleStatus.Unknown;

string baseCommit = GetMergeBase(commit, oldCommit);
string baseCommit = GetMergeBase(commit, oldCommit).ToString();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call ToString to convert from ObjectId to string.

{
return RunGitCmd("merge-base " + a + " " + b).TrimEnd();
return ObjectId.Parse(RunGitCmd("merge-base " + a + " " + b), offset: 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parse overloads that take an offset allow parsing from within an existing string without having to allocate a substring. Hence we don't need to trim.

@@ -3141,7 +3145,7 @@ public string FormatBranchName([NotNull] string branchName)
throw new ArgumentNullException(nameof(branchName));

string fullBranchName = GitCommandHelpers.GetFullBranchName(branchName);
if (String.IsNullOrEmpty(RevParse(fullBranchName)))
if (RevParse(fullBranchName) == null)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RevParse returns null if no hash exists. The 'nil' value for ObjectId is null. I originally made ObjectId a struct, but the utility of null outweighed the performance benefit.

@@ -25,14 +25,16 @@ public sealed class GitRevision : IGitItem, INotifyPropertyChanged
public string[] ParentGuids;
private BuildInfo _buildStatus;

public GitRevision(string guid)
public GitRevision([CanBeNull] ObjectId objectId)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added annotation indicating null is allowed.

? "merge base"
: GitRevision.ToShortSha(mergeBaseGuid);
: mergeBaseGuid.ToShortString();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use the instance ToShortString instead of the static GitRevision.ToShortSha function. The output is the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I've been reworking GitRevision for quite some time now to remove all behaviours from it.
My refactor is not finished, my work and incoming PRs here blocked the progress.
The end GitRevision will be just a POCO

@@ -60,6 +60,9 @@
<DocumentationFile>bin\Release\GitUIPluginInterfaces.xml</DocumentationFile>
</PropertyGroup>
<ItemGroup>
<Reference Include="JetBrains.Annotations, Version=11.1.0.0, Culture=neutral, PublicKeyToken=1010a0d8d6380325, processorArchitecture=MSIL">
<HintPath>..\..\packages\JetBrains.Annotations.11.1.0\lib\net20\JetBrains.Annotations.dll</HintPath>
</Reference>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference to JetBrains.Annotations. This assembly is only used at design time and the compiler won't copy it to the output folder. The attributes themselves are conditional.

In future I'll make another PR that removes annotations from the project's source code and uses this package throughout.

@@ -73,7 +73,7 @@ public string Render(CommitData commitData, bool showRevisionsAsLinks)
throw new ArgumentNullException(nameof(commitData));
}

bool isArtificial = commitData.Guid.IsArtificial();
bool isArtificial = commitData.Guid.IsArtificial;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsArtificial becomes an instance member.

@@ -33,7 +33,7 @@ public void Setup()
[TestCase(CommitStatus.NoSignature, "N")]
public async Task Validate_GetRevisionCommitSignatureStatusAsync(CommitStatus expected, string gitCmdReturn)
{
var guid = Guid.NewGuid().ToString("N");
var guid = ObjectId.Random();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Random factory method for use in unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Random is used only in unit tests then it shouldn't be added to the production code.
Please move this behaviour into the CommonTestUtils test project

@gerhardol
Copy link
Member

The revision is occasionally uses other revision parameters like HEAD. The test cases using that was now changed.
That handling need to be reviewed, some of these might be inserted to GitRevision.

I wish it had been possible to use GitRevision more consistently than mixing GitRevisions and (currently) strings, but that would make the GitRevision object a linked list.

FileStatusList uses "Combined diff" (localised) string to handle the combined diff. I need to change storing a string to a GitRevision for that. With this PR the CombinedDiff must be a full string I assume.

Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea in general.

Some xml-doc would be nice to have to help in understanding the proposed API

@@ -33,7 +33,7 @@ public void Setup()
[TestCase(CommitStatus.NoSignature, "N")]
public async Task Validate_GetRevisionCommitSignatureStatusAsync(CommitStatus expected, string gitCmdReturn)
{
var guid = Guid.NewGuid().ToString("N");
var guid = ObjectId.Random();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Random is used only in unit tests then it shouldn't be added to the production code.
Please move this behaviour into the CommonTestUtils test project

/// <remarks>
/// Instances are immutable and are guaranteed to contain valid, 160-bit (20-byte) SHA1 hashes.
/// </remarks>
public sealed class ObjectId : IEquatable<ObjectId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please separate state and behaviours

  • ObjectId must be a POCO. ToString, GetHashCode etc are accepted since they are part of Object type definitions. ToShortString is ok too as it is complimentary to ToString
  • All behavioural aspects, e.g. parsing must be in SOLID classes. Since all of them related to parsing it is probably natural to move them into ObjectIdParser type (or something along these lines).

IsValid bothers me as well, but I'm having troubles articulating where/how it should be moved.
I couldn't see - does it used anywhere besides tests?

And whilst you there, could you please group all fields, properties and methods together within respective groups. It makes the code much easier to review and reason about.

Thank you

}

[Test] public void UnstagedId_is_artificial() => Assert.IsTrue(ObjectId.UnstagedId.IsArtificial);
[Test] public void IndexId_is_artificial() => Assert.IsTrue(ObjectId.IndexId.IsArtificial);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please turn these into full-body methods to be consistent with the rest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


public static bool IsValid(string hex)
{
if (hex.Length != Sha1CharCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git can take variable length SHAs.
Why are forcing 40 chars here?

Same applies to TryParse

@drewnoakes
Copy link
Member Author

Some responses in no particular order:

Some xml-doc would be nice to have to help in understanding the proposed API

I'll definitely add some if it looks like this approach would work and a PR would be merged. Some of what I'm about to write will potentially end up in the docs.

If Random is used only in unit tests then it shouldn't be added to the production code

Could do. I don't see it as a biggie, but I will move it out if you prefer. I like having simple, uncontroversial factory methods on the type itself. They're more discoverable.

All behavioural aspects, e.g. parsing must be in SOLID classes. Since all of them related to parsing it is probably natural to move them into ObjectIdParser type (or something along these lines).

I get where you're coming from, but this is in contradiction to the rest of the .NET framework. Why go against a tried and tested pattern double, float, long, ulong, int, uint, short, ushort, byte, sbyte, decimal, Url, Guid, Enum, Version, DateTime, TimeSpan, ...

Static factories on a type are also resolved during type-targeted autocomplete by some tooling. For example, with ReSharper you can use smart-complete on ObjectId id = and you'll see Object.Parse in the results.

I think this is especially true because parsing is the opposite of formatting (ToString in this case) and moving parsing logic to another type just moves two related pieces of code that must maintain round-trip compatibility away from one another.

Existing types in GE that have static Parse methods include LostObject, Commit, PullRequest, Repository, Settings, ReleaseVersion.

IsValid bothers me as well, but I'm having troubles articulating where/how it should be moved

In my earlier experiments with using this type I applied it in many places and had it replace methods from GitRevision.ToShortSha with ObjectId.ToShortString, and GitRevision.IsFullSha1Hash with ObjectId.IsValid. Again, I prefer having all formatting/parsing/validation in one place. This type is basically just a simple value and those operations are standard. For example, the formatting methods wouldn't take a culture or encoding as there is exactly one expected UTF-16 string for a given SHA-1 hash. ToShortString is only there because it will be possible to make it more efficient than calling ToString().Substring.

And whilst you there, could you please group all fields, properties and methods together within respective groups.

It'd be helpful if you explained what you wanted, and ideally if we could get this into the StyleCop.Analyzers stuff.

For example, my style is to order as follows:

  1. constants
  2. static members
  3. static methods
  4. instance state
  5. constructors
  6. computed properties
  7. methods grouped logically by function, potentially with regions for Equals/GetHashCode/operator, interface implementations, etc.
  8. clean up functions such as Close, Dispose, whatever
  9. nested types

I guess you have a different preference, but it's not clear what you mean and the existing code is inconsistent throughout so I can't use it as a reference to mimic.

Git can take variable length SHAs. Why are forcing 40 chars here?

In an early version of this type I supported variable length hashes, but after a while decided they were a bad idea.

  • A SHA-1 hash is always 160 bits (20 bytes, 40 characters)
  • Abbreviated hashes are only presented to the user as a convenience at the "UI" level -- internally they're always full-width
  • Operations against an ObjectId (e.g. equality testing) only works for shortened hashes if the context of the entire repository is available. For example, in one repo, 123abc might be equal to 123abcd, but in another repo 123abc might be ambiguous. Any time you want to test equality, you need to check with the repo. Similarly, hash codes don't work. In the same way, adding new objects to the repo can require existing IDs to be lengthened by one or more characters. Far cleaner and more reliable to just store the whole hash. If shortening is to be done for the user, it can be done at the UI level, as is done for many other values such as numbers (1,234.56 in gb/us, 1.234,56 in de/fr), dates (think time zones), etc.

Of course this means that we can only use ObjectId in places where the full hash is available, but seeing as GE is mostly using plumbing/non-porcelain commands, the full hash appears to be returned very broadly. These cases also tend to be the ones where you would also not expect a different type of ref such as HEAD, master, etc.

@drewnoakes
Copy link
Member Author

@gerhardol sorry, I missed your comment.

The revision is occasionally uses other revision parameters like HEAD. The test cases using that was now changed. That handling need to be reviewed, some of these might be inserted to GitRevision.

I looked for invocations of GitRevision that could pass something other than a hash and didn't find any examples. Can you think of a feature that would do this? I'm happy to try it out further.

I wish it had been possible to use GitRevision more consistently than mixing GitRevisions and (currently) strings, but that would make the GitRevision object a linked list.

The linked list observation is an interesting one. I guess you end up with a DAG. Pulling more metadata about a revision is going to necessitate that structure, which likely wouldn't be fully loaded at any point in time. By going for ObjectId we're targeting identity only, which avoids these problems.

The problem with identity here is that while a git object's true identity is its hash, it may also have various aliases.

From an API perspective, sometimes you may only provide the hash, other times other forms are permitted. string makes this simpler, but without being self-validating and self-documenting.

Further down the line it'd be nice if APIs that accept more than one type of ref enforced that via the API's signature. This is quite straightforward in TypeScript via union types. There's some talk about adding discriminated unions to C#, but that wouldn't be until after v8. Such utilities exist as libraries though (I have a Union<...> type in this library, for example).

You can model abstraction between types of references via polymorphism (i.e. IGitItem) but that doesn't allow constraint at the API level. You'd need something like Union<ObjectId,Ref>.

Anyway, that's just an interesting idea for the future. Maybe. For now, I think there's some value in modelling ObjectId even if just at the lower levels.

FileStatusList uses "Combined diff" (localised) string to handle the combined diff. I need to change storing a string to a GitRevision for that. With this PR the CombinedDiff must be a full string I assume.

I'm sorry but could you provide a little more context here? What do you mean by a localised combined diff?

@gerhardol
Copy link
Member

I'm sorry but could you provide a little more context here? What do you mean by a localised combined diff?

Sorry, I was very tired when writing this. I tried to point out that there are situations where we go between GitRevision and string. In most of the occurrences manipulation and comparison is done for the exported strings.

HEAD is used in several places but in master only where the GitRevision GUID is exported to a string. You changed the existing GitRevision("HEAD") in the tests it seems. However, I have added some situations in a PR.

For CombinedDiff, that is a special artificial commit that in master is not converted to a GitRevision, there it is only a guid. Changed in my PR too.

For another change still in my head, I want to add stashes to the revision tree.

503add8#diff-16d3bf20df9b2330e5146f334a6e7f04
503add8#diff-7aca6c90354549c4c40587e6d82baf10R1088

I assume it will be possible to solve this by handling CombinedDiff as a formal artificial revision, the usage will be global though. For creating revisions, maybe RevisionGrid.GetRevision() should be used.
Stashes have the TreeGuids that should work as a commit too.

I do not want to block the proposal, just raise the question that some manipulations gets a lot more complicated. It may not be a bad thing to force typing, I just want to raise the question.

@RussKie
Copy link
Member

RussKie commented Mar 5, 2018

Thank you @drewnoakes for a detailed response. Unfortunately my current workload doesn't allow me a significant time to sit down and respond in full, and my commutes aren't that long.
I will be doing a number of smaller responses over the coming days, apologies.

Some xml-doc would be nice to have to help in understanding the proposed API

I'll definitely add some if it looks like this approach would work and a PR would be merged. Some of > what I'm about to write will potentially end up in the docs.

Xml-doc enable the intellisense, which is foremost used by developers. End-users don't want to know these details.
For example, TryParse([CanBeNull] string hex, int offset, out ObjectId objectId) has two input parameters. I don't find their names particularly informative and hence are having difficulties reasoning about them. Xml-doc could help in this case.
It may also well be that the parameter names need changing.

@RussKie
Copy link
Member

RussKie commented Mar 5, 2018

If Random is used only in unit tests then it shouldn't be added to the production code

Could do. I don't see it as a biggie, but I will move it out if you prefer. I like having simple, uncontroversial factory methods on the type itself. They're more discoverable.

I follow the principle - test code must never go to production.

@RussKie
Copy link
Member

RussKie commented Mar 5, 2018

All behavioural aspects, e.g. parsing must be in SOLID classes. Since all of them related to parsing it is probably natural to move them into ObjectIdParser type (or something along these lines).

I get where you're coming from, but this is in contradiction to the rest of the .NET framework. Why go against a tried and tested pattern double, float, long, ulong, int, uint, short, ushort, byte, sbyte, decimal, Url, Guid, Enum, Version, DateTime, TimeSpan, ...

I am also a big proponent of SOLID principles - small stateless classes with clear purpose that can be tested and easily reason about - good. Everything else - not so much.

.Net Fx has been designed over a decade ago when OOP was all the rage. Since then many proponents of OOP come to realise that OOP makes things harder, often leading to tighly coupled and incoherent code.
I was myself a big OOP user up until about 5 or 6 years ago.
For number of years I was (still am) a part of a team which built a RAD framework. Our user-base is large and users have a great degree of flexibility (including implementation language) in terms of selecting implementations that suited their needs.
We started off with OOP approach, but quickly found ourselves building an inflexible monolith which started becoming incoherent and buggy.
You could not substitute a behavior, you could not serialise DTOs because they had behaviors attached, thread-safety become a major concern...

I am finding all the same problems in the GE's codebase.
For past few months I've been trying to decouple state from the behaviors in GitRevision (#4573 #4504 #4502). Even to be able to start this work I had to perform a number of refactors of other implementations (#4480 #4425 #4342).
So to me, undoing all this work and re-introducing behaviors is a major step backwards.

One can argue that GE is not a framework and behaviors substitution is no a concern. In fact, we had these discussion some time last year. I don't agree with this statement as we are constantly faced with alternate behaviors - .NET Framework vs Mono, local settings vs distributed settings, different UI presentations and rendering requirements etc...
GE is greatly configurable and extensible but it is our downfall as well - our current implementations are often too rigid or too coupled to allow for new or changed requirements.

@RussKie
Copy link
Member

RussKie commented Mar 5, 2018

And whilst you there, could you please group all fields, properties and methods together within respective groups.

It'd be helpful if you explained what you wanted, and ideally if we could get this into the StyleCop.Analyzers stuff.

I pretty much follow the MS internal guidelines: https://blogs.msdn.microsoft.com/brada/2005/01/26/internal-coding-guidelines/
Oldie but goodie. Some new code written by MS chaps is nothing short of abomination...

At work I have a far greater powers to enforce file organisations and styles guidelines, here I am more relaxed, yet some basic guidelines I'd like to see followed.
This makes it easy for all contributors to deal with the code.


[NotNull] private readonly byte[] _bytes;

private ObjectId([NotNull] byte[] bytes) => _bytes = bytes ?? throw new ArgumentNullException(nameof(bytes));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tinkering with your proposal (nothing concrete yet, once there is something to share I will push it in) and having few thoughts.
I'd like to get your opinion.

Do you think we could optimise it by performing ToString implementation in the constructor and caching the result in a field, then return the value of the field in ToString method?
Does it make sense to calculate the hashcode in constructor as well?

        private readonly byte[] _bytes;
        private readonly string _sha1;
        private readonly int _hashCode;

        public ObjectId(IEnumerable<byte> bytes)
        {
            _bytes = bytes?.ToArray() ?? throw new ArgumentNullException(nameof(bytes));

            // If performance here becomes a problem, review https://stackoverflow.com/q/311165/24874
            var hex = new StringBuilder(ObjectIdParser.Sha1CharCount);
            foreach (var b in _bytes)
                hex.AppendFormat("{0:x2}", b);
            _sha1 = hex.ToString();

            // calculate hashcode
            ....
        }

Furthermore, having extracted parsing logic into ObjectIdParser we can probably cache all known ObjectIds and reuse them, if necessary. This requires further exploration, just a thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, brain dump inbound...

Only way to decide on optimisations is to measure. If you want to cache the string then there's no point in storing the byte[] too. The string will be roughly four times the size in memory (40 characters, UTF-16), so you'd need context to make tradeoffs between memory/CPU. If you wanted to run such measurements I would look at optimising the existing ToString and GetHashCode methods first, as they're not the fastest they could be right now. They're already quite snappy, but I wouldn't be surprised if they could be made many times faster still.

I considered caching all object IDs. I've tried/seen attempts at this kind of pooling before and apart from some cases where there are many long-lived instances from a small distinct set of values, there's little benefit and it often tends to come out slower. I would imagine many of the ObjectId values would never get promoted from gen0, being collected very soon after construction. Keeping them small and light can help the GC do its job by reducing heap fragmentation and cache pressure. I don't think there's much perf to gain here and there's some potential problems, as well as slight complexity in managing the caches that you avoid if you don't have them. But again, it should be measured. The Linux kernel has ~750k commits. With the 20-byte array, plus 16 bytes overhead for the wrapper class and 12 bytes overhead for the array (minimums on x64), you're looking at ~63MB before you've even begun to store them in something you can look them up from. I just don't think it's going to be a saving. Plus the lookup collection will be in one part of memory/cache, and the wrapper/array in another, so you end up blowing away your CPU caches which tend to be helpful for the work you're actually trying to do.

Any any benefit you'd get on that scale would be offset by representing the byte array as IEnumerable<byte> then calling ToArray on it again. You want a fast path from parse to object readiness, with minimal allocations. Arrays are about as lean as you'll get without storing the 20 bytes of the SHA-1 in two ulongs and an uint. The representation is an internal consideration, not a place for abstraction, which is why I made the constructor private.

That's a lot of speculation on my part, especially given I said you one has to measure these things, but it's the kind stuff that would go through my head and lead me towards probably not even doing such comparative measurements until I knew there was a problem they might actually solve.

There are many opportunities to improve the performance and responsivity of GE that would yield orders-of-magnitude better results than such micro-optimisations. For example, have a look how much string concatenation goes on all over the place. These push the GC much harder than needed. And the async stuff I looked at in #4501 is an early (and largely incomplete) attempt at reducing the complete profligacy of creating new threads left and right to deal with blocking code. Running GE in a timeline profiler shows results that look completely different from most other applications because of this. Tonnes of threads doing almost no work.

I'll respond soon to the other comments here that I neglected. Apologies for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will decrease the size of GitRevision, but how much part is really the sha?
The Subject, Author etc should use more space already now.
Every byte could count and there may be other reasons for this PR.

For performance I find the unnecessary query of submodules, stash etc to be the most annoying. I have planned to do something about that but have been stuck on in my opinion meaningless test cases for other issues for some time.

@drewnoakes
Copy link
Member Author

@gerhardol

Sorry, I was very tired when writing this.

No problem. If I were more familiar with the project it'd have made sense I'm sure.

Your explanation was helpful. I have some questions/observations.

If new GitRevision("HEAD") is created, and the Parents populated with guids, then if HEAD moves isn't the object in an invalid state? Maybe in practice that's not currently a problem anywhere due to the large amounts of refreshing GE does, but it still feels unsettling to me.

For creating revisions, maybe RevisionGrid.GetRevision() should be used.

Would such a method be able to rev-parse any non-guid refs? Then we'd always have a guid in GitRevision.

What is the role of IGitItem? Here we have a pairing of a guid with a name.

I do not want to block the proposal, just raise the question that some manipulations gets a lot more complicated. It may not be a bad thing to force typing, I just want to raise the question.

The functionality/requirements should direct the type(s) and their usage, not the other way around. I think the exercise of introducing such types to replace string will raise some questions whose answers are currently only implied in the current code.

One challenge for a newcomer like me is terminology. In my research I noted many names for string variables that could hold an ObjectId, including sha, guid, revision, ref, commit. The mismatching names would be less of a problem if the variable's type carried more information. Of course, consolidating the names would be helpful too.

@drewnoakes
Copy link
Member Author

@RussKie

Xml-doc enable the intellisense, which is foremost used by developers. End-users don't want to know these details.

Complete agree. I would find more documentation on key types in GE helpful as a newcomer. I added a lot of docs to AsyncLoader in #4518 and in general document types I work on heavily. ObjectId is likely to evolve a lot before it gets merged, if ever. I'll document it once it stabilises.

For example, TryParse([CanBeNull] string hex, int offset, out ObjectId objectId) has two input parameters. I don't find their names particularly informative and hence are having difficulties reasoning about them.

Agree the names can be improved. The built-in type Parse methods call the string s.

offset in this case would be described as "The position within at which parsing should commence." I'm less clear on how to improve that name, but with the description hopefully it makes sense. Also this overload of Parse only exists for performance reasons (to avoid needing to allocate a sub-string -- see current corefx initiatives around Span<char> for what this might look like in future). The 'default' overload doesn't take an offset, but is different in that it fails if the string is not 40 characters long. This overload could be renamed ParseSubstring or something, but that's not idiomatic with framework code in my mind. Often when receiving an offset, a method will also receive a length/count, but in this case that value is redundant as it must be 40 (at least, according to my assumption that we only deal with full hashes).

.Net Fx has been designed over a decade ago when OOP was all the rage.

Good point. But one thing the team did well in my opinion is make a core library that offers lots of useful functionality built in, with clear and consistent idioms.

Before starting with .NET I did a fair bit of Java, and occasionally still do. Java's idioms favour much more abstraction and flexibility, but frankly I find I rarely need it and the rest of the time it's a pain. Classes such as AbstractSingletonProxyFactoryBean are the embodiment of this abstraction taken too far. Many OOP developers now favour composition over inheritance (ala FP), and the C# language team have given us tools to express code in increasingly functional ways, without sacrificing on performance.

We started off with OOP approach, but quickly found ourselves building an inflexible monolith which started becoming incoherent and buggy.

To be fair, you can do this in any language/paradigm. I worked on a monstrous Lisp codebase for a while. Monoliths form more because of mismanagement of complexity than because of any coding style.

I am finding all the same problems in the GE's codebase.

I don't disagree there's a lot that can be done to tidy the codebase up. I don't believe that a single programming doctrine can solve everything. Problems need to be identified, analysed, discussed and addressed pragmatically.

One can argue that GE is not a framework and behaviors substitution is no a concern

The fact that it's not a framework does allow for a more relaxed approach to API changes, which is nice. There can be more fluidity, and it's possible to remove APIs that are no longer used without upsetting users.

Not sure what you mean by behaviour substitution in this context. Could you elaborate please?

GE is greatly configurable and extensible but it is our downfall as well - our current implementations are often too rigid or too coupled to allow for new or changed requirements.

It is impressive just how much functionality is packaged into this one exe. It's maturity and breadth of scope is significant. I can see where rigidity may hamper evolution. This is often the case with code, unless you design for every eventuality.

Part of the issue I see is the coupling of view with model. But, that's WinForms's default mode unfortunately. Classes with thousands of lines of (non-generated) code.

I pretty much follow the MS internal guidelines: https://blogs.msdn.microsoft.com/brada/2005/01/26/internal-coding-guidelines/

That looks pretty close to what I used in 2005, and not too far off what I use now.

I think the biggest win on member ordering across the codebase now would be to get all fields (and by extension, auto-properties) together in one place, which is probably at the top of the file. Right now fields are often declared close to where they were first used, it seems.

Challenging my ideas about programming is one of the things I love about working on open source. So thanks for taking the time to share your history and perspective. GE is a very interesting codebase to work through as it spans almost a decade and has had so many developers work on it that there are many different ideas throughout. Reading through the code there's a sense of forensic analysis via which that story is told.

@drewnoakes drewnoakes changed the title Formalise type for object IDs [wip] Formalise type for object IDs Mar 9, 2018
@gerhardol
Copy link
Member

If new GitRevision("HEAD") is created, and the Parents populated with guids, then if HEAD moves isn't the object in an invalid state?

Sure. Such revisions are only used in modal forms like FormCommit and FormStash.

Would such a method be able to rev-parse any non-guid refs? Then we'd always have a guid in GitRevision.

Either the real guid must be used (like resolving HEAD or the stash "commit-ish sha"). Somehow doable.

I think the exercise of introducing such types to replace string will raise some questions whose answers are currently only implied in the current code.

Very likely...

What is the role of IGitItem? Here we have a pairing of a guid with a name.
In my research I noted many names for string variables that could hold an ObjectId, including sha, guid, revision, ref, commit.

If the item a Blob (file), Tree (folder) or a Commit (submodule).
Either there is something in Git revisions I have not fully understood, or it just happened to be confusing.

Of course, consolidating the names would be helpful too.

Next cleanup?

@drewnoakes
Copy link
Member Author

Went through and rebased this.

The revision is occasionally uses other revision parameters like HEAD

I removed the commit that converted GitRevision to ObjectId. Thanks again.

Some xml-doc would be nice to have to help in understanding the proposed API

Added XML docs too.

Added some more unit tests and fixed a bug that they turned up :)

@drewnoakes drewnoakes changed the title [wip] Formalise type for object IDs Formalise type for object IDs Mar 16, 2018
@drewnoakes
Copy link
Member Author

drewnoakes commented Mar 16, 2018

Some future opportunities for ObjectId include:

  • CommitData.ParentGuids
  • CommitData.ChildrenGuids
  • ILinkFactory.CreateCommitLink
  • GitSubmoduleInfo.CurrentCommitGuid

CommitData changes are best made after #4641.

GitSubmoduleInfo changes are best made after #4639.

@RussKie
Copy link
Member

RussKie commented Mar 17, 2018

Hehe, brain dump inbound...

Thank you Drew. Much appreciate your long and thoughtful response, I've had a ball of time reading it.

One challenge for a newcomer <...> many names for string variables that could hold an ObjectId, including sha, guid, revision, ref, commit.

I think this is common pain for many. I'm pretty I caused some pain to @gerhardol and other in my reviews seeking clarity.

Not sure what you mean by behaviour substitution in this context. Could you elaborate please?

In a nutshell - same interface different implementations.
E.g.

public interface IHeaderLabelFormatter
{
string FormatLabel(string label, int desiredLength);
string FormatLabelPlain(string label, int desiredLength);
}

public sealed class TabbedHeaderLabelFormatter : IHeaderLabelFormatter

public sealed class MonospacedHeaderLabelFormatter : IHeaderLabelFormatter

All of revision grid rendering can me tremendously simplified by creating specific renderers for each available layout:

private void RevisionsCellPainting(object sender, DataGridViewCellPaintingEventArgs e)
{
// If our loading state has changed since the last paint, update it.
if (Loading != null)
{
if (Loading.Visible != _isLoading)
{
Loading.Visible = _isLoading;
}
}
var columnIndex = e.ColumnIndex;
int graphColIndex = GraphDataGridViewColumn.Index;
int messageColIndex = MessageDataGridViewColumn.Index;
int authorColIndex = AuthorDataGridViewColumn.Index;
int dateColIndex = DateDataGridViewColumn.Index;
int idColIndex = IdDataGridViewColumn.Index;
int isMsgMultilineColIndex = IsMessageMultilineDataGridViewColumn.Index;
if (e.RowIndex < 0 || (e.State & DataGridViewElementStates.Visible) == 0)
{
return;
}
if (Revisions.RowCount <= e.RowIndex)
{
return;
}
var revision = GetRevision(e.RowIndex);
if (revision == null)
{
return;
}
var spi = SuperprojectCurrentCheckout.IsCompleted ? SuperprojectCurrentCheckout.Result : null;
var superprojectRefs = new List<IGitRef>();
if (spi?.Refs != null && spi.Refs.ContainsKey(revision.Guid))
{
superprojectRefs.AddRange(spi.Refs[revision.Guid].Where(ShowRemoteRef));
}
e.Handled = true;
var drawRefArgs = new DrawRefArgs
{
Graphics = e.Graphics,
CellBounds = e.CellBounds,
IsRowSelected = (e.State & DataGridViewElementStates.Selected) == DataGridViewElementStates.Selected
};
// Determine background colour for cell
Brush cellBackgroundBrush;
if (drawRefArgs.IsRowSelected /*&& !showRevisionCards*/)
{
cellBackgroundBrush = _selectedItemBrush;
}
else if (ShouldHighlightRevisionByAuthor(revision))
{
cellBackgroundBrush = _authoredRevisionsBrush;
}
else if (ShouldRenderAlternateBackColor(e.RowIndex))
{
cellBackgroundBrush = new SolidBrush(ColorHelper.MakeColorDarker(e.CellStyle.BackColor));
// TODO if default background is nearly black, we should make it lighter instead
}
else
{
cellBackgroundBrush = new SolidBrush(e.CellStyle.BackColor);
}
// Draw cell background
e.Graphics.FillRectangle(cellBackgroundBrush, e.CellBounds);
Color? backColor = null;
if (cellBackgroundBrush is SolidBrush)
{
backColor = (cellBackgroundBrush as SolidBrush).Color;
}
// Draw graphics column
if (e.ColumnIndex == graphColIndex)
{
Revisions.dataGrid_CellPainting(sender, e);
return;
}
// Determine cell foreground (text) colour for other columns
Color foreColor;
if (drawRefArgs.IsRowSelected)
{
foreColor = SystemColors.HighlightText;
}
else if (AppSettings.RevisionGraphDrawNonRelativesTextGray && !Revisions.RowIsRelative(e.RowIndex))
{
Debug.Assert(backColor != null, "backColor != null");
foreColor = Color.Gray;
// TODO: If the background colour is close to being Gray, we should adjust the gray until there is a bit more contrast.
while (ColorHelper.GetColorBrightnessDifference(foreColor, backColor.Value) < 125)
{
foreColor = ColorHelper.IsLightColor(backColor.Value) ? ColorHelper.MakeColorDarker(foreColor) : ColorHelper.MakeColorLighter(foreColor);
}
}
else
{
Debug.Assert(backColor != null, "backColor != null");
foreColor = ColorHelper.GetForeColorForBackColor(backColor.Value);
}
/*
if (!AppSettings.RevisionGraphDrawNonRelativesTextGray || Revisions.RowIsRelative(e.RowIndex))
{
foreColor = drawRefArgs.IsRowSelected && IsFilledBranchesLayout()
? SystemColors.HighlightText
: e.CellStyle.ForeColor;
}
else
{
foreColor = drawRefArgs.IsRowSelected ? SystemColors.HighlightText : Color.Gray;
}
*/
using (Brush foreBrush = new SolidBrush(foreColor))
{
var rowFont = NormalFont;
if (revision.Guid == CurrentCheckout /*&& !showRevisionCards*/)
{
rowFont = HeadFont;
}
else if (spi != null && spi.CurrentBranch == revision.Guid)
{
rowFont = SuperprojectFont;
}
if (columnIndex == messageColIndex)
{
int baseOffset = 0;
if (IsCardLayout())
{
baseOffset = 5;
Rectangle cellRectangle = new Rectangle(e.CellBounds.Left + baseOffset, e.CellBounds.Top + 1, e.CellBounds.Width - (baseOffset * 2), e.CellBounds.Height - 4);
if (!AppSettings.RevisionGraphDrawNonRelativesGray || Revisions.RowIsRelative(e.RowIndex))
{
e.Graphics.FillRectangle(
new LinearGradientBrush(cellRectangle,
Color.FromArgb(255, 220, 220, 231),
Color.FromArgb(255, 240, 240, 250), 90, false), cellRectangle);
using (var pen = new Pen(Color.FromArgb(255, 200, 200, 200), 1))
{
e.Graphics.DrawRectangle(pen, cellRectangle);
}
}
else
{
e.Graphics.FillRectangle(
new LinearGradientBrush(cellRectangle,
Color.FromArgb(255, 240, 240, 240),
Color.FromArgb(255, 250, 250, 250), 90, false), cellRectangle);
}
if ((e.State & DataGridViewElementStates.Selected) == DataGridViewElementStates.Selected)
{
using (var penSelectionBackColor = new Pen(Revisions.RowTemplate.DefaultCellStyle.SelectionBackColor, 1))
{
e.Graphics.DrawRectangle(penSelectionBackColor, cellRectangle);
}
}
}
float offset = baseOffset;
var gitRefs = revision.Refs;
drawRefArgs.RefsFont = IsFilledBranchesLayout() ? rowFont : RefsFont;
if (spi != null)
{
if (spi.Conflict_Base == revision.Guid)
{
offset = DrawRef(drawRefArgs, offset, "Base", Color.OrangeRed, ArrowType.NotFilled);
}
if (spi.Conflict_Local == revision.Guid)
{
offset = DrawRef(drawRefArgs, offset, "Local", Color.OrangeRed, ArrowType.NotFilled);
}
if (spi.Conflict_Remote == revision.Guid)
{
offset = DrawRef(drawRefArgs, offset, "Remote", Color.OrangeRed, ArrowType.NotFilled);
}
}
if (gitRefs.Any())
{
gitRefs.Sort((left, right) =>
{
if (left.IsTag != right.IsTag)
{
return right.IsTag.CompareTo(left.IsTag);
}
if (left.IsRemote != right.IsRemote)
{
return left.IsRemote.CompareTo(right.IsRemote);
}
if (left.Selected != right.Selected)
{
return right.Selected.CompareTo(left.Selected);
}
return left.Name.CompareTo(right.Name);
});
foreach (var gitRef in gitRefs.Where(head => (!head.IsRemote || AppSettings.ShowRemoteBranches)))
{
if (gitRef.IsTag)
{
if (!AppSettings.ShowTags)
{
continue;
}
}
Color headColor = GetHeadColor(gitRef);
ArrowType arrowType = gitRef.Selected ? ArrowType.Filled :
gitRef.SelectedHeadMergeSource ? ArrowType.NotFilled : ArrowType.None;
drawRefArgs.RefsFont = gitRef.Selected ? rowFont : RefsFont;
var superprojectRef = superprojectRefs.FirstOrDefault(superGitRef => gitRef.CompleteName == superGitRef.CompleteName);
if (superprojectRef != null)
{
superprojectRefs.Remove(superprojectRef);
}
string name = gitRef.Name;
if (gitRef.IsTag
&& gitRef.IsDereference // see note on using IsDereference in CommitInfo class.
&& AppSettings.ShowAnnotatedTagsMessages
&& AppSettings.ShowIndicatorForMultilineMessage)
{
name = name + " " + MultilineMessageIndicator;
}
offset = DrawRef(drawRefArgs, offset, name, headColor, arrowType, superprojectRef != null, true);
}
}
for (int i = 0; i < Math.Min(MaxSuperprojectRefs, superprojectRefs.Count); i++)
{
var gitRef = superprojectRefs[i];
Color headColor = GetHeadColor(gitRef);
var gitRefName = i < (MaxSuperprojectRefs - 1) ? gitRef.Name : "";
ArrowType arrowType = gitRef.Selected ? ArrowType.Filled :
gitRef.SelectedHeadMergeSource ? ArrowType.NotFilled : ArrowType.None;
drawRefArgs.RefsFont = gitRef.Selected ? rowFont : RefsFont;
offset = DrawRef(drawRefArgs, offset, gitRefName, headColor, arrowType, true, false);
}
if (IsCardLayout())
{
offset = baseOffset;
}
var text = (string)e.FormattedValue;
var bounds = AdjustCellBounds(e.CellBounds, offset);
RevisionGridUtils.DrawColumnText(e.Graphics, text, rowFont, foreColor, bounds);
if (IsCardLayout())
{
int textHeight = (int)e.Graphics.MeasureString(text, rowFont).Height;
int gravatarSize = _rowHeigth - textHeight - 12;
int gravatarTop = e.CellBounds.Top + textHeight + 6;
int gravatarLeft = e.CellBounds.Left + baseOffset + 2;
var imageName = _avatarImageNameProvider.Get(revision.AuthorEmail);
var gravatar = _avatarCache.GetImage(imageName, null);
if (gravatar == null)
{
gravatar = Resources.User;
// kick off download operation, will likely display the avatar during the next round of repaint
_gravatarService.GetAvatarAsync(revision.AuthorEmail, AppSettings.AuthorImageSize, AppSettings.GravatarDefaultImageType);
}
e.Graphics.DrawImage(gravatar, gravatarLeft + 1, gravatarTop + 1, gravatarSize, gravatarSize);
e.Graphics.DrawRectangle(Pens.Black, gravatarLeft, gravatarTop, gravatarSize + 1, gravatarSize + 1);
string authorText;
string timeText;
if (_rowHeigth >= 60)
{
authorText = revision.Author;
timeText = TimeToString(AppSettings.ShowAuthorDate ? revision.AuthorDate : revision.CommitDate);
}
else
{
timeText = string.Concat(revision.Author, " (", TimeToString(AppSettings.ShowAuthorDate ? revision.AuthorDate : revision.CommitDate), ")");
authorText = string.Empty;
}
e.Graphics.DrawString(authorText, rowFont, foreBrush,
new PointF(gravatarLeft + gravatarSize + 5, gravatarTop + 6));
e.Graphics.DrawString(timeText, rowFont, foreBrush,
new PointF(gravatarLeft + gravatarSize + 5, e.CellBounds.Bottom - textHeight - 4));
}
}
else if (columnIndex == authorColIndex)
{
var text = (string)e.FormattedValue;
e.Graphics.DrawString(text, rowFont, foreBrush,
new PointF(e.CellBounds.Left, e.CellBounds.Top + 4));
}
else if (columnIndex == dateColIndex)
{
var time = AppSettings.ShowAuthorDate ? revision.AuthorDate : revision.CommitDate;
var text = TimeToString(time);
e.Graphics.DrawString(text, rowFont, foreBrush,
new PointF(e.CellBounds.Left, e.CellBounds.Top + 4));
}
else if (columnIndex == idColIndex)
{
if (!revision.IsArtificial)
{
// do not show artificial GUID
var text = revision.Guid;
var rect = RevisionGridUtils.GetCellRectangle(e);
RevisionGridUtils.DrawColumnText(e.Graphics, text, _fontOfSHAColumn,
foreColor, rect);
}
}
else if (columnIndex == BuildServerWatcher.BuildStatusImageColumnIndex)
{
BuildInfoDrawingLogic.BuildStatusImageColumnCellPainting(e, revision);
}
else if (columnIndex == BuildServerWatcher.BuildStatusMessageColumnIndex)
{
BuildInfoDrawingLogic.BuildStatusMessageCellPainting(e, revision, foreColor, rowFont);
}
else if (AppSettings.ShowIndicatorForMultilineMessage && columnIndex == isMsgMultilineColIndex)
{
var text = (string)e.FormattedValue;
e.Graphics.DrawString(text, rowFont, foreBrush,
new PointF(e.CellBounds.Left, e.CellBounds.Top + 4));
}
}
}
private bool ShouldHighlightRevisionByAuthor(GitRevision revision)
{
return AppSettings.HighlightAuthoredRevisions &&
AuthorEmailEqualityComparer.Instance.Equals(revision.AuthorEmail,
_revisionHighlighting.AuthorEmailToHighlight);
}
private static bool ShouldRenderAlternateBackColor(int rowIndex)
{
return AppSettings.RevisionGraphDrawAlternateBackColor && rowIndex % 2 == 0;
}
private float DrawRef(DrawRefArgs drawRefArgs, float offset, string name, Color headColor, ArrowType arrowType, bool dashedLine = false, bool fill = false)
{
var textColor = fill ? headColor : Lerp(headColor, Color.White, 0.5f);
if (IsCardLayout())
{
using (Brush textBrush = new SolidBrush(textColor))
{
string headName = name;
offset += drawRefArgs.Graphics.MeasureString(headName, drawRefArgs.RefsFont).Width + 6;
var location = new PointF(drawRefArgs.CellBounds.Right - offset, drawRefArgs.CellBounds.Top + 4);
var size = new SizeF(drawRefArgs.Graphics.MeasureString(headName, drawRefArgs.RefsFont).Width,
drawRefArgs.Graphics.MeasureString(headName, drawRefArgs.RefsFont).Height);
if (fill)
{
drawRefArgs.Graphics.FillRectangle(SystemBrushes.Info, location.X - 1,
location.Y - 1, size.Width + 3, size.Height + 2);
}
drawRefArgs.Graphics.DrawRectangle(SystemPens.InfoText, location.X - 1,
location.Y - 1, size.Width + 3, size.Height + 2);
drawRefArgs.Graphics.DrawString(headName, drawRefArgs.RefsFont, textBrush, location);
}
}
else
{
string headName = IsFilledBranchesLayout()
? name
: string.Concat("[", name, "] ");
var headBounds = AdjustCellBounds(drawRefArgs.CellBounds, offset);
SizeF textSize = drawRefArgs.Graphics.MeasureString(headName, drawRefArgs.RefsFont);
offset += textSize.Width;
if (IsFilledBranchesLayout())
{
offset += 9;
float extraOffset = DrawHeadBackground(drawRefArgs.IsRowSelected, drawRefArgs.Graphics,
headColor, headBounds.X,
headBounds.Y,
RoundToEven(textSize.Width + 3),
RoundToEven(textSize.Height), 3,
arrowType, dashedLine, fill);
offset += extraOffset;
headBounds.Offset((int)(extraOffset + 1), 0);
}
RevisionGridUtils.DrawColumnText(drawRefArgs.Graphics, headName, drawRefArgs.RefsFont, textColor, headBounds);
}
return offset;
}

Commit information tab has two different layouts as well and these are as well good candidates for alternate implementations.

Hope this makes sense.

Part of the issue I see is the coupling of view with model. But, that's WinForms's default mode unfortunately. Classes with thousands of lines of (non-generated) code.

Yes, this makes it close to impossible to use a different UI framework. Hence I've started moving certain parts of logic into controller classes, which should dumb-down UI controls.
The controller classes also make testing easier.

So thanks for taking the time to share your history and perspective.... Reading through the code there's a sense of forensic analysis via which that story is told.

Indeed, some parts of the codebase are fascinating and some parts feel like crawling through an abandoned mine about to cave it.... 😆
I've only been part of this project for past few years. @spdr870, @KindDragon, @jbialobr and the community did a tremendous job building the app.

@drewnoakes drewnoakes force-pushed the object-id branch 3 times, most recently from e08ae10 to 73963ef Compare March 21, 2018 16:50
@drewnoakes drewnoakes force-pushed the object-id branch 2 times, most recently from 72244cb to 5553691 Compare March 28, 2018 09:16
@drewnoakes drewnoakes force-pushed the object-id branch 2 times, most recently from 2cc9017 to 73fae02 Compare April 9, 2018 12:36
I had some USB device plugged in which took D: and ran very slowly,
causing tests to stall here. C: is safer.
In this commit we start to see calls to ObjectId.Parse being replaced
with the pre-parsed value. This pattern will continue as more APIs
are migrated to ObjectId.
@RussKie
Copy link
Member

RussKie commented Apr 10, 2018 via email

@drewnoakes
Copy link
Member Author

I need this for another PR so have folded it into that.

@KindDragon
Copy link
Contributor

I think a better use libgit2sharp and their class ObjectId

@drewnoakes
Copy link
Member Author

I think a better use libgit2sharp and their class ObjectId

That class uses a tonne more memory than just a 40-character string. We can do better.

I'll create a PR once a few more things are sorted.

@KindDragon
Copy link
Contributor

That class uses a tonne more memory than just a 40-character string.

Only 1.5 times more :)
You can use GitOid it uses 2 times less memory than string

@RussKie
Copy link
Member

RussKie commented Apr 12, 2018 via email

@KindDragon
Copy link
Contributor

I think we just need libgit2sharp for revision graph

@drewnoakes
Copy link
Member Author

I will submit a PR soon that reworks the revision graph data loading to use much less memory and CPU.

@drewnoakes
Copy link
Member Author

Only 1.5 times more :)
You can use GitOid it uses 2 times less memory than string

There is a fixed cost associated with each additional object, so the number is slightly higher. The child objects also hurt in terms of indirection and locality of reference.

I am working with a solution that's lighter and faster. I looked at libgit2sharp before and, at least for this type, believe a custom solution will be better.

I'll include a lot more detail in the coming PR and we can discuss there.

@drewnoakes drewnoakes deleted the object-id branch June 2, 2018 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants