Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit data #4641

Merged
merged 12 commits into from Mar 19, 2018
Merged

Commit data #4641

merged 12 commits into from Mar 19, 2018

Conversation

drewnoakes
Copy link
Member

Review of CommitData and CommitDataManager classes.

Changes proposed in this pull request:

  • Introduce GitModule.IsGitErrorMessage method for standard detection of error messages in output strings
  • Extract common code into helper methods
  • Fix git response validation
  • Annotate API
  • Minor perf and style fixes
    • Remove redundant parameter
    • Use 'out' over 'ref'
    • Reorder members

What did I do to test the code and ensure quality:

  • Manual testing
  • Unit testing

Has been tested on:

  • GIT 2.16
  • Windows 10

@drewnoakes
Copy link
Member Author

I was wondering if it makes sense to split the commit message from any notes. Currently they're both merged into the Body property. Really the commit message is immutable for a given SHA-1. It's only the notes that can change. The code could be modified to reflect this more clearly.

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

Merging #4641 into master will increase coverage by 0.01%.
The diff coverage is 20%.

@@            Coverage Diff             @@
##           master    #4641      +/-   ##
==========================================
+ Coverage   26.68%   26.69%   +0.01%     
==========================================
  Files         501      501              
  Lines       40752    40713      -39     
  Branches     5961     5957       -4     
==========================================
- Hits        10873    10869       -4     
+ Misses      29379    29344      -35     
  Partials      500      500
Impacted Files Coverage Δ
GitCommands/Git/GitModule.cs 8.54% <0%> (ø) ⬆️
GitUI/UserControls/RevisionGrid.cs 8.7% <0%> (ø) ⬆️
GitUI/CommandsDialogs/FormBrowse.cs 5% <0%> (ø) ⬆️
GitUI/CommandsDialogs/FormFileHistory.cs 13.4% <0%> (+0.04%) ⬆️
ResourceManager/LocalizationHelpers.cs 16.8% <0%> (+0.41%) ⬆️
GitUI/CommitInfo/CommitInfo.cs 11.66% <0%> (+0.03%) ⬆️
GitCommands/CommitData.cs 100% <100%> (ø) ⬆️
...nitTests/GitCommandsTests/CommitDataManagerTest.cs 100% <100%> (ø) ⬆️
GitCommands/CommitDataManager.cs 42.37% <17.39%> (+11.26%) ⬆️
UnitTests/CommonTestUtils/GitModuleTestHelper.cs 70% <0%> (-2.5%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9949704...f2f9389. Read the comment docs.

/// <returns><c>true</c> if the command detailed an error, otherwise <c>false</c>.</returns>
public static bool IsGitErrorMessage(string gitOutput)
{
return Regex.IsMatch(gitOutput, @"^\s*(error:|fatal)");
Copy link
Contributor

@mdonatas mdonatas Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's more efficient to have static readonly instances of Regex created with .Compiled option whenever possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's times like this I wish C# had static locals, like C++ does. Seems a shame to introduce a type-level field for something that's only used in one member. We now have nested functions. Why not nested static fields too?

I'd be interested in some benchmark results here, as well as info on runtime compilation times (consider it might cache DynamicMethod internally or similar). I suspect the difference is negligible, especially for this regex pattern with little seeking about. And there's a trade-off: the compilation of such static Regex objects would happen when their containing type first loads, meaning more latency when GE starts up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the wins are probably low since I guess this method is called rarely but in a tight loop this would cause a measurable GC pressure since it appears each static call creates 208B of objects to clean up.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up. Given we're talking nanos (not even micros) I don't think there's much to do here. Interesting to see the difference all the same. Allocation stats are interesting too. Will keep that in mind for future.

Are there any tight loops in GE? I can't think of any off the top of my head from what I've seen so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. I was glad and disappointed at the same time that execution took only nanos :D CPU perf wise this is absolutely not an issue, so only mem might be of a concern (not in this case though).
At some future time I'll do some benchmarking with proper tools to see where actual perf-hogs are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RussKie, sorry I missed your comment.

Dealing with any code base is like playing chess. There is always a move :) One must think both short and long term.

Now that I've spent a bit of time with GE I feel like my first proper contribution will be around consolidating git process execution, and the strategy for GitModule refactoring (#4643). I see a series of moves that get us from where we are now to a point where the code is clearer, with better abstractions and is more testable. The design and coding is the easy part :) The effort on my part is to communicate effectively and ensure the solution meets the goals of the team, including the goals I don't yet see.

I have a working implementation of the git process execution code and would like to get it in front of you all. It's blocked by the few open PRs here (#4641, #4639 and especially #4662), so I want to do whatever I can to get these merged or closed and move forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to get to them as soon as can.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effort on my part is to communicate effectively and ensure the solution meets the goals of the team, including the goals I don't yet see.

👍
There are few things which are important to me in any project, applicable to both UI and API - consistency, usability, readability, testability and maintainability. These are interconnected and compliment one another.
SOLID principles seem to help a lot in achieving the above, hence I'm so obsessed with them :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealing with any code base is like playing chess. There is always a move :) One must think both short and long term.

Yes, however OS has its caveats - if at (paid) work one has a greater degree of commitment and hence can afford greater degree of trust in promises of future delivery, in the OS world the degree of commitment is lower - family, paid work and other real life commitments have higher priorities (understandably so). People also lose interest...
In this case a promise of future refactor or rework may never be fulfilled, and the codebase would left with a rot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People also lose interest...

I've run open source projects for a long while and know how it goes. GitHub is filled with well intentioned half-promises to look into things and, well, life gets in the way or it turns out to be harder than expected, or whatever.

In my case I hope you can tell that I'm at least a bit invested in and committed to this project. How long that will last, I don't know. But for now I have energy and time.

But I don't think that matters. These PRs all improve the code along at least one of the dimensions you mentioned, leaving things in better shape than they were beforehand. I don't see any unfulfilled promise that would lead to rot, but maybe I suffer from a biased perspective.

GE isn't very testable right now. I'm happy to help kick things in that direction.

Oh and thanks for your kind words earlier. GE is also extremely lucky to have someone as committed as you taking the lead and driving things forward. I know how much time it takes.

public CommitData CreateFromRevision(GitRevision revision)
{
if (revision == null)
{
throw new ArgumentNullException(nameof(revision));
}

CommitData data = new CommitData(revision.Guid, revision.TreeGuid, revision.ParentGuids.ToList().AsReadOnly(),
return new CommitData(revision.Guid, revision.TreeGuid, revision.ParentGuids.ToList().AsReadOnly(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are optimizing.. string.Format could be replaced with string.Concat. string.Format has a rather high overhead of parsing a template. In this particular case strings can simply be concatenated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some BenchmarkDotNet measurements and the differences aren't as high as I thought. string.Concat is faster but by only ~20%. 254ms vs 308ms in my specific test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differences as percentages for this kind of thing seem more significant than they are in reality. A handful of milliseconds difference for a million operations isn't worth worrying about IMO. For that many operations, you're churning so much memory that you have other problems. Go for whatever is easiest to read. In this case, I would convert the lines just beneath your comment to string interpolation. There are many places in the current code that I'd use interpolation just for the clarity it brings, but that's another PR for another day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your position for high level code / code which is executed relatively rarely. But for code which is executed 1000s of times every single optimization counts because these small seemingly unimportant details add up.
In this specific case it seems I've made a mistake assuming this would be executed on all parsed history revisions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for whatever is easiest to read

Amen. I choose readability and maintainability over optimisation unless it is proven to be a bottle neck

@RussKie RussKie added this to the 3.00 milestone Mar 16, 2018
@RussKie
Copy link
Member

RussKie commented Mar 16, 2018 via email

Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do partial squash(es) of benign/non functional/uninteresting changes (e.g. typos, decorations etc) to reduce the visual noise on the trunk?

@@ -3,7 +3,7 @@

namespace GitCommands
{
public class CommitData
public sealed class CommitData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as to why you chose to seal DTO/POCO?
I typically seal behavioural types since I don't like inheritance and can't guarantee that subclasses would not break the expected behaviours. However I leave POCOs open for extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to seal everything that doesn't have subclasses, even in library code. Once someone uses your API, you can unseal without breaking code, but not the other way around.

I rarely use inheritance when extending capabilities. Composition tends to be more flexible and powerful. When I do use inheritance the base class would have something virtual/abstract most of the time.

While there's no agreed upon definition of POCO, I generally think of a type that doesn't have any real behaviour, nor participate in any inheritance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
My experience is different. When you have a single version of software - it may be acceptable to rework the API. However it is a completely different story when you have multiple versions of API consumed by numerous projects...

@@ -15,17 +16,20 @@ public interface ICommitDataManager
/// <param name="data">Data produced by a <c>git log</c> or <c>git show</c> command where <c>--format</c>
/// was provided the string <see cref="CommitDataManager.LogFormat"/>.</param>
/// <returns>CommitData object populated with parsed info from git string.</returns>
CommitData CreateFromFormatedData(string data);
[NotNull]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of expect to be laughed at for a noob question, but - what do we get with these attribute decorations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum, these attributes serve as documentation of intent around whether values can or can not be null.

If you use R#, then they enable warnings about potential NullReferenceExceptions, (eg. passing null to a method that says it cannot be null, or using a nullable return value without checking first), and redundant code (eg. checking something for null that can never be null).

The act of annotating methods also makes you think a little more about a method's behaviour, or a field's invariants.

Hopefully in C# 8 support for nullable reference types will land, then the attributes can be converted to actual language constructs.

That is:

// before

[Nullable] private Bar _bar;
[NotNull] private Baz _baz;

[Nullable] public string Foo([NotNull] string s) { ... }

// after

private Bar? _bar;
private Baz _baz;

public string? Foo(string s) { ... }

Annotating now will help with this migration later.

The above covers [Nullable] and [NotNull] annotations, which are the most common. There are also:

  • [Pure] for a method with no observable side effects (can be hoisted, and return value should not be discarded)
  • [InstantHandle] for IEnumerable<T> and delegate parameters that are fully evaluated before the method returns
  • [ItemNotNull] and [ItemNullable] for collection and Task<T> types
  • [ContractAnnotation] that specifies how inputs and outputs relate to each other (eg. if the input is null the method will return false, allowing R# to flag something like if (!FalseIfNull(notNullValue)) { /* unreachable code */ } and so forth)

You can read more about them here.

All these attributes are defined conditionally meaning they do not end up in the compiled binary, so there's no overhead to using them.

@@ -206,6 +179,25 @@ public CommitData CreateFromRevision(GitRevision revision)
revision.Body ?? revision.Subject);
}

[ContractAnnotation("=>false,error:notnull,data:null")]
[ContractAnnotation("=>true,error:null,data:notnull")]
private bool TryGetCommitLog([NotNull] string commitId, [NotNull] string format, out string error, out string data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move it after GetModule() to help with A-Z order and over maintainability

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll move it.

My rationale for that positioning: It's a private method only used by the two methods immediately before it. They're logically related to one another. GetModule is in turn used by that, so introduced later still. When there are only a few members to a type, I tend to order them top to bottom. I've seen other codebases (three.js comes to mind) where members are ordered bottom to top, so as you read down the class you see the definitions before they're used.

@vbjay
Copy link
Contributor

vbjay commented Mar 17, 2018 via email

@drewnoakes
Copy link
Member Author

try profiling the load of the settings dialog - VS shows half a dozen of GC runs when the dialog is being opened...

I'll try profiling the settings panel in dotMemory and see what comes back.

@drewnoakes
Copy link
Member Author

Is it possible to do partial squash(es) of benign/non functional/uninteresting changes (e.g. typos, decorations etc) to reduce the visual noise on the trunk?

Sure, I'll squash some stuff. Would you be against just squashing the whole PR?

This is a combination of 9 commits:
- Use params overload
- Preconditions first
- Inline temporaries
- Use IReadOnlyList<T> over ReadOnlyCollection<T>
- Avoid temporary string and repeat lookup
- Reorder properties
- Seal DTO
- Replace consts with named arguments
- Reorder members to place consts before fields
Old test (IndexOf + 40) meant (index < 0) could never trigger.
Also (IndexOf + 40 > Length) makes little sense.

Just need to verify that output string contains the requested SHA-1.
The sha1 parameter must match CommitData.Guid.
There's no point passing it. It only creates opportunity for bugs.
Note this shows how many callers ignore any error message.
Also rename UpdateCommitMessage to UpdateBody, as the actual
property updated on CommitInfo is Body.
@drewnoakes
Copy link
Member Author

Changes done.

@RussKie
Copy link
Member

RussKie commented Mar 18, 2018

Sure, I'll squash some stuff. Would you be against just squashing the whole PR?

The PR contains a number of changes - some functional, some aren't. Squashing them all into the same commit doesn't feel right.
I'd like to see functional changes separate from non-functional.

@drewnoakes
Copy link
Member Author

@RussKie I did actually squash quite a few of the commits here down as far as I could go without hitting conflicts big enough to discourage me from spending a lot of time on it. I get where you're coming from and will endeavour to make fewer commits in future. For my own projects I don't worry, and usually appreciate small commits when discovering bugs and bisecting. But if you're concerned about noise in the history, I'll go coarser.

try profiling the load of the settings dialog - VS shows half a dozen of GC runs when the dialog is being opened

I ran a memory profile while opening settings and the majority of the heap allocations were strings related to XML parsing.

If there's anything else you need from this PR, please let me know.

@RussKie RussKie merged commit 6800ecf into gitextensions:master Mar 19, 2018
@drewnoakes drewnoakes deleted the commit-data branch March 19, 2018 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants