Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitModule Encoding lazy encoding #6359

Merged

Conversation

gerhardol
Copy link
Member

@gerhardol gerhardol commented Mar 10, 2019

Part of #6357

Proposed changes

This fixes the performance problem I see in my "real" repos even if the core problem should be fixed too.

Test methodology

See #6357 for a test setup
This is verified by git-rev-parse calls for only the modules with changes

Test environment(s)

See #6357 , a submodule with changes required

✒️ I contribute this code under The Developer Certificate of Origin.

@ghost ghost assigned gerhardol Mar 10, 2019
@ghost ghost added the status: ready label Mar 10, 2019
@gerhardol
Copy link
Member Author

CodeFactor reports complexity on methods where the signature was changed only, no logic changed.
I propose those are handled separately.

@codecov
Copy link

codecov bot commented Mar 10, 2019

Codecov Report

Merging #6359 into master will increase coverage by 0.05%.
The diff coverage is 65%.

@@            Coverage Diff             @@
##           master    #6359      +/-   ##
==========================================
+ Coverage   46.62%   46.67%   +0.05%     
==========================================
  Files         687      687              
  Lines       51666    51670       +4     
  Branches     6807     6807              
==========================================
+ Hits        24087    24118      +31     
+ Misses      26268    26222      -46     
- Partials     1311     1330      +19
Flag Coverage Δ
#production 36.14% <12.5%> (+0.07%) ⬆️
#tests 97.53% <100%> (-0.03%) ⬇️

Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the proposed API changes are safe, and in the long run they may become very problematic.

If we wish to stop re-querying the encoding, how about we pass encoding as an optional parameter when constructing a GitModule instance?

GitCommands/Git/GitModule.cs Outdated Show resolved Hide resolved
GitCommands/Submodules/SubmoduleStatusProvider.cs Outdated Show resolved Hide resolved
@gerhardol
Copy link
Member Author

@RussKie @amaiorano
Any comments? I would like to merge this and base the changes @jbialobr is doing on top of it
(this change basically solves the problem for me with a little under 20 submodules in total (the problem itself can easily be reproduced with more submodules, but this helps).

@amaiorano
Copy link
Contributor

amaiorano commented Mar 19, 2019

@RussKie @amaiorano
Any comments? I would like to merge this and base the changes @jbialobr is doing on top of it
(this change basically solves the problem for me with a little under 20 submodules in total (the problem itself can easily be reproduced with more submodules, but this helps).

I've looked over the diffs here in the browser, and I don't really understand this change. For instance, I see that GitModule.GetSingleDiff accepts an Encoding, and from that point on, it passes down a Lazy<Encoding> to PatchProcessor.CreatePatchesFromString, which eventually gets down to PatchProcessor.CreatePatchFromString, which might invoke filesContentEncoding.Value. What I don't understand is how this helps performance; isn't Encoding just a reference type? Passing it down the call stack shouldn't have any extra cost, right? I'm sure I'm just not seeing it...

@gerhardol
Copy link
Member Author

What I don't understand is how this helps performance

When Encoding is passed as a parameter, it is evaluated directly when calling, which requires a git-rev-parse call. However, Encoding is only needed when there are actual differences, so most Encoding calculations are not needed.

If Lazy is used, it is evaluated when it is first used.

@amaiorano
Copy link
Contributor

When Encoding is passed as a parameter, it is evaluated directly when calling, which requires a git-rev-parse call. However, Encoding is only needed when there are actual differences, so most Encoding calculations are not needed.

If Lazy is used, it is evaluated when it is first used.

You mean that for every function call where Encoding was being passed, it gets evaluated? Like when it first gets passed to GitModule.GetSingleDiff, is it evaluated then? And when it then gets passed to PatchProcessor.CreatePatchesFromString (before your change), it gets evaluated again? And what does evaluation mean? What gets called on Encoding? I don't understand because Encoding is a class, therefore a reference type, so passing an instance just means passing a reference around, right?

I feel like I'm missing something super obvious... sorry.

@vbjay
Copy link
Contributor

vbjay commented Mar 19, 2019

@amaiorano
Copy link
Contributor

amaiorano commented Mar 19, 2019

https://docs.microsoft.com/en-us/dotnet/api/system.lazy-1

Yes, I know what Lazy does, but in the specific use case I outline above, we create a Lazy around an instance that already exists and has been passed in:

            var patches = PatchProcessor.CreatePatchesFromString(patch, new Lazy<Encoding>(() => encoding)).ToList();

Normally, Lazy is provided with a factory function to create an instance lazily. That's what I don't understand here. If the instance has already been created, then what are we saving on? Passing a reference type as argument doesn't cost anything.

EDIT: So looking over the change again, I can see that in most places, it makes sense to use Lazy. I think the problem is that the very first diff that GitHub shows is for one where it doesn't help at all: the one in GitModule.GetSingleDiff, since the Lazy that gets created simply returns the input parameter. Okay, so it's all good! 👍 👍 👍

EDIT 2: And to clarify, the problem that this patch fixes is the instantiation of Encoding instances via the static Encoding.UTF8 property.

@gerhardol
Copy link
Member Author

gerhardol commented Mar 19, 2019

@amaiorano

public Encoding FilesEncoding => EffectiveConfigFile.FilesEncoding ?? new UTF8Encoding(false);

The EffectiveConfigFile getter will require git-rev-parse to get the current config.

As the original code passed FilesEncoding reference, EffectiveConfigFile had to be evaluated.

https://github.com/gitextensions/gitextensions/pull/6359/files#diff-05956b8e9b35344894a0ffe609cf32b9L2619

If the value is needed, not much other than to calculate it. So if all submodules are changed, this PR will not improve anything. However, the value is only required when there are changes (and the normal scenario with submodules is few changes, otherwise submodules are a pain):

https://github.com/gitextensions/gitextensions/pull/6359/files#diff-d832ec87de5903d3b8ffd40176d06050L273

For other than submodules, GE normally knows when there are changes. For submodules we cannot know (without calling other commands first).

@gerhardol
Copy link
Member Author

Unless there are protests, I will merge this tomorrow

The patch formatting may evaluate that no changes have been done
As the encoding is an input to patch formatting, speculative evaluation of
FileEncoding would require a separate git-rev-parse call.
Especially for evaluating submodule status changes (where there are normally no
 changes), this would slow down the I/O.
@RussKie
Copy link
Member

RussKie commented Mar 21, 2019 via email

@gerhardol gerhardol merged commit 7a5680d into gitextensions:master Mar 21, 2019
@ghost ghost removed the status: ready label Mar 21, 2019
@gerhardol gerhardol deleted the feature/i6357-lazy-encoding branch March 21, 2019 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants