Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 140 #142

Merged
merged 2 commits into from
Mar 1, 2020
Merged

Issue 140 #142

merged 2 commits into from
Mar 1, 2020

Conversation

bgreen1005
Copy link
Contributor

@bgreen1005 bgreen1005 commented Feb 21, 2020

140: This commit fixes issue 140 (#140) by adding back in the code that iterates through the file path and finds a file that matches. I have also optimized the lookup by getting all of the file paths in one go. This drastically improves performance for jacoco. One thing to note, I've added some code to log duplicates. This is due to the fact that Jacoco only reports the class name, not the directory and so duplicates are possible.

I noticed a massive performance improvement with these changes. Generating our code coverage report has dropped from about 8 minutes to 5 or 6 seconds.

I faced a few issues with the POM file, and had to update some dependencies. I wasn't sure if that was just a local problem, so haven't checked in the changes but I can do if need be.

I've also added a few fixmes due to unnecessary code in DefaultSourceFileResolver. Happy to fix if you'd like :)

…ates through the file path and finds a file that matches.
@bgreen1005 bgreen1005 changed the title 140: This commit fixes issue 140: https://github.com/jenkinsci/code-coverage-api-plugin/issues/140 Issue 140 Feb 21, 2020
Copy link
Contributor

@cizezsy cizezsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your fix, I have left some comments :P

private Map<String, FilePath> createSourceFileMapping(FilePath workspace, TaskListener listener) {
try {
return Arrays
.stream(workspace.list("**/*.java"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin also supports other programming languages like JavaScript, C#, C++, so this might not work for these languages.

Maybe we can use another optimization method here. We used FilePath(workspace).list("**/" + sourceFilePath) to search source file for each source file before, it is really slow and does a lot of repeated work.

But, most of the source files have the same parent directory, we can use the wildcard to search the source file if we are unable to find it directly under the workspace and possiblePaths. Once we successfully find the source file by using the wildcard, we will add its parent path to possiblePaths so that we can reuse it to avoid search next time.

What do you think of this change? Thanks~

Copy link
Contributor Author

@bgreen1005 bgreen1005 Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cizezsy Ahh sorry, I totally mean to change that from **/*.java to **/*, but i'm happy to make the changes you've suggested if you think that's preferable? Just to verify what it is you're looking for there: in tryFindSourceFile, instead of looking at the wildcard every time, we check to find the file in our list of possible paths. If we can't find it there, we load it with the wildcard and add the parent path to the list of possible paths?

Does doing that also resolve your other comment about doing the calculating in the agent instead of master? I guess it does?

It feels like that wouldn't give us anywhere near the performance benefits that the above change does (assuming I change **/*.java to **/*. What are your thoughts on that?

Copy link
Contributor

@cizezsy cizezsy Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify what it is you're looking for there: in tryFindSourceFile, instead of looking at the wildcard every time, we check to find the file in our list of possible paths. If we can't find it there, we load it with the wildcard and add the parent path to the list of possible paths?

Yes.

Does doing that also resolve your other comment about doing the calculating in the agent instead of master? I guess it does?

Yes :)

It feels like that wouldn't give us anywhere near the performance benefits that the above change does (assuming I change **/.java to **/. What are your thoughts on that?

Yeah, you are right, that will not increase the performance compare with your changes. But, in your implementation, you used the file name as the key of sourceFileMapping. And sourceFilePath might not be a file name instead of a path of the source file(like Cobertura). You won't get a correct source file form the map in this situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cizezsy This change should have no impact on Corbertura at all. Previously at the end of tryFindSourceFile, we were just returning null. That will be exactly the same if the source file mapping isn't found in our map. Also, tryFindSourceFile has lots of code to handle the case where the source file is absolute, so Corbertura shouldn't need the fallback anyway I don't think. We have about 7000 classes, and this change has improved performance from about 10 minutes to 5 seconds. We'd really like it if we could keep those performance benefits 🤞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgreen1005 Sounds reasonable to me :p

I will test the changes on my computer. If there are no other problems, we can get this merged. Thanks~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cizezsy Perfect! Thank you 🙂 Hopefully no issues 🤞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cizezsy How's it looking? Have you had chances to check this? 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgreen1005 I will try it this weekend. Sorry for the delay, I am very busy this week :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cizezsy no problem 🙂 Thanks for the heads up.

@@ -78,8 +80,10 @@ public void resolveSourceFiles(Run<?, ?> run, FilePath workspace, TaskListener l

listener.getLogger().printf("%d source files need to be copied.%n", paints.size());

final Map<String, FilePath> sourceFileMapping = createSourceFileMapping(workspace, listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to do calculating in the agent to reduce the Jenkins master workload.

Copy link
Contributor Author

@bgreen1005 bgreen1005 Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cizezsy Sorry for the stupid question, but would you mind explaining how you can tell which of this is run on master and which is run on agents?

Copy link
Contributor

@cizezsy cizezsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution~

(I think we need another PR to improve the case which two file have the same name, but we can get this merged now :P

@cizezsy cizezsy merged commit a29a9d5 into jenkinsci:dev Mar 1, 2020
@bgreen1005
Copy link
Contributor Author

Great thanks! Yeah I agree. We could do with being able to differentiate between potential duplicates. How often do you release?

@cizezsy
Copy link
Contributor

cizezsy commented Mar 9, 2020

@bgreen1005 I will perform a release this week :)

private Map<String, FilePath> createSourceFileMapping(FilePath workspace, TaskListener listener) {
try {
return Arrays
.stream(workspace.list("**/*"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why list all files in the workspace (which would include compiled binaries, node_modules, etc.) rather than only those referenced by the coverage report(s) being processed? See #150 for something that seems to be caused by this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at this more closely, you will see that it is executed if the source file we are looking for was not found in the usual way.

And this code resolves source files not only for Cobertura, but also for formats like Jacoco, which does not have such a direct path to the source files.

I'll remove the warnings to don't disturb people

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as a fallback mechanism specifically -- and one that seems to be deliberately using just the filenames without paths -- it doesn't seem worth the noise to have all these duplicates get logged. Otherwise you're logging warnings for a map that might not even get used (and ideally would not, if all went well).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking in the code and don't see this as a fallback, but directly called. Am I missing something.
I'm investigating why this plugin is taking over 6 hours in CoverageProcessor.convertToResults, while the Cobertura plugin runs in only 21 seconds. ( I know it's this part of the code per the log out time start of "Publishing Coverage report..." vs the next output of "A total of X reports were found".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that mapping is always constructed -- which may be the cause of your problem -- but is only used if it cannot find a match from the file names/paths in the coverage reports. I believe that's why it was being described as a "fallback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants