New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #249

Open
bhollis opened this Issue Jul 7, 2017 · 12 comments

Comments

Projects
None yet
8 participants
@bhollis

bhollis commented Jul 7, 2017

Hi there - I'm opening this in hopes I (and others) can be pointed towards areas of the codebase or concrete tasks that can be done to improve Spotbugs' performance going forward. Today (using FindBugs) on large codebases analysis can take in excess of 15 minutes on a powerful machine, so any improvement there would be a huge benefit.

@KengoTODA

This comment has been minimized.

Show comment
Hide comment
@KengoTODA

KengoTODA Jul 11, 2017

Member

Note that several ideas are listed in #128

Member

KengoTODA commented Jul 11, 2017

Note that several ideas are listed in #128

@KengoTODA

This comment has been minimized.

Show comment
Hide comment
@KengoTODA

KengoTODA Jul 11, 2017

Member

I personally believe that multi-threading should be the most effective way, because SpotBugs is CPU bound but it uses only one thread.

Member

KengoTODA commented Jul 11, 2017

I personally believe that multi-threading should be the most effective way, because SpotBugs is CPU bound but it uses only one thread.

@iloveeclipse

This comment has been minimized.

Show comment
Hide comment
@iloveeclipse

iloveeclipse Jul 11, 2017

Member

Please note that BCEL Repository class is static and AFAIK not MT safe, so multi-threading will most likely lead to unpredictable issues by design.

The best way to get performance boost is to get rid of BCEL, but this is nearly impossible.

Member

iloveeclipse commented Jul 11, 2017

Please note that BCEL Repository class is static and AFAIK not MT safe, so multi-threading will most likely lead to unpredictable issues by design.

The best way to get performance boost is to get rid of BCEL, but this is nearly impossible.

@KengoTODA

This comment has been minimized.

Show comment
Hide comment
@KengoTODA

KengoTODA Jul 11, 2017

Member

Ya we need many tasks to handle, including designing detector API which doesn't depend on BCEL & redesign SpotBugs core...

Member

KengoTODA commented Jul 11, 2017

Ya we need many tasks to handle, including designing detector API which doesn't depend on BCEL & redesign SpotBugs core...

@piderman314

This comment has been minimized.

Show comment
Hide comment
@piderman314

piderman314 Jul 11, 2017

Contributor

Project.addSourceDir is very inefficient if called multiple times in a row, since it creates a SourceFinder every call which in turn checks for all files in the source if they exist. This is quadratic complexity. This also happens when reading a findbugs result file for example in the Jenkins plugin. One of our projects had 10,000 sources files (listed separately, this is standard gradle Findbugs behaviour) resulting in Findbugs checking if files existed 50 million times :(

Contributor

piderman314 commented Jul 11, 2017

Project.addSourceDir is very inefficient if called multiple times in a row, since it creates a SourceFinder every call which in turn checks for all files in the source if they exist. This is quadratic complexity. This also happens when reading a findbugs result file for example in the Jenkins plugin. One of our projects had 10,000 sources files (listed separately, this is standard gradle Findbugs behaviour) resulting in Findbugs checking if files existed 50 million times :(

@iloveeclipse

This comment has been minimized.

Show comment
Hide comment
@iloveeclipse

iloveeclipse Jul 11, 2017

Member

Project.addSourceDir is very inefficient if called multiple times in a row

Please create a dedicated issue for that, and if you can, please contribute a patch :-)

Member

iloveeclipse commented Jul 11, 2017

Project.addSourceDir is very inefficient if called multiple times in a row

Please create a dedicated issue for that, and if you can, please contribute a patch :-)

@sewe

This comment has been minimized.

Show comment
Hide comment
@sewe

sewe Jul 11, 2017

Contributor

Ya we need many tasks to handle, including designing detector API which doesn't depend on BCEL & redesign SpotBugs core...

We already have a detector API that does not depend on BCEL: Detector2.

I’ve written some (unpublished) WALA-based detectors using it and so far am quite happy. WALA itself is thread-safe, but of course SpotBugs doesn‘t yet run multiple detectors simultaneously. Also, I haven’t yet thought very much about the thread-safety properties of the IAnalysisCache.

Contributor

sewe commented Jul 11, 2017

Ya we need many tasks to handle, including designing detector API which doesn't depend on BCEL & redesign SpotBugs core...

We already have a detector API that does not depend on BCEL: Detector2.

I’ve written some (unpublished) WALA-based detectors using it and so far am quite happy. WALA itself is thread-safe, but of course SpotBugs doesn‘t yet run multiple detectors simultaneously. Also, I haven’t yet thought very much about the thread-safety properties of the IAnalysisCache.

@iloveeclipse

This comment has been minimized.

Show comment
Hide comment
@iloveeclipse

iloveeclipse Jul 11, 2017

Member

Long time ago I've tried to parallelize the analysis but as said, the problem was BCEL static code cache. So I've tried to port BCEL parts to ASM, and this was a complete disaster (my attempt was very naive, without changing existing algorithms, just to do a naive replacement for BCEL classes via ASM Tree API).

So yes, Ideally one would detectors ony by one to use ASM but this is HUGE amount of work for existing detectors. Also removal of BCEL will break compatibility with almost every 3rd party FB detector.

Member

iloveeclipse commented Jul 11, 2017

Long time ago I've tried to parallelize the analysis but as said, the problem was BCEL static code cache. So I've tried to port BCEL parts to ASM, and this was a complete disaster (my attempt was very naive, without changing existing algorithms, just to do a naive replacement for BCEL classes via ASM Tree API).

So yes, Ideally one would detectors ony by one to use ASM but this is HUGE amount of work for existing detectors. Also removal of BCEL will break compatibility with almost every 3rd party FB detector.

@jsotuyod

This comment has been minimized.

Show comment
Hide comment
@jsotuyod

jsotuyod Jul 11, 2017

Member

Please note that BCEL Repository class is static and AFAIK not MT safe, so multi-threading will most likely lead to unpredictable issues by design.

org.apache.bcel.Repository just delegates to the implementation of org.apache.bcel.util.Repository (set through org.apache.bcel.Repository.setRepository), so we could theoretically make it thread-safe. We actually use a custom Repository, URLClassPathRepository, which we could make thread-safe.

The only non-atomic operations performed on the repository seem to be:

final JavaClass old = repository.findClass(clazz.getClassName());
repository.storeClass(clazz);

and

final ClassPath path = repository.getClassPath();
if (path == null) {
    return null;
}
return path.getClassFile(class_name);

Both seem manageable with proper implementations of our repository.

Given we are read-only on class files, this should be ok.

All databases are also shared and would therefore need to be thread-safe.

Moving SpotBugs to a multi-threaded environment will certainly be hard and challenging, but I don't foresee any technical impossibilities.

Member

jsotuyod commented Jul 11, 2017

Please note that BCEL Repository class is static and AFAIK not MT safe, so multi-threading will most likely lead to unpredictable issues by design.

org.apache.bcel.Repository just delegates to the implementation of org.apache.bcel.util.Repository (set through org.apache.bcel.Repository.setRepository), so we could theoretically make it thread-safe. We actually use a custom Repository, URLClassPathRepository, which we could make thread-safe.

The only non-atomic operations performed on the repository seem to be:

final JavaClass old = repository.findClass(clazz.getClassName());
repository.storeClass(clazz);

and

final ClassPath path = repository.getClassPath();
if (path == null) {
    return null;
}
return path.getClassFile(class_name);

Both seem manageable with proper implementations of our repository.

Given we are read-only on class files, this should be ok.

All databases are also shared and would therefore need to be thread-safe.

Moving SpotBugs to a multi-threaded environment will certainly be hard and challenging, but I don't foresee any technical impossibilities.

@patric-r

This comment has been minimized.

Show comment
Hide comment
@patric-r

patric-r Sep 29, 2017

Not sure about the state regarding the old findbugs issues and if this has already been addressed here but I'd like to point out an easy win regarding performance improvements in the eclipse plugin:

findbugsproject/findbugs#84

patric-r commented Sep 29, 2017

Not sure about the state regarding the old findbugs issues and if this has already been addressed here but I'd like to point out an easy win regarding performance improvements in the eclipse plugin:

findbugsproject/findbugs#84

@iloveeclipse

This comment has been minimized.

Show comment
Hide comment
@iloveeclipse

iloveeclipse Sep 29, 2017

Member

@patric-r : feel free to provide a PR. I already commented on original issue what need to be done.

Member

iloveeclipse commented Sep 29, 2017

@patric-r : feel free to provide a PR. I already commented on original issue what need to be done.

@shevek

This comment has been minimized.

Show comment
Hide comment
@shevek

shevek May 15, 2018

We are losing about 40 minutes to spotbugs in our gradle build. Threading would help, as we're on Xeon and about to move to Epyc.

Also, we find a very high penalty for accessing tree-like data structures because it looks to the Intel architecture like a random access order in memory, which can carry a 200:1 performance penalty, but is closer to 5:1 in our tree algorithm tests. The trick is to write detectors as coroutines and have a single tree-walk driver.

shevek commented May 15, 2018

We are losing about 40 minutes to spotbugs in our gradle build. Threading would help, as we're on Xeon and about to move to Epyc.

Also, we find a very high penalty for accessing tree-like data structures because it looks to the Intel architecture like a random access order in memory, which can carry a 200:1 performance penalty, but is closer to 5:1 in our tree algorithm tests. The trick is to write detectors as coroutines and have a single tree-walk driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment