Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression fixes #174

Merged
merged 15 commits into from
Jun 24, 2016

Conversation

johnou
Copy link

@johnou johnou commented Jun 21, 2016

No description provided.

@johnou
Copy link
Author

johnou commented Jun 22, 2016

@lukehutch seeing pretty bad performance with fast-classpath-scanner in our integration environment when everything is packaged into jars etc. scanning is taking up to a minute, when run from the IDE it's maybe a few seconds.. maybe I am using the API incorrectly or potentially some recursive scanning bug?

@johnou johnou force-pushed the feature/performance_regression2 branch from 59f7457 to 0364ff2 Compare June 22, 2016 14:48
@lukehutch
Copy link

Interesting, scanning should never take that long in any reasonable classpath. Please call .verbose() before scanning, and compare the output. There is scan time info in the log. Also you can compare the classpath output in the log by the different scans. Also please let me know if the performance regression happened between 1.9.18 and 1.9.19, there were some major internal changes in 1.9.19.

@lukehutch
Copy link

N.B. the change between 1.9.18 and 1.9.19 could cause slowdown in cases where you have a lot of implemented interfaces/superinterfaces and a lot of subclasses of a given class. If you find that downgrading to 1.9.18 fixes the performance regression, I'll optimize that part of the code.

@johnou
Copy link
Author

johnou commented Jun 23, 2016

1.9.20 FastClasspathScanner: *** Time taken by .scan(): 78806 ms ***
1.9.18 FastClasspathScanner: *** Time taken by .scan(): 121116 ms ***

@lukehutch
Copy link

@johnou Strange, do you have those numbers backwards? I thought you were saying that 1.9.20 was slower than older versions? But the numbers show it's 35% faster.

I just released 1.9.21, which should be significantly faster than 1.9.20 in cases where there are classes with many subclasses, some or all of which implement many interfaces.

@johnou
Copy link
Author

johnou commented Jun 23, 2016

One curious thing I have noticed is that somewhere it must be resolving a symlink but then double scanning jars..

FastClasspathScanner: Scanned jarfile /home/xxx/builds/develop-SNAPSHOT/libs/spring-tx-4.2.6.RELEASE.jar in 1369 msec
FastClasspathScanner: Scanned jarfile /home/xxx/builds/latest/libs/spring-tx-4.2.6.RELEASE.jar in 2 msec

/home/xxx/builds/latest -> /home/xxx/builds/develop-SNAPSHOT

@lukehutch
Copy link

@johnou if classes with the same name are listed twice on the classpath, the second and subsequent instance of that class name is shadowed by the first (meaning the class is not scanned, in keeping with the Java classpath spec). This is why the second jar in your example is scanned in 2 msec -- all the classfiles in the zipfile directory listing were already scanned the first time the jar was listed in the classpath. (This is why it took 1369 msec the first time, the actual classfile contents were read, not just the zipfile header.)

@johnou
Copy link
Author

johnou commented Jun 23, 2016

In response to "I thought you were saying that 1.9.20 was slower than older versions? But the numbers show it's 35% faster."

Eugh, my apologies, bad wording on my part. Regression with Orbit bootstrap after swapping over to fast class path scanner.

@lukehutch
Copy link

lukehutch commented Jun 23, 2016

@johnou: Several things:

  1. I don't understand your last comment -- please elaborate? I meant that the time you listed for 1.9.20 (78806 ms) is 35% faster than the time you listed for 1.9.18 (121116 ms). So you're not saying FastClasspathScanner got slower than it used to be, you're only saying that since you switched to using it (from something else), you noticed that there was a big difference in speed between running in the IDE and running in the integration environment?
  2. Please get version 1.9.21 (or the latest version in git), and let me know the total runtime for this version too, without .verbose() enabled. I introduced some performance enhancements in 1.9.21.
  3. Then please get the latest version in git (which has some big improvements to logging), and enable .verbose(), then save the stderr output from both the IDE environment and your integration environment. Either post the output here, or you could compare the two outputs yourself to see where the time is being taken up.

Thanks!

@johnou
Copy link
Author

johnou commented Jun 23, 2016

  1. My first message to you was worded badly, sounded like there was a regression in fastclasspath library but what I meant is bootstrap time for Orbit.
  2. No, I am pretty sure 1.9.20 scanned the classpath faster but I can compare that again.
  3. 1.9.21 FastClasspathScanner: *** Time taken by .scan(): 70711 ms ***
  4. Looking into this now.

@johnou
Copy link
Author

johnou commented Jun 23, 2016

@lukehutch comparing IDE and integration verbose output I can see that in integration it is logging lines like the following which are not present in my IDE..

FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSMWW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteOrderDeque
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSMWW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteOrderDeque
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSMWW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.Node
FastClasspathScanner: Found class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSMWWR with superclass cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSMWW
FastClasspathScanner: Found class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSR with superclass cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiS
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSR has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.Ticker
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSR has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteBuffer
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSR has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteBuffer
FastClasspathScanner: Found class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSW with superclass cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiS
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.Ticker
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteOrderDeque
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteOrderDeque
FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSW has a field with type or type parameter

@lukehutch
Copy link

I see, thanks for explaining, and for testing 1.9.21.

Is there anything you can change about the scanning spec (whitelist / blacklist specification) to speed up scanning in Orbit? You're probably scanning a lot of stuff you don't need to.

It still concerns me though that the scanning time is so different between your IDE and your integration environment. How is the classpath different between those two scenarios?

It also concerns me that the total scanning time is 70 seconds. That's a lot of time even if you're trying to read a few hundred jarfiles. Your classpath must be enormous! What's the approximate number of jarfiles on your classpath, and the total compressed size of the jars?

The log entries you pasted are from 1.9.21, not from git master. If you use the version in git master, and produce verbose logs from the IDE and the integration environment, you should be able to more easily spot the difference between the two environments.

@johnou
Copy link
Author

johnou commented Jun 23, 2016

Yes configurable scanning spec is possible but out of the box configuration is entire classpath and I am pretty sure it was not taking this long with Guava, I will build a snapshot artifact of fastclasspath now.

@lukehutch
Copy link

I'd be very interested to see the runtimes for Guava, if you can run that again. Do you know if Guava is searching just a subset of the classpath? Is there a way to generate debug output there, to compare? I don't have experience with Guava.

It's possible Guava is using a faster unzip library than the default Java one -- that is known to be very slow, and it is not thread-safe. I have considered replacing the library, but haven't done it yet. I'll create a bug for that, and will take a look at what Guava uses.

What specific methods are you using in FastClasspathScanner? Are you looking just for annotations, or something else?

@johnou
Copy link
Author

johnou commented Jun 23, 2016

Deploying snapshot artifact to integration environment now, use case is that we need to find all interfaces extending cloud.orbit.actors.Actor and then find all concrete classes implementing those interfaces. Current source I am running for that can be found here in the class constructor https://github.com/johnou/orbit/blob/0364ff25d147588d5cc5495135797af03a9e615c/actors/runtime/src/main/java/cloud/orbit/actors/runtime/DefaultActorClassFinder.java I will try and find some time to reimplement the ActorClassFinder with guava and compare.

@johnou
Copy link
Author

johnou commented Jun 23, 2016

From IDE:
2016-06-23T10:33Z FastClasspathScanner - Built class graph in 0.129220 sec
2016-06-23T10:33Z FastClasspathScanner Finished .scan() in 7.662825 sec
2016-06-23T10:33Z FastClasspathScanner Parsed 28776 classfiles on classpath

@johnou
Copy link
Author

johnou commented Jun 23, 2016

Integration environment:
2016-06-23T10:35Z FastClasspathScanner - Built class graph in 5.819232 sec
2016-06-23T10:35Z FastClasspathScanner Finished .scan() in 87.235730 sec
2016-06-23T10:35Z FastClasspathScanner Parsed 34628 classfiles on classpath

@lukehutch
Copy link

Wow, big difference. Can you please include a pastebin of the logs from these two environments?

@johnou
Copy link
Author

johnou commented Jun 23, 2016

I need to head out for lunch now but is there an email I can send the contents? May be some information I don't want everyone to see :)

@lukehutch
Copy link

Yes, my gmail address is luke.hutch. I'm about to send a comment with a probable fix though (couple of mins)

@lukehutch
Copy link

lukehutch commented Jun 23, 2016

OK, I see a couple of problems with the implementation that will be causing a lot of the problems:

(1): MatchProcessors are all run in the same pass. So you can't add one MatchProcessor (here, the one that populates clazzInterfaces), and then depend on its result in a second MatchProcessor within the same scan operation (here, the one with the loop "for (Class<?> clazzInterface : clazzInterfaces)").

Consider what this is doing: every time the first MatchProcessor adds an Actor to clazzInterfaces, the second MatchProcessor has one extra item to iterate through when it hits an Actor. So you actually created an O(N^2) algorithm. Additionally, it will fail to find some subclasses, because clazzInterfaces won't be fully populated before the first time the second MatchProcessor is triggered.

The quick solution to this would be to create two FastClasspathScanner instances, and run the first scan with the first MatchProcessor, then when it has finished, run the second scan with the second MatchProcessor. However, there is a much easier way!:

(2): You don't need to find all subinterfaces of an interface before searching for each of the subclasses of those subinterfaces. FastClasspathScanner already does that for you. You only need to match classes that implement Actor. If a class implements a subinterface of Actor, then it still implements Actor. Here you can see the code that finds all subinterfaces of the given interface you're trying to match, then takes a union of all the subclasses of each class that implements the interface: https://github.com/lukehutch/fast-classpath-scanner/blob/master/src/main/java/io/github/lukehutch/fastclasspathscanner/classgraph/ClassGraphBuilder.java#L243

So all you need to do is use .matchClassesImplementing(Actor.class, c -> /*...*/).

One more thing though: I'm still concerned about the difference in "Built class graph in" times in the two instances -- they should be approximately the same, given that the number of classfiles parsed is approximately the same. So I'd still be curious to see those logs. Thanks!

@lukehutch
Copy link

So based on a quick cut of the unnecessary code, it looks like here's all you need to do:

        FastClasspathScanner scanner = new FastClasspathScanner(scanSpec).matchClassesImplementing(Actor.class, clazzImplementation -> {
            if (clazzImplementation.getSimpleName().toLowerCase(Locale.ENGLISH).startsWith("abstract"))
            {
                return;
            }
            Class<?>[] implementationInterfaces = clazzImplementation.getInterfaces(); // check interfaces directly to support inheritance
            if (implementationInterfaces.length == 0)
            {
                return;
            }
            Class<?> old = concreteImplementations.put(clazzInterface, clazzImplementation);
            if (old != null)
            {
                logger.warn("Multiple actor implementations found for " + clazzInterface);
            }
        });
        scanner.scan();

Using the above code, please show me the scanning times again for the git snapshot you built, and for 1.9.18. And please send these two logs via pastebin (not the logs generated from the original O(N^2) code). I'll use this to verify that there's not something funky going on that is adding a slowdown in the classgraph building stage. Thank you!

@johnou
Copy link
Author

johnou commented Jun 23, 2016

So after lunch I took your advice and revisited the class finder..

    public DefaultActorClassFinder(final String... actorBasePackages)
    {
        String[] scanSpec = extractScanSpec(actorBasePackages);

        List<Class<?>> clazzImplementations = new ArrayList<>();
        long start = System.currentTimeMillis();
        FastClasspathScanner scanner = new FastClasspathScanner(scanSpec).matchClassesImplementing(Actor.class, candidate -> {
            if (candidate.getSimpleName().toLowerCase(Locale.ENGLISH).startsWith("abstract"))
            {
                return;
            }
            clazzImplementations.add(candidate);
        });
        scanner.verbose().scan();
        for (Class<?> clazzImplementation : clazzImplementations)
        {
            Class<?>[] implementationInterfaces = clazzImplementation.getInterfaces(); // check interfaces directly to support inheritance
            if (implementationInterfaces.length == 0)
            {
                return;
            }
            for (Class<?> implementationInterface : implementationInterfaces)
            {
                if (implementationInterface == Remindable.class)
                {
                    continue;
                }
                if (Actor.class.isAssignableFrom(implementationInterface))
                {
                    Class<?> old = concreteImplementations.put(implementationInterface, clazzImplementation);
                    if (old != null)
                    {
                        logger.warn("Multiple actor implementations found for " + implementationInterface);
                    }
                }
            }
        }
        if (logger.isDebugEnabled())
        {
            logger.debug("Took " + (System.currentTimeMillis() - start) + "ms to scan for Actor implementations.");
        }
    }

@JoeHegarty JoeHegarty merged commit b80819e into orbit:master Jun 24, 2016
@JoeHegarty
Copy link
Member

JoeHegarty commented Jun 24, 2016

Thanks for all the efforts @johnou and @lukehutch.
I've merged the change as it is a vast improvement over what we had in 0.9.3.

I know you still plan to try out the latest snapshot etc, please feel free to just open a new PR, any further improvement would be greatly appreciated. Just wanted to get out the great improvements you've already made to our users.

@lukehutch
Copy link

lukehutch commented Jun 24, 2016

The main remaining optimization I can implement is by switching to something that can unzip faster than java.util.zip (I'm hoping NIO will do this). But don't get too hopeful ;) I think FastClasspathScanner is now running within a small constant factor of the fastest theoretical limit.

It's conceivable there's something else funky going on though, so I'll check the next set of logs. The evidence points to a heavily-loaded integration server.

The main way you can speed up scanning is by proper use of whitelists and blacklists.

@johnou
Copy link
Author

johnou commented Jun 25, 2016

Must be hitting some IO limit on the integration environment..

2016-06-25T00:12Z FastClasspathScanner Finished .scan() in 32.100488 sec
2016-06-25T00:12Z FastClasspathScanner Parsed 34705 classfiles on classpath

vs my rig
2016-06-25T00:26Z FastClasspathScanner Finished .scan() in 9.521939 sec
2016-06-25T00:26Z FastClasspathScanner Parsed 34705 classfiles on classpath

@johnou
Copy link
Author

johnou commented Jun 25, 2016

specifying package on my rig
2016-06-25T00:36Z FastClasspathScanner Finished .scan() in 1.212041 sec
2016-06-25T00:36Z FastClasspathScanner Parsed 2950 classfiles on classpath

@johnou
Copy link
Author

johnou commented Jun 25, 2016

That's quite surprising (is it Guava that does that, or the code that calls Guava?)

code that calls Guava.. https://github.com/orbit/orbit/pull/175/files

@lukehutch
Copy link

OK, so only considering your rig, is it true that the ratio in time taken to scan on the IDE to the time taken in the integration environment is roughly the same as the ratio in the number of classes scanned in each environment? i.e. is the average time taken to scan one class roughly the same in both environments on your rig?

@johnou
Copy link
Author

johnou commented Jun 25, 2016

From IDE to cli on the same machine there was only a few seconds difference but it has an SSD drive so cannot be compared directly to the integration environment. I'd still like to try it on Linux with symlinks and a "normal" drive to really replicate the integration environment.

@johnou johnou deleted the feature/performance_regression2 branch June 26, 2016 08:53
@johnou
Copy link
Author

johnou commented Jun 29, 2016

@lukehutch apologies for not getting back to you sooner. I managed to setup a new environment identical to the integration, there is definitely something fishy with classpaths which include symlinks..

direct with verbose
2016-06-29T09:27Z FastClasspathScanner Finished .scan() in 8.249713 sec
2016-06-29T09:27Z FastClasspathScanner Parsed 34736 classfiles on classpath

symlink with verbose
2016-06-29T09:27Z FastClasspathScanner Finished .scan() in 14.809781 sec
2016-06-29T09:27Z FastClasspathScanner Parsed 34736 classfiles on classpath

[johno@dev-1 fs]$ java -cp "build/libs/:build/server/test-service.jar" OrbitStarter
2016-06-29 12:12:19,610 DEBUG ForkJoinPool.commonPool-worker-1 Took 4524ms to scan for Actor implementations.
[johno@dev-1 fs]$ java -cp "build/libs/
:build/server/test-service.jar" OrbitStarter
2016-06-29 12:12:27,188 DEBUG ForkJoinPool.commonPool-worker-1 Took 4414ms to scan for Actor implementations.
[johno@dev-1 fs]$ java -cp "build/libs/:build/server/test-service.jar" OrbitStarter
2016-06-29 12:12:33,058 DEBUG ForkJoinPool.commonPool-worker-1 Took 4326ms to scan for Actor implementations.
[johno@dev-1 fs]$ java -cp "latest/libs/
:latest/server/test-service.jar" OrbitStarter
2016-06-29 12:12:50,408 DEBUG ForkJoinPool.commonPool-worker-1 Took 6983ms to scan for Actor implementations.
[johno@dev-1 fs]$ java -cp "latest/libs/:latest/server/test-service.jar" OrbitStarter
2016-06-29 12:12:58,971 DEBUG ForkJoinPool.commonPool-worker-1 Took 6832ms to scan for Actor implementations.
[johno@dev-1 fs]$ java -cp "latest/libs/
:latest/server/test-service.jar" OrbitStarter
2016-06-29 12:13:07,376 DEBUG ForkJoinPool.commonPool-worker-1 Took 6897ms to scan for Actor implementations.
[johno@dev-1 fs]$ java -cp "build/libs/*:build/server/test-service.jar" OrbitStarter
2016-06-29 12:13:15,998 DEBUG ForkJoinPool.commonPool-worker-1 Took 4210ms to scan for Actor implementations.
[johno@dev-1 fs]$ ls -lah
total 52M
drwxr-xr-x 3 johno johno 4.0K Jun 29 12:11 .
drwxr-xr-x 323 johno root 20K Jun 29 12:11 ..
drwxr-xr-x 4 johno johno 4.0K Jun 29 10:01 build
lrwxrwxrwx 1 johno johno 5 Jun 29 10:01 latest -> build

@johnou
Copy link
Author

johnou commented Jun 29, 2016

updated timetaken when piping output directly into file when in verbose mode.

@johnou
Copy link
Author

johnou commented Jun 29, 2016

[johno@dev-1 fs]$ wc -l symlink-verbose.log
329568 symlink-verbose.log
[johno@dev-1 fs]$ wc -l direct-verbose.log
190003 direct-verbose.log

@johnou
Copy link
Author

johnou commented Jun 29, 2016

Classpath elements contains both symlinks and the resolved path, must be scanning every artifact twice.

io.github.lukehutch.fastclasspathscanner.classpath.ClasspathFinder#urlToPath looks interesting but I haven't had time to investigate further.

@lukehutch
Copy link

OK, I just pushed a change to FastClasspathScanner that resolves symlinks, and then ensures that files, directories and jars are only read once. This should cause the time for ide vs. integration to match on your machine. Can you please verify this? (I haven't pushed out a new release yet, so you'll need to build FastClasspathScanner from source.)

(The reason I wasn't originally resolving symlinks is that the directory structure has to match the package structure in Java, and this can break after resolving symlinks. Now I resolve them once, just to check for duplicates, which should fix your usecase.)

The next release will hopefully scan zipfiles faster, assuming that the NIO unzip code is faster than java.util.zip (I haven't finished this change yet though, so I can't test if it's faster).

@johnou
Copy link
Author

johnou commented Jul 2, 2016

wrt NIO unzip code, might be a good idea to keep an eye on memory too?

@johnou
Copy link
Author

johnou commented Jul 2, 2016

@lukehutch that's it! Latest snapshot of FastClasspathScanner no longer takes double~ the scan time when classpath uses symlinks.

@lukehutch
Copy link

Good call. You have a much larger classpath than I will likely ever work
with though, so it would be great if you could please test this once I make
the NIO commit.

I made changes a few commits ago that dramatically decrease the memory
requirements of FastClasspathScanner, but the Java unzip methods have their
own memory requirements (I assume these are not more than the zip directory
tree structure plus some buffers though).

There will be two separate NIO commits: one to use NIO FileSystems (and a
TreeWalker), which will greatly simplify the RecursiveScanner code. Then
one to multithread the NIO scanning (which will increase memory overhead
but decrease scan time.) Should get these done in the next 2-3 weeks or so.

On Sat, Jul 2, 2016 at 3:50 AM, Johno Crawford notifications@github.com
wrote:

wrt NIO unzip code, might be a good idea to keep an eye on memory too?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#174 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAxhKWCTWxS_Uz6FYU_2YC5N09ZrPzu5ks5qRkJ1gaJpZM4I6kYT
.

@lukehutch
Copy link

@johnou Excellent! Thanks for reporting this, and thanks for all the testing, sending me logs etc.

@lukehutch
Copy link

I'll push out a new release with these changes and the single-threaded NIO unzip rework in the next couple of weeks, then the multithreaded scanning support will go into version 2.0, since that will require MatchProcessors to add "synchronized" if they are not threadsafe (which is a slightly dangerous change to make -- or maybe I'll require a separate ".parallel()" call like the Java 8 Streams API to explicitly enable parallel scanning).

@johnou
Copy link
Author

johnou commented Jul 2, 2016

Absolutely, just ping me when you make the NIO commit. Really appreciate the effort you are putting into the library.

@lukehutch
Copy link

I just published FastClasspathScanner 1.10.0, which switches to (single-threaded) NIO scanning of jarfiles. In my own testing (with a small classpath), it seems a bit faster than the java.util.zip scanner, but it would be great if you could please let me know if it's faster for you. Please also let me know if you see any regressions, since this was a pretty major change.

@lukehutch
Copy link

lukehutch commented Jul 4, 2016

Sorry, make that 1.10.3 :-)

@johnou
Copy link
Author

johnou commented Jul 4, 2016

To my surprise single-threaded NIO seems to be performing worse than the previous releases.

prenio without package
2016-07-04 10:42:18,620 DEBUG ForkJoinPool.commonPool-worker-1 Took 3040ms to scan for Actor implementations.
2016-07-04 10:42:20,376 DEBUG ForkJoinPool.commonPool-worker-1 Took 1754ms to scan for Actor implementations.
2016-07-04 10:42:22,144 DEBUG ForkJoinPool.commonPool-worker-1 Took 1768ms to scan for Actor implementations.
2016-07-04 10:42:23,793 DEBUG ForkJoinPool.commonPool-worker-1 Took 1649ms to scan for Actor implementations.
2016-07-04 10:42:25,488 DEBUG ForkJoinPool.commonPool-worker-1 Took 1695ms to scan for Actor implementations.
2016-07-04 10:42:27,138 DEBUG ForkJoinPool.commonPool-worker-1 Took 1650ms to scan for Actor implementations.
2016-07-04 10:42:28,714 DEBUG ForkJoinPool.commonPool-worker-1 Took 1576ms to scan for Actor implementations.
2016-07-04 10:42:30,685 DEBUG ForkJoinPool.commonPool-worker-1 Took 1971ms to scan for Actor implementations.
2016-07-04 10:42:32,271 DEBUG ForkJoinPool.commonPool-worker-1 Took 1585ms to scan for Actor implementations.
2016-07-04 10:42:33,837 DEBUG ForkJoinPool.commonPool-worker-1 Took 1565ms to scan for Actor implementations.
2016-07-04 10:42:35,561 DEBUG ForkJoinPool.commonPool-worker-1 Took 1724ms to scan for Actor implementations.
2016-07-04 10:42:37,097 DEBUG ForkJoinPool.commonPool-worker-1 Took 1536ms to scan for Actor implementations.
2016-07-04 10:42:38,732 DEBUG ForkJoinPool.commonPool-worker-1 Took 1635ms to scan for Actor implementations.
2016-07-04 10:42:40,460 DEBUG ForkJoinPool.commonPool-worker-1 Took 1728ms to scan for Actor implementations.
2016-07-04 10:42:41,916 DEBUG ForkJoinPool.commonPool-worker-1 Took 1456ms to scan for Actor implementations.
2016-07-04 10:42:43,411 DEBUG ForkJoinPool.commonPool-worker-1 Took 1495ms to scan for Actor implementations.
2016-07-04 10:42:45,154 DEBUG ForkJoinPool.commonPool-worker-1 Took 1743ms to scan for Actor implementations.
2016-07-04 10:42:46,593 DEBUG ForkJoinPool.commonPool-worker-1 Took 1439ms to scan for Actor implementations.
2016-07-04 10:42:48,230 DEBUG ForkJoinPool.commonPool-worker-1 Took 1637ms to scan for Actor implementations.
2016-07-04 10:42:49,712 DEBUG ForkJoinPool.commonPool-worker-1 Took 1482ms to scan for Actor implementations.

nio without package
2016-07-04 10:39:23,973 DEBUG ForkJoinPool.commonPool-worker-1 Took 5027ms to scan for Actor implementations.
2016-07-04 10:39:26,993 DEBUG ForkJoinPool.commonPool-worker-1 Took 3018ms to scan for Actor implementations.
2016-07-04 10:39:29,873 DEBUG ForkJoinPool.commonPool-worker-1 Took 2880ms to scan for Actor implementations.
2016-07-04 10:39:32,666 DEBUG ForkJoinPool.commonPool-worker-1 Took 2793ms to scan for Actor implementations.
2016-07-04 10:39:35,361 DEBUG ForkJoinPool.commonPool-worker-1 Took 2695ms to scan for Actor implementations.
2016-07-04 10:39:38,073 DEBUG ForkJoinPool.commonPool-worker-1 Took 2712ms to scan for Actor implementations.
2016-07-04 10:39:40,913 DEBUG ForkJoinPool.commonPool-worker-1 Took 2840ms to scan for Actor implementations.
2016-07-04 10:39:43,687 DEBUG ForkJoinPool.commonPool-worker-1 Took 2774ms to scan for Actor implementations.
2016-07-04 10:39:46,446 DEBUG ForkJoinPool.commonPool-worker-1 Took 2758ms to scan for Actor implementations.
2016-07-04 10:39:49,301 DEBUG ForkJoinPool.commonPool-worker-1 Took 2855ms to scan for Actor implementations.
2016-07-04 10:39:52,189 DEBUG ForkJoinPool.commonPool-worker-1 Took 2888ms to scan for Actor implementations.
2016-07-04 10:39:55,034 DEBUG ForkJoinPool.commonPool-worker-1 Took 2845ms to scan for Actor implementations.
2016-07-04 10:39:57,857 DEBUG ForkJoinPool.commonPool-worker-1 Took 2823ms to scan for Actor implementations.
2016-07-04 10:40:00,710 DEBUG ForkJoinPool.commonPool-worker-1 Took 2853ms to scan for Actor implementations.
2016-07-04 10:40:03,575 DEBUG ForkJoinPool.commonPool-worker-1 Took 2865ms to scan for Actor implementations.
2016-07-04 10:40:06,447 DEBUG ForkJoinPool.commonPool-worker-1 Took 2872ms to scan for Actor implementations.
2016-07-04 10:40:09,360 DEBUG ForkJoinPool.commonPool-worker-1 Took 2913ms to scan for Actor implementations.
2016-07-04 10:40:12,010 DEBUG ForkJoinPool.commonPool-worker-1 Took 2650ms to scan for Actor implementations.
2016-07-04 10:40:14,720 DEBUG ForkJoinPool.commonPool-worker-1 Took 2710ms to scan for Actor implementations.
2016-07-04 10:40:17,455 DEBUG ForkJoinPool.commonPool-worker-1 Took 2735ms to scan for Actor implementations.

@lukehutch
Copy link

lukehutch commented Jul 5, 2016

I spent many hours trying to figure out why the NIO version was slower, and as far as I can tell, the NIO code triggers some sort of slow path in the Linux kernel when NativeThread.current() calls pthread_self(). This can add 40% overhead per file unzipped. I filed an OpenJDK bug on this.

While debugging this though, I realized that the NIO code for ZipFileSystem actually uses java.util.zip under the hood, so there is actually no benefit to using NIO. Additionally, I found that some of the newer Java 7 APIs, such as Path.toUri(), and some of the static Files methods, are horribly inefficient when used with ZipFileSystem. Consequently, I reverted many of my recent changes, and then optimized what remained using a profiler. Scanning should be fast again now, as of 1.10.4 (just released).

Please test 1.10.4, and let me know how it works for you. It should be no slower than 1.9.21, and may be a little faster on average.

(The only additional thing that can be done is to implement parallel scanning. This will be complicated, but I think I have figured out a way to do this, so hopefully that will be in a future release.)

@lukehutch
Copy link

lukehutch commented Jul 7, 2016

@johnou FastClasspathScanner now supports parallel scanning (in the latest version in git, and also released in 1.90.0). This new version is probably operating at close to the theoretical limit for how fast classpath scanning can go, given that we're overlapping reading from storage with decompression (i.e. I believe we're limited by disk read speeds beyond this point). I would appreciate your testing of this code -- you should see some good speedups.

In my own testing, 1.90.0 is 30-60% faster than the older single-threaded version (1.9.21) for a moderately large classpath (generally it's 30% faster for the first scan, and 60% faster for subsequent scans once the files are all cached). This is achieved using 5 threads, which gave the best speedup on my machine (my machine should be fairly typical of a modern laptop or desktop). Increasing the number of threads causes slowdown due to filing system resource contention. The threads are all about 50% busy decompressing data, and spend 50% of their time waiting for the next chunk to arrive to be decompressed.

(You can tweak the number of threads if you want by changing io.github.lukehutch.fastclasspathscanner.scanner.RecursiveScanner.NUM_THREADS, though I don't expect you'll get dramatically better performance for values other than the default of 5.)

@lukehutch
Copy link

lukehutch commented Jul 8, 2016

@johnou I have spent a lot of time profiling FastClasspathScanner, and fixed all the profiling hotspots I could find. Performance in 1.90.5 should work well for you. Please let me know how it works in your environment.

@johnou
Copy link
Author

johnou commented Jul 8, 2016

@lukehutch smashed it! I think the library really lives up to its name now. Did you try different executors? At the moment number of threads can be configured but maybe custom executor service would be a nice touch, I would be interested in knowing how fork join pool compares, in fact fork join pool common pool might be a suitable default? Assuming it's not slow compared to fixed thread pool of cpu count plus 2..

@lukehutch
Copy link

lukehutch commented Jul 8, 2016

Thanks for testing, what times (or speedups) are you seeing now?

I do see a large overhead in the profiler from using the executor service, but I don't know that switching to a different executor will help -- hard to say if the overhead is thread overhead or service overhead. The threading is simple enough though that maybe I'll just switch to using bare threads.

Using a dynamic number of threads won't probably help though -- for a given architecture and a large number of typical jars and classfiles, there will by a best value for NUM_THREADS, and it won't vary much with workload. (Work stealing etc. won't help either.)

@johnou
Copy link
Author

johnou commented Jul 8, 2016

I will run more tests this afternoon, travelling at the moment and development from a tablet sucks.

@lukehutch
Copy link

I found a way to replace the ExecutorService with a custom multithreading approach that increases scanning speed by a further 8% or so. It is in 1.90.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants