-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression fixes #174
Performance regression fixes #174
Conversation
…ce_regression2 Conflicts: actors/core/src/main/java/cloud/orbit/actors/runtime/DefaultClassDictionary.java
@lukehutch seeing pretty bad performance with fast-classpath-scanner in our integration environment when everything is packaged into jars etc. scanning is taking up to a minute, when run from the IDE it's maybe a few seconds.. maybe I am using the API incorrectly or potentially some recursive scanning bug? |
59f7457
to
0364ff2
Compare
Interesting, scanning should never take that long in any reasonable classpath. Please call .verbose() before scanning, and compare the output. There is scan time info in the log. Also you can compare the classpath output in the log by the different scans. Also please let me know if the performance regression happened between 1.9.18 and 1.9.19, there were some major internal changes in 1.9.19. |
N.B. the change between 1.9.18 and 1.9.19 could cause slowdown in cases where you have a lot of implemented interfaces/superinterfaces and a lot of subclasses of a given class. If you find that downgrading to 1.9.18 fixes the performance regression, I'll optimize that part of the code. |
1.9.20 FastClasspathScanner: *** Time taken by .scan(): 78806 ms *** |
@johnou Strange, do you have those numbers backwards? I thought you were saying that 1.9.20 was slower than older versions? But the numbers show it's 35% faster. I just released 1.9.21, which should be significantly faster than 1.9.20 in cases where there are classes with many subclasses, some or all of which implement many interfaces. |
One curious thing I have noticed is that somewhere it must be resolving a symlink but then double scanning jars.. FastClasspathScanner: Scanned jarfile /home/xxx/builds/develop-SNAPSHOT/libs/spring-tx-4.2.6.RELEASE.jar in 1369 msec /home/xxx/builds/latest -> /home/xxx/builds/develop-SNAPSHOT |
@johnou if classes with the same name are listed twice on the classpath, the second and subsequent instance of that class name is shadowed by the first (meaning the class is not scanned, in keeping with the Java classpath spec). This is why the second jar in your example is scanned in 2 msec -- all the classfiles in the zipfile directory listing were already scanned the first time the jar was listed in the classpath. (This is why it took 1369 msec the first time, the actual classfile contents were read, not just the zipfile header.) |
In response to "I thought you were saying that 1.9.20 was slower than older versions? But the numbers show it's 35% faster." Eugh, my apologies, bad wording on my part. Regression with Orbit bootstrap after swapping over to fast class path scanner. |
@johnou: Several things:
Thanks! |
|
@lukehutch comparing IDE and integration verbose output I can see that in integration it is logging lines like the following which are not present in my IDE.. FastClasspathScanner: Class cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.LocalCacheFactory$SILiSMWW has a field with type or type parameter cloud.orbit.runtime.shaded.com.github.benmanes.caffeine.cache.WriteOrderDeque |
I see, thanks for explaining, and for testing 1.9.21. Is there anything you can change about the scanning spec (whitelist / blacklist specification) to speed up scanning in Orbit? You're probably scanning a lot of stuff you don't need to. It still concerns me though that the scanning time is so different between your IDE and your integration environment. How is the classpath different between those two scenarios? It also concerns me that the total scanning time is 70 seconds. That's a lot of time even if you're trying to read a few hundred jarfiles. Your classpath must be enormous! What's the approximate number of jarfiles on your classpath, and the total compressed size of the jars? The log entries you pasted are from 1.9.21, not from git master. If you use the version in git master, and produce verbose logs from the IDE and the integration environment, you should be able to more easily spot the difference between the two environments. |
Yes configurable scanning spec is possible but out of the box configuration is entire classpath and I am pretty sure it was not taking this long with Guava, I will build a snapshot artifact of fastclasspath now. |
I'd be very interested to see the runtimes for Guava, if you can run that again. Do you know if Guava is searching just a subset of the classpath? Is there a way to generate debug output there, to compare? I don't have experience with Guava. It's possible Guava is using a faster unzip library than the default Java one -- that is known to be very slow, and it is not thread-safe. I have considered replacing the library, but haven't done it yet. I'll create a bug for that, and will take a look at what Guava uses. What specific methods are you using in FastClasspathScanner? Are you looking just for annotations, or something else? |
Deploying snapshot artifact to integration environment now, use case is that we need to find all interfaces extending cloud.orbit.actors.Actor and then find all concrete classes implementing those interfaces. Current source I am running for that can be found here in the class constructor https://github.com/johnou/orbit/blob/0364ff25d147588d5cc5495135797af03a9e615c/actors/runtime/src/main/java/cloud/orbit/actors/runtime/DefaultActorClassFinder.java I will try and find some time to reimplement the ActorClassFinder with guava and compare. |
From IDE: |
Integration environment: |
Wow, big difference. Can you please include a pastebin of the logs from these two environments? |
I need to head out for lunch now but is there an email I can send the contents? May be some information I don't want everyone to see :) |
Yes, my gmail address is luke.hutch. I'm about to send a comment with a probable fix though (couple of mins) |
OK, I see a couple of problems with the implementation that will be causing a lot of the problems: (1): MatchProcessors are all run in the same pass. So you can't add one MatchProcessor (here, the one that populates clazzInterfaces), and then depend on its result in a second MatchProcessor within the same scan operation (here, the one with the loop "for (Class<?> clazzInterface : clazzInterfaces)"). Consider what this is doing: every time the first MatchProcessor adds an Actor to clazzInterfaces, the second MatchProcessor has one extra item to iterate through when it hits an Actor. So you actually created an O(N^2) algorithm. Additionally, it will fail to find some subclasses, because clazzInterfaces won't be fully populated before the first time the second MatchProcessor is triggered. The quick solution to this would be to create two FastClasspathScanner instances, and run the first scan with the first MatchProcessor, then when it has finished, run the second scan with the second MatchProcessor. However, there is a much easier way!: (2): You don't need to find all subinterfaces of an interface before searching for each of the subclasses of those subinterfaces. FastClasspathScanner already does that for you. You only need to match classes that implement Actor. If a class implements a subinterface of Actor, then it still implements Actor. Here you can see the code that finds all subinterfaces of the given interface you're trying to match, then takes a union of all the subclasses of each class that implements the interface: https://github.com/lukehutch/fast-classpath-scanner/blob/master/src/main/java/io/github/lukehutch/fastclasspathscanner/classgraph/ClassGraphBuilder.java#L243 So all you need to do is use One more thing though: I'm still concerned about the difference in "Built class graph in" times in the two instances -- they should be approximately the same, given that the number of classfiles parsed is approximately the same. So I'd still be curious to see those logs. Thanks! |
So based on a quick cut of the unnecessary code, it looks like here's all you need to do:
Using the above code, please show me the scanning times again for the git snapshot you built, and for 1.9.18. And please send these two logs via pastebin (not the logs generated from the original O(N^2) code). I'll use this to verify that there's not something funky going on that is adding a slowdown in the classgraph building stage. Thank you! |
So after lunch I took your advice and revisited the class finder.. public DefaultActorClassFinder(final String... actorBasePackages)
{
String[] scanSpec = extractScanSpec(actorBasePackages);
List<Class<?>> clazzImplementations = new ArrayList<>();
long start = System.currentTimeMillis();
FastClasspathScanner scanner = new FastClasspathScanner(scanSpec).matchClassesImplementing(Actor.class, candidate -> {
if (candidate.getSimpleName().toLowerCase(Locale.ENGLISH).startsWith("abstract"))
{
return;
}
clazzImplementations.add(candidate);
});
scanner.verbose().scan();
for (Class<?> clazzImplementation : clazzImplementations)
{
Class<?>[] implementationInterfaces = clazzImplementation.getInterfaces(); // check interfaces directly to support inheritance
if (implementationInterfaces.length == 0)
{
return;
}
for (Class<?> implementationInterface : implementationInterfaces)
{
if (implementationInterface == Remindable.class)
{
continue;
}
if (Actor.class.isAssignableFrom(implementationInterface))
{
Class<?> old = concreteImplementations.put(implementationInterface, clazzImplementation);
if (old != null)
{
logger.warn("Multiple actor implementations found for " + implementationInterface);
}
}
}
}
if (logger.isDebugEnabled())
{
logger.debug("Took " + (System.currentTimeMillis() - start) + "ms to scan for Actor implementations.");
}
} |
Thanks for all the efforts @johnou and @lukehutch. I know you still plan to try out the latest snapshot etc, please feel free to just open a new PR, any further improvement would be greatly appreciated. Just wanted to get out the great improvements you've already made to our users. |
The main remaining optimization I can implement is by switching to something that can unzip faster than java.util.zip (I'm hoping NIO will do this). But don't get too hopeful ;) I think FastClasspathScanner is now running within a small constant factor of the fastest theoretical limit. It's conceivable there's something else funky going on though, so I'll check the next set of logs. The evidence points to a heavily-loaded integration server. The main way you can speed up scanning is by proper use of whitelists and blacklists. |
Must be hitting some IO limit on the integration environment.. 2016-06-25T00:12Z FastClasspathScanner Finished .scan() in 32.100488 sec vs my rig |
specifying package on my rig |
code that calls Guava.. https://github.com/orbit/orbit/pull/175/files |
OK, so only considering your rig, is it true that the ratio in time taken to scan on the IDE to the time taken in the integration environment is roughly the same as the ratio in the number of classes scanned in each environment? i.e. is the average time taken to scan one class roughly the same in both environments on your rig? |
From IDE to cli on the same machine there was only a few seconds difference but it has an SSD drive so cannot be compared directly to the integration environment. I'd still like to try it on Linux with symlinks and a "normal" drive to really replicate the integration environment. |
@lukehutch apologies for not getting back to you sooner. I managed to setup a new environment identical to the integration, there is definitely something fishy with classpaths which include symlinks.. direct with verbose symlink with verbose [johno@dev-1 fs]$ java -cp "build/libs/:build/server/test-service.jar" OrbitStarter |
updated timetaken when piping output directly into file when in verbose mode. |
[johno@dev-1 fs]$ wc -l symlink-verbose.log |
Classpath elements contains both symlinks and the resolved path, must be scanning every artifact twice. io.github.lukehutch.fastclasspathscanner.classpath.ClasspathFinder#urlToPath looks interesting but I haven't had time to investigate further. |
OK, I just pushed a change to FastClasspathScanner that resolves symlinks, and then ensures that files, directories and jars are only read once. This should cause the time for ide vs. integration to match on your machine. Can you please verify this? (I haven't pushed out a new release yet, so you'll need to build FastClasspathScanner from source.) (The reason I wasn't originally resolving symlinks is that the directory structure has to match the package structure in Java, and this can break after resolving symlinks. Now I resolve them once, just to check for duplicates, which should fix your usecase.) The next release will hopefully scan zipfiles faster, assuming that the NIO unzip code is faster than java.util.zip (I haven't finished this change yet though, so I can't test if it's faster). |
wrt NIO unzip code, might be a good idea to keep an eye on memory too? |
@lukehutch that's it! Latest snapshot of FastClasspathScanner no longer takes double~ the scan time when classpath uses symlinks. |
Good call. You have a much larger classpath than I will likely ever work I made changes a few commits ago that dramatically decrease the memory There will be two separate NIO commits: one to use NIO FileSystems (and a On Sat, Jul 2, 2016 at 3:50 AM, Johno Crawford notifications@github.com
|
@johnou Excellent! Thanks for reporting this, and thanks for all the testing, sending me logs etc. |
I'll push out a new release with these changes and the single-threaded NIO unzip rework in the next couple of weeks, then the multithreaded scanning support will go into version 2.0, since that will require MatchProcessors to add "synchronized" if they are not threadsafe (which is a slightly dangerous change to make -- or maybe I'll require a separate ".parallel()" call like the Java 8 Streams API to explicitly enable parallel scanning). |
Absolutely, just ping me when you make the NIO commit. Really appreciate the effort you are putting into the library. |
I just published FastClasspathScanner 1.10.0, which switches to (single-threaded) NIO scanning of jarfiles. In my own testing (with a small classpath), it seems a bit faster than the java.util.zip scanner, but it would be great if you could please let me know if it's faster for you. Please also let me know if you see any regressions, since this was a pretty major change. |
Sorry, make that 1.10.3 :-) |
To my surprise single-threaded NIO seems to be performing worse than the previous releases. prenio without package nio without package |
I spent many hours trying to figure out why the NIO version was slower, and as far as I can tell, the NIO code triggers some sort of slow path in the Linux kernel when NativeThread.current() calls pthread_self(). This can add 40% overhead per file unzipped. I filed an OpenJDK bug on this. While debugging this though, I realized that the NIO code for ZipFileSystem actually uses java.util.zip under the hood, so there is actually no benefit to using NIO. Additionally, I found that some of the newer Java 7 APIs, such as Path.toUri(), and some of the static Files methods, are horribly inefficient when used with ZipFileSystem. Consequently, I reverted many of my recent changes, and then optimized what remained using a profiler. Scanning should be fast again now, as of 1.10.4 (just released). Please test 1.10.4, and let me know how it works for you. It should be no slower than 1.9.21, and may be a little faster on average. (The only additional thing that can be done is to implement parallel scanning. This will be complicated, but I think I have figured out a way to do this, so hopefully that will be in a future release.) |
@johnou FastClasspathScanner now supports parallel scanning (in the latest version in git, and also released in 1.90.0). This new version is probably operating at close to the theoretical limit for how fast classpath scanning can go, given that we're overlapping reading from storage with decompression (i.e. I believe we're limited by disk read speeds beyond this point). I would appreciate your testing of this code -- you should see some good speedups. In my own testing, 1.90.0 is 30-60% faster than the older single-threaded version (1.9.21) for a moderately large classpath (generally it's 30% faster for the first scan, and 60% faster for subsequent scans once the files are all cached). This is achieved using 5 threads, which gave the best speedup on my machine (my machine should be fairly typical of a modern laptop or desktop). Increasing the number of threads causes slowdown due to filing system resource contention. The threads are all about 50% busy decompressing data, and spend 50% of their time waiting for the next chunk to arrive to be decompressed. (You can tweak the number of threads if you want by changing |
@johnou I have spent a lot of time profiling FastClasspathScanner, and fixed all the profiling hotspots I could find. Performance in 1.90.5 should work well for you. Please let me know how it works in your environment. |
@lukehutch smashed it! I think the library really lives up to its name now. Did you try different executors? At the moment number of threads can be configured but maybe custom executor service would be a nice touch, I would be interested in knowing how fork join pool compares, in fact fork join pool common pool might be a suitable default? Assuming it's not slow compared to fixed thread pool of cpu count plus 2.. |
Thanks for testing, what times (or speedups) are you seeing now? I do see a large overhead in the profiler from using the executor service, but I don't know that switching to a different executor will help -- hard to say if the overhead is thread overhead or service overhead. The threading is simple enough though that maybe I'll just switch to using bare threads. Using a dynamic number of threads won't probably help though -- for a given architecture and a large number of typical jars and classfiles, there will by a best value for NUM_THREADS, and it won't vary much with workload. (Work stealing etc. won't help either.) |
I will run more tests this afternoon, travelling at the moment and development from a tablet sucks. |
I found a way to replace the ExecutorService with a custom multithreading approach that increases scanning speed by a further 8% or so. It is in 1.90.6. |
No description provided.