-
Notifications
You must be signed in to change notification settings - Fork 41.5k
Description
Iterating over org.springframework.boot.loader.jar.JarFile#entries()
is very slow. The successive calls to JarFileEntries#getEntry
by EntryIterator#next
are causing the underlying RandomAccessFile
to jump back and forth in order to re-read the file headers from the central directory of the jar file. This could be much faster if JarFileEntries#visitFileHeader
would store the complete FileHeader
instead of just it's offset and the hash of its name.
Some Background
While working on joinfaces/joinfaces#565 i noticed that ClassGraph is about 500ms slower when scanning a repackaged Spring Boot application than scanning the same application in its unpacked form. So I had a look at the code of ClassGraph and saw that it extracted all the nested jars in order to scan them. Hoping to improve the performance of scanning nested jars, I prepared the following patch to use the JarFile implementation of spring-boot-loader
in order to avoid the extra cost of extracting all the nested jars:
https://github.com/larsgrefer/classgraph/compare/ba4c69347eaf915571e9f5142e09f7a481471570...cfb317aff4d6949afbddcc1fd0ad78b118ef52ec?expand=1
Surprisingly this approach is even a bit slower, so I dug deeper into the code and traced the performance difference down to the Iteration done here: https://github.com/classgraph/classgraph/blob/b170d2bebb871824f7d53d54aa7a9b6939f25cf0/src/main/java/io/github/classgraph/utils/JarfileMetadataReader.java#L148
In my tests, iterating over a org.springframework.boot.loader.jar.JarFile
is about 5 to 10 times slower than iterating over a java.util.zip.ZipFile
. Ths performance impact is so severe that its even faster to extract the nested jar first, to be able to use java.util.zip.ZipFile
Conclusion
After I read the commit message of e2368b9 it seems to me, that memory efficiency is more important for you than iteration performance. So my question is if you would accept a pull request which changes the current behavior or allows ClassGraph to change the behavior of the JarFileEntries
implementation and how this PR should look like.