Skip to content

Iterating over JarFile.entries() is very slow. #15096

@larsgrefer

Description

@larsgrefer

Iterating over org.springframework.boot.loader.jar.JarFile#entries() is very slow. The successive calls to JarFileEntries#getEntry by EntryIterator#next are causing the underlying RandomAccessFile to jump back and forth in order to re-read the file headers from the central directory of the jar file. This could be much faster if JarFileEntries#visitFileHeader would store the complete FileHeader instead of just it's offset and the hash of its name.

Some Background

While working on joinfaces/joinfaces#565 i noticed that ClassGraph is about 500ms slower when scanning a repackaged Spring Boot application than scanning the same application in its unpacked form. So I had a look at the code of ClassGraph and saw that it extracted all the nested jars in order to scan them. Hoping to improve the performance of scanning nested jars, I prepared the following patch to use the JarFile implementation of spring-boot-loader in order to avoid the extra cost of extracting all the nested jars:
https://github.com/larsgrefer/classgraph/compare/ba4c69347eaf915571e9f5142e09f7a481471570...cfb317aff4d6949afbddcc1fd0ad78b118ef52ec?expand=1

Surprisingly this approach is even a bit slower, so I dug deeper into the code and traced the performance difference down to the Iteration done here: https://github.com/classgraph/classgraph/blob/b170d2bebb871824f7d53d54aa7a9b6939f25cf0/src/main/java/io/github/classgraph/utils/JarfileMetadataReader.java#L148

In my tests, iterating over a org.springframework.boot.loader.jar.JarFile is about 5 to 10 times slower than iterating over a java.util.zip.ZipFile. Ths performance impact is so severe that its even faster to extract the nested jar first, to be able to use java.util.zip.ZipFile

Conclusion

After I read the commit message of e2368b9 it seems to me, that memory efficiency is more important for you than iteration performance. So my question is if you would accept a pull request which changes the current behavior or allows ClassGraph to change the behavior of the JarFileEntries implementation and how this PR should look like.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions