Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8x speed-up by buffering of InputStream during reading of uncompressed files #49

Merged
merged 1 commit into from Sep 3, 2019

Conversation

JonStargaryen
Copy link
Member

I ran some benchmarks for the BinaryCIF project/format and in comparison the Java implementation of the MMTF codec was surprisingly slow. Especially when uncompressed (non-gzipped) files were processed. Find benchmark details in the RCSB internal ciftools-performance repo.

By employing a BufferedInputStream with 65536 buffer size the performance can be improved drastically, resulting in a traversal of the currently 154k structures in 70 s (10 minutes with the current code).

For comparison, read times for BinaryCIF and mmCIF parsing are given (which should be slower due to higher overhead). A performance increase for gzipped files can be expected by using a GZIPInputStream with an equally sized buffer of 65536 (in contrast to the default buffer of 512 bytes).

performance plot

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 81.926% when pulling 7960934 on JonStargaryen:master into 59287c5 on rcsb:master.

Copy link
Member

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thank you!

@pwrose
Copy link
Collaborator

pwrose commented Jul 30, 2019 via email

@josemduarte
Copy link
Member

@JonStargaryen do you know if this is this still relevant when running under java 11+ JRE?

If so I'd like to merge and release a new bugfix as soon as possible.

@JonStargaryen
Copy link
Member Author

@josemduarte Yeah, it's still an issue on Java 11. I didn't run it with warm-up iterations or redundancy though but the trend is clear.

Here the times to read the archive:

Benchmark Mode
MMTF explicitly buffered 71.851 s/op
MMTF current impl 547.395 s/op

Probably a good idea to release a new version on Maven before releasing BioJava.

@josemduarte
Copy link
Member

Thanks, @JonStargaryen !

I'll go ahead and make a new release today

@josemduarte josemduarte merged commit 8c59b29 into rcsb:master Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants