-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build custom buffered input stream that can reuse tmp read buffers across indexing ops #302
Conversation
I'll add more commits addressing other allocation(s) hot paths, but probably these were happening because of this big byte[] allocations, leaving scarse bandwith for further allocations, ending up allocating in a new TLAB (or out of the TLAB). I'll provided a JMH bench and/or an end 2 end results if required/possible |
The relevant stack traces I've found are: that seems to match the assumption of the same nope. |
I'd like to build jandex locally with this change to see if I can see improvements on deployment time of https://github.com/scottmarlow/tribe-krd-quarkus/tree/wildfly. I tried building with different JDK versions (jdk 1.6/1.8/11 but still get failure like:
Any suggestions on how to avoid ^ failure? Thanks! |
Thanks @jamezp for pointing out that Maven 3.9 has ^ problem (3.8.1 works fine now). Also thanks for suggesting other solution of |
Yeah, I need to attend to #298 some day, this is pretty unfortunate :-/ When it comes to We could have an |
Today I am on PTO but yes, that would be the perfect deal - and would help the hibernate case as well. |
Enjoy your PTO! :-) I'm on vacation most of next week (except Mon), but feel free to ping me the week after that. And thanks for looking into this -- I didn't have much time (and I lack the experience 😆), so the only thing I noticed is that a lot of time is spent on |
@Ladicek Not a big fan of what's I've done here, but let me provide some context 74f33e9 allows users of With this solution:
The reason why it will use |
Okay, this is pretty nice actually! On the API front, I think we can hide the "memento object" by providing an If you're fine with the improvements now, I can take it from here and massage the API a little bit per above. |
I would like to run few benchs and see if together with a modified version of hibernate I'm getting the expected speedup. |
Okay, just let me know when you think this is ready. For my testing, I assembled a somewhat bigger JAR (160K classes, 250 MB) and indexing that shows |
I believe the reason for that is due to the many intermediate obj (with their buffers) created to decode in what (very likely) is just a latin |
9ee451e
to
ba1bea2
Compare
I've further extended the pool concept, but is very tied to the way we use the tmp buffers for indexing, and some are not as safe as I wish The easier solution is to cleanup everything when borrowed again, but it will cause some perf hit (to be measured). |
Thank you! As mentioned above, I'm on PTO starting tomorrow until end of week, so I'll get to this next week. I'd also like to learn more about performance work, so I'll be happy to help with this (once I'm back). |
return new DataInputStream(new ByteArrayInputStream(pool, pos, len + 2)).readUTF(); | ||
} | ||
|
||
private static String tryDecodeAsciiEntry(byte[] chars, int pos, int len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ladicek
this has been benchmarked in undertow-io/undertow#1424 vs a simple copy loop vs different JDK versions - BUT both wildfly and quarkus maybe would run non warmed up code (i.e. C1 or interpreted) - and having a micro bench here was seems a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Jdk 11 will be the main version we can use VarHandle to batch read from the byte[] in one go (that will be hell faster!)
9e8f0ae
to
aef0444
Compare
@scottmarlow I should add few tests but this could be used already as it is, improving the wildfly use case at while should have a new field private ClassDescriptor toClassDescriptor(ArchiveEntry entry) {
try (InputStream inputStream = entry.getStreamAccess().accessInputStream()) {
// DELETED -> Indexer indexer = new Indexer();
// REPLACED WITH AN INDEXER REUSING BUFFERS
Indexer indexer = indexerFactory.newIndexer();
ClassSummary classSummary = indexer.indexWithSummary( inputStream );
Index index = indexer.complete();
return toClassDescriptor( classSummary, index, entry );
}
catch (IOException e) {
throw new ArchiveException( "Could not build ClassInfo", e );
}
} @Ladicek can confirm that's the right way to do it or not... @scottmarlow please remember that a |
1bd32fc
to
3971a5f
Compare
I've sadly added 2 In some subsequent commits I would like to remove all the field loads for the fields (by passing them as method parameters): private byte[] constantPool;
private int[] constantPoolOffsets;
private byte[] constantPoolAnnoAttrributes; in many tiny methods, because it:
|
3971a5f
to
3a94312
Compare
3a94312
to
2dec148
Compare
Good news! The allocation profiling data for this PR are very promising: the last addressable thing is that I was hoping was addressed by reusing the same |
ed54c4c
to
7b01188
Compare
1ecf650
to
469e447
Compare
The last versions of the changes (that include a different growing strategy that mimic what array list does) reports, for wildfly startup in a dummy project an allocation reductions of 1.2 GB from 3.9 GB i.e. new allocations are 2.68 GB and almost all the out of TLAB allocations are gone as well. CPU-wise I think that if @Ladicek got something that can be turned into a micro/smoke perf test for this that heavily make use of |
I've prepared a microbenchmark that load a big jar (~25 MB) and place it into a with the changes in the PR:
before the changes in the PR:
The microbenchmark is using What I've discovered so far is that:
Below the flamegraph of a benchmark run: it shows the The suggestion would be to have separate factory method for intern pools hiding an enum field that can be used as n hint to always apply (per-type) a specific equals/hash method expecting the right type(s) upfront, hence saving type checks to discover it each time. |
I'm closing this, because I just submitted #303, which uses most of the commits from here, except:
I also added a few commits of my own, created with a lot of @franz1981's help -- thanks! Together, I've observed roughly 25% speedup on indexing a 256 MB JAR containing more than 160K classes. Decent improvement for now, and I need to focus on other things as well :-) |
Deloying a wildfly instance with many JPA models shows that buffered files reads allocate, by default, a 8K byte[] buffer per each indexed file, causing many out of TLAB allocations.
Ideally we could reuse such buffer across different indexed ops, saving zeroing and TLAB bandwidth that could be used elsewhere.
I'm assuming 2 things:
Indexer
to be reusedIndexer
is a single-threaded class that can just borrow/release the same buffer again and again without being bothered by concurrency issues@Ladicek these are valid assumptions?