Skip to content

Conversation

@maffelbaffel
Copy link

@maffelbaffel maffelbaffel commented Feb 2, 2019

  • Created static default fields for DocumentBuilderFactory, DocumentBuilder and LSParser.
    These will be used, if no other factory or resolver is specified.

While doing some performance tests in our codebase, I notice that DOMHandle#receiveContent takes up about 50% of a request to MarkLogic. In particular calling createLSParser and newDocumentBuilder took most of the time. This PR tries to reuse static fields of DocumentBuilderFactory, DocumentBuilder and LSParser to speed things up.

patched.zip

@ehennum
Copy link
Contributor

ehennum commented Feb 4, 2019

Thanks for isolating a potential hotspot.

One concern -- as I understand it, DOM implementations are not required to be thread safe. I'm wondering whether it might be best to wrap the DocumentBuilder and LSParser in a ThreadLocal.

One footnote: unless it's necessary to buffer an in-memory document, other XML representations (such as StAX or a JAXP StreamSource) might be more efficient than DOM.

By the way, have we asked for a contributor agreement?

http://developer.marklogic.com/products/cla

@robsman
Copy link
Contributor

robsman commented Feb 5, 2019

Maybe #477 may be leveraged here. This introduced an XmlFactories class that handles a per-thread singleton for the StAX XMLOutputFactory factory class.

Potentially, you may add similar suppliers for the DOM factories as well.

@maffelbaffel
Copy link
Author

I agree, a ThreadLocal definitely makes sense. I will look into this the next couple of days.

@maffelbaffel
Copy link
Author

I added a per thread caching factory for DocumentBuilder similar to XMLFactories. DOMHandle now uses a DocumentBuilder through this Factory. Not sure if a LSParser also needs to be in a ThreadLocal? In my current implementation it gets created the first time receiveContent is called and then reused. So each DOMHandle has its own LSParser. Should be enough? Correct me if I'm wrong 👍

Creating a new DocumentBuilder and LsParser each time
DOMHandle#receiveContent gets called is very time consuming.
Fix this by using a per thread cached DocumentBuilder in
DomHandle. LSParser is created the first time receiveContent is called
and then reused.

Extracted CachedInstancePerThreadSupplier to be able to reuse it in
DocumentBuilderFactories.

Fixes #1054.

Signed-off-by: Wagner Michael <maffelbaffel@posteo.de>
Copy link
Contributor

@ehennum ehennum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me -- especially refactoring out CachedInstancePerThreadSupplier as a generally reusable class.

My preference would be to put the DOM factories in the same class as the other XML factories for easier comparison and maintenance in parallel, but that's just a preference.

Regarding whether LSParser should be thread local, I would think that's only necessary if constructing the LSParser was also a performance hotspot in your testing, making caching advisable. Otherwise, caching the DocumentBuilder would be enough.

If I read the history correctly, @robsman did the initial work on the thread local with soft references, so it would be good to get his review as well if he has time.

Finally, thanks for identifying and solving this problem -- we'll all get the benefit.

@maffelbaffel
Copy link
Author

Regarding whether LSParser should be thread local, I would think that's only necessary if constructing the LSParser was also a performance hotspot in your testing, making caching advisable. Otherwise, caching the DocumentBuilder would be enough.

Caching it would also speed up queries which do not reuse a DOMHandle or use .getAs(class) instead of .get(handle).get(). This might actually speed up queries i did not look at right now. I will look into caching LSParser again next week (as I am quite busy and low on free time right now).

@maffelbaffel
Copy link
Author

maffelbaffel commented Feb 11, 2019

Alright, I added another commit which adds threadlocal caching for LSParser. Because a LSParser is dependent and created through a DocumentBuilder, I could not use the exising CachedInstancePerThreadSupplier class.

There now is a new CachedInstancePerThreadFunction class which provides cached values similar to CachedInstancePerThreadSupplier. The difference here is that internally a Map is used. The Function's parameter hashcode is used as a key for that map. This way every call to cachedLsParser.get(docBuilder) returns a specific cached LSParser for that passed docBuilder.

Also, I merged the Factories into XmLFactories. I also think it makes more sense to keep them together.

This second commit again speeds up various queries in our codebase where we did not re-use a DOMHandle.

Hope that makes sense 👍
I did not yet squash the two commits so you can more easily review the commit. I will squash them if you want to.

* Moved DocumentBuilderFactories to existing XmlFactories.
* Created class CachedInstancePerThreadFunction similar to CachedInstancePerThreadSupplier.
* Re-using thread-local cached instances of LSParser in DOMHandle provided by XmlFactories.

This results in faster receiveContent calls which do not re-use a DOMHandle, but
rather use a own instance in each api call.

Fixes #1054.

Signed-off-by: Wagner Michael <maffelbaffel@posteo.de>
@ehennum
Copy link
Contributor

ehennum commented Feb 11, 2019

@maffelbaffel , thanks for the thoughtfulness invested in this pull request.

A belated question occurred to me: did your performance testing indicate how much of the DOM initialization cost is consumed in the creation of the DOMImplementationLS object?

In other words, if we had a threadsafe cache for the DOMImplementationLS object, would that improve the efficiency of both reading and writing of Document objects without requiring a separate cache for the LSParser?

@maffelbaffel
Copy link
Author

By DOM initialization cost do you mean this line?
DOMImplementationLS domImpl = (DOMImplementationLS) documentBuilder.getDOMImplementation();
At least in the java runtime this returns a static singleton, so there is not much to gain by caching this object, I guess.

The most time-consuming calls by far where creating a new DocumentBuilder (factory.newDocumentBuilder() - line)and creating a LSParser (domImpl.createLSParser - line). Also really time-consuming is the actual parsing through parser.parse(domInput), but I don't think we can make this faster without changing to a streaming StAX implementation. I don't know actually if that would make a huge difference, though.

@maffelbaffel
Copy link
Author

maffelbaffel commented Feb 11, 2019

I added another patched.zip which shows the current performance flamegraph (just drop the file in your browser to navigate through the flamegraph). DOMHandle.receiveContent now consists of only parsing.

I did not yet try writing with a DOMHandle, but we might also have the possiblility to improve by caching domImpl.createLSSerializer().

Should I check that too?

@ehennum
Copy link
Contributor

ehennum commented Feb 12, 2019

I should have been more clear -- I was wondering whether it would make sense to cache DOMImplementationLS instead of DocumentBuilder because both the LSParser and LSSerializer are created from the DOMImplementationLS object. I would expect the LSSerializer is pretty lightweight.

Anyway, I'll compare and contrast the two flamegraphs.

@ehennum
Copy link
Contributor

ehennum commented Feb 12, 2019

In comparing the first and second flame graph, I didn't see a big difference in the overall profile. However, I'm not a performance engineer specialist.

As usual, the engineering concern would be the balance between optimization and maintainability. If the LSParser caching doesn't make a big difference, maybe it would be better to cache one object. If we do cache one object, maybe that object should be the DOMImplementationLS.

By the way, are you using the search summary provided by SearchHandle or simply using the query to retrieve documents? If the latter, DocumentManager.search() is likely to be faster because it transports documents as parts in a multipart instead of extracting them from an XML payload.

+1 to your earlier observation about the DOM having a reputation for being slow. If you don't need to keep the document in memory, SAX or STaX might be faster if you need to extract values and Reader or InputStream faster if you just need to send the bits somewhere.

@maffelbaffel
Copy link
Author

Yes I can understand that especially the second commit adds up complexity for only a few improvements performance wise. I definitely want to get this right in a way both of us are happy 👍
Caching the DOMImplementationLS might be optimization enough for now.

I wonder, because documentBuilder.getDOMImplementation() returns a static singleton, do we even need to cache this in a threadlocal? If DOMImplementationLS would not be threadsafe returning a static singleton would already be quite an error of the java api.

@ehennum
Copy link
Contributor

ehennum commented Feb 13, 2019

I think I see what you're saying.

The DOM implementation is strange in that the application pays for initializing the DocumentBuilder object (which is itself unsafe) even though the DOMImplementationLS object must be threadsafe.

So, if we get the DOMImplementationLS on first use and cache it in a static, we should be okay (and only pay the initialization cost of the DocumentBuilder once even though we don't cache it).

While that depends on the current JDK implementation, the DOM implementation seems very stable. If it is ever revised, perhaps the revisers will take thread safety into consideration.

Okay, I'm persuaded.

@rjrudin
Copy link
Contributor

rjrudin commented Oct 24, 2022

@maffelbaffel Apologies for what has obviously been a long delay in any response. Is this PR still of value to you? And if so, do you happen to have any test for verifying the performance improvement? Or could that simply be done by e.g. reading 1k rows via the old DOMHandle and via the DOMHandle in your PR?

@rjrudin
Copy link
Contributor

rjrudin commented Nov 1, 2022

Closing this, as the source repository is no longer accessible. Will revisit if we hear of performance issues with DOMHandle in the future.

@rjrudin rjrudin closed this Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants