XMLReaders creates XSDVALIDATING unnecessarily #136

Closed
ilm-informatique opened this Issue Jun 26, 2014 · 4 comments

Comments

Projects
None yet
4 participants
@ghost

ghost commented Jun 26, 2014

Hi,

I investigated a performance regression in our application when upgrading to JDOM2 : calling new SAXBuilder() for the first time went from 15ms to 700ms. It turns out that it calls XMLReaders.NONVALIDATING and since XMLReaders is an enum it also constructs the other members. And XSDVALIDATING takes a long time when xerces 2.9.1 is in the classpath. So even though we're never going to validate anything using XSD we're forced to take a 700ms hit (short of patching Xerces).
Other problems with XMLReaders :

  1. A complex lookup is used for XSDVALIDATING (see SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)) what if when the first call to SAXBuilder is made the lookup isn't right ? The VM needs to be restarted since XMLReaders is immutable.
  2. What if one wants to switch validation engines (by modifying the lookup) ?
  3. Conceptually XMLReaders isn't a defined closed set, there could be other validations (e.g. RelaxNG). So it shouldn't be an enum.
  4. SAXParserFactory is not deemed thread-safe, so it should be synchronized

I've patched XMLReaders today (June, 26th 2014) to try to address the aforementioned problems. ant junit only reports 3 errors for TestStAXOutputter2Writer and TestStAXReader2Writer.

Thougths about the patch : perhaps the NONVALIDATING and DTDVALIDATING fields should also be encapsulated. XMLReaders shouldn't be plural.

https://gist.github.com/ilm-informatique/57ba6c76bf332c3409e1

@rolfl

This comment has been minimized.

Show comment
Hide comment
@rolfl

rolfl Jun 26, 2014

Collaborator

There is a lot to go through here. Changing the ENUM to remove the memeber is not likely a viable option (backward compatibilty breaks). But, there are other alternatives. I am going to have to study your code and understand how you measured the 700ms, etc.

Collaborator

rolfl commented Jun 26, 2014

There is a lot to go through here. Changing the ENUM to remove the memeber is not likely a viable option (backward compatibilty breaks). But, there are other alternatives. I am going to have to study your code and understand how you measured the 700ms, etc.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 27, 2014

I only use XMLReaders indirectly through SAXBuilder, so another solution is to create new constants (for no validation and DTD validation) to use in SAXBuilder constructors. My main point is to only initialize XSD validation if needed. That being said you should at least synchronize access to jaxpfactory.
The 700ms are almost wholly in Apache code, but even if it was less or by another provider, I don't want to spend time and memory on functionality not needed in my app.

ghost commented Jun 27, 2014

I only use XMLReaders indirectly through SAXBuilder, so another solution is to create new constants (for no validation and DTD validation) to use in SAXBuilder constructors. My main point is to only initialize XSD validation if needed. That being said you should at least synchronize access to jaxpfactory.
The 700ms are almost wholly in Apache code, but even if it was less or by another provider, I don't want to spend time and memory on functionality not needed in my app.

@openconcerto

This comment has been minimized.

Show comment
Hide comment
@openconcerto

openconcerto Jun 30, 2014

Backward compatibilty or not, the performance issue is huge. I saw the exact same issue on our application.
To be able to switch the validation mode without restarting could be a neat progress.

Thanks for the patch!
Hoping this patch will be merged soon.

Backward compatibilty or not, the performance issue is huge. I saw the exact same issue on our application.
To be able to switch the validation mode without restarting could be a neat progress.

Thanks for the patch!
Hoping this patch will be merged soon.

@ArnaudC

This comment has been minimized.

Show comment
Hide comment
@ArnaudC

ArnaudC Aug 9, 2014

Hi,
I think SAXBuilder shouldn't block all if the host is not found !
If you are disconnected, it makes an exception and even with :

SAXBuilder builder = new SAXBuilder(XMLReaders.NONVALIDATING);
Document doc = builder.build(new File("f.xml"));

ArnaudC commented Aug 9, 2014

Hi,
I think SAXBuilder shouldn't block all if the host is not found !
If you are disconnected, it makes an exception and even with :

SAXBuilder builder = new SAXBuilder(XMLReaders.NONVALIDATING);
Document doc = builder.build(new File("f.xml"));

@rolfl rolfl added the Target 2.0.x label Feb 13, 2015

rolfl added a commit that referenced this issue Feb 15, 2015

Fixes #136 - Make the XMLReader implementations Lazy-initialize.
Originally the enum would initialize all three factories on firs use.
Now the initialization will happen on a per-factory basis. It is still
a lazy singleton concept. On my computer this saves as much as 200ms on
JDOM startup

@rolfl rolfl closed this in d7a0ec2 Feb 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment