New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow schema validation #327
Comments
What perplexes me is that, when running unit tests, such validation is indeed fast. The invoked method is the same ( |
I intercepted the HTTP requests and these are the remote requests that I see are being made when validating the SP metadata from my consuming application:
The DTDs are requested because they are referenced by the online version of <!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSchema 200102//EN"
"http://www.w3.org/2001/XMLSchema.dtd"
[
<!ATTLIST schema
xmlns:xenc CDATA #FIXED 'http://www.w3.org/2001/04/xmlenc#'
xmlns:ds CDATA #FIXED 'http://www.w3.org/2000/09/xmldsig#'>
<!ENTITY xenc 'http://www.w3.org/2001/04/xmlenc#'>
<!ENTITY % p ''>
<!ENTITY % s ''>
]> I confirm that such remote calls are not being made when executing unit tests locally and indeed, debugging the code, I see that DTDs are not requested in this case, as a consequence of the local I'm experimenting now with a mechanism that should ensure a local resolution of all schemas and DTDs (and fall back to the current behavior in the unlikely case local resolution fails) even when java-saml is packaged as a JAR in a consuming application. |
Schemas in main/resources were not correctly used on schema loading when java-saml was used as a JAR in a consuming application. It seems like the local XSD files for imported schemas were used only when running unit tests, while remote HTTP lookups from the W3C website were made when using java-saml as a JAR. Now a LSResoureResolver is set on the schema factory so that any known XSD or DTD is loaded from the classpath, even when inside a JAR. Any other (unknown) schema is resolved in the standard way (and may involve a remote call). Also, in the unlikely event that retrieving the local copy of the XSD/DTD is impossible, a fallback mechanism ensures the standard resolution is performed. Please note that the online version of xenc-schema.xsd contains a reference to the XML Schema DTD. Now that we can resolve resources locally, I decided to keep the DTD and include it in /schemas package (along with the datatypes DTD). This should provide an even more comprehensive schema validation. Closes SAML-Toolkits#327.
Schemas in main/resources were not correctly used on schema loading when java-saml was used as a JAR in a consuming application. It seems like the local XSD files for imported schemas were used only when running unit tests, while remote HTTP lookups from the W3C website were made when using java-saml as a JAR. Now a LSResoureResolver is set on the schema factory so that any known XSD or DTD is loaded from the classpath, even when inside a JAR. Any other (unknown) schema is resolved in the standard way (and may involve a remote call). Also, in the unlikely event that retrieving the local copy of the XSD/DTD is impossible, a fallback mechanism ensures the standard resolution is performed. Please note that the online version of xenc-schema.xsd contains a reference to the XML Schema DTD. Now that we can resolve resources locally, I decided to keep the DTD and include it in /schemas package (along with the datatypes DTD). This should provide an even more comprehensive schema validation. Closes SAML-Toolkits#327.
I'm experiencing very slow schema validation these days by
com.onelogin.saml2.util.Util.validateXML(Document, URL)
.I tried to debug and I see the validator is waiting on a socket read.
If I'm not missing something, it seems like to me the following files are downloaded from the remote host on every validation request:
All these files are in java-saml-core/src/main/resources/schemas, along with SAML schemas. However, while local SAML schemas seems to be actually used on schema validation, the above ones seem to be downloaded from the remote host, as I said.
I do not have experience in this area, but I think this could be solved by using a proper
XMLCatalogResolver
as theXMLEntityResolver
(but I don't know where it should be set) or aLSResourceResolver
set on thejavax.xml.validation.SchemaFactory
used to parse theSchema
instance (it's not clear to me the difference and if this would be the right approach), so that those schemas are loaded from the local resources rather than from the remote host.What do you think?
The text was updated successfully, but these errors were encountered: