Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow schema validation #327

Closed
mauromol opened this issue Apr 7, 2021 · 2 comments · Fixed by #332
Closed

Slow schema validation #327

mauromol opened this issue Apr 7, 2021 · 2 comments · Fixed by #332

Comments

@mauromol
Copy link
Contributor

mauromol commented Apr 7, 2021

I'm experiencing very slow schema validation these days by com.onelogin.saml2.util.Util.validateXML(Document, URL).
I tried to debug and I see the validator is waiting on a socket read.
If I'm not missing something, it seems like to me the following files are downloaded from the remote host on every validation request:

All these files are in java-saml-core/src/main/resources/schemas, along with SAML schemas. However, while local SAML schemas seems to be actually used on schema validation, the above ones seem to be downloaded from the remote host, as I said.

I do not have experience in this area, but I think this could be solved by using a proper XMLCatalogResolver as the XMLEntityResolver (but I don't know where it should be set) or a LSResourceResolver set on the javax.xml.validation.SchemaFactory used to parse the Schema instance (it's not clear to me the difference and if this would be the right approach), so that those schemas are loaded from the local resources rather than from the remote host.

What do you think?

@mauromol
Copy link
Contributor Author

mauromol commented Apr 9, 2021

What perplexes me is that, when running unit tests, such validation is indeed fast. The invoked method is the same (com.onelogin.saml2.settings.Saml2Settings.validateMetadata(String), for instance, when I validate metadata), but it seems like that, when I invoke it from a consuming application, validation becomes slow because imported schemas are downloaded from the Internet.
Just a guess: perhaps some kind of "default" resolution of imported schemas works with local files when they are all located in the same file system directory (i.e.: when invoked by unit tests), while it fails when such schemas should be loaded from the java-saml JAR, although being located in the same package (i.e.: when invoked from my consuming application).

@mauromol mauromol changed the title Slow metadata validation Slow schema validation Apr 13, 2021
@mauromol
Copy link
Contributor Author

mauromol commented Apr 14, 2021

I intercepted the HTTP requests and these are the remote requests that I see are being made when validating the SP metadata from my consuming application:

The DTDs are requested because they are referenced by the online version of xenc-schema.xsd, which has the following header section that seems to be stripped out from the local xenc-schema.xsd in java-saml classpath:

<!DOCTYPE schema  PUBLIC "-//W3C//DTD XMLSchema 200102//EN"
 "http://www.w3.org/2001/XMLSchema.dtd"
 [
   <!ATTLIST schema
     xmlns:xenc CDATA #FIXED 'http://www.w3.org/2001/04/xmlenc#'
     xmlns:ds CDATA #FIXED 'http://www.w3.org/2000/09/xmldsig#'>
   <!ENTITY xenc 'http://www.w3.org/2001/04/xmlenc#'>
   <!ENTITY % p ''>
   <!ENTITY % s ''>
  ]>

I confirm that such remote calls are not being made when executing unit tests locally and indeed, debugging the code, I see that DTDs are not requested in this case, as a consequence of the local xenc-schema.xsd being used.

I'm experimenting now with a mechanism that should ensure a local resolution of all schemas and DTDs (and fall back to the current behavior in the unlikely case local resolution fails) even when java-saml is packaged as a JAR in a consuming application.

mauromol added a commit to mauromol/java-saml that referenced this issue Apr 14, 2021
Schemas in main/resources were not correctly used on schema loading
when java-saml was used as a JAR in a consuming application. It seems
like the local XSD files for imported schemas were used only when
running unit tests, while remote HTTP lookups from the W3C website were
made when using java-saml as a JAR. Now a LSResoureResolver is set on
the schema factory so that any known XSD or DTD is loaded from the
classpath, even when inside a JAR. Any other (unknown) schema is
resolved in the standard way (and may involve a remote call). Also, in
the unlikely event that retrieving the local copy of the XSD/DTD is
impossible, a fallback mechanism ensures the standard resolution is
performed.

Please note that the online version of xenc-schema.xsd contains a
reference to the XML Schema DTD. Now that we can resolve resources
locally, I decided to keep the DTD and include it in /schemas package
(along with the datatypes DTD). This should provide an even more
comprehensive schema validation.

Closes SAML-Toolkits#327.
mauromol added a commit to mauromol/java-saml that referenced this issue Apr 14, 2021
Schemas in main/resources were not correctly used on schema loading
when java-saml was used as a JAR in a consuming application. It seems
like the local XSD files for imported schemas were used only when
running unit tests, while remote HTTP lookups from the W3C website were
made when using java-saml as a JAR. Now a LSResoureResolver is set on
the schema factory so that any known XSD or DTD is loaded from the
classpath, even when inside a JAR. Any other (unknown) schema is
resolved in the standard way (and may involve a remote call). Also, in
the unlikely event that retrieving the local copy of the XSD/DTD is
impossible, a fallback mechanism ensures the standard resolution is
performed.

Please note that the online version of xenc-schema.xsd contains a
reference to the XML Schema DTD. Now that we can resolve resources
locally, I decided to keep the DTD and include it in /schemas package
(along with the datatypes DTD). This should provide an even more
comprehensive schema validation.

Closes SAML-Toolkits#327.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant