Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly escape text to produce valid XML #315

Merged
merged 4 commits into from Jul 23, 2021

Conversation

mauromol
Copy link
Contributor

The generation methods for metadata, AuthnRequest, LogoutRequest and
LogoutResponse XML documents and messages have been changed so that any
text that could be specified by the user is properly escaped in order to
produce a valid XML output: characters like &, ", < and > are then
replaced with the corresponding XML entities (& ", <, >).

The commons-lang StringEscapeUtils class has been used, although being
deprecated: the deprecation warning suggests to switch to an equivalent
class of commons-text, which java-saml currently does not depend on. I
leave the decision to possibly switch to it for future evaluation:
the escaping logic was indeed implemented in the Util class, so any
implementation change of it just needs to change Util.toXml(String)
method.

Closes #309 and #305.

The generation methods for metadata, AuthnRequest, LogoutRequest and
LogoutResponse XML documents and messages have been changed so that any
text that could be specified by the user is properly escaped in order to
produce a valid XML output: characters like &, ", < and > are then
replaced with the corresponding XML entities (&amp; &quot;, &lt;, &gt;).

The commons-lang StringEscapeUtils class has been used, although being
deprecated: the deprecation warning suggests to switch to an equivalent
class of commons-text, which java-saml currently does not depend on. I
leave the decision to possibly switch to it for future evaluation:
the escaping logic was indeed implemented in the Util class, so any
implementation change of it just needs to change Util.toXml(String)
method.

Closes SAML-Toolkits#309 and SAML-Toolkits#305.
@pitbulk
Copy link
Contributor

pitbulk commented Jun 25, 2021

I don't think you need to escape objects like:
certString, nameIDFormat, requestedAuthnContextCmp, requestedAuthnContext, id, statusCode, generateUniqueID because we control the value of those attributes.

Related to the escape method, escapeXml is deprecated but it only tries to escaped the five basic XML entities (gt, lt, quot, amp, apos), which is what we need. Not sure if the use of escapeXml10 or escapeXml11, that escaped mor chars, can carry side effects.

@haavar
Copy link

haavar commented Jun 30, 2021

Any chance you could do a release once this is merged?

@mauromol
Copy link
Contributor Author

I don't think you need to escape objects like:
certString, nameIDFormat, requestedAuthnContextCmp, requestedAuthnContext, id, statusCode, generateUniqueID because we control the value of those attributes.

I can't check the code right now, but are we really sure that those settings are fully under our control? I'm thinking about the dynamic settings use case. I went for the safe way, wherever there are strings involved I escape them. In this way we should protect ourselves against malicious settings.

Related to the escape method, escapeXml is deprecated but it only tries to escaped the five basic XML entities (gt, lt, quot, amp, apos), which is what we need. Not sure if the use of escapeXml10 or escapeXml11, that escaped mor chars, can carry side effects.

Perhaps minimal escaping would be enough, but I can't think of how a more comprehensive escaping could harm. This is why I opted for the escapeXml10. I can't remember the details right now but I remember that I had problems in the past with more essential escaping when non printable characters were involved: that case was handled well by escapeXml10 instead.

@mauromol
Copy link
Contributor Author

I don't think you need to escape objects like:
certString, nameIDFormat, requestedAuthnContextCmp, requestedAuthnContext, id, statusCode, generateUniqueID because we control the value of those attributes.

I just checked:

  • certString in metadata: since it's always the result of a Base64 encoding, we may probably omit escaping; if you prefer, I can remove it
  • nameIDFormat is a generic string extracted from settings.getSpNameIDFormat(): you may input anything into Saml2Settings.setSpNameIDFormat(String) and corrupt the resulting XML; of course, something which is not a legal SAML format will cause a validation error somewhere down along the SAML processing, but I still think that a broken XML is much worse (and possibly a security risk?).
  • requestedAuthnContextCmp: same for nameIDFormat, it comes from settings as a generic String, I don't see any validation that restricts the set of valid context comparison values (when loading properties files, for instance) and, in any case, in a dynamic settings scenario one can input anything to Saml2Settings.setRequestedAuthnContextComparison(String)
  • requestedAuthnContext is even more volatile, because, if I remember well, SAML specification leaves freedom to specify contexts (SPID has its own custom contexts, for instance); so I think it's important to properly escape contexts
  • id is computed by Util.generateUniqueID(String) as the concatenation of a prefix (which usually comes from Saml2Settings.getUniqueIDPrefix(), which is again a generic string) and a UUID; so the id as a whole may contain characters which need escaping
  • statusCode is passed to LogoutResponse.generateSubstitutor(Saml2Settings, String) by LogoutResponse.build(String, String), which is public API: once again, one can input anything as the status code to put into the generated XML, even a malicious string which may break the resulting XML

To sum it up, I think that the only place where escaping may be removed is in certString output in metadata generation.

@mauromol
Copy link
Contributor Author

Perhaps minimal escaping would be enough, but I can't think of how a more comprehensive escaping could harm. This is why I opted for the escapeXml10. I can't remember the details right now but I remember that I had problems in the past with more essential escaping when non printable characters were involved: that case was handled well by escapeXml10 instead.

Just to provide more details: I can remember now that those non printable characters were making the JSF parser crazy and produce some fancy exceptions. As I said, I could solve the problem with escapeXml10, since those non printable characters were simply filtered out, which was exactly what I needed. More basic XML escaping provided by Spring, or the basic escapeXml you mention, were failing at doing that.

In java-saml context I think that basic escaping could help to make legitimate use cases (which are currently broken) work, while a more comprehensive escaping can help to prevent malicious XML to be generated and cause any sort of unexpected problems to consumers. IMHO, escapeXml10 provides an easy solution to the problem, unless you have some other alternative and/or you want to fine tune the characters to escape.

Anyway, I think this issue is quite severe, so we may probably start with escapeXml10 now and then subsequently change just Util.toXml() implementation.

@mauromol
Copy link
Contributor Author

@pitbulk I removed the useless XML escaping of certificate data.

@pitbulk pitbulk merged commit 38f3480 into SAML-Toolkits:master Jul 23, 2021
@mauromol mauromol deleted the escape-strings-in-xml branch July 24, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text in produced XML is not escaped
3 participants