Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim values obtained with getTextContent() on any XML node #340

Merged
merged 4 commits into from Jul 23, 2021

Conversation

mauromol
Copy link
Contributor

This change ensures that surrounding whitespace is removed for any value obtained from a XML element where this is indeed the expected behaviour (like in issuers, audiences, status messages, name ids).

@pitbulk
Copy link
Contributor

pitbulk commented Apr 30, 2021

Those changes could break current integrations so them should be applied in a major release.

That said, I don't get why the toolkit may "clean" the values.

If there are not expected spaces in the XML, is IdP faults...

@mauromol
Copy link
Contributor Author

In some places the trim was already in place (see the audiences validation, for instance).
In my tests with SPID I encountered some cases in which surrounding spaces where introduced in <NameID> elements, for instance (perhaps just to format the resulting XML).
For instance, if you have this:

    <saml:NameID
      Format="urn:oasis:names:tc:SAML:2.0:nameid-format:transient"
      NameQualifier= "http://spidIdp.spididpProvider.it">
             _06e983facd7cd554cfe067e
    </saml:NameID>

what do you think the NameID value should be? Without trimming it will be something like

             _06e983facd7cd554cfe067e     | <= here it ends

while the expected value in this case would be just _06e983facd7cd554cfe067e.
I agree with you that since the SAML schema states that NameID derives from string (not from normalizedString or token) it should probably not be processed. However, in this case the processing should be coherent everywhere (in audiences as well, I would say). Perhaps the trimming could be enabled on demand in settings, with a default value of false?

@pitbulk
Copy link
Contributor

pitbulk commented May 3, 2021

Ok, lets do it configurable, by default disabled

@mauromol
Copy link
Contributor Author

mauromol commented May 5, 2021

I'll update this PR as soon as possible. I plan to add two properties (both defaulting to false):

  • trim name IDs: this will apply to subject and issuer name IDs
  • trim attribute values: this will apply to attribute values

The sessionIndex is an attribute, so I will revert the change made in the PR and perform no trimming at all.
<Audience> is of type anyUri, which has a whitespace policy of collapse, so it's correct that is trimmed.

Meanwhile it's interesting to see that the SAML specification (in the core specification document, section 1.3.1) says:

Unless otherwise noted in this specification or particular profiles, all elements in SAML documents that have the XML Schema xs:string type, or a type derived from that, MUST be compared using an exact binary comparison. In particular, SAML implementations and deployments MUST NOT depend on case-insensitive string comparisons, normalization or trimming of whitespace, or conversion of locale-specific.

However, the same specification document shows an example in which a NameID contains whitespace that is likely to be removed, see on page 72 of the SAML core specification document:

<NameID
    Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">
    scott@example.org
</NameID>

We can argue that it might be just due to PDF pagination, but indeed it leads to confusion.

@mauromol
Copy link
Contributor Author

mauromol commented May 5, 2021

Hi @pitbulk , here is the update to make trimming an opt-in feature.
It ended up that SessionIndex is an element in LogoutRequest, but it's not used for request processing, so I just provided an overloading of LogoutRequest.getSessionIndexes(...) with a boolean value specifying whether trimming should be performed or not. The same was done for all the other static methods of LogoutRequest that deal with the name ID and issuer: unfortunately the LogoutRequest API is not very elegant. Anyway, LogoutRequest.isValid() passes the right settings value when checking for request validity.
I removed any trimming when extracting the status message of a response status: I think it's not worth the effort and I expect the status message to be solely informative, so whitespace shouldn't hurt too much if present.

Please note that AuthnResponseTest.testGetIssuersTrimming() is disabled: it fails because of a bug I fixed in PR #333, so please merge one of these two PRs (if you agree with the changes) and then I will adapt the other one.

@pitbulk
Copy link
Contributor

pitbulk commented Jun 24, 2021

Sorry for the delay, #333 merged

@mauromol
Copy link
Contributor Author

Hi @pitbulk, thanks for merging! I will then adapt the other open pull requests (including this one), but please note that I'm leaving for a couple of weeks of holidays, I will be back around 12th July, when I will be able to proceed. Thanks for your patience.

@pitbulk
Copy link
Contributor

pitbulk commented Jun 24, 2021

We need to take care while applying this one...

An example of a security issue related to the use of "trim":
https://hackerone.com/reports/976603

@mauromol
Copy link
Contributor Author

Thanks for the link, I will study it and let you know!

If the subject name id is not trimmed, its value may contain
surrounding whitespace characters depending on XML formatting.

This change also avoids a double trim of audiences (which indeed were
already trimmed).
This change extends the previous one, made for SamlResponse name id,
so that surrounding whitespace is removed for any value obtained from
a XML element where this is indeed the expected behaviour (like in
issuers, audiences, status messages, name ids).
Name IDs (including issuers) are by default left untouched, as well as
attribute values, like it was before. However two new settings have
been introduced (whose default value is false) which allow to enable
trimming for such values, which is probably the desired behaviour in
practice, although SAML specification says that no whitespace processing
should be performed on strings.
Another place where trimming may be desirable is in SessionIndex
extraction from LogoutRequests: this is not performed at any point of
the LogoutRequest processing, but an overloading has been provided so
that the API consumer may still request trimming.
AuthnResponseTest.testGetIssuersTrimming() is disabled by now because it
fails due to a bug in SamlResponse.getIssuers() which is addressed by
another PR.
Now that fixes to the getIssuers() and the new getResponseIssuer() and
getAssertionIssuer() are available on master, proper test cases can be
provided with regards to trimming.
@mauromol
Copy link
Contributor Author

mauromol commented Jul 14, 2021

Hi @pitbulk, I rebased the whole work done here against current master and completed the AuthnResponseTest changes. Let me know if you have more objections.
The failures reported by GitHub seem to be related to the dependency-check plugin: all tests should pass.

I read the issue at https://hackerone.com/reports/976603: said that I don't know what Grammarly really is, if I understand it correctly the issue is related to the use of different trimming strategies in a scenario where there's the ability for a user to register to an IdP and there's the ability for a new IdP to join the federation exposing a malicious entity id.
IMHO (but for sure you're more experienced than me), since we left the default behaviour to NOT perform any trimming, if a SP wants to opt-in for the trimming behaviour may do that for a specific reason. In SPID, for instance, a user CANNOT join an IdP (users are pre-registered on IdPs with an out-of-band process, which requires a de visu identification of each subject) and IdPs cannot join the federation, unless they are approved by a central body (AgID) which will most likely reject any attempt to use a malicious entity id by any new IdP. So I don't think there is any harm related to issuer trimming performed by the SP on such a scenario: the trimmed issuer retrieved from the response may or may not match one of a very limited set of non-ambiguous entity ids. Let me know if you think differently.

@pitbulk pitbulk merged commit 3125450 into SAML-Toolkits:master Jul 23, 2021
@mauromol mauromol deleted the trim-subject-name-id branch July 24, 2021 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants