Trim values obtained with getTextContent() on any XML node #340

mauromol · 2021-04-19T10:36:04Z

This change ensures that surrounding whitespace is removed for any value obtained from a XML element where this is indeed the expected behaviour (like in issuers, audiences, status messages, name ids).

pitbulk · 2021-04-30T08:51:34Z

Those changes could break current integrations so them should be applied in a major release.

That said, I don't get why the toolkit may "clean" the values.

If there are not expected spaces in the XML, is IdP faults...

mauromol · 2021-04-30T09:39:37Z

In some places the trim was already in place (see the audiences validation, for instance).
In my tests with SPID I encountered some cases in which surrounding spaces where introduced in <NameID> elements, for instance (perhaps just to format the resulting XML).
For instance, if you have this:

    <saml:NameID
      Format="urn:oasis:names:tc:SAML:2.0:nameid-format:transient"
      NameQualifier= "http://spidIdp.spididpProvider.it">
             _06e983facd7cd554cfe067e
    </saml:NameID>

what do you think the NameID value should be? Without trimming it will be something like

             _06e983facd7cd554cfe067e     | <= here it ends

while the expected value in this case would be just _06e983facd7cd554cfe067e.
I agree with you that since the SAML schema states that NameID derives from string (not from normalizedString or token) it should probably not be processed. However, in this case the processing should be coherent everywhere (in audiences as well, I would say). Perhaps the trimming could be enabled on demand in settings, with a default value of false?

pitbulk · 2021-05-03T18:11:26Z

Ok, lets do it configurable, by default disabled

mauromol · 2021-05-05T10:46:22Z

I'll update this PR as soon as possible. I plan to add two properties (both defaulting to false):

trim name IDs: this will apply to subject and issuer name IDs
trim attribute values: this will apply to attribute values

The sessionIndex is an attribute, so I will revert the change made in the PR and perform no trimming at all.
<Audience> is of type anyUri, which has a whitespace policy of collapse, so it's correct that is trimmed.

Meanwhile it's interesting to see that the SAML specification (in the core specification document, section 1.3.1) says:

Unless otherwise noted in this specification or particular profiles, all elements in SAML documents that have the XML Schema xs:string type, or a type derived from that, MUST be compared using an exact binary comparison. In particular, SAML implementations and deployments MUST NOT depend on case-insensitive string comparisons, normalization or trimming of whitespace, or conversion of locale-specific.

However, the same specification document shows an example in which a NameID contains whitespace that is likely to be removed, see on page 72 of the SAML core specification document:

<NameID
    Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">
    scott@example.org
</NameID>

We can argue that it might be just due to PDF pagination, but indeed it leads to confusion.

mauromol · 2021-05-05T16:21:07Z

Hi @pitbulk , here is the update to make trimming an opt-in feature.
It ended up that SessionIndex is an element in LogoutRequest, but it's not used for request processing, so I just provided an overloading of LogoutRequest.getSessionIndexes(...) with a boolean value specifying whether trimming should be performed or not. The same was done for all the other static methods of LogoutRequest that deal with the name ID and issuer: unfortunately the LogoutRequest API is not very elegant. Anyway, LogoutRequest.isValid() passes the right settings value when checking for request validity.
I removed any trimming when extracting the status message of a response status: I think it's not worth the effort and I expect the status message to be solely informative, so whitespace shouldn't hurt too much if present.

Please note that AuthnResponseTest.testGetIssuersTrimming() is disabled: it fails because of a bug I fixed in PR #333, so please merge one of these two PRs (if you agree with the changes) and then I will adapt the other one.

pitbulk · 2021-06-24T14:54:20Z

Sorry for the delay, #333 merged

mauromol · 2021-06-24T14:58:56Z

Hi @pitbulk, thanks for merging! I will then adapt the other open pull requests (including this one), but please note that I'm leaving for a couple of weeks of holidays, I will be back around 12th July, when I will be able to proceed. Thanks for your patience.

pitbulk · 2021-06-24T15:12:30Z

We need to take care while applying this one...

An example of a security issue related to the use of "trim":
https://hackerone.com/reports/976603

mauromol · 2021-06-24T15:14:41Z

Thanks for the link, I will study it and let you know!

If the subject name id is not trimmed, its value may contain surrounding whitespace characters depending on XML formatting. This change also avoids a double trim of audiences (which indeed were already trimmed).

This change extends the previous one, made for SamlResponse name id, so that surrounding whitespace is removed for any value obtained from a XML element where this is indeed the expected behaviour (like in issuers, audiences, status messages, name ids).

Name IDs (including issuers) are by default left untouched, as well as attribute values, like it was before. However two new settings have been introduced (whose default value is false) which allow to enable trimming for such values, which is probably the desired behaviour in practice, although SAML specification says that no whitespace processing should be performed on strings. Another place where trimming may be desirable is in SessionIndex extraction from LogoutRequests: this is not performed at any point of the LogoutRequest processing, but an overloading has been provided so that the API consumer may still request trimming. AuthnResponseTest.testGetIssuersTrimming() is disabled by now because it fails due to a bug in SamlResponse.getIssuers() which is addressed by another PR.

Now that fixes to the getIssuers() and the new getResponseIssuer() and getAssertionIssuer() are available on master, proper test cases can be provided with regards to trimming.

mauromol · 2021-07-14T13:39:27Z

Hi @pitbulk, I rebased the whole work done here against current master and completed the AuthnResponseTest changes. Let me know if you have more objections.
The failures reported by GitHub seem to be related to the dependency-check plugin: all tests should pass.

I read the issue at https://hackerone.com/reports/976603: said that I don't know what Grammarly really is, if I understand it correctly the issue is related to the use of different trimming strategies in a scenario where there's the ability for a user to register to an IdP and there's the ability for a new IdP to join the federation exposing a malicious entity id.
IMHO (but for sure you're more experienced than me), since we left the default behaviour to NOT perform any trimming, if a SP wants to opt-in for the trimming behaviour may do that for a specific reason. In SPID, for instance, a user CANNOT join an IdP (users are pre-registered on IdPs with an out-of-band process, which requires a de visu identification of each subject) and IdPs cannot join the federation, unless they are approved by a central body (AgID) which will most likely reject any attempt to use a malicious entity id by any new IdP. So I don't think there is any harm related to issuer trimming performed by the SP on such a scenario: the trimmed issuer retrieved from the response may or may not match one of a very limited set of non-ambiguous entity ids. Let me know if you think differently.

pitbulk force-pushed the master branch 3 times, most recently from ab7e4d7 to 3c79c8c Compare May 18, 2021 00:32

mauromol mentioned this pull request Jun 1, 2021

Presenza di whitespace negli elementi XML italia/spid-regole-tecniche#46

Open

mauromol added 4 commits July 14, 2021 14:39

Trim the extracted subject name id

c4907d4

If the subject name id is not trimmed, its value may contain surrounding whitespace characters depending on XML formatting. This change also avoids a double trim of audiences (which indeed were already trimmed).

Adjust AuthnResponseTest with regards to issuers trimming

1562784

Now that fixes to the getIssuers() and the new getResponseIssuer() and getAssertionIssuer() are available on master, proper test cases can be provided with regards to trimming.

mauromol force-pushed the trim-subject-name-id branch from 1ae96e3 to 1562784 Compare July 14, 2021 13:19

pitbulk approved these changes Jul 23, 2021

View reviewed changes

pitbulk merged commit 3125450 into SAML-Toolkits:master Jul 23, 2021

mauromol deleted the trim-subject-name-id branch July 24, 2021 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trim values obtained with getTextContent() on any XML node #340

Trim values obtained with getTextContent() on any XML node #340

mauromol commented Apr 19, 2021

pitbulk commented Apr 30, 2021

mauromol commented Apr 30, 2021

pitbulk commented May 3, 2021

mauromol commented May 5, 2021 •

edited

mauromol commented May 5, 2021

pitbulk commented Jun 24, 2021

mauromol commented Jun 24, 2021

pitbulk commented Jun 24, 2021 •

edited

mauromol commented Jun 24, 2021

mauromol commented Jul 14, 2021 •

edited

Trim values obtained with getTextContent() on any XML node #340

Trim values obtained with getTextContent() on any XML node #340

Conversation

mauromol commented Apr 19, 2021

pitbulk commented Apr 30, 2021

mauromol commented Apr 30, 2021

pitbulk commented May 3, 2021

mauromol commented May 5, 2021 • edited

mauromol commented May 5, 2021

pitbulk commented Jun 24, 2021

mauromol commented Jun 24, 2021

pitbulk commented Jun 24, 2021 • edited

mauromol commented Jun 24, 2021

mauromol commented Jul 14, 2021 • edited

mauromol commented May 5, 2021 •

edited

pitbulk commented Jun 24, 2021 •

edited

mauromol commented Jul 14, 2021 •

edited