Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct Attribute Value Parsing #60

Merged
merged 5 commits into from
Jan 27, 2016

Conversation

DRvanR
Copy link
Contributor

@DRvanR DRvanR commented Jan 22, 2016

Adds complex type attribute value support as well as ensures that basic simple typed attributes (string or integer) are supported as well. This was already the case when generating XML, however this was not yet supported when parsing XML.

DRvanR added 3 commits January 22, 2016 12:04
This commit adds complex type attribute value support
as well as ensures that basic simple typed attributes (string
or integer) are supported as well. This was already the case
when generating XML, however this was not yet supported when
parsing XML
@DRvanR
Copy link
Contributor Author

DRvanR commented Jan 22, 2016

@thijskh @jaimeperez Thoughts, ideas, comments? This stems from an issue where attribute values were not copied correctly by a SAML proxy when the attributes contained complex types (in this case, a NameID) as attribute value. In the current situation, the textContent is used which means the "The text content of this node and its descendants." and thus disregards the childnodes. This fix keeps the childnodes in case the childnode is not a Text node and if it is a Text node, supports both string and integer types (defaulting to string is was the case currently).

if ($type === 'xs:integer') {
$this->attributes[$name][] = (int) $value->textContent;
} else {
$this->attributes[$name][] = trim($value->textContent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you trim the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BC 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, I see it now. Alright, we'll have to live with that.

@thijskh
Copy link
Member

thijskh commented Jan 22, 2016

👍

@thijskh
Copy link
Member

thijskh commented Jan 22, 2016

@jaimeperez would this be ok from a simpleSAMLphp pov?

@rjkip
Copy link

rjkip commented Jan 22, 2016

The attributes field docblock has always said this field to be of type DOMElement[], even though this has been string[] for as far as SAML2_Assertion's history goes. Perhaps this is a good time to document it to have the type (DOMNodeList|string|int)[]?

I'm not very familiar with encrypted attributes, but seeing as these contain AttributeValues as well, don't encrypted attributes also support complex types and shouldn't the same change perhaps be applied to decryptAttributes()?

@DRvanR
Copy link
Contributor Author

DRvanR commented Jan 22, 2016

Wouldn't the correct type for attributes be just array, as it a multi-dimensional array?

With respect to decrypting encrypted attributes, I agree, will add that.

@rjkip
Copy link

rjkip commented Jan 22, 2016

Oops, I do mean (DOMNodeList|string|int)[][], but that's a bit too much. array sounds fine. I'm just looking for a bit of documentation as to what I can expect from getAttributes.

@thijskh
Copy link
Member

thijskh commented Jan 22, 2016

:+2:

@jaimeperez
Copy link
Member

Thanks a lot for this @DRvanR, and sorry so much for the delay offering a response.

I've taken a look into the code and it looks perfectly good, and something we definitely need to fix as the way it is today should be considered a bug. I was a bit confused as part of the PR is actually fixing a different problem, that being some code parsing documents without using the DOMDocumentFactory, but I think it's ok and that shouldn't delay merging the PR any longer.

Regarding SSP, I don't know if this will have any direct impact (though it definitely shouldn't), but it will take a while anyway until we migrate to the current library, so that's something we can sort out if it causes trouble.

👍 from me.

jaimeperez added a commit that referenced this pull request Jan 27, 2016
…es-correctly

Correct Attribute Value Parsing
@jaimeperez jaimeperez merged commit a992844 into master Jan 27, 2016
@thijskh thijskh deleted the bugfix/parse-attribute-values-correctly branch June 6, 2016 16:25
@jaimeperez
Copy link
Member

Hi all!

Six months later, here I am again to take my last statement back and confirm that this change has an impact on SimpleSAMLphp, and actually broke all SSP service providers trying to use eduPersonTargetedID encoded as a NameID (which I suspect is how Shibboleth encodes it by default).

The problem is that returning objects instead of strings or integers as the attribute values is dangerous, as those objects might not be serializable, as it is the case with the DOM* objects.

The consequence: when we receive an eduPersonTargetedID encoded as a SAML NameID, the new code returns its value as a DOMNodeList object, which is traversable. That's ok. However, I'd say we always end up redirecting somewhere else after successful authentication. When we redirect, the session is serialized, and since DOMNodeList is not serializable, the value is lost. There's actually a nasty poltergeist the goes way beyond my comprehension, as the DOMNodeList object is wiped out when the SimpleSAML_Session::__destruct() method is invoked (when redirecting or finishing execution). At that point, it doesn't matter the session object is the same as the one where we stored the DOMNodeList successfully, the list is empty then.

I've tried to reproduce the problem with a simple class, and the DOMNodeList object survives perfectly well when the __destruct() method of that class is called, so I guess it must be some issue with the singleton we use in SimpleSAML_Session.

In any case, I've managed to resolve the issue by iterating over the attribute values and dumping back to XML those that are not plain strings or integers (i.e. DOMNodeList objects). That way, we can later recreate the DOMNodeList from the XML string.

I thought it was a good idea to avoid modifying the SAML2 library again, but we need then to make people aware of the issue. @DRvanR, could you please add a comment about this in the code at least?

If you already noticed this problem and have some mitigation in place, you can just disregard my message 😉

@jaimeperez
Copy link
Member

Here's the bug report, by the way.

jaimeperez added a commit to simplesamlphp/simplesamlphp that referenced this pull request Jul 28, 2016
A recent change in simplesamlphp/saml2#60 made the library return a DOMNodeList object when the contents of the AttributeValue element are not text. This lead to a bug, since the returned value is not serializable, and when storing it in the session it will go away as soon as we serialize the session to store it in the backend (whatever that is). This is always, as the SP will always redirect to the URL originating authentication. The result was an empty DOMNodeList object where there should be some value.

This commit makes the SimpleSAML_Session to implement the Serializable interface. When obtaining the attributes during login (doLogin() method), the code will now look for DOMNodeList objects, and dump them as a string with the XML representation of their contents in the 'RawAttributes' array inside $this->authData[$authority]. This allows us to parse the XML back when unserializing, and restore the original DOMNodeList object as the value of the attribute.

The issue was reported originally in the mailing list by Enrico Cavalli, affecting eduPersonTargetedID. This resolves #424.
@DRvanR
Copy link
Contributor Author

DRvanR commented Aug 2, 2016

Hi Jaime,

that is indeed a quite nasty issue. The contents are however listed in the docblock, what additional documentation would you suggest?

As an aside, we've encountered this problem not once but twice ourselves. What we've been thinking about is to allow nested attribute detection and parsing. I've not yet been able to actually play around with that idea (busy eh 😉), but it would solve not only this, but other possible cases as well. It however will remain an issue that a nested attribute will always be the odd one out and that it will always be needed to be handled intelligently somehow.

@jaimeperez
Copy link
Member

Hi Daan!

I was thinking basically about a big comment in the docblock warning about the returned value not being serializable —or put differently, that you can't store it in any way—, in case a DOMNodeList is returned. That way, at least people should know about this and have some hint when things go wrong.

I love the idea of having some logic in place to detect what's inside the attribute value, and parse it automatically for you. It would be great if we could return a SAML2\XML\saml\NameID instead of the DOMNodeList for eduPersonTargetedID (or any other attribute that may contain a NameID, of course). But at the same time, I guess it would be complicated to implement something that's generic enough and covers most cases (and it will be impossible to cover all of them indeed).

Did you discuss how to approach such implementation?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants