<a href="https://colab.research.google.com/github/guanwee-loo/Notebooks/blob/master/SAML2_0_Vulnerability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 # SAML 2.0 Authentication Bypass Vulnerability
 

#What is SAML 2.0 ?

Security Assertion Markup Language 2.0 (SAML 2.0) is a version of the SAML standard for exchanging authentication and authorization data between security domains. SAML 2.0 is an XML-based protocol that uses security tokens containing assertions to pass information about a **principal** (usually an end user) between a SAML authority, named an **Identity Provider**, and a SAML consumer, named a **Service Provider**. SAML 2.0 enables web-based, cross-domain single sign-on (SSO), which helps reduce the administrative overhead of distributing multiple authentication tokens to the user


**When a user is authenticating to a website using SAML, there are always three parties involved:**

1.   A user in a web browser
2.   A service provider (SP) running a website that user is trying to access (e.g., Salesforce)
3.   An identity provider (IdP) that stores and manages the user’s account and credentials (e.g., Okta, OneLogin)






#Web SSO Normal Flow 

![alt text](https://raw.githubusercontent.com/guanwee-loo/Notebooks/master/WebSSOFlow_normal.PNG)

The important concept to grasp is what a SAML Response means to a Service Provider (SP), and how it is processed. 

Response processing has a lot of subtleties, but a simplified version often looks like:

* The user authenticates to an Identity Provider (IdP) which generates a signed SAML Response. The user’s browser then forwards this response along to an SP such as Slack or Github.

* The SP validates the SAML Responses signature.

* If the signature is valid, a string identifier within the SAML Response (e.g. the NameID) will identify which user to authenticate.



# What is the SAML Authentication Bypass Vulnerability?

It is a new vulnerability class that affects SAML-based single sign-on (SSO) systems. This vulnerability can allow an attacker with authenticated access to trick SAML systems into authenticating as a different user without knowledge of the victim user’s password.


There are 3 ingredients that enable this vulnerability

1. SAML Responses contain strings that identify the authenticating user.

2. XML canonicalization  will remove comments (depending on configuration) as part of signature validation, so adding comments to a SAML Response will not invalidate the signature.

3. XML text extraction may only return a substring of the text within an XML element when comments are present.

# A simplified SAML Response



```
<SAMLResponse>
    <Issuer>https://idp.com/</Issuer>
    <Assertion ID="_id1234">
        <Subject>
            <NameID>user@user.com</NameID>
        </Subject>
    </Assertion>
    <Signature>
        <SignedInfo>
            <CanonicalizationMethod Algorithm="xml-c14n11"/>
            <Reference URI="#_id1234"/>
        </SignedInfo>
        <SignatureValue>
            some base64 data that represents the signature of the assertion
        </SignatureValue>
    </Signature>
</SAMLResponse>
```

The two essential elements from the above XML blob are the **Assertion** and the **Signature** element. 

The Assertion element contains the *NameID* element. which is a string used by the Identity Provider (IdP) to identify and authenticate the user who’s about to be logged in.  A signature is generated for that Assertion element and stored as part of the Signature element and used by the Service Provider (SP) to ensure data integrity and prevent modification of the NameID. 


# XML Canonicalization (C14N)

XML canonicalization allows two logically equivalent XML documents to have the same byte representation. For example:



```
<NameID>user@user.com.evil.com</NameID>
```

and


```
<NameID>user@user.com<!-- this is a comment -->.evil.com</NameID>
```

These two documents have different byte representations (the second contains a comment) but convey the same information (i.e. they are logically equivalent).

Canonicalization is applied to XML elements **prior to signing**. This prevents meaningless differences in the XML document from leading to different digital signatures. 
In the SAML Response above, the Canonicalization Method specifies which canonicalization method to apply prior to signing the document. The most common algorithm in practice seems to be http://www.w3.org/2001/10/xml-exc-c14n#. 


**The above behavior (stripping of comments) can be demonstrated by Python’s lxml XML library (used by an open source SAML Python Toolkit  "python3-saml") ** 


In [0]:
from io import BytesIO
from io import StringIO

import lxml.etree as ET
NameID = StringIO("<NameID>user@user.com<!-- this is a comment -->.evil.com</NameID>")
tree = ET.parse(NameID)
buffer = BytesIO()
tree.write_c14n(buffer,with_comments=False) # DO NOT INCLUDE COMMENTS IN THE RESULT
print(buffer.getvalue().decode("utf-8"))

# Perform signing of NameID after stripping the comments will yield the same signature - THE UNEXPECTED BEHAVIOR!!

<NameID>user@user.com.evil.com</NameID>


The "workaround' is to use another variant of exc-c14n that has the identifier http://www.w3.org/2001/10/xml-exc-c14n#WithComments. This variation of exc-c14n does not omit comments, so the two XML documents above would not have the same canonical representation though they are logically equivalent.  

In [0]:
from io import BytesIO
from io import StringIO

import lxml.etree as ET
NameID = StringIO("<NameID>user@user.com<!-- this is a comment -->.evil.com</NameID>")
tree = ET.parse(NameID)
buffer = BytesIO()
tree.write_c14n(buffer,with_comments=True) # INCLUDE COMMENTS IN THE RESULT
print(buffer.getvalue().decode("utf-8"))

# Perform signing of NameID without stripping the comments will yield different signature - THE EXPECTED BEHAVIOR

<NameID>user@user.com<!-- this is a comment -->.evil.com</NameID>


# XML Text Extraction

Another cause of this vulnerability is a subtle and unexpected behavior of XML parsing.

Consider the following XML element, NameID:



```
<NameID>user@user.com<!-- this is a comment -->.evil.com</NameID>
```

To extract the NameID from that element, in Python,  this can be done as follows:

In [0]:
from defusedxml.lxml import fromstring
resp = "<NameID>user@user.com<!-- this is a comment -->.evil.com</NameID>"
data = fromstring(resp)
print("Parsed NameID = " + data.text) 
# Expecting ‘user@user.com.evil.com’ but..
# it becomes 'user@user.com' after parsing. 

Parsed NameID = user@user.com


# Given the above 3 conditions,  an attacker can do the following to log in as another user:




1.  Log in with a registered account or a compromised account (NameID=user@user.com.evil.com)

2.  IdP will sign the assertion containing the NameID = user@user.com.evil.com

3.  Intercept the SAML response and modify the Assertion NameID to appear to be "user@user.com".

     Exploit the XML C14N issue to prevent invalidating of the signature by changing the NameID to :
     
         user@user.com<!-- this is a comment -->.evil.com
         
      XML C14N will strip the comments before signing (effectively signing user@user.com.evil.com which is the same as his original login signature).
  
     
4.  SP will mistakenly identify of the attacker as "user@user.com" due to unexpected XML textual extraction  behavior while verifying the signature as correct



#Web SSO Attack Flow

![alt text](https://raw.githubusercontent.com/guanwee-loo/Notebooks/master/WebSSOFlow_attacker.PNG)

#Mitigation Measures


* Disabling public registration of user accounts on sensitive networks and vetting each user manually to avoid attackers registering an account on internal networks in the first place.

* If this is not possible, network admins can configure a whitelist of accepted email address domain names to limit who can register on the network,

* The attack is not possible against accounts protected by two-factor authentication (2FA) solutions.

* Other possible remediation are updating libraries to use the C14N method that keeps the comments **prior** to signing OR use the canonicalized XML document **after** signature validation for any processing such as text extraction. This could prevent this vulnerability as well as other vulnerabilities that could be introduced by XML canonicalization issues.

# References

1.   https://en.wikipedia.org/wiki/SAML_2.0
2.   https://developer.okta.com/blog/2018/02/27/a-breakdown-of-the-new-saml-authentication-bypass-vulnerability
3.   https://duo.com/blog/duo-finds-saml-vulnerabilities-affecting-multiple-implementations
4.   https://www.decalage.info/fr/python/lxml-c14n


