### SSO with python requests

### Overview

The issue: Download information from a page that is behind a single sign on (SSO) login.


SSO logins redirect a login in request to a secondary page and set a cookie, so you can't just submit the form on the page that you're accessing.

Luckily, the python requests module opens a session, handles some of the redirects and cookies, which is pretty helpful.

JWE

References:
https://brennan.io/2016/03/02/logging-in-with-requests/

In [18]:
import requests, sys
import lxml.html
from getpass import getpass
from requests.packages.urllib3.exceptions import InsecureRequestWarning

As an example let's use the dicom server oxygen to download a test scan seesion tgz file

First the server URL is the expected one, second we need to also know the redirected login page.  We can figure this out by searching through the page source of the page we're interested in logging into.

document.getElementById('CredSelectorNotice').action = "/siteminderagent/forms/login.fcc";

You find that the credentials option is submitted using submitForm(1)
      var pwd = document.CredSelectorNotice.PASSWORD.value;
      var user = document.CredSelectorNotice.USER.value;


function submitForm(option)
{
   var nextyear = new Date();
   nextyear.setFullYear(nextyear.getFullYear() + 1);
   document.cookie = "newuser="+document.getElementById('CredSelectorNotice').USER.value+"; expires="+nextyear.toGMTString()+"; path=/; domain=.nih.gov";


    //alert(document.CredSelectorNotice.loginradio[1].checked);
    var browserName = ""; 
    var ua = navigator.userAgent.toLowerCase(); 
    var aspsessioncookie = getCookie('ASP.NET_SessionId');
    
    if(option == 1)
    {
      document.getElementById('CredSelectorNotice').method = 'POST';
      document.getElementById('CredSelectorNotice').action = "/siteminderagent/forms/login.fcc";
      var pwd = document.CredSelectorNotice.PASSWORD.value;
      var user = document.CredSelectorNotice.USER.value;
		if( pwd == "" || user == "" )
		{
          alert("Either provide userid and password or Login with PIV");
          return false;
		}
    }
	
    if(option == 2)
    {
        document.getElementById('CredSelectorNotice').method = "GET";
		window.location.href = 'https://pivauth.nih.gov/CertAuthV2/forms/NIHPIVRedirector.aspx?TARGET='+document.CredSelectorNotice.target.value;
		return false;        
    }
    document.getElementById('CredSelectorNotice').submit();  
}


Setup the URLS and session

In [19]:
server_url = 'https://oxygen.nimh.nih.gov'
login_url = 'https://auth.nih.gov/siteminderagent/forms/login.fcc'
sess =  requests.session()
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

Start the session with the expected server URL to get the info to submit the form...

In [20]:
login_pg = sess.get(server_url,verify=False)
print(login_pg)

<Response [200]>


In [21]:
login_pg.url

'https://auth.nih.gov/CertAuthV2/forms/NIHPivOrFormLogin.aspx?TYPE=33554433&REALMOID=06-81e8087e-dc32-4a5b-b4d2-20c273621995&GUID=&SMAUTHREASON=0&METHOD=GET&SMAGENTNAME=-SM-XiCuuyDGKMMOOUnaJKYIe6yL2UoEhRiPJuN%2bcaFdEcyXo2cZd6hanhu%2fsVjPLLsz&TARGET=-SM-https%3a%2f%2foxygen.nimh.nih.gov%2f'

Notice that in the actual URL there is a lot of stuff after the base server name

This stuff can be found in the "hidden elements" of the page.  Parse those out using lxml and update the form with the username and password variables before posting them.

    <form method="post" name="CredSelectorNotice" id="CredSelectorNotice">
          <input type="hidden" name="SMLOCALE"             value="US-EN"/>
          <input type="hidden" name="SMENC"                value="ISO-8859-1"/>
          <input type="hidden" name="smquerydata"          value=""/>
          <input type="hidden" name="smagentname"          value="-SM-XiCuuyDGKMMOOUnaJKYIe6yL2UoEhRiPJuN+caFdEcyXo2cZd6hanhu/sVjPLLsz"/>
          <input type="hidden" name="postpreservationdata" value=""/>
          <input type="hidden" name="target"	       value="-SM-https%3a%2f%2foxygen.nimh.nih.gov%2f"/>
          <input type="hidden" name="minloa" value="NIHIssuedLOA4" />  

In [22]:
login_html = lxml.html.fromstring(login_pg.text)
hidden_elements = login_html.xpath('//form//input[@type="hidden"]')
print(hidden_elements)
form = {x.attrib['name']: x.attrib['value'] for x in hidden_elements}
creds = {'USER':input('Username: '),'PASSWORD':getpass()}
form.update(creds)
tmp_pg = sess.post(login_url,data=form)
del form
del creds

[<InputElement 7fbf285f10e8 name='SMLOCALE' type='hidden'>, <InputElement 7fbf2862f6d8 name='SMENC' type='hidden'>, <InputElement 7fbf2862f728 name='smquerydata' type='hidden'>, <InputElement 7fbf2862f318 name='smagentname' type='hidden'>, <InputElement 7fbf2862f4f8 name='postpreservationdata' type='hidden'>, <InputElement 7fbf2862f408 name='target' type='hidden'>, <InputElement 7fbf2862f7c8 name='minloa' type='hidden'>]
Username: evansjw
········


Once the form posts the session cookie is set and you you're ready to get what you were looking for.

As an example, we can get the scan from a canary melon and its onion friend.

In [23]:
scanner = 'fmrif3td'
scan_date = '2019_08_26' # format yyyy_mm_dd
sub_name = 'MELON_CANARY' # format LAST_FIRST_MIDDLE
sub_MRN = '00000000' # sys.argv[1]

In [24]:
# look for tarFileSizes.txt
dir_path = '/'.join([server_url,'dicomData/userView',scanner,scan_date,sub_name + '-' + sub_MRN])
print(dir_path)
try:
   page = sess.get('/'.join([dir_path,'tarFileSizes.txt']),verify=False)
   page.raise_for_status()
   tmp = page.content.split()
   tar_size = tmp[0].decode("utf-8")
   tar_fn =tmp[2].decode("utf-8")
   print(f'Getting {tar_size} bytes of data in {tar_fn}')
   # get data
   tar_dat = sess.get('/'.join([dir_path,tar_fn]) ,verify=False)

   print(f'Saving {tar_fn}')
   # save data
   open(tar_fn, 'wb').write(tar_dat.content)

except requests.exceptions.HTTPError as err:
   #print(err)
   print('No data for subject')

https://oxygen.nimh.nih.gov/dicomData/userView/fmrif3td/2019_08_26/MELON_CANARY-00000000
Getting 143861926 bytes of data in MELON_CANARY-00000000-20190826-00001-DICOM.tgz
Saving MELON_CANARY-00000000-20190826-00001-DICOM.tgz
