Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URS not working for NSIDC OPeNDAP #188

Open
wallinb opened this issue Apr 4, 2019 · 6 comments
Open

URS not working for NSIDC OPeNDAP #188

wallinb opened this issue Apr 4, 2019 · 6 comments

Comments

@wallinb
Copy link

wallinb commented Apr 4, 2019

I am attempting to access a granule in the NSIDC ECS OPeNDAP instance using pydap, but unable to successfully authenticate with URS. Looking at PR #57, I would expect this to work when using pydap v3.2.2:

from pydap.client import open_url
from pydap.cas.urs import setup_session

url = 'https://n5eil02u.ecs.nsidc.org/opendap/OTHR/NISE.004/2012.10.02/NISE_SSMISF17_20121002.HDFEOS'
session = setup_session(os.environ['EARTHDATA_USER'], os.environ['EARTHDATA_PASS'], check_url=url)
open_url(url, session=session)

Returns an HTTPError that looks like a redirect is not being followed.

HTTPError                                 Traceback (most recent call last)
<ipython-input-7-4a425d17a509> in <module>
----> 1 open_url('https://n5eil02u.ecs.nsidc.org/opendap/OTHR/NISE.004/2012.10.02/NISE_SSMISF17_20121002.HDFEOS.html')

~/.pyenv/versions/miniconda3-latest/envs/test/lib/python3.7/site-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout)
     65     """
     66     dataset = DAPHandler(url, application, session, output_grid,
---> 67                          timeout).dataset
     68
     69     # attach server-side functions

~/.pyenv/versions/miniconda3-latest/envs/test/lib/python3.7/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid, timeout)
     52         ddsurl = urlunsplit((scheme, netloc, path + '.dds', query, fragment))
     53         r = GET(ddsurl, application, session, timeout=timeout)
---> 54         raise_for_status(r)
     55         if not r.charset:
     56             r.charset = 'ascii'

~/.pyenv/versions/miniconda3-latest/envs/test/lib/python3.7/site-packages/pydap/net.py in raise_for_status(response)
     37             detail=response.status+'\n'+response.text,
     38             headers=response.headers,
---> 39             comment=response.body
     40         )
     41

HTTPError: 302 Found
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://urs.earthdata.nasa.gov/oauth/authorize?client_id=PGVMJ5nUzSnQkI5o23gMxA&amp;response_type=code&amp;redirect_uri=https%3A%2F%2Fn5eil02u.ecs.nsidc.org%2FOPS%2Fredirect&amp;state=aHR0cHM6Ly9uNWVpbDAydS5lY3MubnNpZGMub3JnL29wZW5kYXAvT1RIUi9OSVNFLjAwNC8yMDEyLjEwLjAyL05JU0VfU1NNSVNGMTdfMjAxMjEwMDIuSERGRU9TLmh0bWwuZGRz">here</a>.</p>
</body></html>

I tried tracing the execution down to the webob.Request but I am not sure at that point what I should be seeing for this to work. Any help much appreciated!

@wallinb
Copy link
Author

wallinb commented Apr 4, 2019

With requests, this works:

with requests.Session() as session:
     session.auth = auth
     auth_resp = session.get(url)
     resp = session.get(auth_resp.url)

so I hoped this might work:

with requests.Session() as session:
     session.auth = auth
     dataset = open_url(url, session=session)

but the error is the same.

@hyoklee
Copy link

hyoklee commented May 4, 2019

I have the same issue with NSIDC OPeNDAP server.

@wallinb
Copy link
Author

wallinb commented May 9, 2019

Thanks to Peter L. Smith at Raytheon for this workaround and the following explanation:

import os

import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

sessions = {}

class URSSession(requests.Session):
    def __init__(self, username=None, password=None):
        super(URSSession, self).__init__()
        self.username = username
        self.password = password
        self.original_url = None

    def authenticate(self, url):
        self.original_url = url
        super(URSSession, self).get(url)
        self.original_url = None

    def get_redirect_target(self, resp):
        if resp.is_redirect:
            if resp.headers['location'] == self.original_url:
                # Redirected back to original URL, so OAuth2 complete. Exit here
                return None
        return super(URSSession, self).get_redirect_target(resp)

    def rebuild_auth(self, prepared_request, response):
        # If being redirected to URS and we have credentials, add them in
        # otherwise default session code will look to pull from .netrc
        if "https://urs.earthdata.nasa.gov" in prepared_request.url \
                and self.username and self.password:
            prepared_request.prepare_auth((self.username, self.password))
        else:
            super(URSSession, self).rebuild_auth(prepared_request, response)
        return


def get_session(url):
    """ Get existing session for host or create it
    """
    global sessions
    host = urlsplit(url).netloc

    if host not in sessions:
        session = requests.Session()
        if 'urs' in session.get(url).url:
            session = URSSession(os.environ['EARTHDATA_USER'], os.environ['EARTHDATA_PASS'])
            session.authenticate(url)

        retries = Retry(total=5, connect=3, backoff_factor=1, method_whitelist=False,
                        status_forcelist=[400, 401, 403, 404, 408, 500, 502, 503,  504])
        session.mount('http', HTTPAdapter(max_retries=retries))

        sessions[host] = session

    return sessions[host]

The Pydap library starts with the client (the pydap.cas.urs ‘setup_session’ method) making a call to the URS server at urs.earthdata.nasa.gov and providing credentials in a Basic Authorization header. This establishes a logged-in session with the URS service (but not the opendap service). Next, pydap uses this session to issue a HEAD request to the opendap server for the given resource. Under normal circumstances, the opendap server triggers the URS OAuth2 process and redirect pydap to URS. Because pydap is using a session that has logged in to URS already, this redirect would return immediately (no additional credentials needed) with another redirect back to the opendap server. The opendap server then establishes its own logged-in session before finally redirecting pydap back to the original resource, completing the OAuth2 process. At this point, pydap would then issue a GET request for the resource and would successfully download the data.

The issue with the NSIDC opendap server is that HEAD requests are permitted with no authentication required, and thus the OAuth2 process is never triggered and a log-in session with the opendap server is never establish. This results in the subsequent GET request for the resource to fail.

The sample code above avoids this by creating a session and issuing a GET request for the resource up front. The OAUth2 process is invoked and a URS session is established, followed by an opendap server session. At the tail end of this process, the final redirect back to the original resource is intercepted and halted (we don’t wish to actually download it at this point – we want to leave that to pydap). However, the session has established logins with both URS and the opendap server. This session is then passed to pydap whereupon it will be used for the HEAD and subsequent GET request as per normal.

If the NSIDC opendap server was configured to require authentication for HEAD requests, I believe pydap would work out-of-the-box with it. However, it would not be very efficient because at the end of the OAuth2 process, the final redirect back to the original resource is converted from a HEAD request to a GET request, resulting in the entire file being transmitted (or at least partially transmitted/buffered before the client disconnects).

@weiji14
Copy link

weiji14 commented Oct 20, 2019

Hi there,

I've been trying to access the NSIDC OPeNDAP server to programmatically pull down some ICESAT2 data via intake-xarray for the past week and just stumbled on this issue...

Just wondering if there's a way to incorporate that code snippet provided by @wallinb into pydap itself, and where would it go under the codebase (e.g. as a new module under the cas folder, or by patching the urs.py module). Happy to submit a pull request for this.

@loganbyers
Copy link

The issue persists as of this post.

The workaround mentioned by @wallinb works nearly verbatim, but the one thing I found is the urlsplit function in def get_session (... seems to be one of either urlparse.urlsplit for Python 2 or urllib.parse.urlsplit for Python 3.

@simonrp84
Copy link

Likewise, the issue persists.

I think that the fix described by @wallinb works, but it also seems to download the entire file:
session = get_session(url) gives this debug message:
2020-09-28 13:57:22,176 - urllib3.connectionpool - DEBUG - https://ladsweb.modaps.eosdis.nasa.gov:443 "GET /archive/allData/5200/VJ102IMG/2020/225/VJ102IMG.A2020225.0418.002.2020225101640.nc HTTP/1.1" 200 268321100

And there's a long pause before python becomes active again, which makes me think it's downloading the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants