<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Before-you-start" data-toc-modified-id="Before-you-start-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Before you start</a></span><ul class="toc-item"><li><span><a href="#Requirements" data-toc-modified-id="Requirements-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Requirements</a></span></li><li><span><a href="#Inputs" data-toc-modified-id="Inputs-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Inputs</a></span></li></ul></li><li><span><a href="#Subscriptions" data-toc-modified-id="Subscriptions-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Subscriptions</a></span><ul class="toc-item"><li><span><a href="#Subscriptions-through-CMR-Search" data-toc-modified-id="Subscriptions-through-CMR-Search-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Subscriptions through CMR Search</a></span></li><li><span><a href="#Temporary/alternative-approach" data-toc-modified-id="Temporary/alternative-approach-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Temporary/alternative approach</a></span></li></ul></li><li><span><a href="#Echo-Tokens" data-toc-modified-id="Echo-Tokens-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Echo Tokens</a></span></li></ul></div>

# Access Sentinel-6 NRT Data Tutorial

This notebook demonstrates a simple solution and starting point for more complex use cases that require routine access to Sentinel-6 NRT data products.

## Before you start

A few housekeeping items before our simple demo.

### Requirements

Excluding the steps to retrieve an echo token (Earthdata login required), this notebook requires three imports. 

One is a community developed module called [`requests`](https://requests.readthedocs.io/en/master/), which provides convenient http request methods that are a little more user friendly than `urllib` or similar. You'll need to install it with pip, conda, or otherwise.

In [1]:
from json import dumps, loads
from datetime import datetime, timedelta
from requests import get as GET

The base URLs for CMR and CMR UAT instances are shown in the cell below.

In [2]:
cmr = "cmr.earthdata.nasa.gov"
#cmr = "cmr.uat.earthdata.nasa.gov"

### Inputs

Start by defining some inputs that fit our requirements. They're covered in [the Subscriptions section](#Subscriptions), where the core content of the notebook is demoed. Brief explanations for each of the two "inputs" to this workflow is sufficient for now.

**The target CMR Collection's unique *concept-id***

New granules are ingested routinely at a rate of approximately `1` per `day` to the [GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1)](https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1) collection in CMR.

We use its `concept-id` as one input to our search queries in the next section. Set it as a string to variable `input_ccid`.

In [3]:
input_ccid = "C1664741463-PODAAC"
input_ccid

'C1664741463-PODAAC'

**A timestamp corresponding to exactly `24 hours` ago.**

The generic input could be described as a period/interval over which to search. Whether you'll need one timestamp or two depends on the query - there are a few good options for this use case. And a good input timestamp depends on the production cycle for the target collection. Our simple use case should inform that `1` granules were ingested in the last `24 hours`.

Finding an appropriate timestamp is fairly simple in Python. This approach relies on the `datetime` module.

1. Call `datetime.now` to get a datetime object corresponding to the current date and time.

In [4]:
now = datetime.now()
now

datetime.datetime(2020, 7, 25, 3, 53, 43, 900692)

A dictionary below contains the time components permissible defining the period covered by a `datetime.timedelta` object. 

We only need `day`:

In [5]:
input_datetime = dict(
    microseconds=0,
    milliseconds=0,
    seconds=0,
    minutes=0,
    hours=0,
    days=1,
    weeks=0,
)

input_datetime

{'microseconds': 0,
 'milliseconds': 0,
 'seconds': 0,
 'minutes': 0,
 'hours': 0,
 'days': 1,
 'weeks': 0}

2. Call `timedelta` to create an object that represents a period of `24 hours/1 day`.

In [6]:
period = timedelta(**input_datetime)
period

datetime.timedelta(days=1)

Now simple arithmetic gives us the period/interval over which to search for new granules. 

3. Get a timestamp corresponding to one day ago.

In [7]:
yesterday = now - period
yesterday

datetime.datetime(2020, 7, 24, 3, 53, 43, 900692)

Finally, get the timestamp as a string.

In [8]:
input_timestamp = yesterday.strftime("%Y-%m-%dT%H:%M:%SZ")
input_timestamp

'2020-07-24T03:53:43Z'

We'll pass this timestamp with the query demonstrated in the next section.

## Subscriptions

### Subscriptions through CMR Search

*This feature of CMR is still in development so this notebook does not provide a demo.*

A Subscription can trigger events when new metadata records are ingested/updated in CMR. The documentation for this feature of the CMR Search API is found in [the documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#subscription). An ECHO token is required to use CMR Subscriptions. See ([the Echo Tokens section](#Echo-Tokens)) for an explanation of ECHO tokens and how to retrieve one (valid for 24 hours). 

For now, just check CMR for available Subscription records at the corresponding endpoint.

In [9]:
subscriptions = f"https://{cmr}/search/subscriptions.umm_json?pretty=true"
print(subscriptions)
!curl $subscriptions

https://cmr.earthdata.nasa.gov/search/subscriptions.umm_json?pretty=true
{
  "hits" : 0,
  "took" : 3,
  "items" : [ ]
}

### Temporary/alternative approach

The section implements a comparable "Subscription" using the CMR Search API. As we touched on above, the approach is to request the granule records ingested to CMR in the last `24 hours`. The collection should have grown by about `1` granules over that period.

There are several ways to query for CMR updates that occured during a given timeframe. More in the CMR Search documentation:

* https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-with-new-granules (Collections)
* https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-with-revised-granules (Collections)
* https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#g-production-date (Granules)
* https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#g-created-at (Granules)

We'll use the `created_at` parameter for granule search to retrieved the granule records ingested since the timestamp derived in the first section. Requisite parameters:

In [10]:
params = {'collection_concept_id': input_ccid, 'created_at': input_timestamp}
params

{'collection_concept_id': 'C1664741463-PODAAC',
 'created_at': '2020-07-24T03:53:43Z'}

Join the parameters dictionary into the query string by joining the parameters and values with `=`, then the `parameter=value` pairs to each other with `&`.

In [11]:
query = "&".join([f"{p}={v}" for p,v in params.items()])
query

'collection_concept_id=C1664741463-PODAAC&created_at=2020-07-24T03:53:43Z'

Append to the CMR Search endpoint for collections.

In [12]:
url = f"https://{cmr}/search/granules.umm_json?{query}"
print(url)

https://cmr.earthdata.nasa.gov/search/granules.umm_json?collection_concept_id=C1664741463-PODAAC&created_at=2020-07-24T03:53:43Z


Download the granule records that match our search parameters.

In [13]:
results = GET(url).json()
print(f"{results['hits']} new granules ingested for '{input_ccid}' since '{input_timestamp}'.")

1 new granules ingested for 'C1664741463-PODAAC' since '2020-07-24T03:53:43Z'.


Since it's only one record, display it in its entirety:

In [14]:
print(dumps(results['items'][0]['umm'], indent=2))

{
  "TemporalExtent": {
    "RangeDateTime": {
      "BeginningDateTime": "2020-07-23T09:00:00.000Z",
      "EndingDateTime": "2020-07-23T09:00:00.000Z"
    }
  },
  "OrbitCalculatedSpatialDomains": [
    {}
  ],
  "GranuleUR": "20200723090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc",
  "SpatialExtent": {
    "HorizontalSpatialDomain": {
      "Geometry": {
        "BoundingRectangles": [
          {
            "WestBoundingCoordinate": -179.99,
            "EastBoundingCoordinate": 180.0,
            "NorthBoundingCoordinate": 89.99,
            "SouthBoundingCoordinate": -89.99
          }
        ]
      }
    }
  },
  "ProviderDates": [
    {
      "Date": "2020-07-24T09:15:21.849Z",
      "Type": "Insert"
    },
    {
      "Date": "2020-07-24T12:58:55.028Z",
      "Type": "Update"
    }
  ],
  "CollectionReference": {
    "ShortName": "MUR-JPL-L4-GLOB-v4.1",
    "Version": "4.1"
  },
  "RelatedUrls": [
    {
      "URL": "https://podaac-tools.jpl.nasa.gov/drive/files/allDa

The UMM-G record shown above provides a wealth of info about the new granule. Among the list of related web resources stored in the `RelatedUrls` section is a link for http access denoted by `"Type": "GET DATA"`.

Grab the download URL, but do it in a way that'll work for search results returning any number of granule records:

In [15]:
downloads = [r['umm']['RelatedUrls'][0]['URL'] for r in results['items']]
downloads

['https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2020/205/20200723090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc']

If a list of past records exists in the repo's resources directory (at [resources/nrt-granules.txt](resources/nrt-granules.txt), read it and update it by replacing the timestamp in the first line and appending the new URLs to the file. Otherwise, create the file for the first time.

In [16]:
urls_file = "resources/nrt-granules.txt"
urls_hist = [input_timestamp]

# If nrt-granules.txt exists, read the lines.
if isfile(urls_file):
    with open(urls_file, "r") as f:
        urls_hist = f.readlines()
        
        # Replace first row's timestamp with the new one.
        urls_hist[0] = input_timestamp

# Open file for writing and write merged list of URLs.
with open(urls_file, "w") as f:
    f.write("\n".join(urls_hist+downloads))

## Echo Tokens

A few user notes about this section and echo tokens in general:

* You need an Earthdata Login account to request a token. Register at [https://uat.urs.earthdata.nasa.gov](https://uat.urs.earthdata.nasa.gov).
* Tokens expire after 24 hours.

Some of the DAAC-facing capabilities of the Search API require an Echo Token. The Echo Token allows CMR to know who is making a request. Once the requester has the token, the token can be placed into the http header for the necessary API calls. The Subscription service has this requirement.

**Retrieve token by POSTing Earthdata credential as XML:**

The format of the XML data that we POST to CMR Search to retrieve a temporary Echo Token:

```xml
<token>
  <username>{USER}</username>
  <password>{PASSWORD}</password>
  <client_id>{CLIENT ID}</client_id>   
  <user_ip_address>{IP ADDRESS}</user_ip_address>
</token>
```

The XML data are POSTED to this service endpoint: [https://cmr.earthdata.nasa.gov/legacy-services/rest/tokens](https://cmr.earthdata.nasa.gov/legacy-services/rest/tokens). For example, if my XML are assigned to bash variable `$DATA`:

```shell
curl -X POST --header "Content-Type: application/xml" -d $DATA https://cmr.earthdata.nasa.gov/legacy-services/rest/tokens
```

Retrieve a new an echo token below:

In [17]:
# USER NOTE:
#
#  This function really isn't meant for use outside the notebook. 
#  There are simpler/safer approaches in the shell/Python interpreter.
#
from IPython.display import clear_output
from socket import gethostname, gethostbyname
from requests import post as POST
from getpass import getpass


def get_echo_token(client_ip: str=None, client_name: str=None):
    """Returns a temporary Echo-Token following URS authentication.

    Parameters
    ----------
    cmr_or_uat (bool, optional): Want a UAT token? Or a CMR token?
    client_name (str, optional): A name for the NRT client/app.
    client_ip (str, optional): The host's IP address.

    Returns
    -------
    str: a temporary Echo-Token (valid for 24 hours)

    """

    # Get client's IP if not given in 'client_ip' argument (local).
    ip = gethostbyname(gethostname()) if client_ip is None else client_ip
    
    # Use a fake client/application name if None was given.
    name = "PodaacTutorial" if client_name is None else client_name

    # Prompt user for URS credentials; POST as xml data.
    response = POST(

        # Format the end point for the ECHO-TOKENS.
        url=f"https://{cmr}/legacy-services/rest/tokens",

        # Data are in XML format.
        headers={'Content-Type': "application/xml"}, 

        # Prompt for credentials, format XML string for the data argument.
        data=("<token>"

            # Prompt user for their URS/Earthdata username:
            f"<username>{ input('Username: ') }</username>"

            # Prompt user for their password (Python stdlib 'getpass'):
            f"<password>{ getpass('Password: ') }</password>"

            # Provide a string identifier for client/application:
            f"<client_id>{ name }</client_id>"

            # Get the host's external IP address:
            f"<user_ip_address>{ ip }</user_ip_address>"

        "</token>")

    )

    # Clear prompts from output, return response text.
    clear_output()

    return response.text

Call the function, enter credentials, get token xml response:

In [18]:
xml_response = get_echo_token()

print(xml_response)

<?xml version="1.0" encoding="UTF-8"?>
<token>
  <id>CA7D3FE8-D029-09BB-361E-489DCE1FCC6E</id>
  <username>jmcnelis</username>
  <client_id>PodaacTutorial</client_id>
  <user_ip_address>192.168.1.189</user_ip_address>
</token>



Select just the ECHO token string by splitting the block of text at the newline symbol (`\n`) and slicing the third line in the resulting list:

In [19]:
try:
    token = xml_response.splitlines()[2][6:-5]
except IndexError as e:
    raise e
    
token

'CA7D3FE8-D029-09BB-361E-489DCE1FCC6E'