<font size="1">Copyright 2021, by the California Institute of Technology. ALL RIGHTS RESERVED. United States Government sponsorship acknowledged. Any commercial use must be negotiated with the Office of Technology Transfer at the California Institute of Technology.</font>
    
<font size="1">This software may be subject to U.S. export control laws and regulations. By accepting this document, the user agrees to comply with all applicable U.S. export laws and regulations. User has the responsibility to obtain export licenses, or other export authority as may be required, before exporting such information to foreign countries or providing access to foreign persons.<font>

In [2]:
!echo $AWS_ROLE_ARN
!echo $AWS_WEB_IDENTITY_TOKEN_FILE

arn:aws:iam::788584227175:role/am-eks-notebook-role
/var/run/secrets/eks.amazonaws.com/serviceaccount/token


# Datasets - geospatial and time filtered queries
This notebook will use the Pele API to issue a geospatial query to find available datasets and download one from S3 based on the metadata returned by the query.

#### Kernel: isce, plant or mintpy

## Setup

This notebook assumes you've already gone through the first notebook and registered a user and password, as well sa populated your .netrc file. Let's go ahead and set things up so that we can utilize the Pele client library to query our datasets.

In [None]:
import os
import requests, json, getpass
from requests.auth import HTTPBasicAuth
import urllib3
from urllib.parse import urlparse

urllib3.disable_warnings()

# this block makes sure the directory set-up/change is only done once and relative to the notebook's directory
try:
    start_dir
except NameError:
    start_dir = os.getcwd()
    !mkdir -p ./notebook_output/02-Datasets-geospatial
    os.chdir('notebook_output/02-Datasets-geospatial')
    
# set the base url to interact with the goddess, Pele
base_url = input("Enter Pele REST API base url (e.g. https://<mozart_ip>/pele/api/v0.1) then press <Enter>: ")
print("Using base url {}.".format(base_url))

Let's validate that we can interact with Pele:

In [None]:
from pele_client.client import PeleRequests

# instantiate PeleRequests object
pr = PeleRequests(base_url, verify=False)

# now use like requests module (`request()`, `get()`, `head()`, `post()`, `put()`, `delete()`, `patch()`)
r = pr.get(base_url + '/test/echo', params={'echo_str': 'hello world'})

# expect 200
print("status code: {}".format(r.status_code))
print(json.dumps(r.json(), indent=2))
assert r.status_code == 200

## Querying the dataset types

Let's see what datasets we have. Note that the page_size parameter is passed to increase the size of the result set beyond the default of 10.

In [None]:
# get datasets
r = pr.get(base_url + '/pele/datasets', params={'page_size' : 50})

# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))
assert r.status_code == 200

#### Here we list the available L2_L_GSLC datasets, just for reference

In [None]:
# query for all dataset IDs of the `L2_L_GSLC` dataset
r = pr.get(base_url + '/pele/dataset/L2_L_GSLC/dataset_ids')
           
# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))
assert r.status_code == 200

datasets = res['dataset_ids']

## Refine the search

FYI, the target datasets cover a lat/lon polygon of:

```[[[-118,34],[-117,34],[-117,35.5],[-118,35.5],[-118,34]]]```

The search polygon defined below partially overlaps this. 

We will also add start and end time parameters to further refine the results. These 'time' parameters can have any of the following formats:

```YYYY-MM-DDTHH:MI:SSZ```

```YYYY-MM-DD``` (the time portion is considered all zero, i.e. midnight)

```nnnnnnnnnnnnn``` (milliseconds since epoch)

For start_time, the dataset start_time must be greater than or equal to the search value. For end_time, the dataset end_time must be less than the search value.

*polygon*, *start_time* and *end_time* are all optional.

In [None]:
search_poly=[[[-118,34],[-117,34],[-117,35.5],[-118,35.5],[-118,34]]]
search_start_time = '2008-02-18'
search_end_time = '2008-02-18T23:59:59Z'

## Submit the query

The polygon and time parameters are passed in the json of an HTTP post - we should end up with one result dataset.

In [None]:
# query for the dataset ID of the qualifying `L2_L_GSLC` dataset(s)
r = pr.post(base_url + '/pele/dataset/L2_L_GSLC/dataset_ids', json = { 'polygon' : search_poly, 'start_time' : search_start_time, 'end_time' : search_end_time })
           
# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))
assert r.status_code == 200

datasets = res['dataset_ids']

Let's take a look at the metadata for the datasets:

In [None]:
r = pr.get(base_url + '/pele/dataset/{}'.format(datasets[0]))

# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))

This pulls the granule's entire JSON metadata. Let's focus on the URLs so that we can download the granule:

In [None]:
# pull the urls
urls = res['result']['urls']
print("urls: {}".format(urls))

We want to utilize the S3 URL so that we can utilize the S3 API for faster downloads:

In [None]:
s3_url = None
for i in urls:
    if i.startswith('s3://'): s3_url = i
assert s3_url is not None

### Now let's download that dataset from S3, but before we can we need to populate the .aws/credentials file with the access key information. 

Use a terminal to execute aws-login:

    aws-login -pub -p default -r us-west-2
    

In [None]:
import os

# get the S3 url format that awscli requires
url = 's3://{}'.format(urlparse(s3_url).path[1:])
print(url)
local_dir = os.path.basename(url)
print (local_dir)
!aws s3 sync {url} {local_dir}

In [None]:
!pwd
!ls -al {local_dir}

<font size="1">This notebook is compatible with NISAR Jupyter Server Stack v1.7.1 and above</font>