<font size="1">Copyright 2021, by the California Institute of Technology. ALL RIGHTS RESERVED. United States Government sponsorship acknowledged. Any commercial use must be negotiated with the Office of Technology Transfer at the California Institute of Technology.</font>
    
<font size="1">This software may be subject to U.S. export control laws and regulations. By accepting this document, the user agrees to comply with all applicable U.S. export laws and regulations. User has the responsibility to obtain export licenses, or other export authority as may be required, before exporting such information to foreign countries or providing access to foreign persons.<font>

# Datasets
This notebook will use the Pele API to query available datasets and download one from S3 based on the metadata returned by the query.

## Setup

This notebook assumes you've already gone through the first notebook and registered a user and password, as well sa populated your .netrc file. Let's go ahead and set things up so that we can utilize the Pele client library to query our datasets.

In [None]:
import os
import requests, json, getpass
from requests.auth import HTTPBasicAuth
import urllib3
from urllib.parse import urlparse

urllib3.disable_warnings()

# this block makes sure the directory set-up/change is only done once and relative to the notebook's directory
try:
    start_dir
except NameError:
    start_dir = os.getcwd()
    !mkdir -p ./notebook_output/L1_L_RSLC-Analysis
    os.chdir('notebook_output/L1_L_RSLC-Analysis')
    
# set the base url to interact with the goddess, Pele
#base_url = input("Enter Pele REST API base url (e.g. https://<mozart_ip>/pele/api/v0.1) then press <Enter>: ")
base_url = "https://172.31.29.154/pele/api/v0.1"
print("Using base url {}.".format(base_url))

Let's validate that we can interact with Pele:

In [None]:
from pele_client.client import PeleRequests

# instantiate PeleRequests object
pr = PeleRequests(base_url, verify=False, auth=False)

# now use like requests module (`request()`, `get()`, `head()`, `post()`, `put()`, `delete()`, `patch()`)
r = pr.get(base_url + '/test/echo', params={'echo_str': 'hello world'})

# expect 200
print("status code: {}".format(r.status_code))
print(json.dumps(r.json(), indent=2))
assert r.status_code == 200

## Querying

Let's see what datasets we have:

In [None]:
# get datasets
r = pr.get(base_url + '/pele/datasets')

# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))
assert r.status_code == 200
assert "L1_L_RSLC" in res['datasets']

Let's look at the `L1_L_RSLC` dataset type and query for dataset ids:

In [None]:
# query for all dataset IDs of the `L1_L_RSLC` dataset
r = pr.get(base_url + '/pele/dataset/L1_L_RSLC/dataset_ids')

# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))
assert r.status_code == 200

We have a number of `L1_L_RSLC` granules but the API returns only 10. Let's collect all of them by iterating over the paged results 10 at a time:

In [None]:
rslc_ids = res['dataset_ids']
while len(rslc_ids) != res['total']:
    r = pr.get(base_url + '/pele/dataset/L1_L_RSLC/dataset_ids', params={'offset': res['offset']+res['page_size']})
    #print(f"Response:\n{r}")
    res = r.json()
    #print(f"res: {res}")
    if 'dataset_ids' in res:
        rslc_ids.extend(res['dataset_ids'])
    else:
        break
    
# expect 200
print("All L1_L_RSLC ids: {}".format(rslc_ids))
print(len(rslc_ids))

Let's take a look at the metadata for one of those granules:

In [None]:
# query for metadata of a specific `L1_L_RSLC` dataset
r = pr.get(base_url + '/pele/dataset/NISAR_L1_PR_RSLC_007_147_D_144_2800_HH_20070101T053038_20070101T053045_D00200_P_F_001')

# expect 200
print("status code: {}".format(r.status_code))
res = r.json()
print(json.dumps(res, indent=2))

# get hdf5 file name
h5_file = res['result']['metadata']['FileName']
print(h5_file)

You essentially get the granule's entire JSON metadata. Let's pull the URLs so that we can download the granule:

In [None]:
# pull the urls
urls = res['result']['urls']
print("urls: {}".format(urls))

We want to utilize the S3 URL so that we can utilize the S3 API for faster downloads:

In [None]:
s3_url = None
for i in urls:
    if i.startswith('s3://'): s3_url = i
assert s3_url is not None

### Now let's download that dataset from S3, but before we can we need to populate the .aws/credentials file with the access key information. 

Use a terminal to execute aws-login:

    aws-login -pub -p default -r us-west-2
    

In [None]:
import os

# get the S3 url format that awscli requires
url = 's3://{}'.format(urlparse(s3_url).path[1:])
print(url)
local_dir = os.path.basename(url)
print (local_dir)
!aws s3 sync {url} {local_dir}

In [None]:
!pwd
!ls -al {local_dir}

In [None]:
!h5ls -r $local_dir/$h5_file

In [None]:
!gdalinfo HDF5:"$local_dir/$h5_file"://science/LSAR/RSLC/swaths/frequencyA/HH

### Next let's translate the SLC to the ENVI format to be read in by GDAL and visualized by 

Use a terminal to execute aws-login:

    aws-login -pub -p default -r us-west-2

In [None]:
!gdal_translate -of ENVI HDF5:"$local_dir/$h5_file"://science/LSAR/RSLC/swaths/frequencyA/HH HH.slc

In [None]:
!gdalinfo HH.slc

In [None]:
import numpy as np
from osgeo import gdal
import matplotlib.pyplot as plt

ds = gdal.Open("HH.slc", gdal.GA_ReadOnly)

# extract a subset of the SLC to display
x0 = 0
y0 = 10
x_offset = 1000
y_offset = 1000
#x_offset = 500
#y_offset = 500

#slc = ds.GetRasterBand(1).ReadAsArray()           
slc = ds.GetRasterBand(1).ReadAsArray(x0, y0, x_offset, y_offset)
#print(slc)
ds = None

fig = plt.figure(figsize=(20, 30))
#fig = plt.figure(figsize=(14, 12))

# display amplitude of the slc
ax = fig.add_subplot(2,1,1)
ax.imshow(np.abs(slc), vmin = -2, vmax=2, cmap='gray')
ax.set_title("amplitude")

#display phase of the slc
ax = fig.add_subplot(2,1,2)
ax.imshow(np.angle(slc))
ax.set_title("phase")

plt.show()

slc = None

<font size="1">This notebook is compatible with NISAR Jupyter Server Stack v1.4 and above</font>