## HarmonyPy Introduction

This notebook demonstrates several basic examples highlighting how to query and access customized data outputs from NASA Earthdata Harmony. See https://harmony-py.readthedocs.io/ for detailed documentation on HarmonyPy.

### Import packages

First we import packages needed to request and visualize our data, as well as the `harmony-py` library itself. Make sure to install `harmony-py` its dependencies into your current Python environment prior to executing the notebook:

```  
$ pip install -U harmony-py 

```


In [None]:
import sys; sys.path.append('..')
import datetime as dt
from IPython.display import display, JSON
import rasterio
import rasterio.plot

from harmony import BBox, Client, Collection, Request

### Quick start

You can request data using `harmony-py` in just a few easy lines. Although more advanced subsetting and transformation options may be supported on your data product of interest, this example below demonstrates a basic spatial bounding box and temporal range request: 

```
harmony_client = Client(auth=('EDL_username', 'EDL_password'))
request = Request(
    collection=Collection(id=dataset_short_name),
    spatial=BBox(w, s, e, n),
    temporal={
        'start': dt.datetime(yyyy, mm, dd),
        'stop': dt.datetime(yyyy, mm, dd)
    }
)
job_id = harmony_client.submit(request)
results = harmony_client.download_all(job_id, directory='/tmp', overwrite=True)
```

The guidance below offers more detailed examples highlighting many of the helpful features provided by the Harmony Py library.

### Create Harmony Client object

First, you will need to instantiate your Harmony Client, which is what you will interact with to submit and inspect a data request to Harmony, as well as to retrieve your results. 

When creating the Client, you need to provide your [Earthdata Login](https://urs.earthdata.nasa.gov) credentials, which are required to access data from NASA EOSDIS. There are three options for providing your Earthdata Login username and password: 

1. Provide your username and password directly when creating the client:
```
harmony_client = Client(auth=('captainmarvel', 'marve10u5'))
```

2. Set your credentials using environment variables

```
$ export EDL_USERNAME='captainmarvel'
$ export EDL_PASSWORD='marve10u5'
```

3. Use a .netrc file:

Create a .netrc file in your home directory, using the example below

```
machine urs.earthdata.nasa.gov
login captainmarvel
password marve10u5
```

In [None]:
# env: harmony.config.Environment
# class harmony.harmony.Client(*, auth: Optional[Tuple[str, str]] = None, should_validate_auth: bool = True, env: harmony.config.Environment = <Environment.UAT: 3>)¶
# harmony_client = Client(env=Config.Environment.PROD) # assumes .netrc usage
harmony_client = Client() # assumes .netrc usage


### Create Harmony Request


Key parameters:

* collection: The CMR collection that should be queried
    * Short name or CMR collection ID
* spatial: Bounding box spatial constraints on the data
* temporal: Date/time constraints on the data

Other parameters that may be of interest (note that reformatting or advanced projection options may not be available for your requested dataset) 

* crs: reproject the output coverage to the given CRS. Recognizes CRS types that can be
inferred by gdal, including EPSG codes, Proj4 strings, and OGC URLs (http://www.opengis.net/def/crs/…)
* interpolation: specify the interpolation method used during reprojection and scaling
* scale_extent: scale the resulting coverage either among one axis to a given extent
* scale_size: scale the resulting coverage either among one axis to a given size
* granule_id: The CMR Granule ID for the granule which should be retrieved
* width: number of columns to return in the output coverage
* height: number of rows to return in the output coverage
* format: the output mime type to return
* max_results: limits the number of input granules processed in the request

In [None]:
request = Request(
#    collection=Collection(id='C1234088182-EEDTEST'),
    collection=Collection(id='harmony_example'),
#    collection=Collection(id='SENTINEL-1_INTERFEROGRAMS'),
    spatial=BBox(-165, 52, -140, 77),
    temporal={
        'start': dt.datetime(2010, 1, 1),
        'stop': dt.datetime(2020, 12, 30)
    },
    variables=['blue_var'],
    max_results=1,
)

### Check Request validity

In [None]:
print(f"Request valid? {request.is_valid()}")
for m in request.error_messages():
    print(" * " + m)

### Submit Request

In [None]:
job_id = harmony_client.submit(request)
job_id

### Check Request status

* Let's see how it's going. This will show the percentage complete in the progress field. (We use the JSON helper function to show the results in a nicer-to-look-at format).

* We can check on the progress of a processing job with 'status()'.
* This method blocks while communicating with the server but returns quickly.

In [None]:
JSON(harmony_client.status(job_id))

* 'wait_for_processing()'
* Optionally shows progress bar.
* Blocking.

In [None]:
harmony_client.wait_for_processing(job_id, show_progress=True)

### View Harmony job response and output URLs

* 'result_json()' calls 'wait_for_processing()' and returns the complete job json once processing is complete.
* Optionally shows progress bar.
* Blocking.

In [None]:
data = harmony_client.result_json(job_id)
JSON(data)

### Retrieve Harmony output files

* Describe how data are returned as HTTPS URLs, and optionally provided as a STAC catalog with S3 URLs

#### First, HTTPS URL inspection and retrieval:

* 'result_urls()' calls 'wait_for_processing()' and returns the job's data urls once processing is complete.
* Optionally shows progress bar.
* Blocking.


In [None]:
urls = harmony_client.result_urls(job_id, show_progress=True)
urls

* 'download_all()' downloads all data urls and returns immediately with a list of concurrent.futures.
* Optionally shows progress bar for processing only.
* Non-blocking during download but blocking while waitinig for job processing to finish.
* Call 'result()' on future objects to realize them. A call to 'result()' blocks until that particular future finishes downloading. Other futures will download in the background, in parallel, up to the number of workers assigned to the thread pool (thread pool not publicly available).
* Downloading on any unfinished futures can be cancelled early.
* When downloading is complete the futures will return the file path string of the file that was just downloaded. This file path can then be fed into other libraries that may read the data files and perform other operations.

In [None]:
futures = harmony_client.download_all(job_id)
file_names = [f.result() for f in futures]
file_names

* 'download()' will download only the url specified, in case a person would like more control over individual files.
* Returns a future containing the file path string of the file downloaded.
* Blocking upon calling result()

In [None]:
file_name = (harmony_client.download(urls[0], overwrite=True)).result()
file_name

#### Now, STAC inspection and s3 retrieval:

As a user in the cloud, I want to access those data seamlessly using the AWS credentials provided from the Harmony job response

Coming soon: As a user, I want the Harmony library to provide me with the appropriate output URL depending on whether I'm working within or outside of AWS us-west-2

In [None]:
stac_catalog_url = harmony_client.stac_catalog_url(job_id, show_progress=True)
stac_catalog_url

Following the directions for PySTAC (https://pystac.readthedocs.io/en/latest/quickstart.html), we can hook our harmony-py client into STAC_IO.

In [None]:
from urllib.parse import urlparse
import requests
from pystac import STAC_IO

def requests_read_method(uri):
    parsed = urlparse(uri)
    if parsed.hostname.startswith('harmony.'):
        return harmony_client.read_text(uri)
    else:
        return STAC_IO.default_read_text_method(uri)

STAC_IO.read_text_method = requests_read_method

In [None]:
from pystac import Catalog

cat = Catalog.from_file(stac_catalog_url)

print(cat.title)
for item in cat.get_all_items():
    print(item.datetime, [asset.href for asset in item.assets.values()])

#### Alternatively, we can use intake-stac:

In [None]:
!{sys.executable} -m pip install intake-stac # if you don't already have intake-stac
import intake

In [None]:
# stac_cat = intake.open_stac_catalog(stac_root.format(jobID=job,item=''),name='Harmony output')
cat = intake.open_stac_catalog(stac_catalog_url)
display(list(cat))

In [None]:
entries = []
for id, entry in cat.search('type').items():
    display(entry)
    entries.append(entry)

In [None]:
da = cat[list(cat)[0]][entries[0].describe()['name']].to_dask()
da

#### AWS credential retrieval

In [None]:
creds = harmony_client.aws_credentials()
creds

In [None]:
#
# NOTE: Execution of this cell will only succeed within the AWS us-west-2 region. 
#

import boto3

s3 = boto3.client('s3', **creds)
uri = 's3://harmony-uat-staging/public/harmony/netcdf-to-zarr/817a3e99-d53c-4169-b9f1-82cc947793be/2020_01_01_7f00ff_global.zarr'
bucket_name = uri.split('/')[2]
object_name = '/'.join(uri.split('/')[3:])
file_name = uri.split('/')[-1]

with open(file_name, 'wb') as f:
    # should return a 403 Forbidden if run outside of us-west-2
    s3.download_fileobj(bucket_name, object_name, f)



#### test examples

In [None]:
harmony_client = Client()
request = Request(
    collection=Collection(id='C1234088182-EEDTEST'),
#    collection=Collection(id='SENTINEL-1_INTERFEROGRAMS'),
    spatial=BBox(-165, 52, -140, 77),
    temporal={
        'start': dt.datetime(2010, 1, 1),
        'stop': dt.datetime(2020, 12, 30)
    },
    variables=['blue_var'],
    max_results=1,
)
job_id = harmony_client.submit(request)
results = harmony_client.download_all(job_id, directory='/tmp', overwrite=True)
for r in results:
    rasterio.plot.show(rasterio.open(r.result()))