# Harmony Service Chaining: PI 20.4 Demo

In PI 20.4, Harmony add service chaining capabilities to support requests that require functionality beyond that of a single service.
This notebook provides a basic workflow to demonstrate service chaining. For more a general introduction and tutorial, see [Harmony API Introduction](./Harmony%20Api%20Introduction.ipynb).  Useful helpers for making the calls found in this notebook can be found under the [docs/notebook-helpers](./notebook-helpers) folder.

## Prerequisites

1. Install Python 3. This notebook is tested to work in 3.8 but should work in most recent 3.x versions.
2. Install Jupyter: pip install jupyterlab
3. Setup your ~/.netrc for Earthdata Login as described in Harmony API Introduction

## Set Up AWS

Once you have Zarr links, you can access them with your AWS credentials to the Harmony account.  Obtain the credentials and make sure your default AWS account uses them.  One way to do this is to edit `~/.aws/credentials` to have the following section:
```
[default]
aws_access_key_id = YOUR_HARMONY_ACCESS_KEY_ID
aws_secret_access_key = YOUR_HARMONY_SECRET_ACCESS_KEY
```
Restart your Jupyter kernel after completing this step

## Setup imports and Earthdata Login

We need to set up general-purpose imports and authentication

In [None]:
%load_ext autoreload
%autoreload
%matplotlib inline
import sys
# Install dependencies into the Jupyter Kernel
!{sys.executable} -m pip install -q -r notebook_helpers/requirements.txt
!{sys.executable} -m pip install rasterio boto3 s3fs zarr

# Import libraries used throughout the notebook
from urllib import request, parse
from http.cookiejar import CookieJar
import getpass
import netrc
import os
import requests
import json
import pprint
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import rasterio
from rasterio.plot import show
import numpy as np
import os
import time
from notebook_helpers import get, post, show, get_data_urls, show_async, show_async_condensed, show_shape, print_async_status, check_bbox_subset, check_stac

In [None]:
def setup_earthdata_login_auth(endpoint):
    """
    Set up the request library so that it authenticates against the given Earthdata Login
    endpoint and is able to track cookies between requests.  This looks in the .netrc file 
    first and if no credentials are found, it prompts for them.

    Valid endpoints include:
        uat.urs.earthdata.nasa.gov - Earthdata Login UAT (Harmony's current default)
        urs.earthdata.nasa.gov - Earthdata Login production
    """
    try:
        username, _, password = netrc.netrc().authenticators(endpoint)
    except (FileNotFoundError, TypeError):
        # FileNotFound = There's no .netrc file
        # TypeError = The endpoint isn't in the netrc file, causing the above to try unpacking None
        print('Please provide your Earthdata Login credentials to allow data access')
        print('Your credentials will only be passed to %s and will not be exposed in Jupyter' % (endpoint))
        username = input('Username:')
        password = getpass.getpass()

    manager = request.HTTPPasswordMgrWithDefaultRealm()
    manager.add_password(None, endpoint, username, password)
    auth = request.HTTPBasicAuthHandler(manager)

    jar = CookieJar()
    processor = request.HTTPCookieProcessor(jar)
    opener = request.build_opener(auth, processor)
    request.install_opener(opener)

In [None]:
setup_earthdata_login_auth('uat.urs.earthdata.nasa.gov')

## Chained Services - PODAAC L1 Subsetter -> Harmony NetCDF to Zarr

This request asks for variable subsetting of L1 data with output in the Zarr format. This requires chaining two services together, the PODAAC L1 Subsetter and the Harmony NetCDF to Zarr service.

In [None]:
harmony_collection_id = 'C1234208438-POCLOUD'
coverages_root = 'https://harmony.sit.earthdata.nasa.gov/{collection}/ogc-api-coverages/1.0.0/collections/{variable}/coverage/rangeset'


### Variable and spatial subsetting with reformtatting output to Zarr and spatial constraints

In [None]:
harmony_root = 'https://harmony.sit.earthdata.nasa.gov'
asyncConfig = {
    'collection_id': harmony_collection_id,
    'ogc-api-coverages_version': '1.0.0',
    'variable': 'mean_sea_surface',
    'maxResults': '2',
    'format': 'application/x-zarr'
}

async_url = harmony_root+'/{collection_id}/ogc-api-coverages/{ogc-api-coverages_version}/collections/{variable}/coverage/rangeset?subset=lon(-160%3A-160)&subset=lat(-80%3A80)&maxResults={maxResults}&format={format}'.format(**asyncConfig)
print('Request URL', async_url)
async_response = request.urlopen(async_url)
async_results = async_response.read()
async_json = json.loads(async_results)
pprint.pprint(async_json)


### Wait for results

In [None]:
job_url = harmony_root + '/jobs/' + async_json['jobID']

#Continue loop while request is still processing
while True:
    loop_response = request.urlopen(job_url)
    loop_results = loop_response.read()
    job_json = json.loads(loop_results)
    if job_json['status'] != 'running':
        break
    print('Job status is running. Progress is ', job_json['progress'], '%. Trying again.')
    time.sleep(5)

links = []
if job_json['status'] == 'successful' and job_json['progress'] == 100:
    print('Job progress is 100%. Output links printed below:')
    links = [link['href'] for link in job_json['links'] if link.get('rel', 'data') == 'data']
    print('\n'.join(links))

### Open the Zarr file

In [None]:
import s3fs
import zarr

# older versions of s3fs
# fs = s3fs.S3FileSystem(region_name='us-west-2')

# import botocore
# client_session = botocore.session.Session(profile='NON-DEFAULT-PROFILE')
# fs = s3fs.S3FileSystem(session=client_session, client_kwargs={'region_name':'us-west-2'})

fs = s3fs.S3FileSystem(client_kwargs={'region_name':'us-west-2'})

store = fs.get_mapper(root=links[0], check=False)
zarr_file = zarr.open(store)

### Explore the Zarr file

In [None]:
print(zarr_file.tree())

In [None]:
plt.plot(zarr_file['mean_sea_surface']);