# Harmony Zarr Reformatter

Harmony provides a service that can reformat NetCDF files as [Zarr](https://zarr.readthedocs.io/en/stable/) files accessible via the S3 API.  The results require access to the Harmony AWS accounts using S3 credentials, so the service is not yet widely usable.

This notebook is a continuation of the [Harmony API Introduction](./Harmony%20Api%20Introduction.ipynb) and assumes familiarity with Harmony concepts and asynchronous processing in particular.

## Set Up AWS

Now that you have Zarr links, you can access them with your AWS credentials to the Harmony account.  Obtain the credentials and make sure your default AWS account uses them.  One way to do this is to edit `~/.aws/credentials` to have the following section:
```
[default]
aws_access_key_id = YOUR_HARMONY_ACCESS_KEY_ID
aws_secret_access_key = YOUR_HARMONY_SECRET_ACCESS_KEY
```
Restart your Jupyter kernel after completing this step

## Setup imports and Earthdata Login

As with the prior notebook, we need to set up general-purpose imports and authentication

In [None]:
# Install prerequisite packages
import sys
!{sys.executable} -m pip install rasterio GDAL matplotlib s3fs zarr

In [None]:
from urllib import request, parse
from http.cookiejar import CookieJar
import getpass
import netrc
import os
import requests
import json
import pprint
from osgeo import gdal
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import rasterio
from rasterio.plot import show
import numpy as np
import os
import time
%matplotlib inline

In [None]:
def setup_earthdata_login_auth(endpoint):
    """
    Set up the request library so that it authenticates against the given Earthdata Login
    endpoint and is able to track cookies between requests.  This looks in the .netrc file 
    first and if no credentials are found, it prompts for them.

    Valid endpoints include:
        uat.urs.earthdata.nasa.gov - Earthdata Login UAT (Harmony's current default)
        urs.earthdata.nasa.gov - Earthdata Login production
    """
    try:
        username, _, password = netrc.netrc().authenticators(endpoint)
    except (FileNotFoundError, TypeError):
        # FileNotFound = There's no .netrc file
        # TypeError = The endpoint isn't in the netrc file, causing the above to try unpacking None
        print('Please provide your Earthdata Login credentials to allow data access')
        print('Your credentials will only be passed to %s and will not be exposed in Jupyter' % (endpoint))
        username = input('Username:')
        password = getpass.getpass()

    manager = request.HTTPPasswordMgrWithDefaultRealm()
    manager.add_password(None, endpoint, username, password)
    auth = request.HTTPBasicAuthHandler(manager)

    jar = CookieJar()
    processor = request.HTTPCookieProcessor(jar)
    opener = request.build_opener(auth, processor)
    request.install_opener(opener)

In [None]:
setup_earthdata_login_auth('uat.urs.earthdata.nasa.gov')

## Access files as zarr

All zarr reformatting requests produce asynchronous results that point to s3 locations in the Harmony account.

To request zarr, pass `format=application/x-zarr` as a paramter the coverages service.  The following fetches data from a test collection as zarr.

In [None]:
harmony_root = 'https://harmony.uat.earthdata.nasa.gov'
harmony_collection_id='C1233860183-EEDTEST'
asyncConfig = {
    'collection_id': harmony_collection_id,
    'ogc-api-coverages_version': '1.0.0',
    'variable': 'all',
    'format': 'application/x-zarr',
    'granuleId': 'G1233860471-EEDTEST' # CMR ID for a single example file
}

async_url = harmony_root+'/{collection_id}/ogc-api-coverages/{ogc-api-coverages_version}/collections/{variable}/coverage/rangeset?granuleId={granuleId}&format={format}'.format(**asyncConfig)
print('Request URL', async_url)
async_response = request.urlopen(async_url)
async_results = async_response.read()
async_json = json.loads(async_results)
pprint.pprint(async_json)

Wait for the job to finish using the loop we demonstrated in the API Introduction notebook

In [None]:
job_url = harmony_root + '/jobs/' + async_json['jobID']

#Continue loop while request is still processing
while True:
    loop_response = request.urlopen(job_url)
    loop_results = loop_response.read()
    job_json = json.loads(loop_results)
    if job_json['status'] != 'running':
        break
    print('Job status is running. Progress is ', job_json['progress'], '%. Trying again.')
    time.sleep(5)

links = []
if job_json['status'] == 'successful' and job_json['progress'] == 100:
    print('Job progress is 100%. Output links printed below:')
    links = [link['href'] for link in job_json['links'] if link.get('rel', 'data') == 'data']
    print('\n'.join(links))

## Open and explore the Zarr file

In [None]:
import s3fs
import zarr

# older versions of s3fs
# fs = s3fs.S3FileSystem(region_name='us-west-2')

# import botocore
# client_session = botocore.session.Session(profile='NON-DEFAULT-PROFILE')
# fs = s3fs.S3FileSystem(session=client_session, client_kwargs={'region_name':'us-west-2'})

fs = s3fs.S3FileSystem(client_kwargs={'region_name':'us-west-2'})

store = fs.get_mapper(root=links[0], check=False)
zarr_file = zarr.open(store)

Explore the contents of the Zarr file

In [None]:
print(zarr_file.tree())

In [None]:
plt.imshow(zarr_file['green_var'][0], cmap='Greens');