## Harmony Py Library

### Basic Workflow Example

This notebook shows three basic examples of Harmony jobs, each using a Harmony test Collection. The first example requests a spatial subset of Alaska, the second a temporal subset (a single-month timespan), and the third shows a combination of both spatial and temporal subsetting.

First, we import a helper module for the notebook, but then import the Harmony Py classes we need to make a request.

In [None]:
# Install notebook requirements
import sys
import helper
# Install the project and 'examples' dependencies
helper.install_project_and_dependencies('..', libs=['examples'])

In [None]:
import datetime as dt
from harmony import BBox, Client, Collection, Request, Environment

First let's prompt for your CMR credentials (UAT). Your credentials are stored without needing to hit enter in either field.

In [None]:
username = helper.Text(placeholder='captainmarvel', description='Username')
helper.display(username)
password = helper.Password(placeholder='Password', description='Password')
helper.display(password)

Now we create a Harmony Client object, passing in the `auth` tuple containing the username and password entered above.

In [None]:
harmony_client = Client(auth=(username.value, password.value), env=Environment.UAT)

Next, we create a Collection object with the CMR collection id for our test collection. We then create a Request which specifies the collection, and a `spatial` `BBox` describing the bounding box for the area we're interested in. We'll see later in the notebook how to make sure the request we have is valid.

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-165, 52, -140, 77),
    format='image/tiff'
)

Now that we have a request, we can submit it to Harmony using the Harmony Client object we created earlier. We'll get back a job id belonging to our Harmony request.

In [None]:
job1_id = harmony_client.submit(request)

If we want to, we can retrieve the job's status which includes information about the processing Harmony job.

In [None]:
helper.JSON(harmony_client.status(job1_id))

There are a number of options available for downloading results. We'll start with the 'download_all()' method which uses a multithreaded downloader and quickly returns with a "future" (specifically a python conccurrent.futures future).

If you're unfamiliar with futures, at their most basic level they represent an eventual value. In our case, once a file is downloaded its future will contain the name of the local file. We can then hand the name off to other functions which open files based on their name to perform further operations. Work performed on behalf of each future takes place in a "thread pool" created for each Client instantiation.

To extract the eventual value of a future, call its 'result()' method. By using futures we can process downloaded files as soon as they're ready while the rest of the files are still downloading in the background. Because of how we're working with the futures, the order of our results are maintained even though the files will likely be downloaded out of order.

In [None]:
print(f'\nHarmony job ID: {job1_id}')

print('\nWaiting for the job to finish')
results = harmony_client.result_json(job1_id, show_progress=True)

print('\nDownloading results:')
futures = harmony_client.download_all(job1_id)

for f in futures:
    print(f.result())  # f.result() is a filename, in this case

print('\nDone downloading.')

Now using our helper module, we can view the files. Note that we're calling download_all() again here. Because the overwrite option is set to False (the default value), the method will see each of the files are already downloaded and will not do so again. It'll return quickly because it avoids the unnecessary work.

In [None]:
futures = harmony_client.download_all(job1_id, overwrite=False)
filenames = [f.result() for f in futures]

for filename in filenames:
    helper.show_result(filename)

Now we show a Harmony request for a temporal range: one month in 2020. As before, we create a Request, and submit it with the same Harmony Client we used above.

In [None]:
request = Request(
    collection=collection,
    temporal={
        'start': dt.datetime(2020, 6, 1),
        'stop': dt.datetime(2020, 6, 30)
    },
    format='image/tiff'
)

job2_id = harmony_client.submit(request)

With our second request, we've chosen to call 'wait_for_processing()'. This is optional as the other results oriented methods like downloading will implicitly wait for processing but this method can provide visual feedback to let us know if Harmony is still working on our submitted job.

In [None]:
harmony_client.wait_for_processing(job2_id, show_progress=True)

for filename in [f.result() for f in harmony_client.download_all(job2_id)]:
    helper.show_result(filename)

Finally, we show a Harmony request for both a spatial and temporal range. We create the Request and simply specify both a `spatial` bounds and a `temporal` range, submitting it with the Harmony Client.

In [None]:
request = Request(
    collection=collection,
    spatial=BBox(-165, 52, -140, 77),
    temporal={
        'start': dt.datetime(2010, 1, 1),
        'stop': dt.datetime(2020, 12, 30)
    },
    format='image/tiff'
)

job3_id = harmony_client.submit(request)

In [None]:
for filename in [f.result() for f in harmony_client.download_all(job3_id)]:
    helper.show_result(filename)

If we're just interested in the json Harmony produces we can retrieve that also.

In [None]:
helper.JSON(harmony_client.result_json(job3_id))

Now that we know how to make a request, let's investigate how the Harmony Py library can help us make sure we have a valid request. Recall that we used the Harmony `BBox` type to provide a spatial constraint in our request. If we investigate its help text, we see that we create a `BBox` by providing the western, southern, eastern, and northern latitude/longitude bounds for a bounding box.

In [None]:
help(BBox)

Now let's create an invalid bounding box by specifying a longitude less than -180 and a northern latitude less than its southern bounds:

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-183, 40, 10, 30),
    format='image/tiff'
)

print(f'Request valid? {request.is_valid()}')

for m in request.error_messages():
    print(f' * {m}')

Similarly, we can see errors in the temporal parameter:

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    temporal={
        'start': dt.datetime(2020, 12, 30),
        'stop': dt.datetime(2010, 1, 1)
    },
    format='image/tiff'
)

print(f'Request valid? {request.is_valid()}')

for m in request.error_messages():
    print(f' * {m}')

So before submitting a Harmony Request, you can test your request to see if it's valid and how to fix it if not:

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-183, 40, 10, 30),
    temporal={
        'start': dt.datetime(2020, 12, 30),
        'stop': dt.datetime(2010, 1, 1)
    },
    format='image/tiff'
)

print(f'Request valid? {request.is_valid()}')

for m in request.error_messages():
    print(f' * {m}')

If we don't validate the request first, Harmony Py will validate it automatically and raise an exception with a message indicating the errors that need to be fixed:

In [None]:
try:
    harmony_client.submit(request)
except Exception as e:
    print('Harmony Py raised an exception:\n')
    print(e)

Now let's look at some examples of some of the other parameters that you can use when submitting a Harmony request:

First, let's start by specifying a couple ways to limit how many granules of data we're interested in. When creating the Request, you can add the `max_results` argument. This is useful if we eventually want to run a bigger request, but we're experimenting and would like to get some sample results first:

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-10, 0, 10, 10),
    temporal={
        'start': dt.datetime(2021, 1, 1),
        'stop': dt.datetime(2021, 1, 10)
    },
    max_results=2,
    format='image/tiff'
)
request.is_valid()

Or maybe you'd like to operate on some specific granules. In that case, passing the `granule_id` argument allows you to list the granule IDs (one or more) to operate upon. Let's try this in combination with another parameter: `crs`, the coordinate reference system we'd like to reproject our results into. In addition we show other options which specify what output format we'd like, the resulting image height and width.

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-140, 20, -50, 60),
    granule_id=['G1234088196-EEDTEST'],
    crs='EPSG:3995',
    format='image/tiff',
    height=400,
    width=900
)
request.is_valid()

In [None]:
job_id = harmony_client.submit(request)

for filename in [f.result() for f in harmony_client.download_all(job_id)]:
    helper.show_result(filename)

Now we'll craft the same request, but this time instead of getting all the variables in the granule--the default--we'll select just the red, green, and blue variables.

In [None]:
collection = Collection(id='C1234088182-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-140, 20, -50, 60),
    granule_id=['G1234088196-EEDTEST'],
    crs='EPSG:3995',
    format='image/tiff',
    height=400,
    width=900,
    variables=['red_var', 'green_var', 'blue_var']
)
request.is_valid()

In [None]:
job_id = harmony_client.submit(request)

for filename in [f.result() for f in harmony_client.download_all(job_id)]:
    helper.show_result(filename)

We can also use the `granule_name` parameter to to select (one or more) granules. This corresponds to the CMR `readable_granule_name` parameter and matches either the granule ur or the producer granule id.

In [None]:
collection = Collection(id='C1233800302-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-140, 20, -50, 60),
    granule_name=['001_00_7f00ff_global'],
    crs='EPSG:3995',
    format='image/tiff',
    height=400,
    width=900,
    variables=['red_var', 'green_var', 'blue_var']
)
request.is_valid()

In [None]:
job_id = harmony_client.submit(request)

for filename in [f.result() for f in harmony_client.download_all(job_id)]:
    helper.show_result(filename)

We can pass multiple values to `granule_name` or use wildcards `*` (multi character match) or `?` (single character match).

In [None]:
collection = Collection(id='C1233800302-EEDTEST')

request = Request(
    collection=collection,
    spatial=BBox(-180, -90, 180, 90),
    granule_name=['001_08*', '001_05_7f00ff_?ustralia'],
    crs='EPSG:3995',
    format='image/tiff',
    height=400,
    width=900,
    variables=['red_var', 'green_var', 'blue_var']
)
request.is_valid()

In [None]:
job_id = harmony_client.submit(request)

for filename in [f.result() for f in harmony_client.download_all(job_id)]:
    helper.show_result(filename)