# NetCDF to Zarr (N2Z) regression tests

This Jupyter notebook runs a suite of regression tests against a sample collection for the NetCDF-to-Zarr service (N2Z). The *GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V06 ([GPM_3IMERGHH](https://search.uat.earthdata.nasa.gov/search?q=C1245618475-EEDTEST)) at GES DISC* is chosen for its size ~10Mb per granule. 


## Set the Harmony environment:

The cell below sets the `harmony_host_url` to one of the following valid values:

* Production: <https://harmony.earthdata.nasa.gov>
* UAT: <https://harmony.uat.earthdata.nasa.gov>
* SIT: <https://harmony.sit.earthdata.nasa.gov>
* Local: <http://localhost:3000>

The default value is for the UAT environment. When using this notebook there are two ways to use the non-default environment:

* Run this notebook in a local Jupyter notebook server and change the value of `harmony_host_url` in the cell below to the value for the environment you require from the above list.
* Use the `run_notebooks.sh` script, which requires you to declare an environment variable `HARMONY_HOST_URL`. Set that environment variable to the value above that corresponds to the environment you want to test. That environment variable will take precedence over the default value in the cell below.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Import required packages:

In [None]:
from harmony import Collection, Environment, Client, Request
from utility import assert_result_has_correct_number_of_stores, print_success

### Set up environment dependent variables:

This includes the Harmony `Client` object and `Collection` objects for each of the collections for which there are regression tests. The local, SIT and UAT Harmony instances all utilise resources from CMR UAT, meaning any non-production environment will use the same resources.

When adding a production entry to the dictionary below, the collection instances can be included directly in the production dictionary entry, as they do not need to be shared.

In [None]:
non_production_collection = {
    'imerg_collection': Collection(id='C1245618475-EEDTEST'),
}

collection_data = {
    'https://harmony.uat.earthdata.nasa.gov': {
        **non_production_collection,
        'env': Environment.UAT
    },
    'https://harmony.sit.earthdata.nasa.gov': {
        **non_production_collection,
        'env': Environment.SIT
    },
    'http://localhost:3000': {
        **non_production_collection,
        'env': Environment.LOCAL
    },
}

environment_information = collection_data.get(harmony_host_url)

if environment_information is not None:
    harmony_client = Client(env=environment_information['env'])

## Test for a single, non-aggregated granule input

Makes a request, limiting the results to a single granule. Because the results aren't concatenated, we expect to find 1 Zarr store in the returned results.

In [None]:
if environment_information is not None:
    imrg_request1 = Request(collection=environment_information['imerg_collection'], 
                            max_results=1,
                            concatenate=False,
                            format='application/x-zarr')

    job_id = harmony_client.submit(imrg_request1)
    harmony_client.wait_for_processing(job_id, show_progress=True)
    results1 = harmony_client.result_json(job_id)
    assert_result_has_correct_number_of_stores(results1, 1)
    print_success('One granule, not aggregated, creates a single Zarr store.')    
else:
    print('Skipping test: N2Z regression tests not configured for this environment.')    

## Test three non-aggregated granules input

Makes a request, limiting the results to three granules. Because the results aren't concatenated, we expect to find 3 Zarr stores in the returned results.

In [None]:
if environment_information is not None:
    imrg_request2 = Request(collection=environment_information['imerg_collection'],
                            max_results=3,
                            concatenate=False,
                            format='application/x-zarr')

    job_id2 = harmony_client.submit(imrg_request2)
    harmony_client.wait_for_processing(job_id2, show_progress=True)
    results2 = harmony_client.result_json(job_id2)
    assert_result_has_correct_number_of_stores(results2, 3)
    print_success('Three granules, not aggregated, create three Zarr stores.')

##  Test two aggregated granules input

Makes a request, limiting the results to two granules. Because the results are concatenated, we expect to find just 1 Zarr store in the returned results.

In [None]:
if environment_information is not None:
    imrg_request3 = Request(collection=environment_information['imerg_collection'],
                            max_results=2,
                            concatenate=True,
                            format='application/x-zarr')

    job_id3 = harmony_client.submit(imrg_request3)
    harmony_client.wait_for_processing(job_id3, show_progress=True)
    results3 = harmony_client.result_json(job_id3)
    assert_result_has_correct_number_of_stores(results3, 1)
    print_success('Two granules aggregated create a single Zarr store.')    

##  Test two granules input default concatentation

Makes a request, limiting the results to two granules. Because the default is not to concatenate, we expect to find 2 Zarr stores in the returned results.

In [None]:
if environment_information is not None:
    imrg_request4 = Request(collection=environment_information['imerg_collection'],
                            max_results=2,
                            format='application/x-zarr')

    job_id4 = harmony_client.submit(imrg_request4)
    harmony_client.wait_for_processing(job_id4, show_progress=True)
    results4 = harmony_client.result_json(job_id4)
    assert_result_has_correct_number_of_stores(results4, 2)
    print_success('Two granules create two Zarr stores.')    