## Regression tests for the Harmony SMAP L2 Gridder 


This Jupyter notebook runs a suite of regression tests against the Harmony SMAP L2 gridding service. 

SMAP Enhanced L2 Radiometer Half-Orbit 9 km EASE-Grid Soil Moisture V006 ([SPL2SMP_E](https://mmt.uat.earthdata.nasa.gov/collections/C1268429712-EEDTEST)) as gridded trajectory input data.

Validation is done by making a request to the service and validating the correct "shape" of the output data file and then a subset of the data is compared against regression data.


## Set the Harmony environment:

The cell below sets the `harmony_host_url` to one of the following valid values:

* Production: <https://harmony.earthdata.nasa.gov>
* UAT: <https://harmony.uat.earthdata.nasa.gov>
* SIT: <https://harmony.sit.earthdata.nasa.gov>
* Local: <http://localhost:3000>

The default value is for the UAT environment. When using this notebook there are two ways to use the non-default environment:

* Run this notebook in a local Jupyter notebook server and change the value of `harmony_host_url` in the cell below to the value for the environment you require from the above list.

* Use the `run_notebooks.sh` script, which requires you to declare an environment variable `HARMONY_HOST_URL`. Set that environment variable to the value above that corresponds to the environment you want to test. That environment variable will take precedence over the default value in the cell below.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

## Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill-smap-l2-gridder`

A `.netrc` file must also be located in the `test` directory of this repository.

### Import required packages:

In [None]:
from harmony import Client, Collection, Environment, Request
from pathlib import Path
from tempfile import TemporaryDirectory

#### Import shared utility functions:

In [None]:
import sys

sys.path.append('../shared_utils')
from utilities import print_success, submit_and_download
from compare import compare_results_to_reference_file

### Set up test configuration

The next cell organizes the collections and granules to be tested in each environment.

Additional configuration is set for selecting a subset of the output for comparison in the regression. The selector dictionaries have keys into the top level groups of the output dataset and their dictionaries are subsets along the defined dimensions.  This allows us to compare small parts of very large datasets.

In [None]:
spl2smp_e_selector = {
    "Soil_Moisture_Retrieval_Data": {
        "y-dim": slice(0, 1000),
        "x-dim": slice(1500, 2500),
    },
    "Soil_Moisture_Retrieval_Data_Polar": {
        "y-dim": slice(1000, 2000),
        "x-dim": slice(1000, 2000),
    },
}

spl2smap_selector = {
    "Soil_Moisture_Retrieval_Data_3km": {
        "y-dim": slice(1000, 2000),
        "x-dim": slice(8800, 9800),
    },
    "Soil_Moisture_Retrieval_Data": {
        "y-dim": slice(0, 1000),
        "x-dim": slice(2500, 3500),
    },
}


spl2sma_slice = {
    "y-dim": slice(0, 1000),
    "x-dim": slice(9500, 10500),
}

spl2sma_selector = {
    "Soil_Moisture_Retrieval_Data": spl2sma_slice,
    "Radar_Data": spl2sma_slice,
    "Ancillary_Data": spl2sma_slice,
}


non_production_configuration = {
    'SPL2SMP_E': {
        'collection_concept_id': Collection(id='C1268429712-EEDTEST'),
        'granule_id': 'G1268429718-EEDTEST',
        'selector': spl2smp_e_selector,
    },
    'SPL2SMAP': {
        'collection_concept_id': Collection(id='C1268429748-EEDTEST'),
        'granule_id': 'G1268429753-EEDTEST',
        'selector': spl2smap_selector,
    },
    'SPL2SMA': {
        'collection_concept_id': Collection(id='C1268429729-EEDTEST'),
        'granule_id': 'G1268429743-EEDTEST',
        'selector': spl2sma_selector,
    },
}


production_configuration = {
    'SPL2SMP_E': {
        'collection_concept_id': Collection(id='TBD'),
        'granule_id': '',
        'selector': spl2smp_e_selector,
    },
    'SPL2SMAP': {
        'collection_concept_id': Collection(id='TBD'),
        'granule_id': '',
        'selector': spl2smap_selector,
    },
    'SPL2SMA': {
        'collection_concept_id': Collection(id='C1236303826-NSIDC_ECS'),
        'granule_id': '',
        'selector': spl2sma_selector,
    },
}

In [None]:
environment_configuration = {
    'https://harmony.earthdata.nasa.gov': {
        'config': production_configuration,
        'env': Environment.PROD,
    },
    'https://harmony.uat.earthdata.nasa.gov': {
        'config': non_production_configuration,
        'env': Environment.UAT,
    },
    'https://harmony.sit.earthdata.nasa.gov': {
        'config': non_production_configuration,
        'env': Environment.SIT,
    },
    'http://localhost:3000': {
        'config': non_production_configuration,
        'env': Environment.LOCAL,
    },
}

configuration = environment_configuration.get(harmony_host_url)

if configuration is not None:
    harmony_client = Client(env=configuration['env'])

### Run Tests

* The cell below loops through the SMAP L2 collections for the configured environment.
* A request for gridding the L2 data is submitted to Harmony and the results downloaded.
* Verification compares the results to a previously generated and subsetted reference data file.  This comparison is done only on the first 2000 rows and 2000 columns of the downloaded data to save memory usage. 


In [None]:
if configuration is not None:
    for collection, test_config in configuration['config'].items():
        with TemporaryDirectory() as tmp_dir:
            test_request = Request(
                collection=test_config['collection_concept_id'],
                granule_id=[test_config['granule_id']],
                crs='EPSG:4326',
                format='application/x-netcdf4',
            )
            test_output = tmp_dir / Path(f'{collection}.nc')
            test_reference = Path(
                f'reference_files/{test_output.stem}_reference{test_output.suffix}'
            )

            submit_and_download(harmony_client, test_request, test_output)

            assert Path(
                test_output
            ).exists, 'Unsuccessful Harmony Request: {collection}: {test_name}'
            compare_results_to_reference_file(
                test_output,
                test_reference,
                identical=False,
                subset_selector=test_config.get('selector', None),
            )
        print_success(f'{collection} Test.')

    print_success('Entire Test Suite.')
else:
    print(
        f'Bounding box tests not configured for environment: {configuration["env"]} - skipping tests'
    )