# net2cog regression tests

This Jupyter notebook runs a suite of regression tests against the net2cog Harmony Service.

These tests evaluate the net2cog service using NetCDF input data from
* [SMAP_RSS_L3_SSS_SMI_8DAY-RUNNINGMEAN_V4](https://search.uat.earthdata.nasa.gov/search?q=C1234410736) 
* [SMAP L4 Global 9 km EASE-Grid Surface and Root Zone Soil Moisture Land Model Constants](https://search.uat.earthdata.nasa.gov/search/granules?p=C1256108792-EEDTEST&pg[0][v]=f&pg[0][gsk]=-start_date&q=SPL4SMLM%20EEDTEST&tl=1588197503.679!5!!)

## Set the Harmony environment:

The cell below sets the `harmony_host_url` to one of the following valid values:

* Production: <https://harmony.earthdata.nasa.gov>
* UAT: <https://harmony.uat.earthdata.nasa.gov>
* SIT: <https://harmony.sit.earthdata.nasa.gov>
* Local: <http://localhost:3000>

The default value is for the UAT environment. When using this notebook there are two ways to use the non-default environment:

* Run this notebook in a local Jupyter notebook server and change the value of `harmony_host_url` in the cell below to the value for the environment you require from the above list.

* Use the `run_notebooks.sh` script, which requires you to declare an environment variable `HARMONY_HOST_URL`. Set that environment variable to the value above that corresponds to the environment you want to test. That environment variable will take precedence over the default value in the cell below.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

## Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill-net2cog`

A `.netrc` file must also be located in the `test` directory of this repository.

### Import required packages:

In [None]:
from harmony import Collection, Environment, Client, Request
from tempfile import TemporaryDirectory

from utility import validate_smap_outputs

### Set up environment dependent variables:

This includes the Harmony `Client` object and `Collection` objects for each of the collections for which there are regression tests. The local, SIT and UAT Harmony instances all utilise resources from CMR UAT, meaning any non-production environment will use the same resources.

When adding a production entry to the dictionary below, the collection instances can be included directly in the production dictionary entry, as they do not need to be shared.

In [None]:
non_production_configuration = {
    'RSS_SPL3_SSS_single_variable': {
        'collection_concept_id': Collection(id='C1272962474-EEDTEST'),
        'granule_id': 'G1272962521-EEDTEST',
        'variables_to_subset': ['sss_smap'],
    },
    'RSS_SPL3_SSS_multi_variables': {
        'collection_concept_id': Collection(id='C1272962474-EEDTEST'),
        'granule_id': 'G1272962521-EEDTEST',
        'variables_to_subset': ['sss_smap', 'sss_smap_40km'],
    },
    'RSS_SPL3_SSS_all_variables': {
        'collection_concept_id': Collection(id='C1272962474-EEDTEST'),
        'granule_id': 'G1272962521-EEDTEST',
    },
    'SPL4_SMLM': {
        'collection_concept_id': Collection(id='C1256108792-EEDTEST'),
        'granule_id': 'G1256108793-EEDTEST',
        'variables_to_subset': ["/Land-Model-Constants_Data/cell_land_fraction"],
    },
}

production_configuration = {
    'RSS_SPL3_SSS_single_variable': {
        'collection_concept_id': Collection(id='TBD'),
        'granule_id': '',
        'variables_to_subset': ['sss_smap'],
    },
    'RSS_SPL3_SSS_multi_variables': {
        'collection_concept_id': Collection(id='TBD'),
        'granule_id': '',
        'variables_to_subset': ['sss_smap'],
    },
    'RSS_SPL3_SSS_all_variables': {
        'collection_concept_id': Collection(id='TBD'),
        'granule_id': '',
    },
    'SPL4_SMLM': {
        'collection_concept_id': Collection(id='TBD'),
        'granule_id': '',
        'variables_to_subset': ["/Land-Model-Constants_Data/cell_land_fraction"],
    },
}

In [None]:
environment_configuration = {
    # 'https://harmony.earthdata.nasa.gov': {
    #     'config': production_configuration,
    #     'env': Environment.PROD,
    # },
    'https://harmony.uat.earthdata.nasa.gov': {
        'config': non_production_configuration,
        'env': Environment.UAT,
    },
    'https://harmony.sit.earthdata.nasa.gov': {
        'config': non_production_configuration,
        'env': Environment.SIT,
    },
    'http://localhost:3000': {
        'config': non_production_configuration,
        'env': Environment.LOCAL,
    },
}

configuration = environment_configuration.get(harmony_host_url)

if configuration is not None:
    harmony_client = Client(env=configuration['env'])

### Expected Results:

* Expected value of the Coordinate Reference System (CRS).
* Expected value of number of files are returned.
* Expected value bounding box

In [None]:
expected_results = {
    'RSS_SPL3_SSS_single_variable': {
        'expected_crs': 'EPSG:4326',
        'expected_file_count': 1,
        'expected_bounding_box': [(0.0, 90.0, 360.0, -90.0)],
    },
    'RSS_SPL3_SSS_multi_variables': {
        'expected_crs': 'EPSG:4326',
        'expected_file_count': 2,
        'expected_bounding_box': [(0.0, 90.0, 360.0, -90.0)],
    },
    'RSS_SPL3_SSS_all_variables': {
        'expected_crs': 'EPSG:4326',
        'expected_file_count': 10,
        'expected_bounding_box': [(0.0, 90.0, 360.0, -90.0)],
    },
    'SPL4_SMLM': {
        'expected_crs': 'EPSG:6933',
        'expected_file_count': 1,
        'expected_bounding_box': [
            (-17367531.3203125, -7314541.19921875, 17367531.3203125, 7314541.19921875)
        ],
    },
}

### Run Tests

* This cell loops through the collections specified in the configured environment
* The process involves submitting a request for collection data to Harmony, followed by downloading the returned results.
* Verification compares the results to a previously generated and subsetted reference data file.
* Validation of the generated COG file.
* Validation of the Coordinate Reference System (CRS).
* Visualizes each variable with a graph.

Lists of tests:
* SPL4_SMLM: Land-Model-Constants_Data/cell_land_fraction nested single variable
* RSS_SPL3_SSS: conversion of sss_smap single variable
* RSS_SPL3_SSS: multiple variables.  
    This test ensures that a request for more than one variable, where those variables are named, will succeed and return output for each variable.
* RSS_SPL3_SSS: all variables
    This should result in 10 output files, 1 each for the following variables:
    * `fland`
    * `gice`
    * `gland`
    * `nobs`
    * `nobs_40km`
    * `sss_ref`
    * `sss_smap`
    * `sss_smap_40km`
    * `sss_smap_uncertainty`
    * `surtep`

    The input NetCDF-4 file has three additional variables that should not be converted to a GeoTIFF:
    * `lat`
    * `lon`
    * `time`

In [None]:
if configuration is not None:
    for collection, test_config in configuration['config'].items():
        with TemporaryDirectory() as tmp_dir:

            print(f'Testing collection: {collection} ')
            test_request = Request(
                collection=test_config['collection_concept_id'],
                granule_id=[test_config['granule_id']],
                max_results=1,
                format='image/tiff',
            )

            if 'variables_to_subset' in test_config:
                test_request.variables = test_config['variables_to_subset']

            job_id = harmony_client.submit(test_request)
            harmony_client.wait_for_processing(job_id, show_progress=True)

            if collection in expected_results:
                validate_smap_outputs(
                    harmony_client, job_id, expected_results[collection]
                )
            else:
                print(
                    f'Skipping test: collection: {collection} is not configured expected_results dictionary'
                )
else:
    print('Skipping test: net2cog is not configured for environment')