# NSIDC SMAP Regression tests

### This juypter notebook runs and verifies a series of test requests against NSIDC's SMAP data.

Requests are submitted and the retrieved data compared to a set of verified results.

Sample requests include:

- Subset by Bounding Box
- Subset by Geojson
- Subset by Shapefile 
- Subset by Variable
- Subset by KML
- Reprojection to Geographic
- GeoTIFF Reformating


## Prerequisites

The dependencies for running this notebook are listed in the
[environment.yaml](https://github.com/nasa/harmony-regression-tests/blob/main/test/nsidc-smap/environment.yaml).

In order to test locally, run the following commands from the `test/nsidc-smap/` directory to create and activate the conda environment necessary to run the regression testing notebook.

```sh
conda env create -f ./environment.yaml && conda activate papermill-nsidc-smap
```

To use this environment within a shared Jupyter Hub, see [instructions](https://nasa-openscapes.github.io/earthdata-cloud-cookbook/contributing/workflow.html#create-a-jupyter-kernel-to-run-notebooks) in the NASA Earthdata Cloud Cookbook for how to create a new kernel based on this environment. 

## Authentication

To provide your credentials to harmony, a `.netrc` file must be located in the `test` directory of this repository.
Ensure the credentials in this .netrc belong to a user that can access the NSIDC data which is protected by ACLs in UAT and SIT.


## Set the Harmony environment:

The next cell below sets the `harmony_host_url` to one of the following valid values:

* Production: <https://harmony.earthdata.nasa.gov>
* UAT: <https://harmony.uat.earthdata.nasa.gov>
* SIT: <https://harmony.sit.earthdata.nasa.gov>
* Local: <http://localhost:3000>

By default, the value is set to use Harmony's UAT environment. You can modify the target environment in two ways when using this notebook.

* Run this notebook in a local Jupyter notebook server and simply edit the value of `harmony_host_url` in the cell below to be the desired value for your environment.

* Run the `run_notebooks.sh` script, which uses the papermill library to parameterize and run notebooks. Before running, set the environment variable `HARMONY_HOST_URL` to the desired environment's URL from the list above. This variable will override the default value in the cell below, allowing papermill to inject the correct URL into the notebook at runtime.

## Test reference files:

The reference files stored in the harmony-regression-test repository are JSON files containing hashed values derived from the groups, variables and metadata in each file. The raw netCDF4, HDF-5 or GeoTIFF files are hosted in the Harmony UAT AWS account in the `harmony-uat-regression-tests` S3 bucket in the `nsidc-smap/reference_files/X.Y.Z` folder.

In [None]:
harmony_host_url = "https://harmony.uat.earthdata.nasa.gov"

### Import required packages

In [None]:
from os.path import exists
from pathlib import Path
from tempfile import TemporaryDirectory
import json

from harmony import Client, Environment, Request
from earthdata_hashdiff import (
    geotiff_matches_reference_hash_file,
)

#### Import shared utility functions:

In [None]:
import sys

sys.path.append("../shared_utils")
from utilities import print_success, submit_and_download
from smap_utils import file_for_variable, comparison_function_by_extension

### Set up test information

The tests are configured in json objects in the `test_configuration.py` module.




In [None]:
from test_configuraton import non_production_configuration, production_configuration

In [None]:
environment_configuration = {
    "https://harmony.earthdata.nasa.gov": {
        **production_configuration,
        "env": Environment.PROD,
    },
    "https://harmony.uat.earthdata.nasa.gov": {
        **non_production_configuration,
        "env": Environment.UAT,
    },
    "https://harmony.sit.earthdata.nasa.gov": {
        **non_production_configuration,
        "env": Environment.SIT,
    },
    "http://localhost:3000": {
        **non_production_configuration,
        "env": Environment.LOCAL,
    },
}

configuration = environment_configuration.get(harmony_host_url)

if configuration is not None:
    harmony_client = Client(env=configuration["env"])

### Run Tests

The next cell runs through each of the tests forming requests that are submitted to Harmony and comparing the downloaded results against reference data files that have been verified.  This ensures that Harmony continues to return the expected binary files for expected requests.


The tests include:

- subset_bounding_box: SPL2SMA, SPL3FTP_E
- subset_by_geojson: SPL2SMA
- subset_by_geojson: SPL2SMA
- subset_by_shapefile: SPL3FTP
- subset_by_variable: SPL2SMP_E, SPL2SMP_E_2
- subset_by_kml: SPL3SMP
- reprojection_to_geographic: SPL4CMDL
            



In [None]:
if configuration is not None:
    for test_name, test_configs in configuration["single_output_tests"].items():
        print(f"Running Tests: {test_name}")

        with TemporaryDirectory() as tmp_dir:

            for shortname, test_config in test_configs.items():
                print(f"running request: {shortname}")

                test_request = Request(**test_config["request_params"])
                ext = test_config["test_params"]["ext"]
                test_output = tmp_dir / Path(f"{shortname}_{test_name}{ext}")
                test_reference = Path(
                    f"reference_files/{test_output.stem}_reference.json"
                )

                submit_and_download(harmony_client, test_request, test_output)

                assert exists(
                    test_output
                ), f"Unsuccessful Harmony Request: {shortname}: {test_name}"

                # Grab correct comparison function and call it.
                compare_fxn = comparison_function_by_extension(ext)
                reference_file = (
                    Path("reference_files") / f"{test_output.stem}_reference.json"
                )
                print(
                    f"assert: {compare_fxn.__name__}({test_output.name}, {reference_file}) "
                )
                assert compare_fxn(
                    test_output, reference_file
                ), f'Failed comparison for {shortname}:{test_name}'

                print_success(f"{shortname} {test_name} test request complete.")
            print_success(f"{test_name} test suite complete.")
        print_success("Entire SMAP Single Output Regression Test suite complete.")
else:
    print(
        f'Bounding box tests not configured for environment: {configuration["env"]} - skipping tests'
    )

## Multiple output file tests

The following cell runs through any request that generates multiple output files. These are GeoTIFF Reformatting requests. These are generally requests for multiple variables and generate a single GeoTIFF for each variable on the output. The tests make a request and then download all of the resulting files and then compare those to the reference hash values to verify no changes in the output.

The tests include:

- SPL2SMP with multiple variable subsetting.


In [None]:
if configuration is not None:
    for test_name, test_configs in configuration["multiple_output_tests"].items():
        with TemporaryDirectory() as tmp_dir:

            for shortname, test_config in test_configs.items():
                print(
                    f"testing {shortname} with {json.dumps(test_config, indent=2, default=str)}"
                )
                test_request = Request(**test_config["request_params"])
                ext = test_config["test_params"]["ext"]

                ## Generate and download all of the output files
                job_id = harmony_client.submit(test_request)

                for filename in [
                    file_future.result()
                    for file_future in harmony_client.download_all(
                        job_id, overwrite=True, directory=tmp_dir
                    )
                ]:
                    prefix = Path(filename).name.split("_")[0]
                    print(f"Downloaded: {Path(filename).name}")

                # ## Check each of the expected output files against its reference file.
                for test_var in test_config["request_params"]["variables"]:
                    var_base = test_var.split("/")[-1]
                    output_file = file_for_variable(
                        Path(tmp_dir), f"{prefix}*Data_{var_base}_reformatted*"
                    )
                    reference_file = file_for_variable(
                        Path("reference_files"), f"{var_base}_reference*"
                    )
                    print(f"validating {output_file.name}")
                    assert geotiff_matches_reference_hash_file(
                        output_file, reference_file
                    ), f'Failed {shortname}:{test_name}'

                print_success(f"{shortname} {test_name} test request complete.")
            print_success(f"{test_name} test suite complete.")
        print_success("muliple_output_test: SMAP Regression Test suite complete.")
else:
    print(
        f'Bounding box tests not configured for environment: {configuration["env"]} - skipping tests'
    )