# Regression test suite for the SAMBAH:

This notebook provides condensed examples of using Harmony to make requests against the [Subsetter And Multi-dimensional Batched Aggregation in Harmony (SAMBAH)](https://stitchee.readthedocs.io/en/latest/sambah_readme/) service developed to process Level 2 data from the [Tropospheric Emissions: Monitoring of Pollution (TEMPO)](https://asdc.larc.nasa.gov/project/TEMPO) instrument. 

### Features of SAMBAH include:

* Variable subsetting, including required variables.
* Temporal subsetting.
* Bounding box spatial subsetting.
* Concatenation within TEMPO east-west scans
* Concatenation across scans

### Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill-sambah`

A `.netrc` file must also be located in the `test` directory of this repository.

# Import required packages:

In [None]:
from datetime import datetime
from os.path import exists
from pathlib import Path
from tempfile import TemporaryDirectory

from earthdata_hashdiff import nc4_matches_reference_hash_file
from harmony import BBox, Client, Collection, Environment, Request

## Import shared utility functions:

In [None]:
import sys

sys.path.append('../shared_utils')
from utilities import print_success, submit_and_download

## Set default parameters:

`papermill` requires default values for parameters used on the workflow. In this case, `harmony_host_url`.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Identify Harmony environment (for easier reference):

In [None]:
host_environment = {
    'http://localhost:3000': Environment.LOCAL,
    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
    'https://harmony.earthdata.nasa.gov': Environment.PROD,
}

harmony_environment = host_environment.get(harmony_host_url)

if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

The request collection and granules are different for UAT and PROD:

In [None]:
sambah_non_prod_information = {
    # TEMPO NO2 tropospheric, stratospheric, and total columns V03
    # https://cmr.uat.earthdata.nasa.gov/search/concepts/C1262899916-LARC_CLOUD.html
    'collection': Collection(id='C1262899916-LARC_CLOUD'),
    'granule_id': [
        'G1269044486-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T153258Z_S007G07.nc
        'G1269044632-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T153935Z_S007G08.nc
        'G1269044623-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T154612Z_S007G09.nc
        'G1269044612-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T155308Z_S008G01.nc
        'G1269044756-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T155948Z_S008G02.nc
    ],
    'full_subset_reference_file': 'reference_files/temp_var_bbox_uat.json',
    'variable_subset_reference_file': 'reference_files/var_only_uat.json',
    'spatial_subset_reference_file': 'reference_files/spatial_only_uat.json',
    'no_subset_reference_file': 'reference_files/all_data_uat.json',
}

sambah_prod_information = {
    # TEMPO NO2 tropospheric and stratospheric columns V03 (BETA)
    # https://cmr.earthdata.nasa.gov/search/concepts/C2930725014-LARC_CLOUD.html
    'collection': Collection(id='C2930725014-LARC_CLOUD'),
    'granule_id': [
        'G3181300053-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T153258Z_S007G07.nc
        'G3181300108-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T153935Z_S007G08.nc
        'G3181299889-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T154612Z_S007G09.nc
        'G3181345515-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T155308Z_S008G01.nc
        'G3181345531-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240801T155948Z_S008G02.nc
    ],
    'full_subset_reference_file': 'reference_files/temp_var_bbox_prod.json',
    'variable_subset_reference_file': 'reference_files/var_only_prod.json',
    'spatial_subset_reference_file': 'reference_files/spatial_only_prod.json',
    'no_subset_reference_file': 'reference_files/all_data_prod.json',
}

sambah_request_env = {
    Environment.LOCAL: sambah_non_prod_information,
    Environment.SIT: sambah_non_prod_information,
    Environment.UAT: sambah_non_prod_information,
    Environment.PROD: sambah_prod_information,
}

if harmony_environment in sambah_request_env:
    sambah_info = sambah_request_env[harmony_environment]
else:
    sambah_info = None

In [None]:
request_info = {
    **sambah_info,
    'temporal': {
        'start': datetime(2024, 8, 1, 15, 34, 0),
        'stop': datetime(2024, 8, 1, 16, 0, 0),
    },
    'spatial': BBox(-170, 33, -10, 38),
    # chosen variables include one variable from each group
    # support/scattering_weights is 3D variable
    'variables': [
        'product/vertical_column_stratosphere',
        'qa_statistics/fit_rms_residual',
        'support_data/scattering_weights',
    ],
}

# Begin regression tests:

SAMBAH is currently deployed to Sandbox, SIT, UAT and production. Requests will be made against the TEMPO NO2 L2 V03 collection. Each request will:

* Define a request to Harmony using  `harmony-py`.
* Execute that request.
* Download the output file within a temporary directory.
* Ensure that the expected output file is downloaded (i.e. that the request was successful).
* Convert the output to a JSON file that hashes the variables and groups in the output file.
* Compare that JSON file to a reference JSON file, that should be identical (except for the `history` and `history_json` attributes that contain timestamps).

### SAMBAH: temporal, variable and bounding box subset request

This is a request that exercises the full range of SAMBAH options: 
- time range
- spatial bounding box
- variable subsetting
- concatenating within (i.e., `extend`) and across scans

In [None]:
if request_info is not None:
    temp_var_bbox_request = Request(
        collection=request_info['collection'],
        extend=['mirror_step'],
        concatenate=True,
        granule_id=request_info['granule_id'],
        temporal=request_info['temporal'],
        variables=request_info['variables'],
        spatial=request_info['spatial'],
    )

    request_name = 'SAMBAH temporal, variable, bounding box request'

    with TemporaryDirectory() as tmp_dir:
        full_output_path = tmp_dir / Path('temp_var_bbox.nc4')
        submit_and_download(harmony_client, temp_var_bbox_request, full_output_path)
        assert exists(full_output_path), f'Unsuccessful {request_name}.'
        assert nc4_matches_reference_hash_file(
            full_output_path,
            request_info['full_subset_reference_file'],
        ), f'{request_name}: Output and reference files do not match'

    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

### SAMBAH: variable subset request, two files

This is a request that includes:
- variable subsetting
- concatenating two granules within (i.e., `extend`) a scan

In [None]:
if request_info is not None:
    var_only_request = Request(
        collection=request_info['collection'],
        extend=['mirror_step'],
        concatenate=True,
        granule_id=request_info['granule_id'][:2],
        variables=request_info['variables'],
    )

    request_name = 'SAMBAH variable request'

    with TemporaryDirectory() as tmp_dir:
        var_output_path = tmp_dir / Path('var_only.nc4')
        submit_and_download(harmony_client, var_only_request, var_output_path)
        assert exists(var_output_path), f'Unsuccessful {request_name}.'
        assert nc4_matches_reference_hash_file(
            var_output_path,
            request_info['variable_subset_reference_file'],
        ), f'{request_name}: Output and reference files do not match'

    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

### SAMBAH: spatial request

This is a request that includes:
- spatial bounding box subsetting
- concatenating granules within (i.e., `extend`) and across scans

In [None]:
if request_info is not None:
    spatial_only_request = Request(
        collection=request_info['collection'],
        extend=['mirror_step'],
        concatenate=True,
        granule_id=request_info['granule_id'],
        spatial=request_info['spatial'],
    )

    request_name = 'SAMBAH spatial request'

    with TemporaryDirectory() as tmp_dir:
        spatial_output_path = tmp_dir / Path('spatial_only.nc4')

        submit_and_download(harmony_client, spatial_only_request, spatial_output_path)
        assert exists(spatial_output_path), f'Unsuccessful {request_name}.'
        assert nc4_matches_reference_hash_file(
            spatial_output_path,
            request_info['spatial_subset_reference_file'],
        ), f'{request_name}: Output and reference files do not match'

    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

### SAMBAH: no subsetting required, single file

This is a request that includes:
- a single granule
- no subsetting

In [None]:
if request_info is not None:
    all_data_request = Request(
        collection=request_info['collection'],
        extend='mirror_step',
        concatenate=True,
        granule_id=request_info['granule_id'][0],
    )

    request_name = 'SAMBAH no subset single file request'

    with TemporaryDirectory() as tmp_dir:
        all_data_output_path = tmp_dir / Path('all_data.nc4')

        submit_and_download(harmony_client, all_data_request, all_data_output_path)
        assert exists(all_data_output_path), f'Unsuccessful {request_name}.'
        assert nc4_matches_reference_hash_file(
            all_data_output_path,
            request_info['no_subset_reference_file'],
        ), f'{request_name}: Output and reference files do not match'

    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )