# Regression test suite for the SAMBAH:

This notebook provides condensed examples of using Harmony to make requests against the [Subsetter And Multi-dimensional Batched Aggregation in Harmony (SAMBAH)](https://stitchee.readthedocs.io/en/latest/sambah_readme/) service developed to process Level 2 data from the [Tropospheric Emissions: Monitoring of Pollution (TEMPO)](https://asdc.larc.nasa.gov/project/TEMPO) instrument. 

### Features of SAMBAH include:

* Variable subsetting, including required variables.
* Temporal subsetting.
* Bounding box spatial subsetting.
* Concatenation within TEMPO east-west scans
* Concatenation across scans

### Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill-sambah`

A `.netrc` file must also be located in the `test` directory of this repository.

# Import required packages:

In [None]:
import sys

sys.path.append('../shared_utils')
from utilities import (
    print_success,
    submit_and_download,
)

from datetime import datetime
from os.path import exists

from harmony import BBox, Client, Collection, Environment, Request

from local_utilities import (
    compare_results_to_reference_file,
    remove_results_files,
)

## Set default parameters:

`papermill` requires default values for parameters used on the workflow. In this case, `harmony_host_url`.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Identify Harmony environment (for easier reference):

In [None]:
host_environment = {
    'http://localhost:3000': Environment.LOCAL,
    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
    'https://harmony.earthdata.nasa.gov': Environment.PROD,
}

harmony_environment = host_environment.get(harmony_host_url)

if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

The request collection and granules are different for UAT and PROD:

In [None]:
sambah_non_prod_information = {
    # TEMPO NO2 tropospheric, stratospheric, and total columns V03
    # https://cmr.uat.earthdata.nasa.gov/search/concepts/C1262899916-LARC_CLOUD.html
    'collection': Collection(id='C1262899916-LARC_CLOUD'),
    'granule_id': [
        'G1263137623-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240328T153020Z_S007G08.nc
        'G1263137394-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240328T153657Z_S007G09.nc
        'G1263137387-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240328T154353Z_S008G01.nc
        'G1263137388-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240328T155033Z_S008G02.nc
        'G1263137378-LARC_CLOUD',  # TEMPO_NO2_L2_V03_20240328T155713Z_S008G03.nc
    ],
}

sambah_prod_information = {
    # TEMPO NO2 tropospheric and stratospheric columns V03 (BETA)
    # https://cmr.earthdata.nasa.gov/search/concepts/C2930725014-LARC_CLOUD.html
    'collection': Collection(id='C2930725014-LARC_CLOUD'),
    'granule_id': [
        'G3205017917-LARC_CLOUD',
        'G3205017777-LARC_CLOUD',
        'G3205052670-LARC_CLOUD',
        'G3205053541-LARC_CLOUD',
        'G3205052319-LARC_CLOUD',
    ],
}

sambah_request_env = {
    Environment.LOCAL: sambah_non_prod_information,
    Environment.SIT: sambah_non_prod_information,
    Environment.UAT: sambah_non_prod_information,
    Environment.PROD: sambah_prod_information,
}

if harmony_environment in sambah_request_env:
    sambah_info = sambah_request_env[harmony_environment]
else:
    sambah_info = None

In [None]:
request_info = {
    'collection': sambah_info["collection"],
    'temporal': {
        'start': datetime(2024, 3, 28, 15, 34, 0),
        'stop': datetime(2024, 3, 28, 16, 0, 0),
    },
    'spatial': BBox(-170, 33, -10, 38),
    'granule_id': sambah_info["granule_id"],
    # chosen variables include one variable from each group
    # support/scattering_weights is 3D variable
    'variables': [
        'product/vertical_column_stratosphere',
        'qa_statistics/fit_rms_residual',
        'support_data/scattering_weights',
    ],
}

# Begin regression tests:

SAMBAH is currently deployed to Sandbox, SIT, UAT and production.
Requests will be made against the TEMPO NO2 L2 V03 collection.

### SAMBAH: temporal, variable and bounding box subset request

This is a request that exercises the full range of SAMBAH options: 
- time range
- spatial bounding box
- variable subsetting
- concatenating within (i.e., `extend`) and across scans

In [None]:
if request_info is not None:
    temp_var_bbox_request = Request(
        collection=request_info['collection'],
        extend=['mirror_step'],
        concatenate=True,
        granule_id=request_info['granule_id'],
        temporal=request_info['temporal'],
        variables=request_info['variables'],
        spatial=request_info['spatial'],
    )

    request_name = 'SAMBAH temporal, variable, bounding box request'
    output_filename = 'temp_var_bbox.nc4'

    submit_and_download(harmony_client, temp_var_bbox_request, output_filename)
    assert exists(output_filename), f'Unsuccessful {request_name}.'

    compare_results_to_reference_file(output_filename)
    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

### SAMBAH: variable subset request, two files

This is a request that includes:
- variable subsetting
- concatenating two granules within (i.e., `extend`) a scan

In [None]:
if request_info is not None:
    var_only_request = Request(
        collection=request_info['collection'],
        extend=['mirror_step'],
        concatenate=True,
        granule_id=request_info['granule_id'][:2],
        variables=request_info['variables'],
    )

    request_name = 'SAMBAH variable request'
    output_filename = 'var_only.nc4'

    submit_and_download(harmony_client, var_only_request, output_filename)
    assert exists(output_filename), f'Unsuccessful {request_name}.'

    compare_results_to_reference_file(output_filename)
    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

### SAMBAH: spatial request

This is a request that includes:
- spatial bounding box subsetting
- concatenating granules within (i.e., `extend`) and across scans

In [None]:
if request_info is not None:
    spatial_only_request = Request(
        collection=request_info['collection'],
        extend=['mirror_step'],
        concatenate=True,
        granule_id=request_info['granule_id'],
        spatial=request_info['spatial'],
    )

    request_name = 'SAMBAH spatial request'
    output_filename = 'spatial_only.nc4'

    submit_and_download(harmony_client, spatial_only_request, output_filename)
    assert exists(output_filename), f'Unsuccessful {request_name}.'

    compare_results_to_reference_file(output_filename)
    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

### SAMBAH: no subsetting required, single file

This is a request that includes:
- a single granule
- no subsetting

In [None]:
if request_info is not None:
    all_data_request = Request(
        collection=request_info['collection'],
        extend='mirror_step',
        concatenate=True,
        granule_id=request_info['granule_id'][0],
    )

    request_name = 'SAMBAH no subset single file request'
    output_filename = 'all_data.nc4'

    submit_and_download(harmony_client, all_data_request, output_filename)
    assert exists(output_filename), f'Unsuccessful {request_name}.'

    compare_results_to_reference_file(output_filename)
    print_success(request_name)
else:
    print(
        f'SAMBAH is not configured for environment: "{harmony_environment}" - skipping test.'
    )

# Clean up test outputs:

In [None]:
remove_results_files()