# Regression test suite for CASPER:

This notebook runs a suite of regression tests against the Harmony CASPER Service. These tests use sample TEMPO NOâ‚‚ granules to verify that CASPER logic works as expected and that output matches reference data in `reference_files/`.


### Features of CASPER include:

* Converting a NetCDF files to CSV files. It groups NetCDF variables by the dimensional schema and outputs each dimensional schema in a separate CSV file.

### Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill-casper`

A `.netrc` file must also be located in the `test` directory of this repository.


# Begin regression tests


## Import required packages:

In [39]:
from os import listdir, remove
from hashlib import sha256
from pathlib import Path
from zipfile import ZipFile
from shutil import rmtree
from tempfile import TemporaryDirectory
from pathlib import Path
from json import load
from harmony import Client, Collection, Request, Environment

### Import shared utility functions:

In [40]:
import sys

sys.path.append('../shared_utils')
from utilities import print_success, submit_and_download

## Set default parameters:

`papermill` requires default values for parameters used on the workflow. In this case, `harmony_host_url`.

In [41]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Identify Harmony environment (for easier reference):

In [42]:
host_environment = {
    'http://localhost:3000': Environment.LOCAL,
    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
    'https://harmony.earthdata.nasa.gov': Environment.PROD,
}

harmony_environment = host_environment.get(harmony_host_url)
if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

# Begin regression tests:

CASPER is currently deployed to Sandbox, SIT, and UAT.

### Set up environment-dependent variables for CASPER:

Define the collections and granules for testing in non-production environments

In [43]:
casper_non_prod_test_data = {
    'C1274178436-LARC_CLOUD': {
        'collection': Collection(id='C1274178436-LARC_CLOUD'),
        'granule_name':'TEMPO_NO2_L2_V04_20250913T000441Z_S015G04.nc', # granule_id 'G1276518102-LARC_CLOUD'
    },
    'C1262899916-LARC_CLOUD': {
        'collection': Collection(id='C1262899916-LARC_CLOUD'),
        'granule_name':'TEMPO_NO2_L2_V03_20250309T000257Z_S014G06.nc', #granule_id 'G1273153873-LARC_CLOUD
    }
}

casper_test_data_by_environment = {
    Environment.LOCAL: casper_non_prod_test_data,
    Environment.SIT: casper_non_prod_test_data,
    Environment.UAT: casper_non_prod_test_data,
}

if harmony_environment in casper_test_data_by_environment:
    casper_test_data = casper_test_data_by_environment[harmony_environment]
else:
    casper_test_data = None

## Test: CASPER TEMPO NO2 conversions

Submit granules and validate output for 2 collections.

In [44]:
def calculate_hash(filename):
    # Read the file in chunks and create hash
    sha256_hash = sha256()
    with open(filename, 'rb') as file:
          while chunk := file.read(8192):
            sha256_hash.update(chunk)
    return sha256_hash.hexdigest()

if casper_test_data is not None:
    for v in casper_non_prod_test_data.values():
        request = Request(
            collection=v['collection'],
            granule_name=v['granule_name'],
            format='text/csv',
        )
        assert request.is_valid()
        jid = harmony_client.submit(request)
    
        with TemporaryDirectory() as temp_dir:
            results = harmony_client.download_all(jid, directory=temp_dir, overwrite=True)
            file_name = [f.result() for f in results][0]
            with open(f"reference_files/{v['granule_name'].split('.')[0]}.json", 'r') as file:
                ref_hashes = load(file)
            extract_dir = file_name.split('.')[0]
            with ZipFile(file_name, 'r') as zip_ref:
                zip_ref.extractall(extract_dir)
            op_files = sorted(listdir(extract_dir))
            assert len(op_files) == len(ref_hashes)
            output_files = Path(extract_dir)
            for f in op_files:
                assert ref_hashes[f] == calculate_hash(f"{extract_dir}/{f}")
            
            # Remove directory and zip file for next test
            rmtree(extract_dir)
            remove(file_name)    
    print_success('CASPER NetCDF conversion requests')
else:
    print('Skipping test: CASPER NetCDF conversion requests')

/var/folders/nw/12mj43fx61v95q0s0f808hs00000gp/T/tmp6zsk4d6p/9157744_TEMPO_NO2_L2_V04_20250913T000441Z_S015G04.zip
/var/folders/nw/12mj43fx61v95q0s0f808hs00000gp/T/tmpew6f4uwc/9157749_TEMPO_NO2_L2_V03_20250309T000257Z_S014G06.zip
[92mSuccess: CASPER NetCDF conversion requests[0m
