# Regression test suite for the Harmony Subsetter with Multi-dimensional Concatenator:

<!-- This notebook provides condensed examples of using Harmony to make requests against the Variable Subsetter services developed and managed by the Data Services team on the Transformation Train. This service makes use of CF-Conventions to retrieve all requested variable from OPeNDAP, along with all those other variables required to make the output product usable in downstream processing (e.g., coordinate and dimension variables). This service can be used with any OPeNDAP-enabled collection that adheres to the Climate and Forecast metadata conventions. -->

The data retrieved from the service chain will be in a NetCDF-4 format.

## Prerequisites

<!-- The dependencies for this notebook are listed in the environment.yaml. To test or install locally, create the papermill environment used in the automated regression testing suite:

conda env create -f ./environment.yaml && conda activate papermill-variable-subsetter -->

A .netrc file must also be located in the test directory of this repository.

## Import requirements:

In [1]:
from os.path import exists

from harmony import Client, Collection, Environment, Request
import numpy as np

from utilities import (compare_results_to_reference_file, print_success,
                       remove_results_files, submit_and_download)

In [2]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

## Identify Harmony environment (for easier reference):

In [3]:
host_environment = {'http://localhost:3000': Environment.LOCAL,
                    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
                    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
                    'https://harmony.earthdata.nasa.gov': Environment.PROD}

harmony_environment = host_environment.get(harmony_host_url)

if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

# Begin regression tests:

## Concatenation Service Chain

The Subsetter with Multi-dimensional Concatenator is currently only activated for collections in the UAT environment. Requests will be made against granules in the TEMPO L2 $NO_2$ and Formaldehyde collections, as these two are the only currently associated collections.

In [5]:
concatenator_chain_non_prod_information = {'collection': 'C1254854453-LARC_CLOUD'}

concatenator_chain_env = {Environment.UAT: concatenator_chain_non_prod_information}

if harmony_environment in concatenator_chain_env:
    concatenator_chain_info = concatenator_chain_env[harmony_environment]
else:
    concatenator_chain_info = None

## Concatenator request:

This is a request to retrieve a concatenated file from 12 TEMPO granules. The request will utilize Batchee, STITCHEE, and CONCISE to both extend the existing `mirror_step` dimension and concatenate along a new `subset_index` dimension.

In [6]:
if concatenator_chain_info is not None:
    concatenator_chain_output_file_name = 'Concatenation_Result.nc4'
    concatenator_chain_request = Request(
        collection=Collection(id=concatenator_chain_info['collection']),
        concatenate="True",
        extend="mirror_step",
        max_results=12
    )

    assert concatenator_chain_request.is_valid()

    submit_and_download(harmony_client, concatenator_chain_request, concatenator_chain_output_file_name)
    assert exists(concatenator_chain_output_file_name), 'Unsuccessful Subsetter-Concatenation request.'

    # compare_results_to_reference_file(
    #     single_var_file_name,
    #     'reference_files/var_subsetter_single_var_reference.nc4'
    # )

    print_success('Subsetter-Concatenation request.')
else:
    print(f'The Subsetter-Concatenation is not configured for environment: "{harmony_environment}" - skipping test.')

job-id: 91ad64fa-c6c0-4385-bc74-2c28ca555fbc
C1254854453-LARC_CLOUD_merged.nc4
Downloaded: C1254854453-LARC_CLOUD_merged.nc4
Saved output to: Concatenation_Result.nc4
[92mSuccess: Subsetter-Concatenation request.[0m
C1254854453-LARC_CLOUD_merged.nc4


### Inspect results

In [7]:
from netCDF4 import Dataset, Group, Variable
import xarray as xr

In [8]:
xr.open_dataset(concatenator_chain_output_file_name)

## Concatenator with variable subsetting request:

This request combines the concatenation with subsetting for two variables.

In [9]:
if concatenator_chain_info is not None:
    concatenator_chain_output_file_name = 'Subset-Concatenation_Result.nc4'
    concatenator_chain_request = Request(
        collection=Collection(id=concatenator_chain_info['collection']),
        variables=["/product/vertical_column_total", "/product/vertical_column_troposphere"],
        concatenate="True",
        extend="mirror_step",
        max_results=12
    )

    assert concatenator_chain_request.is_valid()

    submit_and_download(harmony_client, concatenator_chain_request, concatenator_chain_output_file_name)
    assert exists(concatenator_chain_output_file_name), 'Unsuccessful Subsetter-Concatenation request.'

    # compare_results_to_reference_file(
    #     single_var_file_name,
    #     'reference_files/var_subsetter_single_var_reference.nc4'
    # )

    print_success('Subsetter-Concatenation request.')
else:
    print(f'The Subsetter-Concatenation is not configured for environment: "{harmony_environment}" - skipping test.')

job-id: 437c0d40-3ec5-45e2-b5c4-7bc8569af366
Downloaded: C1254854453-LARC_CLOUD_merged.nc4
Saved output to: Subset-Concatenation_Result.nc4
[92mSuccess: Subsetter-Concatenation request.[0m


In [None]:
# with Dataset(concatenator_chain_output_file_name) as results_ds:  #, Dataset(ref_file) as ref_ds:
#         # compare_group_to_reference(results_ds, ref_ds)
#     # print(results_ds.ncattrs)
#     print(results_ds.dimensions)

In [11]:
xr.open_dataset(concatenator_chain_output_file_name, group="product")