# Regression test suite for the Variable Subsetter backend Harmony service:

This notebook provides condensed examples of using Harmony to make requests against the Variable Subsetter services developed and managed by the Data Services team on the Transformation Train. This service makes use of CF-Conventions to retrieve all requested variable from OPeNDAP, along with all those other variables required to make the output product usable in downstream processing (e.g., coordinate and dimension variables). This service can be used with any OPeNDAP-enabled collection that adheres to the Climate and Forecast metadata conventions.

The data retrieved from OPeNDAP will be in a NetCDF-4 format.

Note, several configuration tips were gained from [this blog post](https://towardsdatascience.com/introduction-to-papermill-2c61f66bea30).

## Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill`

A `.netrc` file must also be located in the `test` directory of this repository.

## Import requirements:

In [None]:
from os.path import exists

from harmony import Client, Collection, Environment, Request
import numpy as np

from utilities import (compare_results_to_reference_file, print_success,
                       remove_results_files, submit_and_download)

## Set default parameters:

`papermill` requires default values for parameters used on the workflow. In this case, `harmony_host_url`.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Identify Harmony environment (for easier reference):

In [None]:
host_environment = {'http://localhost:3000': Environment.LOCAL,
                    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
                    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
                    'https://harmony.earthdata.nasa.gov': Environment.PROD}

harmony_environment = host_environment.get(harmony_host_url)

if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

## Variable Subsetter

The variable subsetter is currently only configured for collections in UAT.

The granule selected is the smallest in the ATL08 collection, to improve performance of tests.

In [None]:
var_subsetter_non_prod_information = {'collection': Collection(id='C1234714698-EEDTEST'),
                                      'granule_id': 'G1238479209-EEDTEST'}

var_subsetter_env = {Environment.LOCAL: var_subsetter_non_prod_information,
                     Environment.SIT: var_subsetter_non_prod_information,
                     Environment.UAT: var_subsetter_non_prod_information}

if harmony_environment in var_subsetter_env:
    var_subsetter_info = var_subsetter_env[harmony_environment]
else:
    var_subsetter_info = None

### Variable Subsetter request, no Int64 variables

This request should retrieve the requested `/gt1l/land_segments/dem_h` variable alongside the following supporting variables:

* `/gt1l/land_segments/delta_time`
* `/gt1l/land_segments/latitude`
* `/gt1l/land_segments/longitude`

In [None]:
if var_subsetter_info is not None:
    no_int64_file_name = 'var_subsetter_no_int64.nc4'
    no_int64_request = Request(collection=var_subsetter_info['collection'],
                               granule_id=[var_subsetter_info['granule_id']],
                               variables=['/gt1l/land_segments/dem_h'])

    submit_and_download(harmony_client, no_int64_request, no_int64_file_name)
    assert exists(no_int64_file_name), 'Unsuccessful non-Int64 Variable Subsetter request.'

    compare_results_to_reference_file(no_int64_file_name,
                              'reference_files/var_subsetter_no_int64_reference.nc4')

    print_success('Variable subsetter synchronous request.')
else:
    print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Variable Subsetter request and Int64 variables

This request should retrieve the `/gt1l/signal_photons/classed_pc_flag` variable and 6 supporting variables, some of which are Int64, which is not supported by the DAP2 protocol:

* `/gt1l/signal_photons/delta_time`
* `/gt1l/land_segments/ph_ndx`
* `/gt1l/land_segments/n_seg_ph`
* `/gt1l/land_segments/delta_time`
* `/gt1l/land_segments/latitude`
* `/gt1l/land_segments/longitude`

In [None]:
if var_subsetter_info is not None:
    int64_file_name = 'var_subsetter_int64.nc4'
    int64_request = Request(collection=var_subsetter_info['collection'],
                            granule_id=[var_subsetter_info['granule_id']],
                            variables=['/gt1l/signal_photons/classed_pc_flag'])

    submit_and_download(harmony_client, int64_request, int64_file_name)
    assert exists(int64_file_name), 'Unsuccessful Int64 Variable Subsetter request.'

    compare_results_to_reference_file(int64_file_name,
                              'reference_files/var_subsetter_int64_reference.nc4')

    print_success('Variable Subsetter Int64 request.')
else:
    print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Variable Subsetter, all variables

Make a request for "all" variables. This should retrieve the entire file, with all the variables from the original source granule.

**2023-01-12: This test is currently disabled as the output contains a variable type for `/ancillary_data/control` that is not currently handled by OPeNDAP (an array of string types).**

In [None]:
# if var_subsetter_info is not None:
#     all_variables_file_name = 'var_subsetter_all_vars.nc4'
#     all_variables_request = Request(collection=var_subsetter_info['collection'],
#                                     granule_id=[var_subsetter_info['granule_id']])
#
#     submit_and_download(harmony_client, all_variables_request, all_variables_file_name)
#     assert exists(all_variables_file_name), 'Unsuccessful Variable Subsetter all variable request.'
# 
#     compare_results_to_reference_file(all_variables_file_name, 'reference_files/to_be_retrieved...')
# 
#     print_success('Variable Subsetter all variable request.')
# else:
#     print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

## Remove results files:

In [None]:
remove_results_files()