# Regression test suite for services managed by the Data Services team:

This notebook provides condensed examples of using Harmony to make requests against the services developed and managed by the Data Services team on the Transformation Train. These services currently include:

* Swath Projector (a.k.a. SWOT Reprojector): `sds/swot-reproject`. A service that projects L2 swath data to a grid.
* Variable Subsetter: `sds/variable-subsetter`. A service that extracts a subset of granule variables from OPeNDAP to provide a smaller, specific output product.
* Harmony OPeNDAP SubSetter (HOSS): `sds/HOSS`. A service for geographically-gridded collections, allowing variable and bounding-box spatial subsetting.
* MaskFill: `sds/maskfill`. A service that sets values outside of a user-defined GeoJSON shape to a fill value.

Note, several configuration tips were gained from [this blog post](https://towardsdatascience.com/introduction-to-papermill-2c61f66bea30).

## Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite: 

`conda env create -f ./environment.yaml && conda activate papermill`

A `.netrc` file must also be located in the `test` directory of this repository.

## Import requirements:

In [None]:
from io import BytesIO
from os import listdir, remove
from time import sleep
from typing import Dict, List, Optional, Tuple
from uuid import uuid4
import json

from h5py import File as H5File
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as np
import requests

## Set default parameters:

`papermill` requires default values for parameters used on the workflow. In this case, `harmony_host_url`.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Identify Harmony environment (for easier reference):

In [None]:
host_environment = {'https://harmony.sit.earthdata.nasa.gov': 'sit',
                    'https://harmony.uat.earthdata.nasa.gov': 'uat',
                    'https://harmony.earthdata.nasa.gov': 'prod'}

harmony_environment = host_environment.get(harmony_host_url, 'sandbox')

# Test data in SIT and UAT are hosted in a Harmony S3 bucket:
uat_staging_bucket = 'https://harmony.uat.earthdata.nasa.gov/service-results/harmony-uat-staging'
sit_staging_bucket = 'https://harmony.sit.earthdata.nasa.gov/service-results/harmony-sit-staging'

## Helper functions:

In [None]:
def print_error(error_string: str) -> str:
    """Print an error, with formatting for red text. """
    print(f'\033[91m{error_string}\033[0m')

In [None]:
def get_harmony_request_url(host_url: str, collection_concept_id: str, variables: List[str] = ['all']):
    """ Construct a Harmony URL to make a request against the correct host, collection and variables. The parameteres for
        the request will be specified using the `params` keyword argument of the `requests.get` function.
    
    """
    return f'{host_url}/{collection_concept_id}/ogc-api-coverages/1.0.0/collections/{",".join(variables)}/coverage/rangeset'

### Download helper functions:

The helper functions below could be removed in favour of `harmony-py` if the Sandbox environment (with a host name containing the Harmony Elastic Load Balancer) was compatible with the `harmony-py` `Client` class.

In [None]:
def request(url: str, extension: str = 'nc4', request_parameters: Optional[Dict] = None,
            files: Optional[Dict] = None) -> Tuple[str, bool]:
    """ Make a request to the specified URL, and save the output response content as a NetCDF-4 file. """
    if files is None:
        response = requests.get(url, params=request_parameters)
    else:
        response = requests.post(url, data=request_parameters, files=files)

    if response.ok:
        file_name = f'{uuid4()}.{extension}'
        with open(file_name, 'wb') as file_handler:
            file_handler.write(response.content)

    else:
        print_error(f'Request failed with status: {response.status_code}')
        print_error(response.text)
        file_name = ''
        success = False

    return file_name, response.ok


def get_job_id(response: requests.Response) -> Optional[str]:
    """ Parse the job ID from the output of an asynchronous request. """
    response_json = response.json()
    return response_json.get('jobID')


def check_job_status(job_id: str) -> Tuple[bool, str]:
    """ Query the Harmony /jobs endpoint to check the status of the job.
        If completed, and successful, return the path to the output file.
        If completed, and unsuccessful, return an empty file path.
        If not completed, return False, and an empty file path.

    """
    empty_file_url = ''
    status_response = requests.get(f'{harmony_host_url}/jobs/{job_id}')
    
    if status_response.ok:
        status_json = status_response.json()
        status = status_json.get('status')

        if status == 'successful':
            completed = True
            file_url = next((link.get('href')
                             for link in status_json.get('links', [])
                             if link.get('rel') == 'data'), empty_file_url)
        elif status == 'failed':
            completed = True
            file_url = empty_file_url
        else:
            completed = False
            file_url = empty_file_url
    else:
        # Problem retrieving the status, avoid an infinite loop.
        completed = True
        file_name = empty_file_url
    
    return completed, file_url

def request_async(url: str, extension: str = 'nc4', request_parameters: Optional[Dict] = None,
                  files: Optional[Dict] = None) -> Tuple[str, bool]:
    """ Make an asynchronous request to the specified URL and retrieve that produced NetCDF-4 file. """
    success = True
    file_name = ''
    request_parameters['forceAsync'] = True

    if files is None:
        job_response = requests.get(url, params=request_parameters)
    else:
        job_response = requests.post(url, data=request_parameters, files=files)
    
    if job_response.ok:
        job_id = get_job_id(job_response)

        if job_id is not None:
            print(f'Asynchronous job submitted, ID: {job_id}')
            job_completed = False
            while not job_completed:
                print('Job not completed, waiting 5 seconds...')
                sleep(5.0)
                job_completed, file_url = check_job_status(job_id)

            if file_url == '':
                print_error(f'Request failed: {job_id}')
                success = False
            else:
                print('Request successful, retrieving output file.')
                file_name, download_success = request(file_url, extension=extension)
        else:
            print_error('Failed to submit Harmony request')
            print_error(job_response.text)
            success = False
    else:
        print_error(f'Request failed with status: {job_response.status_code}')
        print_error(job_response.text)
        success = False

    return file_name, success

### Helper functions to check variables in output file:

In [None]:
def variable_in_dataset(dataset: Dataset, variable_name: str) -> bool:
    """ Check if a variable is present in a dataset. The variable name must
        be the full path, including groups it is nested in.

    """
    variable_bits = variable_name.lstrip('/').split('/')
    working_group = dataset
    
    while len(variable_bits) > 1:
        group = variable_bits.pop(0)
        if group in working_group.groups:
            working_group = working_group[group]
        else:
            return False

    variable_base_name = variable_bits.pop(0)
    return variable_base_name in working_group.variables
    

def all_variables_present(file_name: str, variable_list: List[str]) -> bool:
    """ Take a list of variable and ensure that all of them are present in the
        downloaded NetCDF-4 file.

    """
    with Dataset(file_name, 'r') as dataset:
        return all(variable_in_dataset(dataset, variable) for variable in variable_list)

### Plotting helper functions

In [None]:
def create_plot(variable_data, x_values, y_values, title=None, colourbar_units=None,
                x_label=None, y_label=None, levels=20, fill_value=None):
    """ This helper function will display a contour plot of the requested data. This
        function assumes the variable data will be two, or three dimensionally gridded, with
        dimensions: (time, latitude, longitude) or (latitude, longitude).
        
        For 3-dimensional data, the first slice in time is extracted for plotting.

    """
    masked_variable = np.ma.masked_where(variable_data[:] == fill_value, variable_data)

    fig = plt.figure(figsize=(10, 10))

    if title is not None:
        fig.suptitle(title, fontsize=20)

    ax = plt.axes(xlabel=x_label, ylabel=y_label)

    if len(variable_data.shape) == 3:
        variable_slice = masked_variable[0][:]
    else:
        variable_slice = masked_variable

    # Plot masked data:
    colour_scale = ax.contourf(x_values[:], y_values[:], variable_slice, levels=levels)
    
    # Add colour bar for scaling
    colour_bar = plt.colorbar(colour_scale, ax=ax, orientation='horizontal', pad=0.05)

    if colourbar_units is not None:
        colour_bar.set_label(colourbar_units, fontsize=14)

    plt.tight_layout()
    plt.show()


def plot_variable(file_name, variable, x_variable, y_variable, title, colourbar_units,
                  x_label, y_label, levels=20, fill_value=None):
    """ Open the requested NetCDF-4 file and pass the variables through to the `create_plot`
        function.

    """
    if file_name.endswith('.nc4'):
        # Swath Projector, Variable Subsetter and HOSS
        with Dataset(file_name, 'r') as dataset:
            create_plot(dataset[variable], dataset[x_variable], dataset[y_variable],
                        title=title, colourbar_units=colourbar_units, x_label=x_label,
                        y_label=y_label, levels=levels, fill_value=fill_value)
    elif file_name.endswith('.h5'):
        # MaskFill
        with H5File(file_name, 'r') as h5_file:
            create_plot(h5_file[variable], h5_file[x_variable], h5_file[y_variable],
                        title=title, colourbar_units=colourbar_units, x_label=x_label,
                        y_label=y_label, levels=levels, fill_value=fill_value)
    else:
        print_error('Problem with request, not able to plot output.')

# Begin regression tests:

## Swath Projector:

The Swath Projector is currently only configured for collections in UAT.

In [None]:
swath_projector_env = {'uat': {'collection_id': 'C1233860183-EEDTEST',
                               'granule_id': 'G1233860549-EEDTEST',
                               'original_data_url': f'{uat_staging_bucket}/public/harmony_example_l2/nc/015_00_210_africa.nc'}}


if harmony_environment in swath_projector_env:
    swath_projector_info = swath_projector_env[harmony_environment]
    swath_projector_request_url = get_harmony_request_url(harmony_host_url,
                                                          swath_projector_info['collection_id'])

### Retrieve original granule for visual comparison:

In [None]:
if harmony_environment in swath_projector_env:
    original_file_name, request_success = request(swath_projector_info['original_data_url'])

    plot_variable(original_file_name, 'alpha_var', 'lon', 'lat', title='Input Africa granule',
                  colourbar_units='Land mask', x_label='Longitude (degrees east)', y_label='Latitude (degrees north)')
    
    assert request_success, 'Unsuccessful download of Swath Projector source data.'
    
    expected_variables = ['/lat', '/lon', '/time', '/alpha_var', '/blue_var', '/green_var', '/red_var']
    assert all_variables_present(original_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Swath Projector is not configured for environment: "{harmony_environment}" - skipping download.')

### Swath Projector request with defaults:

Make a request that only specifies the collection and an appropriate granule. This should rely on the default target Coordinate Reference System (CRS) and interpolation method.

In [None]:
if harmony_environment in swath_projector_env:
    defaults_file_name, defaults_success = request(swath_projector_request_url,
                                                   request_parameters={'granuleId': swath_projector_info['granule_id']})

    plot_variable(defaults_file_name, 'alpha_var', 'lon', 'lat', title='Default parameters Africa granule',
                  colourbar_units='Land mask', x_label='Longitude (degrees east)', y_label='Latitude (degrees north)')
    
    assert defaults_success, 'Unsuccessful Swath Projector default request'
    
    expected_variables = ['/lat', '/lon', '/latitude_longitude', '/time', '/alpha_var', '/blue_var', '/green_var', '/red_var']
    assert all_variables_present(defaults_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Swath Projector is not configured for environment: "{harmony_environment}" - skipping test.')

### Swath Projector request for Madagascar:

Make a request to the Swath Projector specifying a target CRS using an EPSG code, and requested that the target grid covers only the area surrounding Madagascar, using the `scaleExtents` parameter.

In [None]:
if harmony_environment in swath_projector_env:
    epsg_file_name, epsg_success = request(
        swath_projector_request_url,
        request_parameters={'granuleId': swath_projector_info['granule_id'],
                            'outputCrs': 'EPSG:4326',
                            'scaleExtent': '42,-27,52,-10', # W, S, E, N
                            'subset': 'time("2020-01-15T00:00:00Z":"2020-01-16T00:00:00Z")'}
    )

    plot_variable(epsg_file_name, 'alpha_var', 'lon', 'lat', title='EPSG:4326 output Africa granule',
                  colourbar_units='Land mask', x_label='Longitude (degrees east)', y_label='Latitude (degrees north)')
    
    assert epsg_success, 'Unsuccessful Swath Projector EPSG code request.'
    
    expected_variables = ['/lat', '/lon', '/latitude_longitude', '/time', '/alpha_var', '/blue_var', '/green_var', '/red_var']
    assert all_variables_present(epsg_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Swath Projector is not configured for environment: "{harmony_environment}" - skipping test.')

### Swath Projector, interpolation type and Proj4:

Use the `interpolation` and `outputCrs` parameters to ensure a raw Proj4 string is valid input and that the user can select a non-default interpolation type.

In [None]:
if harmony_environment in swath_projector_env:
    proj4_string_file_name, proj4_string_success = request(
        swath_projector_request_url,
        request_parameters={'outputCrs': '+proj=lcc +lat_1=43 +lat_2=62 +lat_0=30 +lon_0=10 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs',
                            'interpolation': 'near',
                            'subset': 'time("2020-01-15T00:00:00Z":"2020-01-16T00:00:00Z")',
                            'granuleId': swath_projector_info['granule_id']}
    )

    plot_variable(proj4_string_file_name, 'alpha_var', 'x', 'y', title='Lambert Conformal Conic CRS, Africa granule',
                  colourbar_units='Land mask', x_label='Longitude (degrees east)', y_label='Latitude (degrees north)')

    assert proj4_string_success, 'Unsuccessful Swath Projector interpolation and Proj4 request.'
    
    expected_variables = ['/x', '/y', '/lambert_conformal_conic', '/time', '/alpha_var', '/blue_var', '/green_var', '/red_var']
    assert all_variables_present(proj4_string_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Swath Projector is not configured for environment: "{harmony_environment}" - skipping test.')

### Swath Projector asynchronous request:

In [None]:
if harmony_environment in swath_projector_env:
    async_file_name, async_success = request_async(
        swath_projector_request_url,
        request_parameters={'forceAsync': True,
                            'granuleId': swath_projector_info['granule_id']}
    )

    plot_variable(async_file_name, 'alpha_var', 'lon', 'lat', title='Scale extents output Africa granule',
                  colourbar_units='Land mask', x_label='Longitude (degrees east)', y_label='Latitude (degrees north)')

    assert async_success, 'Unsuccessful Swath Projector asynchronous request'

    expected_variables = ['/lat', '/lon', '/latitude_longitude', '/time', '/alpha_var', '/blue_var', '/green_var', '/red_var']
    assert all_variables_present(async_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Swath Projector is not configured for environment: "{harmony_environment}" - skipping test.')

## Variable Subsetter

The variable subsetter is currently only configured for collections in UAT.

The granule selected is the smallest in the ATL08 collection, to improve performance of tests.

In [None]:
var_subsetter_env = {'uat': {'collection_id': 'C1234714698-EEDTEST',
                             'granule_id': 'G1238479209-EEDTEST',
                             'original_data_url': f'{uat_staging_bucket}/public/sds/staged/ATL08_20200711232648_02530814_003_01.h5'}}

if harmony_environment in var_subsetter_env:
    var_subsetter_info = var_subsetter_env[harmony_environment]

### Variable Subsetter synchronous request, no Int64 variables

This request should retrieve the requested `/gt1l/land_segments/dem_h` variable alongside the following supporting variables:

* `/gt1l/land_segments/delta_time`
* `/gt1l/land_segments/latitude`
* `/gt1l/land_segments/longitude`

Note: The paths in the URL for Variable Subsetter and HOSS requests must be URL encoded, so the slashes can be included.

In [None]:
if harmony_environment in var_subsetter_env:
    sync_request_url = get_harmony_request_url(harmony_host_url,
                                               var_subsetter_info['collection_id'],
                                               ['%2Fgt1l%2Fland_segments%2Fdem_h'])

    sync_file_name, sync_success = request(sync_request_url,
                                           request_parameters={'granuleId': var_subsetter_info['granule_id']})

    assert sync_success, 'Unsuccessful synchronous Variable Subsetter request.'

    expected_variables = ['/gt1l/land_segments/dem_h', '/gt1l/land_segments/delta_time',
                          '/gt1l/land_segments/latitude', '/gt1l/land_segments/longitude']
    
    assert all_variables_present(sync_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Variable Subsetter asynchronous request and Int64 variables

This request should retrieve the `/gt1l/signal_photons/classed_pc_flag` variable and 6 supporting variables, some of which are Int64, which is not supported by the DAP2 protocol:

* `/gt1l/signal_photons/delta_time`
* `/gt1l/land_segments/ph_ndx`
* `/gt1l/land_segments/n_seg_ph`
* `/gt1l/land_segments/delta_time`
* `/gt1l/land_segments/latitude`
* `/gt1l/land_segments/longitude`

In [None]:
if harmony_environment in var_subsetter_env:
    async_request_url = get_harmony_request_url(harmony_host_url,
                                                var_subsetter_info['collection_id'],
                                                ['%2Fgt1l%2Fsignal_photons%2Fclassed_pc_flag'])

    async_file_name, async_success = request_async(
        async_request_url,
        request_parameters={'granuleId': var_subsetter_info['granule_id'], 'forceAsync': True}
    )

    assert async_success, 'Unsuccessful asynchronous Variable Subsetter request'

    expected_variables = ['/gt1l/signal_photons/classed_pc_flag', '/gt1l/signal_photons/delta_time',
                          '/gt1l/land_segments/ph_ndx_beg', '/gt1l/land_segments/n_seg_ph',
                          '/gt1l/land_segments/delta_time', '/gt1l/land_segments/latitude',
                          '/gt1l/land_segments/longitude']
    
    assert all_variables_present(async_file_name, expected_variables), 'Missing variables in downloaded output'
else:
    print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Variable Subsetter, all variables

Make a request for "all" variables. This should retrieve the entire file, with all the variables from the original source granule.

**This test is commented out as the request from Harmony to OPeNDAP often times out. The same branch of code is exercised by the HOSS all-variable, no bounding box regression test.**

In [None]:
"""
if harmony_environment in var_subsetter_env:
    all_request_url = get_harmony_request_url(harmony_host_url,
                                              var_subsetter_info['collection_id'])

    all_file_name, all_success = request_async(
        all_request_url,
        request_parameters={'granuleId': var_subsetter_info['granule_id'], 'forceAsync': True}
    )

    assert all_success, 'Unsuccessful Variable Subsetter request, all variables'

    # Recursive check of all variables and groups from input granule.
else:
    print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')
"""

## Harmony OPeNDAP SubSetter (HOSS):

HOSS is currently only activated for collections in UAT. Requests will be made against the GHRC RSSMIF16D collection.

In [None]:
hoss_env = {'uat': {'collection_id': 'C1222931739-GHRC_CLOUD',
                    'bounding_box': [-150, 0, -105, 15],  # W, S, E, N
                    'granule_id': 'G1240561151-GHRC_CLOUD',
                    'original_data_url': ('https://data.ghrc.uat.earthdata.nasa.gov/'
                                          'ghrcwuat-protected/rss/rssmif16d__7/f16_ssmis_20210425v7.nc'),
                    'subset_param': ['lat(0:15)', 'lon(-150:-105)']}}

if harmony_environment in hoss_env:
    hoss_info = hoss_env[harmony_environment]

### Retrieve original data for visual comparison

In [None]:
if harmony_environment in hoss_env:
    original_file_name, request_success = request(hoss_info['original_data_url'])
    assert request_success, 'Unsuccessful download of HOSS input granule.'
    
    plot_variable(original_file_name, '/atmosphere_cloud_liquid_water_content', '/longitude', '/latitude',
                  title='HOSS input granule.', colourbar_units='Columnar cloud liquid water (kg.m-2)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(-0.05, 2.45, 51))
else:
    print(f'The Variable Subsetter is not configured for environment: "{harmony_environment}" - skipping download.')

### HOSS synchronous request

This is a request the exercises the full range of HOSS options: bounding box and variable subsetting.

Requested parameter:

* `/atmosphere_water_vapor_content`

Additional required parameters (grid dimensions):

* `/latitude`
* `/longitude`
* `/time`

In [None]:
if harmony_environment in hoss_env:
    sync_request_url = get_harmony_request_url(harmony_host_url,
                                               hoss_info['collection_id'],
                                               ['atmosphere_cloud_liquid_water_content'])

    sync_file_name, sync_success = request(
        sync_request_url,
        request_parameters={'granuleId': hoss_info['granule_id'], 'subset': hoss_info['subset_param']}
    )

    assert sync_success, 'Unsuccessful HOSS synchronous request.'

    plot_variable(sync_file_name, '/atmosphere_cloud_liquid_water_content', '/longitude', '/latitude',
                  title='HOSS synchronous results.', colourbar_units='Columnar cloud liquid water (kg.m-2)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(-0.05, 2.45, 51))

    expected_variables = ['/atmosphere_cloud_liquid_water_content', '/latitude', '/longitude', '/time']
    assert all_variables_present(sync_file_name, expected_variables), 'Missing variables in HOSS synchronous output'
else:
    print(f'HOSS is not configured for environment: "{harmony_environment}" - skipping test.')

### HOSS asynchronous request

In [None]:
if harmony_environment in hoss_env:
    async_request_url = get_harmony_request_url(harmony_host_url,
                                                hoss_info['collection_id'],
                                                ['atmosphere_cloud_liquid_water_content'])

    async_file_name, async_success = request_async(
        async_request_url,
        request_parameters={'granuleId': hoss_info['granule_id'],
                            'subset': hoss_info['subset_param'],
                            'forceAsync': True}
    )

    assert async_success, 'Unsuccessful HOSS asynchronous request.'

    plot_variable(async_file_name, '/atmosphere_cloud_liquid_water_content', '/longitude', '/latitude',
                  title='HOSS asynchronous results.', colourbar_units='Columnar cloud liquid water (kg.m-2)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(-0.05, 2.45, 51))

    expected_variables = ['/atmosphere_cloud_liquid_water_content', '/latitude', '/longitude', '/time']
    assert all_variables_present(async_file_name, expected_variables), 'Missing variables in HOSS asynchronous output'
else:
    print(f'HOSS is not configured for environment: "{harmony_environment}" - skipping test.')

### HOSS bounding box crosses grid edge:

For collections where the grid edge is the Prime Meridian (0 degrees east) rather than the Antimeridian (180 degrees east) HOSS needs to be able to function when a user requests a region crossing the Prime Meridian (for example a box containing the UK). It currently retrieves the specified latitude range, but the full longitude range, and fills outside the bounding box region.

The expected output will look like two vertical stripes of data, one each at the lefthand and righthand edge of the plot.

In [None]:
if harmony_environment in hoss_env:
    grid_edge_request_url = get_harmony_request_url(harmony_host_url,
                                                    hoss_info['collection_id'],
                                                    ['atmosphere_cloud_liquid_water_content'])

    grid_edge_file_name, grid_edge_success = request_async(
        grid_edge_request_url,
        request_parameters={'granuleId': hoss_info['granule_id'], 'subset': ['lat(-60:-30)', 'lon(-15:15)']}
    )

    assert grid_edge_success, 'Unsuccessful HOSS request crossing longitudinal edge.'

    plot_variable(grid_edge_file_name, '/atmosphere_cloud_liquid_water_content', '/longitude', '/latitude',
                  title='HOSS request crossing grid edge.', colourbar_units='Columnar cloud liquid water (kg.m-2)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(-0.05, 2.45, 51))

    expected_variables = ['/atmosphere_cloud_liquid_water_content', '/latitude', '/longitude', '/time']
    assert all_variables_present(grid_edge_file_name, expected_variables), 'Missing variables in grid-edge-crossing output'
else:
    print(f'HOSS is not configured for environment: "{harmony_environment}" - skipping test.')

### HOSS request no bounding box

If a bounding box is not specified for a HOSS-activated collection, a variable subset will still be performed. The requested variables will be returned, with their full original data.

In [None]:
if harmony_environment in hoss_env:
    no_bbox_request_url = get_harmony_request_url(harmony_host_url,
                                                  hoss_info['collection_id'],
                                                  ['sst_dtime', 'wind_speed'])

    no_bbox_file_name, no_bbox_success = request_async(
        no_bbox_request_url,
        request_parameters={'granuleId': hoss_info['granule_id']}
    )

    assert no_bbox_success, 'Unsuccessful HOSS request without bounding box.'

    plot_variable(no_bbox_file_name, '/wind_speed', '/longitude', '/latitude',
                  title='HOSS request no bounding box.', colourbar_units='Wind speed (m/s)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(0, 50, 51))

    expected_variables = ['/sst_dtime', '/wind_speed', '/latitude', '/longitude', '/time']
    assert all_variables_present(no_bbox_file_name, expected_variables), 'Missing variables in no bounding box output'
else:
    print(f'HOSS is not configured for environment: "{harmony_environment}" - skipping test.')

### HOSS request all variables

If there are no variables specified, HOSS should retrieve all variables. If the bounding box is specified, all gridded variables should still be constrained to the requested spatial region.

**This regression test cannot be deployed until Harmony v0.0.389 is deployed to UAT.**

In [None]:
"""
if harmony_environment in hoss_env:
    all_request_url = get_harmony_request_url(harmony_host_url, hoss_info['collection_id'])

    all_file_name, all_success = request_async(
        all_request_url,
        request_parameters={'granuleId': hoss_info['granule_id'],
                            'subset': hoss_info['subset_param']}
    )

    assert all_success, 'Unsuccessful HOSS all-variable request.'

    plot_variable(all_file_name, '/atmosphere_cloud_liquid_water_content', '/longitude', '/latitude',
                  title='HOSS all variable results.', colourbar_units='Columnar cloud liquid water (kg.m-2)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(-0.05, 2.45, 51))

    expected_variables = ['/atmosphere_cloud_liquid_water_content', '/atmosphere_water_vapor_content',
                          '/latitude', '/longitude', '/rainfall_rate', '/sst_dtime', '/time', '/wind_speed']
    assert all_variables_present(all_file_name, expected_variables), 'Missing variables in HOSS all-variable output'
else:
    print(f'HOSS is not configured for environment: "{harmony_environment}" - skipping test.')
"""

### HOSS request all variables, no bounding box

If no variables and no bounding box are specified, the entire original granule should be retrieved. (This will run the Variable Subsetter branch of the `sds/variable-subsetter` Docker image, skipping any spatial subsetting portion of the service)

The plotted image should match the original source data, plotted above as "HOSS input granule".

In [None]:
if harmony_environment in hoss_env:
    all_no_bbox_request_url = get_harmony_request_url(harmony_host_url, hoss_info['collection_id'])

    all_no_bbox_file_name, all_no_bbox_success = request_async(
        all_no_bbox_request_url,
        request_parameters={'granuleId': hoss_info['granule_id']}
    )

    assert all_no_bbox_success, 'Unsuccessful HOSS all-variable, no bounding box request.'

    plot_variable(all_no_bbox_file_name, '/atmosphere_cloud_liquid_water_content', '/longitude', '/latitude',
                  title='HOSS all variables, no bounding box results.',
                  colourbar_units='Columnar cloud liquid water (kg.m-2)',
                  x_label='Longitude (degrees east)', y_label='Latitude (degrees north)',
                  levels=np.linspace(-0.05, 2.45, 51))

    expected_variables = ['/atmosphere_cloud_liquid_water_content', '/atmosphere_water_vapor_content',
                          '/latitude', '/longitude', '/rainfall_rate', '/sst_dtime', '/time', '/wind_speed']
    assert all_variables_present(all_no_bbox_file_name, expected_variables), 'Missing variables in HOSS all-variable, no bbox output'
else:
    print(f'HOSS is not configured for environment: "{harmony_environment}" - skipping test.')

## MaskFill:

MaskFill is currently only activated for collections in the UAT environment. Requests will be made against granules in the SPL4CMDL collection, as this is the only currently active collection. The download of these granules may be slow, as they are over 100 MB in size.

In [None]:
maskfill_env = {'uat': {'collection_id': 'C1240150677-EEDTEST',
                        'shape_file_path': 'amazon_basin.geo.json',
                        'granule_id': 'G1240154805-EEDTEST',
                        'original_data_url': f'{uat_staging_bucket}/public/sds/staged/SPL4CMDL/SMAP_L4_C_mdl_20210223T000000_Vv5024_001.h5'}}

if harmony_environment in maskfill_env:
    maskfill_info = maskfill_env[harmony_environment]
    maskfill_request_url = get_harmony_request_url(harmony_host_url, maskfill_info['collection_id'])

### MaskFill synchronous request:

This request uses a GeoJSON shape file of the Amazon River basin.

In [None]:
if harmony_environment in maskfill_env:
    sync_file_name, sync_success = request(
        maskfill_request_url,
        extension='h5',
        request_parameters={'granuleId': maskfill_info['granule_id']},
        files={'shapefile': (maskfill_info['shape_file_path'], open(maskfill_info['shape_file_path'], 'r'), 'application/geo+json')}
    )
    assert sync_success, 'Unsuccessful MaskFill synchronous request.'

    plot_variable(sync_file_name, '/GPP/gpp_mean', '/x', '/y',
                  title='MaskFill synchronous results.',
                  colourbar_units='Gross Primary Productivity ($\mathrm{g.cm}^{-2}.\mathrm{day}^{-1}$)',
                  x_label='EASE-2 grid x coordinate (m)', y_label='EASE-2 grid y coordinate (m)',
                  levels=np.linspace(0, 30, 31))
else:
    print(f'MaskFill is not configured for environment: "{harmony_environment}" - skipping test.')

### MaskFill asynchronous request:

For the SPL4CMDL collection, an asynchronous request is preferable for normal operations due to the size of each granule.

In [None]:
if harmony_environment in maskfill_env:
    async_file_name, async_success = request_async(
        maskfill_request_url,
        extension='h5',
        request_parameters={'granuleId': maskfill_info['granule_id']},
        files={'shapefile': (maskfill_info['shape_file_path'], open(maskfill_info['shape_file_path'], 'r'), 'application/geo+json')}
    )

    assert async_success, 'Unsuccessful MaskFill asynchronous request.'

    plot_variable(async_file_name, '/GPP/gpp_mean', '/x', '/y',
                  title='MaskFill asynchronous results.',
                  colourbar_units='Gross Primary Productivity ($\mathrm{g.cm}^{-2}.\mathrm{day}^{-1}$)',
                  x_label='EASE-2 grid x coordinate (m)', y_label='EASE-2 grid y coordinate (m)',
                  levels=np.linspace(0, 30, 31))
else:
    print(f'MaskFill is not configured for environment: "{harmony_environment}" - skipping test.')

# Clean up test outputs:

In [None]:
directory_files = listdir()

for directory_file in directory_files:
    if directory_file.endswith(('.nc4', '.h5')):
        remove(directory_file)