# Regression test suite the Segmented Trajectory Subsetter:

This notebook provides condensed examples of using Harmony to make requests using the Segmented Trajectory Subsetter service developed and managed by the Data Services team on the Transformation Train. This subsetter is designed for use against L2 data, and includes the following capabilities:

* Variable subsetting.
* Temporal subsetting.
* Bounding box spatial subsetting.
* Polygon spatial subsetting.
* Preservation of photon segment indices.

Note, several configuration tips were gained from [this blog post](https://towardsdatascience.com/introduction-to-papermill-2c61f66bea30).

## Prerequisites

The dependencies for this notebook are listed in the [environment.yaml](./environment.yaml). To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill`

A `.netrc` file must also be located in the `test` directory of this repository.

## Import requirements:

In [None]:
from datetime import datetime
from os.path import exists

from harmony import BBox, Client, Collection, Environment, Request

from utilities import (compare_results_to_reference_file, print_success,
                       remove_results_files, submit_and_download)

## Set default parameters:

`papermill` requires default values for parameters used on the workflow. In this case, `harmony_host_url`.

In [None]:
harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'

### Identify Harmony environment (for easier reference):

In [None]:
host_environment = {'http://localhost:3000': Environment.LOCAL,
                    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
                    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
                    'https://harmony.earthdata.nasa.gov': Environment.PROD}


harmony_environment = host_environment.get(harmony_host_url)

if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

# Begin regression tests:

## Segmented Trajectory Subsetter:

The Segmented Trajectory Subsetter is currently only activated for collections in the UAT environment. Requests will be made against granules in the GEDI L4A collection, as this is the only currently active collection. To minimize the size of the output, all requests will use a variable subset - the original granules are > 1 GB in size!

The specific granule used in the requests below was selected to have a trajectory that crosses the Amazon river basin GeoJSON shape used in the MaskFill regression tests above.

In [None]:
traj_sub_non_prod_information = {'collection': Collection(id='C1242267295-EEDTEST'),
                                 'granule_id': 'G1242274836-EEDTEST',
                                 'shape_file_path': 'amazon_basin.geo.json',
                                 'requested_variables': ['/BEAM0000/agbd'],
                                 'retrieved_variables': ['/BEAM0000/agbd', '/BEAM0000/delta_time',
                                                         '/BEAM0000/lat_lowestmode',
                                                         '/BEAM0000/lon_lowestmode',
                                                         '/BEAM0000/shot_number']}

trajectory_subsetter_env = {Environment.LOCAL: traj_sub_non_prod_information,
                            Environment.SIT: traj_sub_non_prod_information,
                            Environment.UAT: traj_sub_non_prod_information}

if harmony_environment in trajectory_subsetter_env:
    trajectory_subsetter_info = trajectory_subsetter_env[harmony_environment]
else:
    trajectory_subsetter_info = None

### Trajectory Subsetter variable subset request:

This is a request to retrieve a variable subset of a GEDI L4A granule. The request will ask for a single variable `/BEAM0000/agbd`, but will retrieve an additional four variables that are required to make the output viable for downstream processing. The five expected output variables are:

* `/BEAM0000/agbd` (above ground biomass density)
* `/BEAM0000/delta_time` (from the `coordinates` metadata attribute of `/BEAM0000/agbd`)
* `/BEAM0000/lat_lowestmode` (from the `coordinates` metadata attribute of `/BEAM0000/agbd`)
* `/BEAM0000/lon_lowestmode` (from the `coordinates` metadata attribute of `/BEAM0000/agbd`)
* `/BEAM0000/shot_number` (from the `ancillary_variables` metadata attribute of `/BEAM0000/agbd`, as configured by `sds-varinfo`)

In [None]:
if trajectory_subsetter_info is not None:
    ts_variable_file_name = 'trajectory_subsetter_variable.h5'
    ts_variable_request = Request(collection=trajectory_subsetter_info['collection'],
                                  granule_id=[trajectory_subsetter_info['granule_id']],
                                  variables=trajectory_subsetter_info['requested_variables'])

    submit_and_download(harmony_client, ts_variable_request, ts_variable_file_name)
    assert exists(ts_variable_file_name), 'Unsuccessful Trajectory Subsetter variable subset request.'

    compare_results_to_reference_file(ts_variable_file_name,
                                      'reference_files/trajectory_subsetter_variable_reference.h5',
                                      '/BEAM0000')

    print_success('Trajectory Subsetter variable subset request.')
else:
    print(f'Trajectory Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Trajectory Subsetter temporal subset request:

This request will combine a variable subset with a temporal range - as defined via the `subset` request parameter. The requested data should fall between 1am and 2am on the 8th of July 2020.

In [None]:
if trajectory_subsetter_info is not None:
    ts_temporal_file_name = 'trajectory_subsetter_temporal.h5'
    ts_temporal_request = Request(collection=trajectory_subsetter_info['collection'],
                                  granule_id=[trajectory_subsetter_info['granule_id']],
                                  variables=trajectory_subsetter_info['requested_variables'],
                                  temporal={'start': datetime(2020, 7, 8, 1, 0, 0),
                                            'stop': datetime(2020, 7, 8, 2, 0, 0)})

    submit_and_download(harmony_client, ts_temporal_request, ts_temporal_file_name)
    assert exists(ts_temporal_file_name), 'Unsuccessful Trajectory Subsetter temporal subset request.'

    compare_results_to_reference_file(ts_temporal_file_name,
                                      'reference_files/trajectory_subsetter_temporal_reference.h5',
                                      '/BEAM0000')

    print_success('Trajectory Subsetter temporal subset request.')
else:
    print(f'Trajectory Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Trajectory Subsetter bounding box spatial subset request:

This request combines the variable subset (for output size purposes) with a bounding box spatial subset. The bounding box has been selected to approximately encompass Brazil:

* -74 ≤ longitude (degrees east) ≤ -35
* -34 ≤ latitude (degress north) ≤ 5

In [None]:
if trajectory_subsetter_info is not None:
    ts_bbox_file_name = 'trajectory_subsetter_bbox.h5'
    ts_bbox_bbox = BBox(w=-74, s=-34, e=-35, n=5)
    ts_bbox_request = Request(collection=trajectory_subsetter_info['collection'],
                              granule_id=[trajectory_subsetter_info['granule_id']],
                              variables=trajectory_subsetter_info['requested_variables'],
                              spatial=ts_bbox_bbox)

    submit_and_download(harmony_client, ts_bbox_request, ts_bbox_file_name)
    assert exists(ts_bbox_file_name), 'Unsuccessful Trajectory Subsetter bounding box subset request.'

    compare_results_to_reference_file(ts_bbox_file_name,
                                      'reference_files/trajectory_subsetter_bbox_reference.h5',
                                      '/BEAM0000')

    print_success('Trajectory Subsetter bounding box spatial subset request.')
else:
    print(f'Trajectory Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Trajectory Subsetter polygon spatial subset request:

The request below combines a variable subset with the Amazon river basin polygon. The output should constrained to be extent of this polygon.

In [None]:
if trajectory_subsetter_info is not None:
    ts_polygon_file_name = 'trajectory_subsetter_polygon.h5'
    ts_polygon_request = Request(collection=trajectory_subsetter_info['collection'],
                                 granule_id=[trajectory_subsetter_info['granule_id']],
                                 variables=trajectory_subsetter_info['requested_variables'],
                                 shape=trajectory_subsetter_info['shape_file_path'])

    submit_and_download(harmony_client, ts_polygon_request, ts_polygon_file_name)
    assert exists(ts_polygon_file_name), 'Unsuccessful Trajectory Subsetter polygon spatial subset request.'

    compare_results_to_reference_file(ts_polygon_file_name,
                                      'reference_files/trajectory_subsetter_polygon_reference.h5',
                                      '/BEAM0000')

    print_success('Trajectory Subsetter polygon spatial subset request.')
else:
    print(f'Trajectory Subsetter is not configured for environment: "{harmony_environment}" - skipping test.')

### Segmented Trajectory Subsetter additional tests:

Ideally, we should test that photon segment indices are correctly handled (e.g., they are all consecutive integers, even if a middle segment is excluded by a subset: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, ...]). Currently (2021-12-02), there are no Cloud-hosted collections with photon segment indices associated with the Segmented Trajectory Subsetter.

# Clean up test outputs:

In [None]:
remove_results_files()