# LAADS DAAC Geoloco Regression Tests

This notebook contains contains a suite of regression tests against LAADS DAAC Geoloco Harmony Service against truth data generated on premises. 

Geoloco ideally operates on Levels 3 & 4 data. Level 1B and 2 data can be operated, but the outputs will be automatically reprojected to `Geographic` and regridded to its raster resolution if no `proj4` string is supplied. 

Although geoloco can perform variable and location subsetting, reprojection, resampling and regridding, this regression regression test suite will focus on reprojection, resampling and regridding. 

This notebook will have to be ran from within the AWS `us-west-2` to access data in S3.

## Prerequisites

The dependencies for this notebook are listed in the environment.yaml. To test or install locally, create the papermill environment used in the automated regression testing suite:

`conda env create -f ./environment.yaml && conda activate papermill-geoloco`

A `.netrc` file must also be located in the test directory of this repository.

In [None]:
from os.path import exists

from harmony import Client, Collection, Environment, Request

from utilities import (submit_and_download, get_dim_sizes,
                       remove_results_files, print_error, print_success)

## Set Default Parameters

`papermill` requires default values for paramters used on the workflow. In this case, `harmony_host_url`

The following are the valid values
- Production: https://harmony.earthdata.nasa.gov
- UAT: https://harmony.uat.earthdata.nasa.gov
- SIT: https://harmony.sit.earthdata.nasa.gov
- Local: http://localhost:3000

In [None]:
# harmony_host_url = 'https://harmony.uat.earthdata.nasa.gov'
harmony_host_url = 'http://localhost:3000'

## Identify Harmony Environment

In [None]:
host_environment = {'http://localhost:3000': Environment.LOCAL,
                    'https://harmony.sit.earthdata.nasa.gov': Environment.SIT,
                    'https://harmony.uat.earthdata.nasa.gov': Environment.UAT,
                    'https://harmony.earthdata.nasa.gov': Environment.PROD}


harmony_environment = host_environment.get(harmony_host_url)

if harmony_environment is not None:
    harmony_client = Client(env=harmony_environment)

## Setting up Collection Environment Variables

The cell below sets up the Collection, Granule and other necessary variables for each tested dataset. The datasets provided are in the `UAT` environment. There is one dataset for Level 1B, Level 2 and Level 3.

- Level 1B: MOD021KM
- Level 2: MOD35_L2
- Level 3: MOD08_D3

Also provided are `proj4` strings for UTM and Geographic tranformations for reprojections and downscale sizing for regridding.

In [None]:
mod021km_non_production_info = {'collection' : Collection(id='C1256826282-LAADSCDUAT'),
                               'granule_id' : 'G1259320275-LAADSCDUAT',
                               'variable' : ['EV_1KM_RefSB'],
                               'downscale_size' : [0.004505, 0.004505],
                               'interpolation_string': 'Nearest'}

mod35l2_non_production_info = {'collection' : Collection(id='C1257437479-LAADSCDUAT'),
                               'granule_id' : 'G1259231639-LAADSCDUAT',
                               'variable' : ['Cloud_Mask'],
                               'downscale_size' : [0.004505, 0.004505],
                               'interpolation_string': 'Nearest'}

mod08d3_non_production_info = {'collection' : Collection(id='C1257773477-LAADSCDUAT'),
                               'granule_id' : 'G1259320277-LAADSCDUAT',
                               'variable' : ['Aerosol_Optical_Depth_Land_Ocean_Mean'],
                               'downscale_size' : [.5, .5],
                               'interpolation_string': 'Nearest'}

geo_proj4_string = '+a=6378137.0 +b=6356752.3142451793 +no_defs +proj=latlong'
utm_proj4_string = '+a=6378137.0 +b=6356752.3142451793 +no_defs +proj=utm +unit=meters +zone=59'

resampling_string = 'NN'

file_indicators = {'MOD021KM': 'EV_1KM_RefSB_1.hdf',
                   'MOD35_L2': 'Cloud_Mask_1.hdf',
                   'MOD08_D3': 'Aerosol_Optical_Depth_Land_Ocean_Mean.hdf'}

reprojection_truth_data = {'MOD021KM': 'truth_data/MOD021KM.A2023001.0020.061.psrpcs_001678984001.EV_1KM_RefSB_1.hdf',
                           'MOD35_L2': 'truth_data/MOD35_L2.A2023001.0020.061.psrpcs_001678984540.Cloud_Mask_1.hdf',
                           'MOD08_D3': 'truth_data/MOD08_D3.A2023001.061.psrpcs_001678986883.Aerosol_Optical_Depth_Land_Ocean_Mean.hdf'}

regridding_truth_data = {'MOD021KM': 'truth_data/MOD021KM.A2023001.0020.061.psrpcs_001679406616.EV_1KM_RefSB_1.hdf',
                         'MOD35_L2': 'truth_data/MOD35_L2.A2023001.0020.061.psrpcs_001680293530.Cloud_Mask_1.hdf',
                         'MOD08_D3': 'truth_data/MOD08_D3.A2023001.061.psrpcs_001679410851.Aerosol_Optical_Depth_Land_Ocean_Mean.hdf'}

resampling_truth_data = {'MOD021KM': 'truth_data/MOD021KM.A2023001.0020.061.psrpcs_001679674560.EV_1KM_RefSB_1.hdf',
                         'MOD35_L2': 'truth_data/MOD35_L2.A2023001.0020.061.psrpcs_001679677203.Cloud_Mask_1.hdf',
                         'MOD08_D3': 'truth_data/MOD08_D3.A2023001.061.psrpcs_001679677149.Aerosol_Optical_Depth_Land_Ocean_Mean.hdf'}

These selected collections and granules are only available in UAT environment. To minimize the output, all requests will utilize variable subsetting.

In [None]:
mod021km_geoloco_env = {Environment.LOCAL: mod021km_non_production_info,
                       Environment.UAT: mod021km_non_production_info,
                       Environment.SIT: mod021km_non_production_info}
mod35l2_geoloco_env = {Environment.LOCAL: mod35l2_non_production_info,
                       Environment.UAT: mod35l2_non_production_info,
                       Environment.SIT: mod35l2_non_production_info}
mod08d3_geoloco_env = {Environment.LOCAL: mod08d3_non_production_info,
                       Environment.UAT: mod08d3_non_production_info,
                       Environment.SIT: mod08d3_non_production_info}

if harmony_environment in mod021km_geoloco_env:
    mod021km_geoloco_info =  mod021km_geoloco_env[harmony_environment]
else:
    mod021km_geoloco_info = None

if harmony_environment in mod35l2_geoloco_env:
    mod35l2_geoloco_info =  mod35l2_geoloco_env[harmony_environment]
else:
    mod35l2_geoloco_info = None

if harmony_environment in mod08d3_geoloco_env:
    mod08d3_geoloco_info =  mod08d3_geoloco_env[harmony_environment]
else:
    mod08d3_geoloco_info = None

## Reprojection Regression Tests

In the cell below, reprojection is tested using a UTM `proj4` string for the Level 1B, Level 2 and Level 3 Collections/Granules. Outputs are considerably reduced using variable subsetting. Dimension sizes are checked between the truth data and the output data.

In [None]:
if (mod021km_geoloco_info is not None and 
    mod35l2_geoloco_info is not None and
    mod08d3_geoloco_info is not None):

    mod021km_reprojection_request = Request(collection=mod021km_geoloco_info['collection'],
                                            granule_id=mod021km_geoloco_info['granule_id'],
                                            variables=mod021km_geoloco_info['variable'],
                                            crs=utm_proj4_string)

    mod35l2_reprojection_request = Request(collection=mod35l2_geoloco_info['collection'],
                                           granule_id=mod35l2_geoloco_info['granule_id'],
                                           variables=mod35l2_geoloco_info['variable'],
                                           crs=utm_proj4_string)

    mod08d3_reprojection_request = Request(collection=mod08d3_geoloco_info['collection'],
                                           granule_id=mod08d3_geoloco_info['granule_id'],
                                           variables=mod08d3_geoloco_info['variable'],
                                           crs=utm_proj4_string)

    mod021km_reprojection_compare_file = submit_and_download(harmony_client, mod021km_reprojection_request, file_indicators['MOD021KM'])
    mod35l2_reprojection_compare_file = submit_and_download(harmony_client, mod35l2_reprojection_request, file_indicators['MOD35_L2'])
    mod08d3_reprojection_compare_file = submit_and_download(harmony_client, mod08d3_reprojection_request, file_indicators['MOD08_D3'])

    geoloco_reprojection_tests = 1
    
    mod021km_truth_dims = get_dim_sizes(reprojection_truth_data['MOD021KM'])
    mod021km_dims = get_dim_sizes(mod021km_reprojection_compare_file)
    if mod021km_truth_dims != mod021km_dims:
        print_error('MOD021KM Reprojection mismatch.')
        geoloco_reprojection_tests = 0
    
    mod35l2_truth_dims = get_dim_sizes(reprojection_truth_data['MOD35_L2'])
    mod35l2_dims = get_dim_sizes(mod35l2_reprojection_compare_file)
    if mod35l2_truth_dims != mod35l2_dims:
        print_error('MOD35_L2 Reprojection mismatch.')
        geoloco_reprojection_tests = 0

    mod08d3_truth_dims = get_dim_sizes(reprojection_truth_data['MOD08_D3'])
    mod08d3_dims = get_dim_sizes(mod08d3_reprojection_compare_file)
    if mod08d3_truth_dims != mod08d3_dims:
        print_error('MOD08_D3 Reprojection file mismatch.')
        geoloco_reprojection_tests = 0

    if geoloco_reprojection_tests == 1:
        print_success('Geoloco Reprojection requests.')
    
    remove_results_files()
else:
    print(f'Geoloco is not configured for this environment: "{harnomy_environment}" - skipping test.')

## Regridding Regression Tests

In the cell below, regridding is tested using a GEO `proj4` string for the Level 1B, Level 2 and Level 3 Collections/Granules and reprojection scale sizes. Outputs are considerably reduced using variable subsetting. Dimension sizes are checked between the truth data and the output data.

In [None]:
if (mod021km_geoloco_info is not None and 
    mod35l2_geoloco_info is not None and
    mod08d3_geoloco_info is not None):

    mod021km_regridding_request = Request(collection=mod021km_geoloco_info['collection'],
                                          granule_id=mod021km_geoloco_info['granule_id'],
                                          variables=mod021km_geoloco_info['variable'],
                                          crs=geo_proj4_string,
                                          scale_size=mod021km_geoloco_info['downscale_size'],
                                          interpolation=mod021km_geoloco_info['interpolation_string'])

    mod35l2_regridding_request = Request(collection=mod35l2_geoloco_info['collection'],
                                         granule_id=mod35l2_geoloco_info['granule_id'],
                                         variables=mod35l2_geoloco_info['variable'],
                                         crs=geo_proj4_string,
                                         scale_size=mod35l2_geoloco_info['downscale_size'],
                                         interpolation=mod35l2_geoloco_info['interpolation_string'])

    mod08d3_regridding_request = Request(collection=mod08d3_geoloco_info['collection'],
                                         granule_id=mod08d3_geoloco_info['granule_id'],
                                         variables=mod08d3_geoloco_info['variable'],
                                         crs=geo_proj4_string,
                                         scale_size=mod08d3_geoloco_info['downscale_size'],
                                         interpolation=mod08d3_geoloco_info['interpolation_string'])

    mod021km_regridding_compare_file = submit_and_download(harmony_client, mod021km_regridding_request, file_indicators['MOD021KM'])
    mod35l2_regridding_compare_file = submit_and_download(harmony_client, mod35l2_regridding_request, file_indicators['MOD35_L2'])
    mod08d3_regridding_compare_file = submit_and_download(harmony_client, mod08d3_regridding_request, file_indicators['MOD08_D3'])

    geoloco_regridding_tests = 1
    
    mod021km_regridding_truth_dims = get_dim_sizes(regridding_truth_data['MOD021KM'])
    mod021km_regridding_dims = get_dim_sizes(mod021km_regridding_compare_file)
    if mod021km_regridding_truth_dims != mod021km_regridding_dims:
        print_error('MOD021KM Regridding mismatch.')
        geoloco_regridding_tests = 0
    
    mod35l2_regridding_truth_dims = get_dim_sizes(regridding_truth_data['MOD35_L2'])
    mod35l2_regridding_dims = get_dim_sizes(mod35l2_regridding_compare_file)
    print(mod35l2_regridding_truth_dims)
    print(mod35l2_regridding_dims)
    if mod35l2_regridding_truth_dims != mod35l2_regridding_dims:
        print_error('MOD35_L2 Regridding mismatch.')
        geoloco_regridding_tests = 0

    mod08d3_regridding_truth_dims = get_dim_sizes(regridding_truth_data['MOD08_D3'])
    mod08d3_regridding_dims = get_dim_sizes(mod08d3_regridding_compare_file)
    if mod08d3_regridding_truth_dims != mod08d3_regridding_dims:
        print_error('MOD08_D3 Regridding file mismatch.')
        geoloco_regridding_tests = 0

    if geoloco_regridding_tests == 1:
        print_success('Geoloco Regridding requests.')
    
    remove_results_files()
else:
    print(f'Geoloco is not configured for this environment: "{harnomy_environment}" - skipping test.')

## Resampling Regression Tests

In the cell below, regridding is tested using a GEO `proj4` string for the Level 1B, Level 2 and Level 3 Collections/Granules and repsampling string. Outputs are considerably reduced using variable subsetting. Dimension sizes are checked between the truth data and the output data.

In [None]:
if (mod021km_geoloco_info is not None and 
    mod35l2_geoloco_info is not None and
    mod08d3_geoloco_info is not None):

    mod021km_resampling_request = Request(collection=mod021km_geoloco_info['collection'],
                                          granule_id=mod021km_geoloco_info['granule_id'],
                                          variables=mod021km_geoloco_info['variable'],
                                          crs=geo_proj4_string,
                                          interpolation=resampling_string)

    mod35l2_resampling_request = Request(collection=mod35l2_geoloco_info['collection'],
                                         granule_id=mod35l2_geoloco_info['granule_id'],
                                         variables=mod35l2_geoloco_info['variable'],
                                         crs=geo_proj4_string,
                                         interpolation=resampling_string)

    mod08d3_resampling_request = Request(collection=mod08d3_geoloco_info['collection'],
                                         granule_id=mod08d3_geoloco_info['granule_id'],
                                         variables=mod08d3_geoloco_info['variable'],
                                         crs=geo_proj4_string,
                                         interpolation=resampling_string)

    mod021km_resampling_compare_file = submit_and_download(harmony_client, mod021km_resampling_request, file_indicators['MOD021KM'])
    mod35l2_resampling_compare_file = submit_and_download(harmony_client, mod35l2_resampling_request, file_indicators['MOD35_L2'])
    mod08d3_resampling_compare_file = submit_and_download(harmony_client, mod08d3_resampling_request, file_indicators['MOD08_D3'])

    geoloco_resampling_tests = 1
    
    mod021km_resampling_truth_dims = get_dim_sizes(resampling_truth_data['MOD021KM'])
    mod021km_resampling_dims = get_dim_sizes(mod021km_resampling_compare_file)
    if mod021km_resampling_truth_dims != mod021km_resampling_dims:
        print_error('MOD021KM Resampling mismatch.')
        geoloco_resampling_tests = 0
    
    mod35l2_resampling_truth_dims = get_dim_sizes(resampling_truth_data['MOD35_L2'])
    mod35l2_resampling_dims = get_dim_sizes(mod35l2_resampling_compare_file)
    if mod35l2_resampling_truth_dims != mod35l2_resampling_dims:
        print_error('MOD35_L2 Resampling mismatch.')
        geoloco_resampling_tests = 0

    mod08d3_resampling_truth_dims = get_dim_sizes(resampling_truth_data['MOD08_D3'])
    mod08d3_resampling_dims = get_dim_sizes(mod08d3_resampling_compare_file)
    if mod08d3_resampling_truth_dims != mod08d3_resampling_dims:
        print_error('MOD08_D3 Resampling file mismatch.')
        geoloco_resampling_tests = 0

    if geoloco_resampling_tests == 1:
        print_success('Geoloco Resampling requests.')
    
    remove_results_files()
else:
    print(f'Geoloco is not configured for this environment: "{harnomy_environment}" - skipping test.')