# Access GEDI L2A with Harmony API

## Overview
This tutorial will demonstrate how to directly access and subset the GEDI L2A canopy height metrics dataset using [NASA’s Harmony Services](https://harmony.earthdata.nasa.gov/). The Harmony API allows seamless access and production of analysis-ready Earth observation data across different DAACs by enabling cloud-based spatial, temporal, and variable subsetting and data conversions. The GEDI datasets are available from the Harmony API.

## Learning Objectives
- Use [NASA’s Harmony Services](https://harmony.earthdata.nasa.gov/) to retrieve the GEDI L2A datasets. The Harmony API allows access to selected variables for the dataset within the spatial-temporal bounds without having to download the whole data file.

## Dataset
The GEDI Level 2A Geolocated Elevation and Height Metrics product (GEDI02_A) provides waveform interpretation and extracted products from eachreceived waveform, including ground elevation, canopy top height, and relative height (RH) metrics. GEDI datasets are available for the period starting 2019-04-17 and covers 52 N to 52 S latitudes. GEDI L2A data files are natively in HDF5 format. 

In [1]:
import h5py
import requests as re
import pandas as pd
from datetime import datetime
from glob import glob
from harmony import Client, Collection, Request, BBox

## Authentication
NASA Harmony API requires [NASA Earthdata Login (EDL)](https://urs.earthdata.nasa.gov/). You can login to harmony_client directly by passing EDL authentication as the following in the Jupyter Notebook itself:
```bash
harmony_client = Client(auth=("your EDL username", "your EDL password"))
```

## Create Harmony Client Object
First, we create a Harmony Client object. If you are passing the EDL authentication, please do as shown above with the `auth` parameter.

In [2]:
harmony_client = Client()

## Retrieve Concept ID
Now, let’s retrieve the `Concept ID` of the GEDI L4A dataset. The `Concept ID` is NASA Earthdata’s unique ID for its dataset.

In [3]:
def get_concept_id(doi):
    """get concept id from DOI using CMR API"""
    doisearch = f'https://cmr.earthdata.nasa.gov/search/collections.json?doi={doi}' 
    return re.get(doisearch).json()['feed']['entry'][0]['id']

concept = get_concept_id('10.5067/GEDI/GEDI02_A.002') # GEDI L2A DOI
# printing concept_id
print(f"{concept} ")

C2142771958-LPCLOUD 


## Define Request Parameters

Let’s create a Harmony Collection object with the concept_id retrieved above. We will also define the GEDI L4A variables of interest and temporal range.

In [4]:
# harmony collection
collection = Collection(id=concept)

def create_var_names(variables):
    # gedi beams
    beams = ['BEAM0000', 'BEAM0001', 'BEAM0010', 
             'BEAM0011', 'BEAM0101', 'BEAM0110', 
             'BEAM1000', 'BEAM1011']
    # combine variables and beam names
    return [f'/{b}/{v}' for b in beams for v in variables]

# gedi variables
variables = create_var_names(['rh', 'shot_number'])

# bbox
bounding_box = BBox(w=-44.0654, s=-13.76913, 
                    e=-44.17246, n=-13.67646)

# time range
temporal_range = {'start': datetime(2019, 4, 17), 
                  'stop': datetime(2023, 3, 31)}

## Create and Submit Harmony Request
Now, we can create a Harmony request with variables, temporal range, and bounding box and submit the request using the Harmony client object. We will use the `download_all` method, which uses a multithreaded downloader and returns a [concurrent future](https://docs.python.org/3/library/concurrent.futures.html). Futures are asynchronous and let us use the downloaded file as soon as the download is complete while other files are still being downloaded.

In [5]:
request = Request(collection=collection, 
              variables=variables, 
              temporal=temporal_range,
              spatial=bounding_box,
              ignore_errors=True)

# submit harmony request, will return job id
subset_job_id = harmony_client.submit(request)
print(f'Processing job: {subset_job_id}')
print(f'Waiting for the job to finish')
results = harmony_client.result_json(subset_job_id, show_progress=True)
print(f'Downloading subset files...')
futures = harmony_client.download_all(subset_job_id, overwrite=True)
for f in futures:
    # all subsetted files have this suffix
    if f.result().endswith('subsetted.h5'):
        print(f'Downloaded: {f.result()}')
print(f'Done downloading files.')

Processing job: bf0998a8-cec8-4a1e-8519-f15027d47411
Waiting for the job to finish


 [ Processing:   1% ] |                                                   | [\]
Job has been paused. Call `resume()` to resume.
 [ Processing: 100% ] |###################################################| [|]


Downloading subset files...


Job has been paused. Call `resume()` to resume.


77422564_GEDI02_A_2019108063057_O01963_04_T01067_02_003_01_V002_subsetted.h5
Downloaded: 77422564_GEDI02_A_2019108063057_O01963_04_T01067_02_003_01_V002_subsetted.h5
Done downloading files.


## Read Subset files
All the subsetted files are saved as `_subsetted.h5`. Let’s read these `h5` files into the pandas dataframe.


In [6]:
subset_df = pd.DataFrame()
for subfile in glob('*GEDI02_A_*_subsetted.h5'):
    with h5py.File(subfile, 'r') as hf_in:
        for v in list(hf_in.keys()):
            if v.startswith('BEAM'):
                beam = hf_in[v]
                col_names = []
                col_val = []
                # read all variables
                for key, value in beam.items():
                    col_names.append(key)
                    col_val.append(value[:].tolist())
            
                # Appending to the subset_df dataframe
                beam_df = pd.DataFrame(map(list, zip(*col_val)), columns=col_names)
                subset_df = pd.concat([subset_df, beam_df])
        
# print head of dataframe
subset_df.head()

Unnamed: 0,delta_time,lat_lowestmode,lon_lowestmode,rh,shot_number
0,40808680.0,-13.676622,167.594001,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",19630000400366077
1,40808680.0,-13.677037,167.594318,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",19630000400366078
2,40808680.0,-13.677452,167.594635,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",19630000400366079
3,40808680.0,-13.677866,167.594952,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",19630000400366080
4,40808680.0,-13.678281,167.595269,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",19630000400366081
