# Searching and Downloading GEDI L4A Dataset

This tutorial will demonstrate how to search and download [Global Ecosystem Dynamics Investigation (GEDI) L4A Footprint Level Aboveground Biomass Density (AGBD)](https://doi.org/10.3334/ORNLDAAC/2056) dataset. GEDI L4A dataset is available for the period starting 2019-04-17 and covers latitudes of 52 North to 52 South. GEDI L4A data files are natively in HDF5 format, and each file represents one International Space Station (ISS) orbit. 

We will use NASA's Earthdata [Common Metadata Repository (CMR) Application Programming Interface (API)](https://cmr.earthdata.nasa.gov/search) to search for GEDI L4A files or granules. CMR catalogs metadata records of NASA Earth Science data and make them available for easy programmatic access.

This tutorial requires the following Python modules installed in your system: `requests`, `datetime`, `pandas`. Requirements are also in [requirements.txt](requirements.txt). To install the necessary Python modules, you can copy the requirements.txt from this repository and run:
```bash
pip install -r requirements.txt
```

In [1]:
%matplotlib inline
import requests
import datetime as dt 
import pandas as pd

## 1. Building granule url

NASA EarthData's unique ID for this dataset (called `Concept ID`) is needed for searching the dataset. The dataset Digital Object Identifier or DOI can be used to obtain the `Concept ID`.

In [2]:
doi = '10.3334/ORNLDAAC/2056'# GEDI L4A DOI 

# CMR API base url
cmrurl='https://cmr.earthdata.nasa.gov/search/' 

doisearch = cmrurl + 'collections.json?doi=' + doi
response = requests.get(doisearch)
response.raise_for_status()
concept_id = response.json()['feed']['entry'][0]['id']

print(concept_id)

C2237824918-ORNL_CLOUD


This is the unique NASA-given concept ID for the GEDI 4A dataset, which can be used to retrieve relevant files (or granules) for GEDI L4A. 

The CMR API only allows searching 2000 files at a time. Using `page_num` parameter allows to loop through the search result pages. We will use [pandas dataframe](https://pandas.pydata.org/) to store the download URLs of each file and their bounding geometries.

In [3]:
page_num = 1
page_size = 2000 # CMR page size limit

granule_arr = []

while True:
    
    # defining parameters
    cmr_param = {
        "collection_concept_id": concept_id, 
        "page_size": page_size,
        "page_num": page_num
    }
    
    granulesearch = cmrurl + 'granules.json'

    response = requests.get(granulesearch, params=cmr_param)
    response.raise_for_status()
    granules = response.json()['feed']['entry']
    
    if granules:
        for g in granules:
            # Get URL to HDF5 files
            for links in g['links']:
                if 'title' in links and links['title'].startswith('Download') \
                and links['title'].endswith('.h5'):
                    granule_url = links['href']
                    granule_arr.append(granule_url)
               
        page_num += 1
    else: 
        break

# creating a pandas dataframe
l4adf = pd.DataFrame(granule_arr, columns=["granule_url"])
print ("Total granules found: ", len(l4adf.index)-1)

Total granules found:  70694


Now, we have stored the granule URLs into the pandas dataframe `l4adf`. The first few rows of the table look like the following.

In [4]:
l4adf.head()

Unnamed: 0,granule_url
0,https://data.ornldaac.earthdata.nasa.gov/prote...
1,https://data.ornldaac.earthdata.nasa.gov/prote...
2,https://data.ornldaac.earthdata.nasa.gov/prote...
3,https://data.ornldaac.earthdata.nasa.gov/prote...
4,https://data.ornldaac.earthdata.nasa.gov/prote...


## 2. Downloading the files

We recommend using utilities such as `cURL` or `wget` to download the files. You will first need to set up NASA Earthdata Login authentication using `.netrc` file. Please refer to this page for details on setting up such authentication: https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+cURL+And+Wget.

Once the authentication has been setup, the GEDI L4a files can be downloaded as follows.

First, save the granule URLs to a file `granules.txt`. 

In [6]:
l4adf.to_csv('granules.txt', columns = ['granule_url'], index=False, header = False)

Either of the following commands can then be issued from the terminal to download the files.

#### wget
```
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition -nc -i granules.txt
```
#### curl
```
cat granules.txt | tr -d '\r' | xargs -n 1 curl -LJO -n -c ~/.urs_cookies -b ~/.urs_cookies
```