# How to Search and Load GES DISC Cloud OPeNDAP Collections
### Author: Chris Battisto
### Date Authored: 05-20-2025

### Timing

Exercise: 5 minutes

### Overview

This notebook demonstrates how to search and load Cloud OPeNDAP-enabled collections and granules using the `earthaccess` and `requests` libraries. It demonstrates how to search for GES DISC Cloud-OPeNDAP enabled collections by querying the [Common Metadata Repository (CMR) API](https://cmr.earthdata.nasa.gov/search/), and then demonstrates how to load a single granule from that Cloud OPeNDAP-enabled collection. 

### Prerequisites

This notebook was written using Python 3.10, and requires:
- Valid [Earthdata Login credentials](https://urs.earthdata.nasa.gov), and the generation of [Earthdata Prerequisite Files](https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files) including the `netrc`, `.dodsrc`, and `.edl_token` files.
- [Xarray](https://docs.xarray.dev/en/stable/)
- [certifi](https://pypi.org/project/certifi/)
- [requests](https://pypi.org/project/requests/)
- [earthaccess](https://earthaccess.readthedocs.io/en/latest/)
- [Pydap >=v3.5](https://pydap.github.io/pydap/en/intro.html)

#### Optional Anaconda Environment YAML:

This notebook can be run using the ['opendap' YAML file](https://github.com/nasa/gesdisc-tutorials/tree/main/environments/opendap.yml) provided in the 'environments' subfolder.

Please follow the instructions [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) to install and activate this environment. 

### Import Libraries

In [1]:
import certifi
from datetime import datetime
import json
import os
import requests
import urllib3
import earthaccess
import xarray as xr
from pydap.cas.urs import setup_session
from tabulate import tabulate
from IPython.display import display, HTML, Markdown

### Create Functions to Search for Cloud OPeNDAP Collections

In [2]:
def getCollections(env, token, page_size, page_num):
    ''' CMR Search for all GESDISC Collections '''
    daac='GES_DISC'
    params = {'provider':daac,
              'page_size':page_size, # Must be 2000 or less
              'page_num':page_num,  
             }
    searchUrl = getUrl(env,'collections.umm_json',params)
    collRecords = getResult(searchUrl)
    return collRecords

def getSvcAssoc():
    ''' 
    Returns a dictionary of collection concept-ids and nativeids associated to the Cloud Hyrax UMM-S record
    '''
    env = 'PROD'
    umms = 'S2874702816-XYZ_PROV'
    coll_names = {}
    LongName = None
    response = getResult(getUrl(env,'services'+'.umm_json',{'concept_id':umms}))
    if len(response['items']) == 1 :
        LongName = response['items'][0]['umm']['LongName']
        collections = response['items'][0]['meta']['associations']['collections']
    for cid in collections:
        if 'GES_DISC' in cid:
            nid = concept2native(env,cid)
            if nid is not None:
                coll_names[cid] = nid
    return coll_names, LongName

def getResult(url):
    ''' Returns json-formatted response from an input URL '''
    result = requests.get(url)
    try:
        result.raise_for_status()
        return json.loads(result.text)
    except :
        print('failed URL:',url)
        return None
        
def getUrl(env, search_type, queryD):
    ''' Constructs a CMR search URL '''

    base = 'https://cmr.earthdata.nasa.gov'

    # Expand the query's key:value pairs into search parameters
    params = ''
    for k,key in enumerate(queryD.keys()):
        if k == 0: 
            prefix = ''
        else:
            prefix = '&'
        facet = prefix + key + '=' + str(queryD[key])
        params += facet

    # Build the URL
    url = '{}/{}?{}'.format(base, search_type, params)
    return url

def concept2native(env,cID):
    '''
    Returns a collection's native-id when given a concept-id and a CMR environment
    '''
    # construct the search URL and get the response
    url = getUrl(env,'collections.umm_json',{'concept-id':cID})
    response = getResult(url)
    try:
        return response['items'][0]['meta']['native-id']
    except :
#        print('failed URL',url)
        return None


### Table of Cloud OPeNDAP-Enabled GES DISC Collections

In [3]:
# Get a dictionary of concept-ids and nativeids for associated collections
colls_opendap_prod, LongName = getSvcAssoc() 

In [4]:
# Prepare headers and rows
headers = ["Collection Short Name", "CMR Virtual Directory Page"]
rows = []

for concept_id, short_name in sorted(colls_opendap_prod.items(), key=lambda x: x[1]):
    url = f"https://cmr.earthdata.nasa.gov/virtual-directory/collections/{concept_id}/temporal"
    clickable_url = f'<a href="{url}" target="_blank">{url}</a>'
    rows.append([short_name, clickable_url])

# Generate HTML table (don't escape the <a> tags)
html_table = tabulate(rows, headers=headers, tablefmt="unsafehtml", stralign="left")

# Wrap in scrollable container
scrollable_html = f"""
<div id="enabled-collections-table" style="height: 300px; overflow: auto; border: 1px solid #ccc; padding: 10px;">
    {html_table}
</div>
"""

# Display in Jupyter
display(HTML(scrollable_html))

Collection Short Name,CMR Virtual Directory Page
FluxSatMGPP_L3_Daily_p05deg_2.2,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3464727944-GES_DISC/temporal
FluxSatMGPP_L3_Daily_p5deg_2.2,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3464727089-GES_DISC/temporal
GLDAS_CLSM10_3H_2.0,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1933574500-GES_DISC/temporal
GLDAS_CLSM10_3H_2.1,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1690022359-GES_DISC/temporal
GLDAS_CLSM10_3H_EP_2.1,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1700900748-GES_DISC/temporal
GLDAS_CLSM10_M_2.0,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1933574465-GES_DISC/temporal
GLDAS_CLSM10_M_2.1,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1690022314-GES_DISC/temporal
GLDAS_CLSM10_M_EP_2.1,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1700900777-GES_DISC/temporal
GLDAS_NOAH10_3H_2.0,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1233767548-GES_DISC/temporal
GLDAS_NOAH10_3H_2.1,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1288937556-GES_DISC/temporal


### Query a Day of `GLDAS_NOAH10_3H_2.0` Cloud OPeNDAP Granules

Note that each URL begins with `opendap.earthdata.nasa.gov`, indicating that it is coming from Cloud OPeNDAP servers. These methods are adapted from the OPeNDAP steps in [How to Access GES DISC Data with Python](https://disc.gsfc.nasa.gov/information/howto?keywords=python&title=How%20to%20Access%20GES%20DISC%20Data%20Using%20Python).

In [8]:
# Create search query for 1980-01-01 Cloud OPeNDAP URL
results = earthaccess.search_data(
    short_name="GLDAS_NOAH10_3H",
    version='2.0',
    temporal=('2000-01-01', '2000-01-01'), 
    bounding_box=(-180, 0, 180, 90)
)

# Parse out URL from request, add to OPeNDAP URLs list for querying multiple granules with constraint expressions
opendap_urls = []
for item in results:
    for urls in item['umm']['RelatedUrls']:  # Iterate over RelatedUrls in each request step
        if 'OPENDAP' in urls.get('Description', '').upper():  # Check if 'OPENDAP' is in the Description
            # Extract OPeNDAP URL
            url = urls['URL']

            # Subset Tair_f_inst, lat, lon, and time
            # To view all variables, comment out these two lines
            ce = "?dap4.ce=/{}%3B/{}%3B/{}%3B/{}".format("Tair_f_inst", "lat", "lon", "time")
            url = url + ce

            # Add URL to list
            opendap_urls.append(url)

opendap_urls

['https://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0000.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'https://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0300.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'https://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0600.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'https://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0900.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'https://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.1200.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'https://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granu

Open a single URL into a single `Xarray` dataset, using the `.edl_token` authentication method:

In [10]:
# Set file path to root
token_file_path = os.path.join(os.path.expanduser("~"), ".edl_token")

# Read the token from the .edl_token file
with open(token_file_path, 'r') as token_file:
    token = token_file.read().strip()  # Ensure to strip any newlines or extra spaces

# Enter the token into the request header
my_session = requests.Session()
my_session.headers = {
    'Authorization': f'Bearer {token}'
}

try:
    # Load dataset object and metadata, but don't open the values yet
    # NOTE: When opening HDF files, the group to be accessed must be specified with the "group=" parameter. 
    #       E.g., for GPM IMERG, group="Grid" must be entered or an error will occur
    # Remove the session parameter if you are just using a .netrc file to authenticate
    # Change to open_mfdataset if you want to open and concatenate all URLs
    ds = xr.open_dataset(opendap_urls[0], engine="pydap", session=my_session)
except OSError as e:
    print('Error', e)
    print('Please check that your .edl_token file exists and is valid, or that your username/password were entered correctly.')
    raise

print(ds)

<xarray.Dataset> Size: 218kB
Dimensions:      (/lon: 360, /time: 1, /lat: 150)
Dimensions without coordinates: /lon, /time, /lat
Data variables:
    lon          (/lon) float32 1kB ...
    time         (/time) datetime64[ns] 8B ...
    lat          (/lat) float32 600B ...
    Tair_f_inst  (/time, /lat, /lon) float32 216kB ...
Attributes: (12/24)
    CDI:                    Climate Data Interface version 1.9.8 (https://mpi...
    Conventions:            CF-1.6
    history:                created on date: 2019-09-28T13:25:22.668
    source:                 Noah_v3.6 forced with Princeton_V2.2
    institution:            NASA GSFC
    missing_value:          -9999.0
    ...                     ...
    CDO:                    Climate Data Operators version 1.9.8 (https://mpi...
    build_dmrpp:            3.21.0-272
    bes:                    3.21.0-272
    libdap:                 libdap-3.21.0-70
    configuration:          \n# TheBESKeys::get_as_config()\nAllowedHosts=^ht...
    invocat