# How to Search and Load GES DISC Cloud OPeNDAP Collections
### Author: Chris Battisto
### Date Authored: 05-20-2025

### Timing

Exercise: 5 minutes

### Overview

This notebook demonstrates how to search and load Cloud OPeNDAP-enabled collections and granules using the `earthaccess` and `requests` libraries. It demonstrates how to search for GES DISC Cloud-OPeNDAP enabled collections by querying the [Common Metadata Repository (CMR) API](https://cmr.earthdata.nasa.gov/search/), and then demonstrates how to load a single granule from that Cloud OPeNDAP-enabled collection. 

### Prerequisites

This notebook was written using Python 3.10, and requires:
- Valid [Earthdata Login credentials](https://urs.earthdata.nasa.gov), and the generation of [Earthdata Prerequisite Files](https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files) including the `netrc`, `.dodsrc`, and `.edl_token` files.
- [Xarray >=2025.4.0](https://docs.xarray.dev/en/stable/)
- [requests](https://pypi.org/project/requests/)
- [earthaccess](https://earthaccess.readthedocs.io/en/latest/)
- [Pydap >=v3.5](https://pydap.github.io/pydap/en/intro.html)

#### Optional Anaconda Environment YAML:

This notebook can be run using the ['nasa-gesdisc-opendap' YAML file](https://github.com/nasa/gesdisc-tutorials/tree/main/environments/nasa-gesdisc-opendap.yml) provided in the 'environments' subfolder.

Please follow the instructions [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) to install and activate this environment. 

### Import Libraries

In [1]:
from datetime import datetime
import json
import os
import requests
import earthaccess
import xarray as xr
from pydap.net import create_session
from tabulate import tabulate
from IPython.display import display, HTML, Markdown
import sys
sys.path.append(os.path.abspath(".."))
from utils.cloud_opendap_nb_utils import display_filtered_table

### Create Functions to Search for Cloud OPeNDAP Collections

In [2]:
def getCollections(env, token, page_size, page_num):
    ''' CMR Search for all GESDISC Collections '''
    daac='GES_DISC'
    params = {'provider':daac,
              'page_size':page_size, # Must be 2000 or less
              'page_num':page_num,  
             }
    searchUrl = getUrl(env,'collections.umm_json',params)
    collRecords = getResult(searchUrl)
    return collRecords

def getSvcAssoc():
    ''' 
    Returns a dictionary of collection concept-ids and nativeids associated to the Cloud Hyrax UMM-S record
    '''
    env = 'PROD'
    umms = 'S2874702816-XYZ_PROV'
    coll_names = {}
    LongName = None
    response = getResult(getUrl(env,'services'+'.umm_json',{'concept_id':umms}))
    if len(response['items']) == 1 :
        LongName = response['items'][0]['umm']['LongName']
        collections = response['items'][0]['meta']['associations']['collections']
    for cid in collections:
        if 'GES_DISC' in cid:
            nid = concept2native(env,cid)
            if nid is not None:
                coll_names[cid] = nid
    return coll_names, LongName

def getResult(url):
    ''' Returns json-formatted response from an input URL '''
    result = requests.get(url)
    try:
        result.raise_for_status()
        return json.loads(result.text)
    except :
        print('failed URL:',url)
        return None
        
def getUrl(env, search_type, queryD):
    ''' Constructs a CMR search URL '''

    base = 'https://cmr.earthdata.nasa.gov'

    # Expand the query's key:value pairs into search parameters
    params = ''
    for k,key in enumerate(queryD.keys()):
        if k == 0: 
            prefix = ''
        else:
            prefix = '&'
        facet = prefix + key + '=' + str(queryD[key])
        params += facet

    # Build the URL
    url = '{}/{}?{}'.format(base, search_type, params)
    return url

def concept2native(env,cID):
    '''
    Returns a collection's native-id when given a concept-id and a CMR environment
    '''
    # construct the search URL and get the response
    url = getUrl(env,'collections.umm_json',{'concept-id':cID})
    response = getResult(url)
    try:
        return response['items'][0]['meta']['native-id']
    except :
#        print('failed URL',url)
        return None


### Table of Cloud OPeNDAP-Enabled GES DISC Collections

First, we retrieve a dictionary of concept-ids and nativeids for associated collections.
This cell may take a few minutes to run.

In [3]:
colls_opendap_prod, LongName = getSvcAssoc() 

Print a table of all Cloud OPeNDAP-enabled collections. Entering a short name in the "Search collections..." bar will filter the table (if you are viewing this notebook on Github, you will not be able to use the search bar).

The source code containing the `display_filtered_table` function can be found in utils/cloud_opendap_nb_utils.py.

<a id="#enabled-collections-table"></a>

In [4]:
display_filtered_table(colls_opendap_prod)

Collection Short Name,CMR Virtual Directory Page
ACOSMonthlyGriddedXCO2_3,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C2219374316-GES_DISC/temporal
AER_DBDT_D10KM_L3_MODIS_AQUA_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3618504061-GES_DISC/temporal
AER_DBDT_D10KM_L3_MODIS_TERRA_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3618500076-GES_DISC/temporal
AER_DBDT_D10KM_L3_VIIRS_NOAA20_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3618498508-GES_DISC/temporal
AER_DBDT_D10KM_L3_VIIRS_SNPP_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3616854935-GES_DISC/temporal
AER_DBDT_M10KM_L3_MODIS_AQUA_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3618495995-GES_DISC/temporal
AER_DBDT_M10KM_L3_MODIS_TERRA_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3618492375-GES_DISC/temporal
AER_DBDT_M10KM_L3_VIIRS_NOAA20_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3618490432-GES_DISC/temporal
AER_DBDT_M10KM_L3_VIIRS_SNPP_001,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3616941450-GES_DISC/temporal
AIRH2CCF_006,https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1243477316-GES_DISC/temporal


### Query a Day of `GLDAS_NOAH10_3H_2.0` Cloud OPeNDAP Granules

Note that each URL begins with `opendap.earthdata.nasa.gov`, indicating that it is coming from Cloud OPeNDAP servers. These methods are adapted from the OPeNDAP steps in [How to Access GES DISC Data with Python](https://disc.gsfc.nasa.gov/information/howto?keywords=python&title=How%20to%20Access%20GES%20DISC%20Data%20Using%20Python). Additionally, since we are using a DAP4 constraint expression, we must replace `https://` with `dap4://`, or else a 500 error will occur.

In [5]:
# Create search query for 2000-01-01 Cloud OPeNDAP URL
results = earthaccess.search_data(
    short_name="GLDAS_NOAH10_3H",
    version='2.0',
    temporal=('2000-01-01', '2000-01-01'), 
    bounding_box=(-180, 0, 180, 90)
)

# Parse out URL from request, add to OPeNDAP URLs list for querying multiple granules with constraint expressions
opendap_urls = []
for item in results:
    for urls in item['umm']['RelatedUrls']:  # Iterate over RelatedUrls in each request step
        if 'OPENDAP' in urls.get('Description', '').upper():  # Check if 'OPENDAP' is in the Description
            # Extract OPeNDAP URL, replace "https" with "dap4"
            url = urls['URL'].replace('https', 'dap4')

            # Subset Tair_f_inst, lat, lon, and time
            # To view all variables, comment out these two lines
            ce = "?dap4.ce=/{}%3B/{}%3B/{}%3B/{}".format("Tair_f_inst", "lat", "lon", "time")
            url = url + ce

            # Add URL to list
            opendap_urls.append(url)

opendap_urls

['dap4://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0000.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'dap4://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0300.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'dap4://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0600.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'dap4://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.0900.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'dap4://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GLDAS_NOAH10_3H.2.0%3AGLDAS_NOAH10_3H.A20000101.1200.020.nc4?dap4.ce=/Tair_f_inst%3B/lat%3B/lon%3B/time',
 'dap4://opendap.earthdata.nasa.gov/collections/C1233767548-GES_DISC/granules/GL

Open a single URL into a single `Xarray` dataset, using a `.netrc` file:

In [6]:
my_session = create_session()

try:
    # Load dataset object and metadata, but don't open the values yet
    # NOTE: When opening HDF files, the group to be accessed must be specified with the "group=" parameter. 
    #       E.g., for GPM IMERG, group="Grid" must be entered or an error will occur
    # Remove the session parameter if you are just using a .netrc file to authenticate
    # Change to open_mfdataset if you want to open and concatenate all URLs
    ds = xr.open_dataset(opendap_urls[0], engine="pydap", session=my_session)
except OSError as e:
    print('Error', e)
    print('Please check that your .edl_token file exists and is valid, or that your username/password were entered correctly.')
    raise

print(ds)

<xarray.Dataset> Size: 218kB
Dimensions:      (lon: 360, time: 1, lat: 150)
Coordinates:
  * lon          (lon) float32 1kB -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
  * time         (time) datetime64[ns] 8B 2000-01-01
  * lat          (lat) float32 600B -59.5 -58.5 -57.5 -56.5 ... 87.5 88.5 89.5
Data variables:
    Tair_f_inst  (time, lat, lon) float32 216kB ...
Attributes: (12/19)
    CDI:                    Climate Data Interface version 1.9.8 (https://mpi...
    Conventions:            CF-1.6
    history:                created on date: 2019-09-28T13:25:22.668
    source:                 Noah_v3.6 forced with Princeton_V2.2
    institution:            NASA GSFC
    missing_value:          -9999.0
    ...                     ...
    MAP_PROJECTION:         EQUIDISTANT CYLINDRICAL
    SOUTH_WEST_CORNER_LAT:  -59.5
    SOUTH_WEST_CORNER_LON:  -179.5
    DX:                     1.0
    DY:                     1.0
    CDO:                    Climate Data Operators version 1.9.8 (https: