# Accessing MERRA-2 Data via OPeNDAP 

**Requirements to run this notebook**

1. Have an Earth Data Login account
2. Preferred method of authentication.

**Objectives**
 
Use best practices from OPeNDAP, [pydap](https://pydap.github.io/pydap/), and xarray, to

- Discover all OPeNDAP URLs associated with a MERRA-2 collection.
- Authenticate via EDL (token based)
- Explore MERRA-2 collection and filter variables
- Consolidate Metadata at the collection level
- Download/stream a subset of interes



`Author`: Miguel Jimenez-Urias, '25

In [None]:
from pydap.net import create_session
from pydap.client import get_cmr_urls, consolidate_metadata, open_url
import xarray as xr
import datetime as dt
import earthaccess
import json
import matplotlib.pyplot as plt

## Explore metadata of the MERRA-2 Collection

In [None]:
merra2_doi = "10.5067/VJAFPLI1CSIV" # available e.g. GES DISC MERRA-2 documentation 
                                    # https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary?keywords=MERRA-2
# One month of data
time_range=[dt.datetime(2023, 1, 1), dt.datetime(2023, 2, 28)]

In [None]:
urls = get_cmr_urls(doi=merra2_doi,time_range=time_range, limit=100) # you can incread the limit of results
len(urls)

## Authenticate

To hide the abstraction, we will use earthaccess to authenticate, and create cache session to consolidate all metadata

In [None]:
auth = earthaccess.login(strategy="interactive", persist=True) # you will be promted to add your EDL credentials

# pass Token Authorization to a new Session.
cache_kwargs={'cache_name':'database/MERRA2'}
my_session = create_session(use_cache=True, session=auth.get_session(), cache_kwargs=cache_kwargs)
my_session.cache.clear()

In [None]:
my_session.cache.cache_name

In [None]:
my_session

## Explore Variables in collection and filter down to keep only desirable ones

We do this by specifying the NASA OPeNDAP server to process requests via the DAP4 protocol.


In [None]:
variables = ['lon', 'lat', 'time', 'T2M', "U2M", "V2M"]
CE = "?dap4.ce="+ "/"+";/".join(variables)
new_urls = [url.replace("https", "dap4") + CE for url in urls] # 

## Consolidate Metadata

Aggregating multiple remote files at the collection level requires persistent internet connection. The pydap backend allows to download and store the metadata required by xarray locally as a sqlite3 database, and this database can be used as session manager (for futher downloads). Creating this databaset can be done once, and reused, version controlled, etc. Reusing this database can cut the time to generate the aggregated dataset view from minutes to seconds. 


In [None]:
consolidate_metadata(new_urls, session=my_session, concat_dim='time')

In [None]:
%%time
ds = xr.open_mfdataset(new_urls, engine='pydap', session=my_session, combine='nested', concat_dim="time")
ds

In [None]:
ds