# Download Data using OPeNDAP
### Written by Minh Phan

Open-source Project for a Network Data Access Protocol (OPeNDAP) is the developer of client/server software of the same name, enabling scientists to share data easily over the internet ([EarthData](https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api/opendap)). Using an OPeNDAP URL of any database server, we can access data easily and stream it directly and seamlessly to your local machine. This tutorial is based on [this notebook from the Copernicus Marine Data](https://help.marine.copernicus.eu/en/articles/4854800-how-to-open-and-visualize-copernicus-marine-data-using-python), with some modifications to reflect our workflow and demands outlined in the previous notebooks.

## Import necessary libraries

In [4]:
import xarray as xr
import getpass
from pydap.client import open_url
from pydap.cas.get_cookies import setup_session

In [2]:
USERNAME = 'mphan'
PASSWORD = getpass.getpass('Enter your password: ')

Enter your password:  ········


For this notebook we want to stream data from the [Copernicus Marine Environment Monitoring Service's Global Ocean Physics Analysis](https://nrt.cmems-du.eu/thredds/dodsC/cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m.html)

In [5]:
# change your Dataset ID accordingly
DATASET_ID = 'cmems_mod_glo_phy_my_0.083_P1D-m'

In the function below, we utilize a [PydapDataStore](https://docs.xarray.dev/en/latest/generated/xarray.backends.PydapDataStore.html), an Xarray store object used for accessing OpenDAP datasets. For this dataset, you need to log in your credentials using the name and password provided above. Note that not all OpenDAP datasets will require the same steps, so you should look up appropriate methods to access data.

In [6]:
def copernicusmarine_datastore(dataset, username, password):
    cas_url = 'https://cmems-cas.cls.fr/cas/login'
    session = setup_session(cas_url, username, password)
    session.cookies.set("CASTGC", session.cookies.get_dict()['CASTGC'])
    database = ['my', 'nrt']
    url = f'https://{database[0]}.cmems-du.eu/thredds/dodsC/{dataset}'
    try:
        data_store = xr.backends.PydapDataStore(open_url(url, session=session, user_charset='utf-8')) # needs PyDAP >= v3.3.0 see https://github.com/pydap/pydap/pull/223/commits 
    except:
        url = f'https://{database[1]}.cmems-du.eu/thredds/dodsC/{dataset}'
        data_store = xr.backends.PydapDataStore(open_url(url, session=session, user_charset='utf-8')) # needs PyDAP >= v3.3.0 see https://github.com/pydap/pydap/pull/223/commits
    return data_store

In [7]:
data_store = copernicusmarine_datastore(DATASET_ID, USERNAME, PASSWORD)

When you open a dataset from PyDAP's Data Store object, only the "shell" of the dataset is streamed into our local machine. The rest of the data shall be remote, so if we want to process further outside of slicing/inspecting, we need to stream the additional data, which occupies lots of memory. The streaming/downloading speed is also very slow, so proceed with caution. 

In [8]:
DS = xr.open_dataset(data_store)

In [9]:
DS

We can slice and get the portion of data we need. In consistency with the other notebooks, let's get data from Jan-March 2003 for salinity at the most shallow level, within our region of interest (60-80 deg E, 5-25 deg N)

**DISCLAIMER: DO NOT LOAD DATASET IN ITS ENTIRETY BEFORE SLICING IF YOU DO NOT WANT TO OVERFLOW YOUR MEMORY**. Actual data is only loaded when you slice the dataset.

In [10]:
DS_sliced = DS['so'].isel(depth=0).sel(longitude=slice(60, 80), latitude=slice(5, 25), time=slice('2003-01', '2003-02'))

In [13]:
DS_sliced

You can get rid of the depth since it is an empty coordinate. Make sure to note the depth of the data sampled in your attribute when you work on the data, though!

In [14]:
DS_sliced = DS_sliced.drop('depth')

In [15]:
DS_sliced

In [17]:
DS.isel(time=0)

Finally, we can export this dataset to combine with the rest of the data later

In [16]:
DS_sliced.to_netcdf('demonstrated data/salinity_at_0_49m.nc')