# Quick introduction


erddapy can be installed with conda


```shell
conda install --channel conda-forge erddapy
```

 or pip

```shell
pip install erddapy
```

First we need to instantiate the ERDDAP URL constructor for a server.
In this example we will use [https://gliders.ioos.us/erddap](https://gliders.ioos.us/erddap/index.html).

In [None]:
from erddapy import ERDDAP


e = ERDDAP(
    server="https://gliders.ioos.us/erddap",
    protocol="tabledap",
    response="csv",
)

Now we can populate the object a dataset id, variables of interest, and 
its constraints. We can download the csvp response with the `.to_pandas` method.

In [None]:
e.dataset_id = "whoi_406-20160902T1700"

e.variables = [
    "depth",
    "latitude",
    "longitude",
    "salinity",
    "temperature",
    "time",
]

e.constraints = {
    "time>=": "2016-07-10T00:00:00Z",
    "time<=": "2017-02-10T00:00:00Z",
    "latitude>=": 38.0,
    "latitude<=": 41.0,
    "longitude>=": -72.0,
    "longitude<=": -69.0,
}


df = e.to_pandas(
    index_col="time (UTC)",
    parse_dates=True,
).dropna()

df.head()

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(17, 2))
cs = ax.scatter(
    df.index,
    df["depth (m)"],
    s=15,
    c=df["temperature (Celsius)"],
    marker="o",
    edgecolor="none"
)

ax.invert_yaxis()
ax.set_xlim(df.index[0], df.index[-1])
xfmt = mdates.DateFormatter("%H:%Mh\n%d-%b")
ax.xaxis.set_major_formatter(xfmt)

cbar = fig.colorbar(cs, orientation="vertical", extend="both")
cbar.ax.set_ylabel("Temperature ($^\circ$C)")
ax.set_ylabel("Depth (m)");

# Longer introduction

Let's explore the methods and attributes available in the ERDDAP object?

In [None]:
from erddapy import ERDDAP


e = ERDDAP(server="https://gliders.ioos.us/erddap")

[method for method in dir(e) if not method.startswith("_")]

All the *get_<methods>* will return a valid ERDDAP URL for the requested response and options. For example, a search for all datasets available.

In [None]:
url = e.get_search_url(search_for="all", response="csv")

print(url)

There are many responses available, see the docs for [griddap](https://coastwatch.pfeg.noaa.gov/erddap/griddap/documentation.html) and
[tabledap](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/documentation.html) respectively.
The most useful ones for Pythonistas are the .csv and .nc that can be read with pandas and netCDF4-python respectively.

Let's load the csv response directly with pandas.

In [None]:
import pandas as pd


df = pd.read_csv(url)
print(
    f'We have {len(set(df["tabledap"].dropna()))} '
    f'tabledap, {len(set(df["griddap"].dropna()))} '
    f'griddap, and {len(set(df["wms"].dropna()))} wms endpoints.'
)

We can refine our search by providing some constraints.

Let's narrow the search area, time span, and look for **sea_water_temperature** .

In [None]:
from erddapy.utilities import show_iframe


kw = {
    "standard_name": "sea_water_temperature",
    "min_lon": -72.0,
    "max_lon": -69.0,
    "min_lat": 38.0,
    "max_lat": 41.0,
    "min_time": "2016-07-10T00:00:00Z",
    "max_time": "2017-02-10T00:00:00Z",
    "cdm_data_type": "trajectoryprofile"
}


search_url = e.get_search_url(response="html", **kw)
show_iframe(search_url)

Note that the search form was populated with the constraints we provided.

Changing the response from html to csv we load it in a data frame.

In [None]:
search_url = e.get_search_url(response="csv", **kw)
search = pd.read_csv(search_url)
gliders = search["Dataset ID"].values

gliders_list = "\n".join(gliders)
print(f"Found {len(gliders)} Glider Datasets:\n{gliders_list}")

Now that we know the Dataset ID we can explore their metadata with the *get_info_url* method.

In [None]:
glider = gliders[-1]

info_url = e.get_info_url(dataset_id=glider, response="html")

show_iframe(src=info_url)

We can manipulate the metadata and find the variables that have the *cdm_profile_variables* attribute using the csv response.

In [None]:
info_url = e.get_info_url(dataset_id=glider, response='csv')

info = pd.read_csv(info_url)
info.head()

In [None]:
"".join(info.loc[info["Attribute Name"] == "cdm_profile_variables", "Value"])

Selecting variables by theirs attributes is such a common operation that erddapy brings its own method to simplify this task.

The *get_var_by_attr* method was inspired by netCDF4-python's *get_variables_by_attributes*. However, because erddapy is operating on remote serves, it will return the variable names instead of the actual variables.

Here we check what is/are the variable(s) associated with the *standard_name* used in the search.

Note that *get_var_by_attr* caches the last response in case the user needs to make multiple requests.
(See the execution times below.)

In [None]:
%%time

# First one, slow.
e.get_var_by_attr(
    dataset_id="whoi_406-20160902T1700",
    standard_name="sea_water_temperature"
)

In [None]:
%%time

# Second one on the same glider, a little bit faster.
e.get_var_by_attr(
    dataset_id="whoi_406-20160902T1700",
    standard_name="sea_water_practical_salinity"
)

In [None]:
%%time

# New one, slow again.
e.get_var_by_attr(
    dataset_id="cp_336-20170116T1254",
    standard_name="sea_water_practical_salinity"
)

Another way to browse datasets is via the *categorize* URL. In the example below we can get all the *standard_names* available in the dataset with a single request.

In [None]:
url = e.get_categorize_url(
    categorize_by="standard_name",
    response="csv"
)

pd.read_csv(url)["Category"]

We can also pass a **value** to filter the categorize results.

In [None]:
url = e.get_categorize_url(
    categorize_by="institution",
    value="woods_hole_oceanographic_institution",
    response="csv"
)

df = pd.read_csv(url)
whoi_gliders = df.loc[~df["tabledap"].isnull(), "Dataset ID"].tolist()
whoi_gliders

Let's create a map of all the gliders tracks from WHOI.

(We are downloading a lot of data! Note that we will use [joblib](https://joblib.readthedocs.io/en/latest/) to parallelize the for loop and get the data faster.)

In [None]:
from joblib import Parallel, delayed
import multiprocessing


def request_whoi(dataset_id):
    e.constraints = None
    e.protocol = "tabledap"
    e.variables = ["longitude", "latitude", "temperature", "salinity"]
    e.dataset_id = dataset_id
    # Drop units in the first line and NaNs.
    df = e.to_pandas(response="csv", skiprows=(1,)).dropna()
    return (dataset_id, df)
        

num_cores = multiprocessing.cpu_count()
downloads = Parallel(n_jobs=num_cores)(
    delayed(request_whoi)(dataset_id) for dataset_id in whoi_gliders
)

dfs = {glider: df for (glider, df) in downloads}

Finally let's see some figures!

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter


def make_map():
    fig, ax = plt.subplots(
        figsize=(9, 9),
        subplot_kw=dict(projection=ccrs.PlateCarree())
    )
    ax.coastlines(resolution="10m")
    lon_formatter = LongitudeFormatter(zero_direction_label=True)
    lat_formatter = LatitudeFormatter()
    ax.xaxis.set_major_formatter(lon_formatter)
    ax.yaxis.set_major_formatter(lat_formatter)

    return fig, ax


fig, ax = make_map()
lons, lats = [], []
for glider, df in dfs.items():
    lon, lat = df["longitude"], df["latitude"]
    lons.extend(lon.array)
    lats.extend(lat.array)
    ax.plot(lon, lat)

dx = dy = 0.25
extent = min(lons)-dx, max(lons)+dx, min(lats)+dy, max(lats)+dy
ax.set_extent(extent)

ax.set_xticks([extent[0], extent[1]], crs=ccrs.PlateCarree())
ax.set_yticks([extent[2], extent[3]], crs=ccrs.PlateCarree());

In [None]:
def glider_scatter(df, ax):
    ax.scatter(df["temperature"], df["salinity"],
               s=10, alpha=0.25)

fig, ax = plt.subplots(figsize=(9, 9))
ax.set_ylabel("salinity")
ax.set_xlabel("temperature")
ax.grid(True)

for glider, df in dfs.items():
    glider_scatter(df, ax)

ax.axis([5.5, 30, 30, 38]);

## Extra convenience methods for common responses

### OPeNDAP

In [None]:
from netCDF4 import Dataset


e.constraints = None
e.protocol = "tabledap"
e.dataset_id = "whoi_406-20160902T1700"

opendap_url = e.get_download_url(
    response="opendap",
)

print(opendap_url)
with Dataset(opendap_url) as nc:
    print(nc.summary)

### netCDF Climate and Forecast

In [None]:
e.response = "nc"
e.variables = ["longitude", "latitude", "temperature", "salinity"]

nc = e.to_ncCF()

print(nc.Conventions)
print(nc["temperature"])

### xarray

In [None]:
ds = e.to_xarray(decode_times=False)

ds

Tabledap represents all data in tabular form and the next steps, while a bit awkward, are necessary to match the dimensions properly. The griddap response (unsupported at the moment) does not have this limitation.

In [None]:
row_size = ds["rowSize"].values
lon = ds["longitude"].values
lat = ds["latitude"].values

lons, lats = [], []
for x, y, r in zip(lon, lat, row_size):
    lons.extend([x]*r)
    lats.extend([y]*r)

In [None]:
import numpy as np


data = ds["temperature"].values
depth = ds["depth"].values

mask = ~np.ma.masked_invalid(depth).mask

data = data[mask]
depth = depth[mask]
lons = np.array(lons)[mask]
lats = np.array(lats)[mask]

In [None]:
mask = depth <= 5

data = data[mask]
depth = depth[mask]
lons = lons[mask]
lats = lats[mask]

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs


dx = dy = 1.5
extent = (
    ds.geospatial_lon_min-dx, ds.geospatial_lon_max+dx,
    ds.geospatial_lat_min-dy, ds.geospatial_lat_max+dy
)
fig, ax = make_map()

cs = ax.scatter(lons, lats, c=data, s=50, alpha=0.5, edgecolor="none")
cbar = fig.colorbar(cs, orientation="vertical",
                    fraction=0.1, shrink=0.9, extend="both")
ax.set_extent(extent)
ax.coastlines("10m");

### iris

In [None]:
import warnings

# Iris warnings are quire verbose!
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    cubes = e.to_iris()

print(cubes)

In [None]:
cubes.extract_strict("sea_water_temperature")

This example is written in a Jupyter Notebook
[click here](https://raw.githubusercontent.com/ioos/erddapy/master/notebooks/quick_intro.ipynb)
to download the notebook so you can run it locally, or [click here](https://binder.pangeo.io/v2/gh/ioos/erddapy/master?filepath=notebooks/quick_intro.ipynb) to run a live instance of this notebook.