# Longer introduction

Let's explore the methods and attributes available in the ERDDAP object? Note
that we can either use the short server key (NGDAC) or the full URL. For a list
of the short keys check _erddapy.servers_.


In [None]:
from erddapy import ERDDAP

server = "https://gliders.ioos.us/erddap"
e = ERDDAP(server=server)

[method for method in dir(e) if not method.startswith("_")]

All the methods prefixed with \_get\_\_ will return a valid ERDDAP URL for the
requested response and options. For example, searching for all datasets
available.


In [None]:
url = e.get_search_url(search_for="all", response="html")

print(url)

There are many responses available, see the docs for
[griddap](https://erddap.ioos.us/erddap/griddap/documentation.html)
and
[tabledap](https://erddap.ioos.us/erddap/tabledap/documentation.html)
respectively. The most useful ones for Pythonistas are the .csv and .nc that can
be read with pandas and netCDF4-python respectively.

Let's load the csv response directly with pandas.


In [None]:
import pandas as pd

url = e.get_search_url(search_for="whoi", response="csv")

df = pd.read_csv(url)
print(
    f'We have {len(set(df["tabledap"].dropna()))} '
    f'tabledap, {len(set(df["griddap"].dropna()))} '
    f'griddap, and {len(set(df["wms"].dropna()))} wms endpoints.',
)

We can refine our search by providing some constraints.

Let's narrow the search area, time span, and look for **sea_water_temperature**.

In [None]:
from doc_helpers import show_iframe

kw = {
    "standard_name": "sea_water_temperature",
    "min_lon": -72.0,
    "max_lon": -69.0,
    "min_lat": 38.0,
    "max_lat": 41.0,
    "min_time": "2016-07-10T00:00:00Z",
    "max_time": "2017-02-10T00:00:00Z",
    "cdm_data_type": "trajectoryprofile",
}


search_url = e.get_search_url(response="html", **kw)
show_iframe(search_url)

The search form was populated with the constraints we provided.

Changing the response from html to csv we load it in a data frame.


In [None]:
search_url = e.get_search_url(response="csv", **kw)
search = pd.read_csv(search_url)
gliders = search["Dataset ID"].to_numpy()

gliders_list = "\n".join(gliders)
print(f"Found {len(gliders)} Glider Datasets:\n{gliders_list}")

Now that we know the Dataset ID we can explore their metadata with the
_get_info_url_ method.


In [None]:
glider = gliders[-1]

info_url = e.get_info_url(dataset_id=glider, response="html")

show_iframe(src=info_url)

We can manipulate the metadata and find the variables that have the
_cdm_profile_variables_ attribute using the csv response.


In [None]:
info_url = e.get_info_url(dataset_id=glider, response="csv")

info = pd.read_csv(info_url)
info.head()

In [None]:
"".join(info.loc[info["Attribute Name"] == "cdm_profile_variables", "Value"])

Selecting variables by theirs attributes is such a common operation that erddapy
brings its own method to simplify this task.

The _get_var_by_attr_ method was inspired by netCDF4-python's
_get_variables_by_attributes_. However, because erddapy operates on remote
serves, it will return the variable names instead of the actual data.

We ca check what is/are the variable(s) associated with the _standard_name_ used
in the search.

Note that _get_var_by_attr_ caches the last response in case the user needs to
make multiple requests. (See the execution times below.)


In [None]:
%%time

# First one, slow.
e.get_var_by_attr(
    dataset_id="whoi_406-20160902T1700",
    standard_name="sea_water_temperature",
)

In [None]:
%%time

# Second one on the same glider, a little bit faster.
e.get_var_by_attr(
    dataset_id="whoi_406-20160902T1700",
    standard_name="sea_water_practical_salinity",
)

Another way to browse datasets is via the _categorize_ URL. In the example below
we can get all the _standard_names_ available in the dataset with a single
request.


In [None]:
url = e.get_categorize_url(categorize_by="standard_name", response="csv")

pd.read_csv(url)["Category"]

We can also pass a **value** to filter the categorize results.


In [None]:
url = e.get_categorize_url(
    categorize_by="institution",
    value="woods_hole_oceanographic_institution",
    response="csv",
)

df = pd.read_csv(url)
whoi_gliders = df.loc[~df["tabledap"].isna(), "Dataset ID"].tolist()
whoi_gliders

Let's create a map of some WHOI gliders tracks.

We are downloading a lot of data! Note that we will use
[joblib](https://joblib.readthedocs.io/en/latest/) to parallelize the for loop
and get the data faster and we will limit to the first 5 gliders.

In [None]:
import multiprocessing

from joblib import Parallel, delayed

from erddapy.core import get_download_url, to_pandas


def request_whoi(dataset_id):
    variables = ["longitude", "latitude", "temperature", "salinity"]
    url = get_download_url(
        server,
        dataset_id,
        protocol="tabledap",
        variables=variables,
        response="csv",
    )
    # Drop units in the first line and NaNs.
    df = to_pandas(url, pandas_kwargs={"skiprows": (1,)}).dropna()
    return (dataset_id, df)


num_cores = multiprocessing.cpu_count()
downloads = Parallel(n_jobs=num_cores)(
    delayed(request_whoi)(dataset_id) for dataset_id in whoi_gliders[:5]
)

dfs = dict(downloads)

Finally let's see some figures!

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from cartopy.mpl.ticker import LatitudeFormatter, LongitudeFormatter


def make_map():
    fig, ax = plt.subplots(
        figsize=(9, 9),
        subplot_kw={"projection": ccrs.PlateCarree()},
    )
    ax.coastlines(resolution="10m")
    lon_formatter = LongitudeFormatter(zero_direction_label=True)
    lat_formatter = LatitudeFormatter()
    ax.xaxis.set_major_formatter(lon_formatter)
    ax.yaxis.set_major_formatter(lat_formatter)

    return fig, ax


fig, ax = make_map()
lons, lats = [], []
for df in dfs.values():
    lon, lat = df["longitude"], df["latitude"]
    lons.extend(lon.array)
    lats.extend(lat.array)
    ax.plot(lon, lat)

dx = dy = 0.25
extent = min(lons) - dx, max(lons) + dx, min(lats) + dy, max(lats) + dy
ax.set_extent(extent)

ax.set_xticks([extent[0], extent[1]], crs=ccrs.PlateCarree())
ax.set_yticks([extent[2], extent[3]], crs=ccrs.PlateCarree());

In [None]:
def glider_scatter(df, ax):
    ax.scatter(df["temperature"], df["salinity"], s=10, alpha=0.25)


fig, ax = plt.subplots(figsize=(9, 9))
ax.set_ylabel("salinity")
ax.set_xlabel("temperature")
ax.grid(True)

for df in dfs.values():
    glider_scatter(df, ax)

ax.axis([5.5, 30, 30, 38])

In [None]:
e.dataset_id = "whoi_406-20160902T1700"
e.protocol = "tabledap"
e.variables = [
    "depth",
    "latitude",
    "longitude",
    "salinity",
    "temperature",
    "time",
]

e.constraints = {
    "time>=": "2016-09-03T00:00:00Z",
    "time<=": "2017-02-10T00:00:00Z",
    "latitude>=": 38.0,
    "latitude<=": 41.0,
    "longitude>=": -72.0,
    "longitude<=": -69.0,
}


df = e.to_pandas(
    index_col="time (UTC)",
    parse_dates=True,
).dropna()

In [None]:
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(17, 2))
cs = ax.scatter(
    df.index,
    df["depth (m)"],
    s=15,
    c=df["temperature (Celsius)"],
    marker="o",
    edgecolor="none",
)

ax.invert_yaxis()
ax.set_xlim(df.index[0], df.index[-1])
xfmt = mdates.DateFormatter("%H:%Mh\n%d-%b")
ax.xaxis.set_major_formatter(xfmt)

cbar = fig.colorbar(cs, orientation="vertical", extend="both")
cbar.ax.set_ylabel(r"Temperature ($^\circ$C)")
ax.set_ylabel("Depth (m)")