# Subsetting Cloud OPeNDAP Example

One of OPeNDAP's key features is it's subsetting capabilities. It is capable of subsetting by time, space and variable, and is an effective tool for those looking to reduce the amount of gridded data required for research or analysis to a specific region, time, or set of variables. As such, subsetting reduces the amount of data run and allows users to key in on areas of interest.

This example requires preliminary Cloud-OPeNDAP set up before running these examples. For instructions, see credentials.ipynb in this repository.

In [None]:
import netCDF4 as nc
OPENDAP_URL = "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Atmosphere%20Surface%20Temperature%2C%20Humidity%2C%20Wind%2C%20and%20Pressure%20-%20Daily%20Mean%200.5%20Degree%20(Version%204%20Release%204)/granules/ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_2017-12-31_ECCO_V4r4_latlon_0p50deg"
ds = nc.Dataset(OPENDAP_URL, mode="r")

print(ds)

### Dataset Set Up
Subsetting is based on variable selection. To view the variables in the dataset, run the following command, which will output the variables' names, types, and ranges:

In [None]:
ds.variables

### Install Libraries
These examples use xarray and matplot, which are helpful tools for data analysis and computation.
In your local environment, install ```xarray``` and ```matplotlib``` for the following subsetting plot functionalities.

In [None]:
import matplotlib.pyplot as plt
import xarray as xr

### Method 1: Subsetting with xarray
Subsetting begins by first accessing cloud-enabled OPeNDAP with xarray's ```open_dataset``` function. Create an xarray named ```ecco_ds```. This xarray will later be subsetted into smaller defined tiles. 

In [None]:
ecco_ds = xr.open_dataset(OPENDAP_URL)

In [None]:
print(ecco_ds)

Print out the variables in the dataset:

In [None]:
ecco_ds.data_vars

Select one of the variables listed above. In this example, `EXFewind` is selected. 
#### Assess the variable:
View `EXFewind` in more detail and without the global attributes by running: 

In [None]:
ecco_ds['EXFwspee']

#### Plot the entire variable:

In [None]:
ecco_ds.EXFwspee.plot()

#### Subset with defined ranges:
To subset, we will use xarray's ```isel``` method. This returns a new dataset with each array indexed along the specified dimension(s). These calls extract arrays based on the provided parameters that are read as a set of coordinate indices. For more information, https://docs.xarray.dev/en/stable/generated/xarray.Dataset.isel.html.  

In this example, the following ranges are used: latitude (50,100), longitude (300,700).

In [None]:
ecco_ds.EXFwspee.isel(latitude=slice(50,100),longitude=slice(300,700)).plot()

In [None]:
output_tile = ecco_ds.isel(latitude=300, time=0).load()
output_tile.data_vars

Use `coords` to view which coordinate values are available to use for our subset plot.

In [None]:
ecco_ds.coords

The following plots the subsetted data with windspeed along latitude=300 over time=0 (see the output_tile definition above). 

In [None]:
f, axarr = plt.subplots(1, sharex=True,figsize=(8, 8))
ax = axarr
ax.plot(ecco_ds['longitude'], output_tile.EXFwspee, color='b')
ax.set_xlabel('Wind Speed: m s-1')
ax.set_ylabel('Longitude')
ax.set_title('Wind Speed vs Longitude (m s-1), Latitude=300')
ax.grid()

## Method 2: Subsetting using the Numpy Syntax
An alternative method is to subset using Numpy's [ ] syntax on ```DataArray```.

In [None]:
ecco_ds.data_vars

In [None]:
wind_speed_arr = ecco_ds.EXFwspee.values
type(wind_speed_arr)
wind_speed_arr.shape

The shape above (1, 360, 720) aligns with the variable definition (time, latitude, longitude). To plot this using numpy, the variable has to be reduced to a 2-D array. In this example, time is removed since time is the same value for all latitude, longitude values.

In [None]:
wind_speed_2d = wind_speed_arr[0,:,:]
wind_speed_2d.shape

In [None]:
fig = plt.figure(figsize=(10, 8))
plt.imshow(wind_speed_2d, origin='lower',cmap='jet')
plt.colorbar()
time = ecco_ds.time.values[0]
plt.title(f'Wind Speed m s-1 for {time}')
plt.xlabel('latitude')
plt.ylabel('longitude')

To subset, create a 2D variable to plot with defined ranges. In this example, ```EXFwspee``` is subsetted with the following ranges are used: latitude (50,100), longitude (300,700).

In [None]:
wind_speed_2d_subset = ecco_ds.EXFwspee[0,50:100,300:700]
fig = plt.figure(figsize=(10, 8))
wind_speed_2d_subset.plot()

In [None]:
wind_speed_1d_subset = ecco_ds.EXFwspee[0,300,:]

Trim down ```wind_speed_2_d_subset``` to a 1-D array. In this example, latitude = 300 and longitude is saved.

In [None]:
wind_speed_1d_subset.shape

In [None]:
print(wind_speed_1d_subset)

In [None]:
f, axarr = plt.subplots(1, sharex=True,figsize=(8, 8))
ax = axarr
ax.plot(ecco_ds['longitude'], wind_speed_1d_subset, color='b')
ax.set_xlabel('Wind Speed: m s-1')
ax.set_ylabel('Longitude')
ax.set_title('Wind Speed vs Longitude (m s-1), Latitude=300')
ax.grid()