# Exploring COAWST coupled circulation/wave forecast data

In [None]:
import fsspec
import xarray as xr
import hvplot.xarray

In [None]:
fs = fsspec.filesystem('s3', requester_pays=True)
fs.ls('s3://pangeo-data-uswest2/esip/COAWST/surface_vars')

Xarray uses Dask behind the scenes, so spin up a Dask Cluster

In [None]:
import sys, os
sys.path.append(os.path.join(os.environ['HOME'],'shared','users','lib'))
import ebdpy as ebd
ebd.set_credentials(profile='esip-qhub')

aws_profile = 'esip-qhub'
aws_region = 'us-west-2'
endpoint = f's3.{aws_region}.amazonaws.com'
ebd.set_credentials(profile=aws_profile, region=aws_region, endpoint=endpoint)
worker_max = 30
client,cluster = ebd.start_dask_cluster(profile=aws_profile, worker_max=worker_max, 
                                      region=aws_region, use_existing_cluster=True,
                                      adaptive_scaling=False, wait_for_cluster=False, 
                                      environment='pangeo', worker_profile='Pangeo Worker', 
                                      propagate_env=True)

#### Method 1: Access data via THREDDS Data Server

In [None]:
%%time
url = 'http://geoport.usgs.esipfed.org/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd'
ds_dap = xr.open_dataset(url, chunks={'time':48}).drop('time_run')

In [None]:
# Dataset in TB
ds_dap.nbytes/1e12

In [None]:
ds_dap.Hwave

In [None]:
len(ds_dap.data_vars)

#### Extract one week time series from THREDDS

In [None]:
%%time
h = ds_dap['Hwave'].sel(time=slice('2012-10-25','2012-10-31'))[:,288,610].load()  # New York Bight

In [None]:
h.hvplot(grid=True)

In [None]:
# So.. how many hours would it take to read the full time series?
9*52*20/3600

#### Method 2: Read rechunked Zarr data from AWS Cloud 

In [None]:
ds = xr.open_zarr(fsspec.get_mapper('s3://pangeo-data-uswest2/esip/COAWST/surface_vars', 
                  requester_pays=True), consolidated=True)

In [None]:
ds

In [None]:
ds.Hwave

#### Extract entire time series 

In [None]:
%%time
h = ds['Hwave'][:,288,610].load()  # New York Bight
h.hvplot(grid=True)

Plot entire field at a fixed time (here Hurricane Sandy)

In [None]:
%%time
h = ds['Hwave'].sel(ocean_time='2012-10-29 22:00').load()

In [None]:
h.hvplot.quadmesh(x='lon_rho', y='lat_rho', geo=True, frame_height=400,
                  rasterize=True, cmap='turbo', tiles='OSM')

The computationally expensive step: taking the mean of the entire wave height field over time

In [None]:
%%time
hwave_mean = ds['Hwave'].mean(dim='ocean_time').compute()

Taking the mean using our cluster reading from AWS took only about 1 minute! 

The previous workflow for this calculation was obtaining the data via THREDDS and calculating the mean on a local desktop computer, which took 2 weeks!

In [None]:
hwave_mean.where(hwave_mean>0.0).hvplot.quadmesh(x='lon_rho', y='lat_rho', 
                    rasterize=True, geo=True, cmap='turbo', tiles='OSM')

In the figure above, we can see locally enhanced waves in the Gulf Stream region, caused by the coupling between currents and waves in COAWST. This enhancement does not appear in non-coupled models like the NOAA WaveWatch III model.

In [None]:
cluster.close(); client.close()