# Exploring COAWST coupled circulation/wave forecast data

In [None]:
import fsspec
import xarray as xr
import hvplot.xarray

Open COAWST dataset stored in an Amazon S3 bucket

In [None]:
ds = xr.open_zarr(fsspec.get_mapper('s3://pangeo-data-uswest2/esip/COAWST/surface_vars', 
                  requester_pays=True), consolidated=True)

In [None]:
ds.Hwave

How many GB is this dataset?

In [None]:
ds.nbytes/1e9

Examine the wave height variable

In [None]:
ds['Hwave']

Start up a dask cluster on Kubernetes

In [None]:
ds.Hwave[:10,180,400].values

In [None]:
import sys, os
sys.path.append(os.path.join(os.environ['HOME'],'shared','users','lib'))
import ebdpy as ebd

worker_max=20
client,cluster=ebd.start_dask_cluster(profile='esip-qhub',worker_max=worker_max,
                                      worker_profile='Pangeo Worker',
                                      propagate_env=True,adaptive_scaling=False)

In [None]:
#client.close(); cluster.shutdown();

In [None]:
#cluster.adapt(minimum=1, maximum=30)
#cluster.scale(20)

In [None]:
#client

In [None]:
#client = Client(cluster)

#### Load a one week time series from THREDDS

In [None]:
%%time
url = 'http://geoport.usgs.esipfed.org/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd'
ds_dap = xr.open_dataset(url).drop('time_run')

In [None]:
%%time
h = ds_dap['Hwave'].sel(time=slice('2017-01-01','2017-01-07'))[:,288,610].load()  # New York Bight

In [None]:
h.hvplot(grid=True)

#### Load the entire 9 year time series from AWS Bucket

In [None]:
%%time
h = ds['Hwave'][:,288,610].load() # New York Bight
h.hvplot(grid=True) 

Plot entire field at a fixed time (here Hurricane Sandy)

In [None]:
%%time
h = ds['Hwave'].sel(ocean_time='2012-10-29 22:00').load()

In [None]:
h.hvplot.quadmesh(x='lon_rho', y='lat_rho', geo=True, rasterize=True, cmap='turbo', tiles='OSM')

The computationally expensive step: taking the mean of the entire wave height field over time

In [None]:
%%time
hwave_mean = ds['Hwave'].mean(dim='ocean_time').compute()

Taking the mean using 60 cpus took only 1 minute!  The previous workflow for this calculation was obtaining the data via web services and calculating the mean on a local desktop computer, which took 2 weeks!

In [None]:
hwave_mean.where(hwave_mean>0.0).hvplot.quadmesh(x='lon_rho', y='lat_rho', 
                    rasterize=True, geo=True, cmap='turbo', tiles='OSM')

In the figure above, we can see locally enhanced waves in the Gulf Stream region, caused by the coupling between currents and waves in COAWST. This enhancement does not appear in non-coupled models like the NOAA WaveWatch III model.