# 3 ways to read COAWST output
Matanzas Inlet Breach simulation.
Try reading:
* NetCDF4 file on S3 with h5netcdf library (HDF5 library)
* NetCDF4 file on S3 with zarr library, using fsspec's [FileReferenceSystem](https://medium.com/pangeo/cloud-performant-netcdf4-hdf5-with-zarr-fsspec-and-intake-3d3a3e7cb935)
* Zarr dataset on S3 with consolidated metadata

In [None]:
import xarray as xr
import fsspec

#### Reading NetCDF4 data from USGS ScienceBase AWS Bucket
[ScienceBase Item here](https://www.sciencebase.gov/catalog/item/609bf69ed34ea221ce39b261)
(requires login and permission)

In [None]:
fs0 = fsspec.filesystem('s3', anon=True)  # USGS ScienceBase Bucket, us-west-2
fs1 = fsspec.filesystem('s3', requester_pays=True, profile='esip-qhub') # ESIP Bucket, us-west-2

#### NetCDF4 file on S3 with h5netcdf library

In [None]:
%%time
fo = 's3://prod-is-usgs-sb-prod-publish/609bf69ed34ea221ce39b261/breach_matanzas.nc'
ds = xr.open_dataset(fs0.open(fo), decode_timedelta=False, engine='h5netcdf', 
                     chunks={'ocean_time':200, 'eta_rho':300, 'xi_rho':400})

In [None]:
%%time
ds.zeta[:,200,200].mean().values

#### Read NetCDF file on S3 using ReferenceFileSystem metadata with Zarr library

In [None]:
%%time
so = {"anon": True}
fo = "s3://prod-is-usgs-sb-prod-publish/609bf69ed34ea221ce39b261/breach_matanzas_ReferenceFileSystem.json"
fs = fsspec.filesystem("reference", fo=fo, target_options=so,
                       remote_protocol="s3", remote_options=so)
m = fs.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", decode_timedelta=False, 
                     chunks={'ocean_time':200, 'eta_rho':300, 'xi_rho':400})

In [None]:
%%time
ds.zeta[:,200,200].mean().values

#### Zarr Dataset on S3 with Zarr library 

In [None]:
%%time
ds = xr.open_zarr(fs1.get_mapper('esip-qhub/usgs/rsignell/zarr/breach_matanzas'),
                     decode_timedelta=False, consolidated=True, chunks={})

In [None]:
%%time
ds.zeta[:,200,200].mean().values

### Conclusions 
** based on this dataset **
* Opening the netCDF4 file on S3 with Zarr/ReferenceFileSystem is 10x faster than with HDF5 library
* Reading the netCDF4 file on S3 with Zarr/ReferenceFileSystem is 3-4x faster than with HDF5 library
* Opening and reading NetCDF can be as fast as Zarr