# Compare four ways to read COAWST coastal model output
Matanzas Inlet Breach simulation.
Try reading:
* NetCDF4 file on S3 with h5netcdf library (HDF5 library)
* NetCDF4 file on S3 with zarr library, using fsspec's FileReferenceSystem
* Zarr dataset on S3 with consolidated metadata
* Zarr dataset on S3 with unconsolidated metadata

In [None]:
import xarray as xr
import fsspec

In [None]:
fs = fsspec.filesystem('s3',requester_pays=True, profile='esip-qhub')

#### NetCDF4 file on S3 with h5netcdf library

In [None]:
%%time
ds = xr.open_dataset(fs.open('esip-qhub/usgs/rsignell/breach_matanzas_GTM_inwave.nc'),
                     decode_timedelta=False, engine='h5netcdf')

In [None]:
%%time
ds.zeta[:,200,200].mean().compute()

#### Read NetCDF file on S3 using ReferenceFileSystem metadata with Zarr library

In [None]:
%%time
so = {"requester_pays": True}
fs1 = fsspec.filesystem("reference", fo="s3://esip-qhub/usgs/rsignell/breach_matanzas_GTM_inwave.json", 
                       target_options=so,
                       remote_protocol="s3", remote_options=so)
m = fs1.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", decode_timedelta=False)

In [None]:
%%time
ds.zeta[:,200,200].mean().compute()

#### Zarr Dataset on S3 with Zarr library (consolidated metadata)

In [None]:
%%time
ds = xr.open_zarr(fs.get_mapper('esip-qhub/usgs/rsignell/zarr/breach_matanzas_GTM_inwave_b'),
                     decode_timedelta=False, consolidated=True)

In [None]:
%%time
ds.zeta[:,200,200].mean().compute()

#### Zarr Dataset on S3 with Zarr library (unconsolidated metadata)

In [None]:
%%time
ds = xr.open_zarr(fs.get_mapper('esip-qhub/usgs/rsignell/zarr/breach_matanzas_GTM_inwave_b'),
                     decode_timedelta=False, consolidated=False)

In [None]:
%%time
ds.zeta[:,200,200].mean().compute()

### Conclusions (**Valid for this dataset only!**)
* Opening the netCDF4 file on S3 with Zarr/ReferenceFileSystem is 10x faster than with HDF5 library
* Reading the netCDF4 file on S3 with Zarr/ReferenceFileSystem is 3-4x faster than with HDF5 library
* Opening and reading the same dataset written in Zarr format is still faster 
* Opening the Zarr dataset with consolidated metadata is an order of magnitude faster than with unconsolidated metadata

### Future Work
* Try reading both NetCDF4 and Zarr with the new NetCDF4 (NetCDF C library) S3 reading capability
