# Open NWM 1km dataset as DFReferenceFileSystem 

Open dataset as a fsspec `DFReferenceFileSystem` filesystem by reading references from a collection of Parquet files: one file containing global metadata and coordinate variable references, and one file for each of the data variables.  

The big wins here are lazy-loading of the references for each variable, and the more efficient construction of the virtual fsspec filesystem from the Parquet files (JSON is slow to decode).

In [None]:
import fsspec
from fsspec.implementations.reference import DFReferenceFileSystem
import xarray as xr

In [None]:
fs = fsspec.filesystem('s3', anon=True, 
                        client_kwargs={'endpoint_url':'https://ncsa.osn.xsede.org'})

In [None]:
s3_lazy_refs = 's3://esip/noaa/nwm/lazy_refs'

In [None]:
print(f'Number of reference files: {len(fs.ls(s3_lazy_refs))}')
print(f'Total size of references: {fs.du(s3_lazy_refs)/1e9} GB')

In [None]:
r_opts = {'anon': True}
t_opts = {'anon': True, 'client_kwargs':{'endpoint_url':'https://ncsa.osn.xsede.org'}}

In [None]:
%%time
fs2 = DFReferenceFileSystem(s3_lazy_refs, lazy=True, target_options=t_opts,
                        remote_protocol='s3', remote_options=r_opts)
m = fs2.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", chunks={}, backend_kwargs=dict(consolidated=False))

In [None]:
ds

Examine a specific variable:

In [None]:
ds.TRAD

How big would the uncompressed size of the whole dataset be?

In [None]:
ds.nbytes/1e12  #TB

Load some data at a specific time step.  The first time a variable is accessed it will take longer as the references need to be loaded.

In [None]:
%%time 
da = ds.TRAD.sel(time='1990-01-01 00:00').load()

Loading data for another time step is much faster as the references are already loaded:

In [None]:
%%time
da = ds.TRAD.sel(time='2015-01-01 00:00').load()

Compute the mean over the domain:

In [None]:
da.mean().data

In [None]:
da.plot()