# Open NWM 1km dataset as ReferenceFileSystem 
Create a `ReferenceFileSystem` object by reading references from a 9.8GB combined JSON file. 

Opening the dataset in Xarray takes more than 10 minutes, mostly due to decoding the giant JSON file.   It also requires more than 50GB of RAM to run, more than 8GB or 16GB typically available to users. 

In [1]:
import fsspec
import xarray as xr
from fsspec.implementations.reference import ReferenceFileSystem

In [2]:
fs = fsspec.filesystem('s3', anon=True, 
                        client_kwargs={'endpoint_url':'https://ncsa.osn.xsede.org'})

In [3]:
url = 's3://esip/noaa/nwm/grid1km/LDAS_combined.json'

In [4]:
fs.size(url)/1e9  # JSON size in GB

9.780943801

In [5]:
%%time
s_opts = {'anon':True, 'client_kwargs':{'endpoint_url':'https://ncsa.osn.xsede.org'}}
r_opts = {'anon':True}
fs = ReferenceFileSystem(url, ref_storage_args=s_opts,
                       remote_protocol='s3', remote_options=r_opts)
m = fs.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", chunks={}, backend_kwargs=dict(consolidated=False))

CPU times: user 4min 34s, sys: 40.4 s, total: 5min 14s
Wall time: 7min 44s


In [6]:
ds

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,30.03 TiB,8.44 MiB
Shape,"(116631, 3840, 2, 4608)","(1, 960, 1, 1152)"
Dask graph,3732192 chunks in 2 graph layers,3732192 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 30.03 TiB 8.44 MiB Shape (116631, 3840, 2, 4608) (1, 960, 1, 1152) Dask graph 3732192 chunks in 2 graph layers Data type float64 numpy.ndarray",116631  1  4608  2  3840,

Unnamed: 0,Array,Chunk
Bytes,30.03 TiB,8.44 MiB
Shape,"(116631, 3840, 2, 4608)","(1, 960, 1, 1152)"
Dask graph,3732192 chunks in 2 graph layers,3732192 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,30.03 TiB,8.44 MiB
Shape,"(116631, 3840, 2, 4608)","(1, 960, 1, 1152)"
Dask graph,3732192 chunks in 2 graph layers,3732192 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 30.03 TiB 8.44 MiB Shape (116631, 3840, 2, 4608) (1, 960, 1, 1152) Dask graph 3732192 chunks in 2 graph layers Data type float64 numpy.ndarray",116631  1  4608  2  3840,

Unnamed: 0,Array,Chunk
Bytes,30.03 TiB,8.44 MiB
Shape,"(116631, 3840, 2, 4608)","(1, 960, 1, 1152)"
Dask graph,3732192 chunks in 2 graph layers,3732192 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,60.06 TiB,5.40 MiB
Shape,"(116631, 3840, 4, 4608)","(1, 768, 1, 922)"
Dask graph,11663100 chunks in 2 graph layers,11663100 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 60.06 TiB 5.40 MiB Shape (116631, 3840, 4, 4608) (1, 768, 1, 922) Dask graph 11663100 chunks in 2 graph layers Data type float64 numpy.ndarray",116631  1  4608  4  3840,

Unnamed: 0,Array,Chunk
Bytes,60.06 TiB,5.40 MiB
Shape,"(116631, 3840, 4, 4608)","(1, 768, 1, 922)"
Dask graph,11663100 chunks in 2 graph layers,11663100 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,60.06 TiB,5.40 MiB
Shape,"(116631, 3840, 4, 4608)","(1, 768, 1, 922)"
Dask graph,11663100 chunks in 2 graph layers,11663100 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 60.06 TiB 5.40 MiB Shape (116631, 3840, 4, 4608) (1, 768, 1, 922) Dask graph 11663100 chunks in 2 graph layers Data type float64 numpy.ndarray",116631  1  4608  4  3840,

Unnamed: 0,Array,Chunk
Bytes,60.06 TiB,5.40 MiB
Shape,"(116631, 3840, 4, 4608)","(1, 768, 1, 922)"
Dask graph,11663100 chunks in 2 graph layers,11663100 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Examine a specific variable:

In [7]:
ds.TRAD

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.02 TiB 5.40 MiB Shape (116631, 3840, 4608) (1, 768, 922) Dask graph 2915775 chunks in 2 graph layers Data type float64 numpy.ndarray",4608  3840  116631,

Unnamed: 0,Array,Chunk
Bytes,15.02 TiB,5.40 MiB
Shape,"(116631, 3840, 4608)","(1, 768, 922)"
Dask graph,2915775 chunks in 2 graph layers,2915775 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Compute the uncompressed size of this dataset in TB:

In [8]:
ds.nbytes/1e12   

462.28064798432

Loading data for a particular time step is fast as the references are already loaded:

In [9]:
%%time
da = ds.TRAD.sel(time='1990-01-01 00:00').load()

CPU times: user 4.36 s, sys: 652 ms, total: 5.01 s
Wall time: 5.96 s


Loading data for another time step takes about the same amount of time:

In [10]:
%%time
da = ds.TRAD.sel(time='2015-01-01 00:00').load()

CPU times: user 4.17 s, sys: 436 ms, total: 4.6 s
Wall time: 4.67 s


Compute the mean over the domain:

In [11]:
da.mean().data

array(266.92635398)