# LLC4320 data access demonstration using `xarray`

Miguel Jimenez-Urias May '25


### The environment requires:

```python

pip install --upgrade fsspec
pip install --upgrade xarray
pip install --upgrade dask
pip install --upgrade distributed

```


In [1]:
import fsspec
import xarray as xr
from dask.distributed import Client

# for benchmarking using dask
import dask
from dask import delayed, compute
from dask.distributed import Client
import gc

In [2]:
print("fsspec version: ", fsspec.__version__)
print("xarray: ", xr.__version__)
print("dask version: ", dask.__version__)


fsspec version:  2022.11.0
xarray:  2022.11.0
dask version:  2024.8.0


In [3]:
client = Client(n_workers=4, threads_per_worker=2)  # Basic limited cpu example
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 238.42 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:44271,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 238.42 GiB

0,1
Comm: tcp://127.0.0.1:33041,Total threads: 2
Dashboard: http://127.0.0.1:40611/status,Memory: 59.60 GiB
Nanny: tcp://127.0.0.1:36593,
Local directory: /tmp/dask-scratch-space/worker-0on2h87a,Local directory: /tmp/dask-scratch-space/worker-0on2h87a

0,1
Comm: tcp://127.0.0.1:46295,Total threads: 2
Dashboard: http://127.0.0.1:46207/status,Memory: 59.60 GiB
Nanny: tcp://127.0.0.1:32831,
Local directory: /tmp/dask-scratch-space/worker-x1oq09uy,Local directory: /tmp/dask-scratch-space/worker-x1oq09uy

0,1
Comm: tcp://127.0.0.1:39419,Total threads: 2
Dashboard: http://127.0.0.1:35597/status,Memory: 59.60 GiB
Nanny: tcp://127.0.0.1:44535,
Local directory: /tmp/dask-scratch-space/worker-db5_5l4r,Local directory: /tmp/dask-scratch-space/worker-db5_5l4r

0,1
Comm: tcp://127.0.0.1:39103,Total threads: 2
Dashboard: http://127.0.0.1:35031/status,Memory: 59.60 GiB
Nanny: tcp://127.0.0.1:40575,
Local directory: /tmp/dask-scratch-space/worker-wl5ql6ex,Local directory: /tmp/dask-scratch-space/worker-wl5ql6ex


# Json Kerchunk references

Each LLC4320 field variable has a Kerchunk reference to speed up metadata creation that points to all the associated distributed zarr stores across the ceph cluster.

In [4]:
ceph_path = "/home/idies/workspace/poseidon_ceph/LLC4320/"
Ker_path = ceph_path + "Kerchunks/"

# Individual 4D variables:
Salt_json= Ker_path + "LLC4320_SALT.json"
Theta_json = Ker_path + "LLC4320_THETA.json"
U_json =  Ker_path + "LLC4320_U.json"
V_json =  Ker_path+ "LLC4320_V.json"
W_json =   Ker_path + "LLC4320_W.json"

# Individual 3D (surface) variables:
Eta_json = Ker_path + "LLC4320_Eta.json"
KPPhbl_json = Ker_path + "LLC4320_KPPhbl.json"
PhiBot_json =  Ker_path + "LLC4320_PhiBot.json"
SIarea_json =  Ker_path + "LLC4320_SIarea.json"
SIheff_json =  Ker_path + "LLC4320_SIheff.json"
SIhsalt_json =  Ker_path + "LLC4320_SIhsalt.json"
SIhsnow_json =  Ker_path + "LLC4320_SIhsnow.json"
SIuice_json =  Ker_path + "LLC4320_SIuice.json"
SIvice_json =  Ker_path + "LLC4320_SIvice.json"
oceFWflx_json =  Ker_path + "LLC4320_oceFWflx.json"
oceQnet_json =  Ker_path + "LLC4320_oceQnet.json"
oceQsw_json =  Ker_path + "LLC4320_oceQsw.json"
oceSflux_json =  Ker_path + "LLC4320_oceSflux.json"
oceTAUX_json =  Ker_path + "LLC4320_oceTAUX.json"
oceTAUY_json =  Ker_path + "LLC4320_oceTAUY.json"

# All 3D (surface) variables:
Surface_json = "LLC4320_Surface.json" # get all surface variables at once

# Open the zarr stores

### Identify all variables you can in your dataset by including the json path of its Kerchunk references 

```python
json_paths=[Kerchunk_json1, Kerchunk_json2,...]
```



In [5]:
# Here, add the json files of the variables you want to include in your dataset
json_paths = [Salt_json, Theta_json, U_json, V_json, W_json, Eta_json, KPPhbl_json, 
              PhiBot_json,  SIarea_json, SIheff_json, SIhsalt_json, SIhsnow_json, 
              SIuice_json, SIvice_json, oceFWflx_json, oceQnet_json, oceQsw_json,
              oceSflux_json, oceTAUX_json, oceTAUY_json,
             ]

# Load the LLC4320 grid


In [6]:
grid_path = ceph_path
ds_grid = xr.open_zarr(grid_path)
ds_grid = ds_grid.set_coords([var for var in ds_grid.data_vars])
ds_grid

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 360 B 360 B Shape (90,) (90,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1,

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,364 B,364 B
Shape,"(91,)","(91,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 364 B 364 B Shape (91,) (91,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",91  1,

Unnamed: 0,Array,Chunk
Bytes,364 B,364 B
Shape,"(91,)","(91,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 360 B 360 B Shape (90,) (90,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1,

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 360 B 360 B Shape (90,) (90,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1,

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,364 B,364 B
Shape,"(91,)","(91,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 364 B 364 B Shape (91,) (91,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",91  1,

Unnamed: 0,Array,Chunk
Bytes,364 B,364 B
Shape,"(91,)","(91,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 360 B 360 B Shape (90,) (90,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1,

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,364 B,364 B
Shape,"(91,)","(91,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 364 B 364 B Shape (91,) (91,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",91  1,

Unnamed: 0,Array,Chunk
Bytes,364 B,364 B
Shape,"(91,)","(91,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 360 B 360 B Shape (90,) (90,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1,

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.34 GiB,177.98 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.34 GiB 177.98 MiB Shape (90, 13, 4320, 4320) (90, 1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1  4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,81.34 GiB,177.98 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.34 GiB,177.98 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.34 GiB 177.98 MiB Shape (90, 13, 4320, 4320) (90, 1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1  4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,81.34 GiB,177.98 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.34 GiB,177.98 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.34 GiB 177.98 MiB Shape (90, 13, 4320, 4320) (90, 1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1  4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,81.34 GiB,177.98 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.34 GiB,44.49 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 20.34 GiB 44.49 MiB Shape (90, 13, 4320, 4320) (90, 1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type bool numpy.ndarray",90  1  4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,20.34 GiB,44.49 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.34 GiB,44.49 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 20.34 GiB 44.49 MiB Shape (90, 13, 4320, 4320) (90, 1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type bool numpy.ndarray",90  1  4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,20.34 GiB,44.49 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,20.34 GiB,44.49 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 20.34 GiB 44.49 MiB Shape (90, 13, 4320, 4320) (90, 1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type bool numpy.ndarray",90  1  4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,20.34 GiB,44.49 MiB
Shape,"(90, 13, 4320, 4320)","(90, 1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.90 GiB 1.98 MiB Shape (13, 4320, 4320) (1, 720, 720) Dask graph 468 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  4320  13,

Unnamed: 0,Array,Chunk
Bytes,0.90 GiB,1.98 MiB
Shape,"(13, 4320, 4320)","(1, 720, 720)"
Dask graph,468 chunks in 2 graph layers,468 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 360 B 360 B Shape (90,) (90,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",90  1,

Unnamed: 0,Array,Chunk
Bytes,360 B,360 B
Shape,"(90,)","(90,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


# Create xarray datasets

There is no significant performance gain when using dask for parallel read/creation of the mapper object. The code runs sequentially (unfortunately). But it provides a clean
way to create the xarray dataset in a few lines, avoiding repeated code.

In [7]:
@delayed
def delayed_open_mapper(json_file):
    fs = fsspec.filesystem("reference", fo=json_file)
    mapper = fs.get_mapper("")
    return mapper

def open_mapper(json_file):
    fs = fsspec.filesystem("reference", fo=json_file)
    mapper = fs.get_mapper("")
    return mapper

In [8]:
%%time
# instantly runs
# delayed_mappers = [delayed_open_mapper(json_path) for json_path in json_paths]

# ===========
# This may take almost as much as running sequentially
# It takes ~ 10*N secs, with N the number of variables
# ===========

# Distribute JSON paths to workers first
scattered_paths = client.scatter(json_paths, broadcast=False)
# Then submit open_mapper to each
futures = [client.submit(open_mapper, path) for path in scattered_paths]

# Optionally wait for all to complete
mappers = client.gather(futures)  # This doesn't move data, just ensures completion

# mappers = [open_mapper(path) for path in json_paths]


CPU times: user 3min 38s, sys: 20.1 s, total: 3min 59s
Wall time: 5min 7s


## Dataset creation

In [None]:
%%time
ds = xr.merge([xr.open_zarr(mapper, consolidated=False) for mapper in mappers]+[ds_grid])
ds

In [None]:
print("Size of dataset (uncompressed) : ", ds.nbytes/1e15, "Petabytes")

## Some plotting examples

The following approach of closing the client before plotting is much faster (see end of notebook).


In [None]:
client.close()

In [None]:
%%time
ds['Salt'].isel(time=1000, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot();

In [None]:
%%time
ds['Theta'].isel(time=1000, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);

In [None]:
%%time
ds['Eta'].isel(time=1000,face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='seismic');

# Load all LLC4320 surface variables

A json kerchunk file is available to open all surface variables at once. 

You can use this to create a dataset for all surface variables at once (I do not see any performance gain doing this)


```python

%%time
mapper_surface = open_mapper(Surface_json)
ds_theta = xr.open_zarr(mapper_surface, consolidated=False)
```



# Timing of plots

With, and without initializing the Client object. I am finding $4\times$ slower performance to plot a chunk of data, when initializing client, vs non-initalizing the client.


I will look only at Theta


In [None]:
%%time
mapper_surface = open_mapper(Theta_json)
ds_theta = xr.open_zarr(mapper_surface, consolidated=False)

In [None]:
%%time
ds_theta['Theta'].isel(time=0, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);

In [None]:
client = Client()
client

In [None]:
%%time
ds_theta['Theta'].isel(time=0, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);

In [None]:
client.close()

In [None]:
%%time
ds_theta['Theta'].isel(time=0, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);

In [None]:
%%time
ds_theta['Theta'].isel(time=0, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);

In [None]:
client = Client()
client

In [None]:
%%time
ds_theta['Theta'].isel(time=0, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);

In [None]:
%%time
ds_theta['Theta'].isel(time=0, k=0, face=2, i=slice(0, 720), j=slice(0, 720)).plot(cmap='RdBu_r', vmin=23, vmax=28.5);