# “analysis ready data” (ARD) workflow for ensemble ACCESS-ESM1.5 data

## GitHub Issues:
- https://github.com/shared-climate-data-problems/CMIP-data-problems/issues/8
- https://github.com/COSIMA/cosima-recipes/issues/444

Date: 19 September, 2024

Author = {"name": "Thomas Moore", "affiliation": "CSIRO", "email": "thomas.moore@csiro.au", "orcid": "0000-0003-3930-1946"}

### Goals:
- Write a `Jupyter` notebook example to show and explore best-practice
- importantly, maintain the original names for the 40 ensembles in the resulting `xarray` object, for example `r1i1p1f1`

#### Setting up your ARE session - be sure to dial up your `JobFS` resources

### Bookmark this information on NCI Queue limits
https://opus.nci.org.au/pages/viewpage.action?pageId=236881198

**See JobFS limits for your chosen queue**

# Setup your Dask cluster

NB: looks like a work-around in netcdf4-python to deal with netcdf-c not being thread safe was removed in 1.6.1. The solution (for now) is to make sure your cluster only uses 1 thread per worker

Discussion:
https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389

![open source dependency XKCD](https://www.explainxkcd.com/wiki/images/d/d7/dependency.png)

Credit: XKCD - http://www.xkcd.com

In [1]:
### setup dask cluster
from dask.distributed import Client
client = Client(threads_per_worker=1) # for loading a dataset object from many NetCDF file paths
#client = Client() # possibly better for analysis after loading from an ARD collection / intermediate file
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 28
Total threads: 28,Total memory: 251.19 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:34729,Workers: 28
Dashboard: /proxy/8787/status,Total threads: 28
Started: Just now,Total memory: 251.19 GiB

0,1
Comm: tcp://127.0.0.1:46719,Total threads: 1
Dashboard: /proxy/43151/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:46379,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-snghtvaa,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-snghtvaa

0,1
Comm: tcp://127.0.0.1:45065,Total threads: 1
Dashboard: /proxy/32905/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33027,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-jdh79xvj,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-jdh79xvj

0,1
Comm: tcp://127.0.0.1:34109,Total threads: 1
Dashboard: /proxy/45789/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:34755,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-ihc3374j,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-ihc3374j

0,1
Comm: tcp://127.0.0.1:38851,Total threads: 1
Dashboard: /proxy/37853/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:37805,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-p_a55wdi,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-p_a55wdi

0,1
Comm: tcp://127.0.0.1:44819,Total threads: 1
Dashboard: /proxy/41217/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:34651,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-6wpyh_tp,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-6wpyh_tp

0,1
Comm: tcp://127.0.0.1:39293,Total threads: 1
Dashboard: /proxy/33777/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:46097,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-z37rais4,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-z37rais4

0,1
Comm: tcp://127.0.0.1:46557,Total threads: 1
Dashboard: /proxy/42993/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:39001,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-_feerm6o,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-_feerm6o

0,1
Comm: tcp://127.0.0.1:42359,Total threads: 1
Dashboard: /proxy/40533/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:46811,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-q4qzyfob,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-q4qzyfob

0,1
Comm: tcp://127.0.0.1:35217,Total threads: 1
Dashboard: /proxy/35817/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33359,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-yft1ahsf,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-yft1ahsf

0,1
Comm: tcp://127.0.0.1:40953,Total threads: 1
Dashboard: /proxy/37365/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:40529,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-mlxn05d6,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-mlxn05d6

0,1
Comm: tcp://127.0.0.1:37709,Total threads: 1
Dashboard: /proxy/40561/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:41325,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-l3pmse8c,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-l3pmse8c

0,1
Comm: tcp://127.0.0.1:36057,Total threads: 1
Dashboard: /proxy/39331/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:36637,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-tso6g8cz,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-tso6g8cz

0,1
Comm: tcp://127.0.0.1:41143,Total threads: 1
Dashboard: /proxy/33925/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:44941,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-5r8ryfyn,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-5r8ryfyn

0,1
Comm: tcp://127.0.0.1:33237,Total threads: 1
Dashboard: /proxy/38551/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:39709,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-u_887aer,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-u_887aer

0,1
Comm: tcp://127.0.0.1:39455,Total threads: 1
Dashboard: /proxy/46713/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33041,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-4x1qj1cs,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-4x1qj1cs

0,1
Comm: tcp://127.0.0.1:46633,Total threads: 1
Dashboard: /proxy/37441/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:41187,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-uzaje210,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-uzaje210

0,1
Comm: tcp://127.0.0.1:40405,Total threads: 1
Dashboard: /proxy/38377/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:36003,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-jmtpn_sk,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-jmtpn_sk

0,1
Comm: tcp://127.0.0.1:35787,Total threads: 1
Dashboard: /proxy/43417/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:39195,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-9lzbvjda,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-9lzbvjda

0,1
Comm: tcp://127.0.0.1:45987,Total threads: 1
Dashboard: /proxy/33335/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:41783,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-vye9yn88,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-vye9yn88

0,1
Comm: tcp://127.0.0.1:33151,Total threads: 1
Dashboard: /proxy/44421/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33449,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-9cy_wi_v,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-9cy_wi_v

0,1
Comm: tcp://127.0.0.1:39333,Total threads: 1
Dashboard: /proxy/44591/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:37277,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-tp9czbwn,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-tp9czbwn

0,1
Comm: tcp://127.0.0.1:44883,Total threads: 1
Dashboard: /proxy/45453/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:35629,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-err55w94,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-err55w94

0,1
Comm: tcp://127.0.0.1:45483,Total threads: 1
Dashboard: /proxy/42491/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:45527,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-phl2ibt6,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-phl2ibt6

0,1
Comm: tcp://127.0.0.1:38827,Total threads: 1
Dashboard: /proxy/38085/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:42195,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-bi2_45t5,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-bi2_45t5

0,1
Comm: tcp://127.0.0.1:44293,Total threads: 1
Dashboard: /proxy/40005/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:38169,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-vsl244as,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-vsl244as

0,1
Comm: tcp://127.0.0.1:41911,Total threads: 1
Dashboard: /proxy/43881/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33029,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-swdt11sp,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-swdt11sp

0,1
Comm: tcp://127.0.0.1:38291,Total threads: 1
Dashboard: /proxy/44021/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:35187,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-50q6w32v,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-50q6w32v

0,1
Comm: tcp://127.0.0.1:38977,Total threads: 1
Dashboard: /proxy/42095/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:37475,
Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-q1a4qtdp,Local directory: /jobfs/125018176.gadi-pbs/dask-scratch-space/worker-q1a4qtdp


# utilise CMIP6 data catalogs for NCI holdings

The global CMIP6 (Coupled Model Intercomparison Project Phase 6) dataset is enormous, reflecting the extensive range of climate simulations and variables it contains. As of recent estimates, the CMIP6 archive is expected to surpass **20 petabytes (PB)** in total size. This includes data from various modeling centers around the world, covering different experiments, variables, and temporal resolutions.

The total number of NetCDF files in the entire CMIP6 dataset is rather large.

##### Information on climate data catalogs across Australian HPC

**ACCESS-NRI** https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/how.html <br>
**NCI** https://opus.nci.org.au/pages/viewpage.action?pageId=213713098


##### Earth System Grid Federation (ESGF) Australian CMIP6-era Datasets
https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f3154_9976_7262_7595
##### $\bigstar$ Get inspiration from ACCESS-NRI intake catalog docs: ACCESS-ESM1-5 CMIP6 example
https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/quickstart.html

### import some packages

In [3]:
import intake
import xarray as xr
import numpy as np
import gc

### import the ACCESS-NRI catalog

In [4]:
catalog = intake.cat.access_nri

### 

In [5]:
cmip6_datastore = catalog.search(name="cmip6.*", model="ACCESS-ESM1-5").to_source()

In [8]:
cmip6_datastore_filtered = cmip6_datastore.search(
    source_id="ACCESS-ESM1-5", 
    table_id="Omon", 
    variable_id="tos", 
    experiment_id="historical", 
    file_type="l"
)

cmip6_datastore_filtered

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


In [11]:
cmip6_datastore_filtered.keys()

['l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r10i1p1f1.mon.ocean.Omon.tos.gn.v20200605',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r11i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r12i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r13i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r14i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r15i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r16i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r17i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r18i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r19i1p1f1.mon.ocean.Omon.tos.gn.v20200803',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r1i1p1f1.mon.ocean.Omon.tos.gn.v20191115',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r20i1p1f1.mon.ocean.Omon.tos.gn.v2020

In [9]:
%%time
ds = xr.concat(
    cmip6_datastore_filtered.to_dataset_dict(progressbar=False).values(), 
    dim="member"
).rename({"tos": "sst"})

CPU times: user 10.6 s, sys: 1.87 s, total: 12.5 s
Wall time: 28.9 s


In [10]:
ds

Unnamed: 0,Array,Chunk
Bytes,1.21 MiB,16 B
Shape,"(40, 1980, 2)","(1, 1, 2)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 1.21 MiB 16 B Shape (40, 1980, 2) (1, 1, 2) Dask graph 79200 chunks in 121 graph layers Data type datetime64[ns] numpy.ndarray",2  1980  40,

Unnamed: 0,Array,Chunk
Bytes,1.21 MiB,16 B
Shape,"(40, 1980, 2)","(1, 1, 2)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 131.84 MiB 1.65 MiB Shape (40, 300, 360, 4) (1, 300, 360, 2) Dask graph 80 chunks in 121 graph layers Data type float64 numpy.ndarray",40  1  4  360  300,

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 131.84 MiB 1.65 MiB Shape (40, 300, 360, 4) (1, 300, 360, 2) Dask graph 80 chunks in 121 graph layers Data type float64 numpy.ndarray",40  1  4  360  300,

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.86 GiB,421.88 kiB
Shape,"(40, 1980, 300, 360)","(1, 1, 300, 360)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 31.86 GiB 421.88 kiB Shape (40, 1980, 300, 360) (1, 1, 300, 360) Dask graph 79200 chunks in 121 graph layers Data type float32 numpy.ndarray",40  1  360  300  1980,

Unnamed: 0,Array,Chunk
Bytes,31.86 GiB,421.88 kiB
Shape,"(40, 1980, 300, 360)","(1, 1, 300, 360)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [12]:
# Convert the catalog to a dictionary
datasets_dict = cmip6_datastore_filtered.to_dataset_dict(progressbar=False)

# Extract the member names from the keys
member_names = [key.split('.')[5] for key in datasets_dict.keys()]

# Concatenate the datasets along the new 'member' dimension
ds = xr.concat(datasets_dict.values(), dim="member")

# Assign the member names to the 'member' coordinate
ds = ds.assign_coords(member=("member", member_names))

# Now, 'ds' will have the 'member' coordinate with the correct member names

In [13]:
ds

Unnamed: 0,Array,Chunk
Bytes,1.21 MiB,16 B
Shape,"(40, 1980, 2)","(1, 1, 2)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 1.21 MiB 16 B Shape (40, 1980, 2) (1, 1, 2) Dask graph 79200 chunks in 121 graph layers Data type datetime64[ns] numpy.ndarray",2  1980  40,

Unnamed: 0,Array,Chunk
Bytes,1.21 MiB,16 B
Shape,"(40, 1980, 2)","(1, 1, 2)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 131.84 MiB 1.65 MiB Shape (40, 300, 360, 4) (1, 300, 360, 2) Dask graph 80 chunks in 121 graph layers Data type float64 numpy.ndarray",40  1  4  360  300,

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 131.84 MiB 1.65 MiB Shape (40, 300, 360, 4) (1, 300, 360, 2) Dask graph 80 chunks in 121 graph layers Data type float64 numpy.ndarray",40  1  4  360  300,

Unnamed: 0,Array,Chunk
Bytes,131.84 MiB,1.65 MiB
Shape,"(40, 300, 360, 4)","(1, 300, 360, 2)"
Dask graph,80 chunks in 121 graph layers,80 chunks in 121 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.86 GiB,421.88 kiB
Shape,"(40, 1980, 300, 360)","(1, 1, 300, 360)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 31.86 GiB 421.88 kiB Shape (40, 1980, 300, 360) (1, 1, 300, 360) Dask graph 79200 chunks in 121 graph layers Data type float32 numpy.ndarray",40  1  360  300  1980,

Unnamed: 0,Array,Chunk
Bytes,31.86 GiB,421.88 kiB
Shape,"(40, 1980, 300, 360)","(1, 1, 300, 360)"
Dask graph,79200 chunks in 121 graph layers,79200 chunks in 121 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [None]:
cmip6_fs38_catalog = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-fs38/catalog.json")

In [None]:
cmip6_fs38_catalog

In [None]:
cmip6_fs38_catalog.df

### What is the name for SST? Variable name search here: https://pcmdi.llnl.gov/mips/cmip3/variableList.html

In [None]:
my_cmip6_search = cmip6_fs38_catalog.search(
experiment_id="historical",
realm='ocean',
source_id='ACCESS-CM2',
variable_id='tos',
frequency='mon',
file_type='f'
)
my_cmip6_search

In [None]:
my_cmip6_search.unique().path

In [None]:
%%time
dataset_dict = my_cmip6_search.to_dataset_dict(progressbar=False)
SST_ds = xr.concat(
    dataset_dict.values(),
    dim='member'
)

## ARD task - "Cleaning up" inconsistent files 

In [None]:
%%time
dataset_dict = my_cmip6_search.to_dataset_dict(progressbar=False)
dataset_dict_clean = {key: ds.drop_vars(['vertices_longitude', 'vertices_latitude', 'time_bnds'], errors='ignore') 
           for key, ds in dataset_dict.items()}
SST_ds = xr.concat(dataset_dict_clean.values(), dim='member')

In [None]:
SST_ds

## what is the default NetCDF chunking?
`ncdump -hs /g/data/fs38/publications/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r5i1p1f1/Omon/tos/gn/files/d20210607/tos_Omon_ACCESS-CM2_historical_r5i1p1f1_gn_185001-201412.nc`

**tos:_ChunkSizes = 1, 300, 360 ;**

In [None]:
%%time
xarray_open_kwargs = {'chunks':{'member':1,'time':220,'j':300,'i':360}}
dataset_dict = my_cmip6_search.to_dataset_dict(progressbar=False,xarray_open_kwargs=xarray_open_kwargs)
dataset_dict_clean = {key: ds.drop_vars(['vertices_longitude', 'vertices_latitude', 'time_bnds'], errors='ignore') 
           for key, ds in dataset_dict.items()}
SST_ds_chunked_on_open = xr.concat(dataset_dict_clean.values(), dim='member')

In [None]:
SST_ds_chunked_on_open

# Calculate monthly climatology - ARD makes it faster

In [None]:
SST_climatology = SST_ds.groupby('time.month').mean('time')
SST_climatology

### Wall time: 3min 25s

In [None]:
%%time
SST_climatology = SST_climatology.compute()

In [None]:
gc.collect()

### Wall time: 32.3 s

In [None]:
SST_ARD_climatology = SST_ds_chunked_on_open.groupby('time.month').mean('time')
SST_ARD_climatology

In [None]:
%%time
SST_ARD_climatology = SST_ARD_climatology.compute()

# Explore SST climatology over the historical record for ACCESS-CM2

In [None]:
(SST_ARD_climatology.tos.isel({'month':0,'member':0}) - SST_ARD_climatology.tos.isel({'month':0,'member':5})).plot()

# Ensemble calculations - "what's the spread in SST" - ARD makes it POSSIBLE

# Rechunking for ARD ensemble calculations - XL Cluster (14 cpu / 63 GB) used - Wall time: 1min 2s

In [None]:
def remove_encoding(DS):
    for var in DS:
        DS[var].encoding = {}

    for coord in DS.coords:
        DS[coord].encoding = {}
    return DS

In [None]:
SST_ds_rechunked = SST_ds_chunked_on_open.chunk({'member':10,'time':220,'j':300,'i':36})
SST_ds_rechunked

In [None]:
SST_ds_rechunked.tos.encoding

In [None]:
remove_encoding(SST_ds_rechunked)
SST_ds_rechunked.tos.encoding

In [None]:
%%time
SST_ds_rechunked.to_zarr('/scratch/nf33/moore_tutorial/SST_ARD_rechunked.zarr',consolidated=True)

# open ARD Zarr collection for SST

In [None]:
SST_ds_rechunked = xr.open_zarr('/scratch/nf33/moore_tutorial/SST_ARD_rechunked.zarr',consolidated=True)

In [None]:
SST_ds_rechunked

In [None]:
SST_ds_chunked_on_open

In [None]:
SST_spread = SST_ds_chunked_on_open.std(dim='member')
SST_spread_rechunked = SST_ds_rechunked.std(dim='member')

In [None]:
%%time
SST_spread.tos.mean('time').plot()

In [None]:
%%time
SST_spread_rechunked.tos.mean('time').plot(robust=True)

# $The$ $End$

In [None]:
client.shutdown()