# 01 - Compute Long Term Averages
For each case, compute the global 3D (`nlat`, `nlon`, `z_t`) mean

## Imports
We include the lines at the beginning to make sure that any updates we make to the `analysis_config.yml` file are reflected in real time for this notebook

In [1]:
%load_ext autoreload
%autoreload 2

import intake
import ast
import yaml
from distributed import Client
from ncar_jobqueue import NCARCluster
import xarray as xr

## Spin up a Dask Cluster

In [2]:
cluster = NCARCluster()
cluster.scale(20)
client = Client(cluster)

  from distributed.utils import format_bytes, parse_bytes, tmpfile
  from distributed.utils import format_bytes, parse_bytes, tmpfile
  from distributed.utils import parse_bytes


In [3]:
client

0,1
Connection method: Cluster object,Cluster type: PBSCluster
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status,

0,1
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status,Workers: 9
Total threads:  18,Total memory:  209.52 GiB

0,1
Comm: tcp://10.12.206.42:46308,Workers: 9
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status,Total threads:  18
Started:  Just now,Total memory:  209.52 GiB

0,1
Comm: tcp://10.12.206.39:43483,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/37734/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.39:39617,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-xkl4g634,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-xkl4g634

0,1
Comm: tcp://10.12.206.39:39718,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/44075/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.39:35100,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-0q_09_iv,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-0q_09_iv

0,1
Comm: tcp://10.12.206.56:43225,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/44554/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.56:35602,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-r67afkg7,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-r67afkg7

0,1
Comm: tcp://10.12.206.63:35460,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/41850/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.63:42724,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-9xbv3t8n,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-9xbv3t8n

0,1
Comm: tcp://10.12.206.63:41849,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/38503/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.63:44624,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-meakbyzu,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-meakbyzu

0,1
Comm: tcp://10.12.206.39:36148,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/45293/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.39:35306,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-vwil6yp3,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-vwil6yp3

0,1
Comm: tcp://10.12.206.39:41045,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/43930/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.39:35326,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-vpj4v0uu,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-vpj4v0uu

0,1
Comm: tcp://10.12.206.63:36268,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/35378/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.63:44310,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-3jcm4rlc,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-3jcm4rlc

0,1
Comm: tcp://10.12.206.63:42112,Total threads: 2
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/41017/status,Memory: 23.28 GiB
Nanny: tcp://10.12.206.63:34005,
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-antc0wt9,Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-antc0wt9


## Open the Intake-ESM Catalog
In the first notebook, we created a an `intake-esm` catalog which provides a means of accessing our data.

In [4]:
col = intake.open_esm_datastore(
    "/glade/work/mgrover/cesm-validation-catalog.json",
    csv_kwargs={"converters": {"variables": ast.literal_eval}},
    sep="/",
)
col

Unnamed: 0,unique
component,1
stream,4
date,2501
case,3
member_id,2
frequency,4
variables,545
path,11103


In [5]:
cat = col.search(
    stream='pop.h',
)
cat

Unnamed: 0,unique
component,1
stream,1
date,1200
case,3
member_id,2
frequency,1
variables,434
path,3600


### Subset the last 20 years of data

In [6]:
dates = sorted(cat.df.date.unique())

In [7]:
sub = cat.search(date=dates[-240:])

### Read in our dataset using `to_dataset_dict()`

In [11]:
dsets = sub.to_dataset_dict(cdf_kwargs={'use_cftime': True, 'chunks': {'time': 10}})


--> The keys in the returned dictionary of datasets are constructed as follows:
	'component/stream/case'


In [12]:
dsets.keys()

dict_keys(['ocn/pop.h/b1850.f19_g17.validation_mct.002', 'ocn/pop.h/b1850.f19_g17.validation_mct.004', 'ocn/pop.h/b1850.f19_g17.validation_nuopc.004'])

## Loop through the data and compute!
We are computing the average over time, and merging into a single dataset, subsetting for the variables specified in the `analysis_config.yml` file

In [77]:
with open("analysis_config.yml", mode="r") as fptr:
    analysis_config = yaml.safe_load(fptr)
variables = analysis_config['variables']
variables

['TEMP', 'SALT', 'FG_CO2', 'NH4', 'NO3', 'SiO3']

In [65]:
ds_list = []
for key in dsets.keys():
    ds = dsets[key]
    mean = ds.mean(dim='time')
    out = mean[variables].compute()
    for var in variables:
        out[var].attrs = mean[var].attrs
    out.attrs = ds.attrs
    out.attrs['intake_esm_varname'] = variables
    ds_list.append(out)

In [66]:
merged_ds = xr.concat(ds_list, dim='case')

### Make sure our variables keep their attributes

In [71]:
for var in variables:
    merged_ds[var].attrs = ds[var].attrs

We also want to make sure that we keep the title, or case information

In [73]:
cases = []
for ds in ds_list:
    cases.append(ds.title)

In [74]:
merged_ds['case'] = cases

### Export our data
We output our dataset to zarr!

In [76]:
merged_ds.to_zarr('cached_output/averages_year_081_100.zarr')

<xarray.backends.zarr.ZarrStore at 0x2b891b9364a0>