# Generate Model Climatology

---

This notebook takes in the `.zarr` raw S2S array we made in `1_download_data.ipynb` and generates a daily climatology (with band pass filtering) to remove from the raw S2S output to create anomalies.

In [1]:
import numpy as np
import xarray as xr
xr.set_options(keep_attrs=True)
import dask.array as da
import cftime
from datetime import datetime

from dask.distributed import Client

In [2]:
client = Client("tcp://128.117.208.71:38607")

Choose which model and variables you are wanting in your climatology `zarr` file.

In [3]:
model = "CESM2_climoATM" #ECMWF, NCEP, or ECCC
#var = ["t2m","tp","gh_500","psl"]
var = ["t2m"]
#var = ["psl"]
if model == "ECMWF" or model == "ECCC":
    days = 97
elif model == "NCEP":
    days = 94

First, we need to calculate the daily climatology following the SubX protocol. We'll load in our big `zarr`, take the ensemble mean, and then generate the climatology.

In [4]:
ds = xr.open_zarr("/glade/campaign/mmm/c3we/jaye/S2S_zarr/"+model+".raw.daily.geospatial.zarr",consolidated=True)
display(ds)

Unnamed: 0,Array,Chunk
Bytes,141.00 GiB,125.77 MiB
Shape,"(1148, 11, 46, 181, 360)","(1, 11, 46, 181, 360)"
Count,2 Graph Layers,1148 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 141.00 GiB 125.77 MiB Shape (1148, 11, 46, 181, 360) (1, 11, 46, 181, 360) Count 2 Graph Layers 1148 Chunks Type float32 numpy.ndarray",11  1148  360  181  46,

Unnamed: 0,Array,Chunk
Bytes,141.00 GiB,125.77 MiB
Shape,"(1148, 11, 46, 181, 360)","(1, 11, 46, 181, 360)"
Count,2 Graph Layers,1148 Chunks
Type,float32,numpy.ndarray


Here I "persist" the ensemble mean. This essentially runs the calculation in the background, but maintains the result as dask arrays distributed across the cluster, rather than loading it into RAM memory on the main node.

In [5]:
ds = ds.mean("member").persist()

## Generate Climatology

We'll be taking an average over the `dayofyear` of the `init` dimension, so it's best to rechunk so that `init` is a single chunk. That will speed things up. I'll make `lead` chunks of 1 since we are doing a lead-dependent climatology.

I usually run `.persist()` after re-chunking so that it can re-load it's chunks properly.

In [6]:
ds = ds.chunk({"init": -1, "lead": 1, "lat": "auto", "lon": "auto"}).persist()

All of the following is from Ray Bell's climatology script for SubX that Kathy uses. Direct questions to them about the methodology! This uses a smoothed window approach to smooth out the daily climatology for initializations.

In [7]:
ds.groupby("init.dayofyear").mean("init")

Unnamed: 0,Array,Chunk
Bytes,4.09 GiB,113.91 kiB
Shape,"(366, 46, 181, 360)","(1, 1, 121, 241)"
Count,1466 Graph Layers,67344 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 4.09 GiB 113.91 kiB Shape (366, 46, 181, 360) (1, 1, 121, 241) Count 1466 Graph Layers 67344 Chunks Type float32 numpy.ndarray",366  1  360  181  46,

Unnamed: 0,Array,Chunk
Bytes,4.09 GiB,113.91 kiB
Shape,"(366, 46, 181, 360)","(1, 1, 121, 241)"
Count,1466 Graph Layers,67344 Chunks
Type,float32,numpy.ndarray


In [8]:
#days = 153
days = 366

## Loop through all vars to make climatology

In [9]:
da_day_clim_smooth_vars = []
for i in var:
    print(i)
    da_day_clim = ds[i].groupby("init.dayofyear").mean("init")
    #da_day_clim = ds[i].groupby("init.dayofyear").mean("init")
    # Rechunk to make dayofyear climatology one chunk.
    da_day_clim = da_day_clim.chunk({"dayofyear": days}).persist()
    # Just mimicking the chunk sizes from our climatology.
    x = da.full((days, da_day_clim.lead.size, da_day_clim.lat.size, da_day_clim.lon.size),
        np.nan,dtype="float32",chunks=(days, 1, 181, 360),)
    # Pad the daily climatolgy with nans
    _da = xr.DataArray(x,dims=["dayofyear", "lead", "lat", "lon"],
        coords=[da_day_clim.dayofyear, da_day_clim.lead, da_day_clim.lat, da_day_clim.lon],name=i,)
    # Pad the daily climatolgy with nans
    da_day_clim_wnan = da_day_clim.combine_first(_da)
    # Period rolling twice to make it triangular smoothing
    da_day_clim_smooth = da_day_clim_wnan.copy()
    
    for j in range(2):
        # Extand the DataArray to allow rolling to do periodic
        da_day_clim_smooth = xr.concat([da_day_clim_smooth[-15:], da_day_clim_smooth, da_day_clim_smooth[:15]],"dayofyear",)
        # Rolling mean
        da_day_clim_smooth = da_day_clim_smooth.rolling(dayofyear=31, center=True, min_periods=1).mean()
        # Drop the periodic boundaries
        da_day_clim_smooth = da_day_clim_smooth.isel(dayofyear=slice(15, -15))
        
    # Extract the original days
    da_day_clim_smooth = da_day_clim_smooth.sel(dayofyear=da_day_clim.dayofyear)
    da_day_clim_smooth.name = i
    
    da_day_clim_smooth_vars.append(da_day_clim_smooth)

t2m


- Next we need to merge the variables together into one xarray dataset.

In [10]:
ds_day_clim_smooth_vars = xr.merge(da_day_clim_smooth_vars)

In [11]:
ds_day_clim_smooth_vars

Unnamed: 0,Array,Chunk
Bytes,4.09 GiB,38.93 MiB
Shape,"(366, 46, 181, 360)","(350, 1, 121, 241)"
Count,60 Graph Layers,368 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 4.09 GiB 38.93 MiB Shape (366, 46, 181, 360) (350, 1, 121, 241) Count 60 Graph Layers 368 Chunks Type float32 numpy.ndarray",366  1  360  181  46,

Unnamed: 0,Array,Chunk
Bytes,4.09 GiB,38.93 MiB
Shape,"(366, 46, 181, 360)","(350, 1, 121, 241)"
Count,60 Graph Layers,368 Chunks
Type,float32,numpy.ndarray


In [12]:
ds_day_clim_smooth_vars = ds_day_clim_smooth_vars.chunk({"dayofyear": -1, "lead": 1, "lat": 181, "lon": 360}).persist()

In [13]:
ds_day_clim_smooth_vars

Unnamed: 0,Array,Chunk
Bytes,4.09 GiB,90.98 MiB
Shape,"(366, 46, 181, 360)","(366, 1, 181, 360)"
Count,1 Graph Layer,46 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 4.09 GiB 90.98 MiB Shape (366, 46, 181, 360) (366, 1, 181, 360) Count 1 Graph Layer 46 Chunks Type float32 numpy.ndarray",366  1  360  181  46,

Unnamed: 0,Array,Chunk
Bytes,4.09 GiB,90.98 MiB
Shape,"(366, 46, 181, 360)","(366, 1, 181, 360)"
Count,1 Graph Layer,46 Chunks
Type,float32,numpy.ndarray


- Now we write out to `zarr`!

In [14]:
%time ds_day_clim_smooth_vars.to_zarr("/glade/campaign/mmm/c3we/jaye/S2S_zarr/"+model+".climatology.daily.geospatial.zarr", mode="w",consolidated=True)

CPU times: user 26.9 ms, sys: 7.57 ms, total: 34.5 ms
Wall time: 1 s


<xarray.backends.zarr.ZarrStore at 0x1494cdcbde40>

Just a quick check that things look right. It's going to look a bit stair-steppy because we only have inits every seven days

In [14]:
ds_day_clim_smooth_vars.t2m.sel(lead=1,lat=slice(31,60)).mean(("lat","lon")).plot()

CancelledError: ('mean_agg-aggregate-c68ae187f62507dd91551cf94cd7747e', 0)