# Demo of `create_climpred_data`

This demo demonstrates how you setup your raw output from a climate model to match `climpred`'s expectations.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

import climpred

In [2]:
from climpred.create_climpred_data import load_hindcast, climpred_preprocess_internal, get_path

Assuming your raw model output is stored in multiple files per member and initialization, `load_hindcast` is a nice wrapper function based on `get_path` designed for the output format of `MPI-ESM` to aggregated all hindcast output into one file as expected by `climpred`.

The basic idea is to look over the output of all members and concatinate, then loop over all initializations and concatinate. Before concatination, it is important to align the `time` dimension.

To reduce the data size, use the `preprocess` function wisely in combination with `climpred_preprocess_internal`, e.g. additionally extracting only a certain region or only few variables for a multi-variable input file as in MPI-ESM standard output.

In [3]:
get_path??

[0;31mSignature:[0m
[0mget_path[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdir_base_experiment[0m[0;34m=[0m[0;34m'/work/bm1124/m300086/CMIP6/experiments'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmember[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minit[0m[0;34m=[0m[0;36m1960[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmodel[0m[0;34m=[0m[0;34m'hamocc'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_stream[0m[0;34m=[0m[0;34m'monitoring_ym'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtimestr[0m[0;34m=[0m[0;34m'*1231'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mending[0m[0;34m=[0m[0;34m'nc'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mget_path[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdir_base_experiment[0m[0;34m=[0m[0;34m'/work/bm1124/m300086/CMIP6/experiments'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmember[0m[0;34m=[0m[0;36m1[0m

In [4]:
climpred_preprocess_internal??

[0;31mSignature:[0m [0mclimpred_preprocess_internal[0m[0;34m([0m[0mds[0m[0;34m,[0m [0mlead_offset[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m [0mtime_dim[0m[0;34m=[0m[0;34m'time'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mclimpred_preprocess_internal[0m[0;34m([0m[0mds[0m[0;34m,[0m [0mlead_offset[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m [0mtime_dim[0m[0;34m=[0m[0;34m'time'[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""CMIP6 DCPP preprocessing before the aggreatations of intake-esm happen."""[0m[0;34m[0m
[0;34m[0m    [0;31m# set time_dim to integers starting at lead_offset[0m[0;34m[0m
[0;34m[0m    [0mds[0m[0;34m[[0m[0mtime_dim[0m[0;34m][0m [0;34m=[0m [0mnp[0m[0;34m.[0m[0marange[0m[0;34m([0m[0mlead_offset[0m[0;34m,[0m [0mlead_offset[0m [0;34m+[0m [0mds[0m[0;34m[[0m[0mtime_dim[0m[0;34m][0m[0;34m.[0m[0msize[0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0;32mreturn

In [5]:
load_hindcast??

[0;31mSignature:[0m
[0mload_hindcast[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minits[0m[0;34m=[0m[0mrange[0m[0;34m([0m[0;36m1961[0m[0;34m,[0m [0;36m1965[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmembers[0m[0;34m=[0m[0mrange[0m[0;34m([0m[0;36m1[0m[0;34m,[0m [0;36m3[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpreprocess[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlead_offset[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparallel[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mengine[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mget_path_kwargs[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mload_hindcast[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minits[0m[0;34m=[0m[0mrange[0m[0;34m([0m[0;36m1961[0m[0;34m,[0m [0;36m1965[0m[0;34

In [6]:
def preprocess_1var(ds, v="global_primary_production"):
    return ds[v].to_dataset(name=v).squeeze()

In [7]:
ds = load_hindcast(preprocess=preprocess_1var).load()
ds.coords

Processing init 1961 ...
Processing init 1962 ...
Processing init 1963 ...
Processing init 1964 ...


Coordinates:
    lat      float64 0.0
    lon      float64 0.0
    depth    float64 0.0
  * lead     (lead) int64 1 2 3 4 5 6 7 8 9 10
  * member   (member) int64 1 2
  * init     (init) int64 1961 1962 1963 1964

In [8]:
climpred.prediction.compute_perfect_model(ds, ds.rename({'lead':'time'}))

# `intake-esm` for cmorized output

In case you have access to cmorized output of CMIP experiments, consider using `intake-esm`. With the `preprocess` function you can align the `time` dimension of the output. Finally, `climpred_preprocess_post` only renames.

In [9]:
from climpred.create_climpred_data import climpred_preprocess_post, climpred_preprocess_internal

In [10]:
import intake

In [11]:
col_url = "/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json"
col = intake.open_esm_datastore(col_url)

In [18]:
col.df.head()

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,dcpp_init_year,version,time_range,path
0,AerChemMIP,HAMMOZ-Consortium,MPI-ESM-1-2-HAM,ssp370-lowNTCF,r1i1p1f1,Lmon,npp,gn,,v20190627,203501-205412,/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...
1,AerChemMIP,HAMMOZ-Consortium,MPI-ESM-1-2-HAM,ssp370-lowNTCF,r1i1p1f1,Lmon,npp,gn,,v20190627,201501-203412,/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...
2,AerChemMIP,HAMMOZ-Consortium,MPI-ESM-1-2-HAM,ssp370-lowNTCF,r1i1p1f1,Lmon,npp,gn,,v20190627,205501-205512,/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...
3,AerChemMIP,HAMMOZ-Consortium,MPI-ESM-1-2-HAM,ssp370-lowNTCF,r1i1p1f1,Lmon,tsl,gn,,v20190627,205501-205512,/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...
4,AerChemMIP,HAMMOZ-Consortium,MPI-ESM-1-2-HAM,ssp370-lowNTCF,r1i1p1f1,Lmon,tsl,gn,,v20190627,201501-203412,/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...


In [19]:
col.df.columns

Index(['activity_id', 'institution_id', 'source_id', 'experiment_id',
       'member_id', 'table_id', 'variable_id', 'grid_label', 'dcpp_init_year',
       'version', 'time_range', 'path'],
      dtype='object')

In [20]:
# load 2 members for 2 inits from one model
query = dict(experiment_id=[
    'dcppA-hindcast'], table_id='Amon', member_id=['r1i1p1f1', 'r2i1p1f1'], dcpp_init_year=[1970, 1971],
    variable_id='tas', source_id='MPI-ESM1-2-HR')
cat = col.search(**query)
cdf_kwargs = {'chunks': {'time': 12}, 'decode_times': False}

In [13]:
def preprocess(ds):
    # extract tiny spatial and temporal subset
    ds = ds.isel(lon=[50, 51, 52], lat=[50, 51, 52],
                 time=np.arange(12 * 2))
    # make time dim identical
    ds = climpred_preprocess_internal(ds)
    return ds

In [14]:
dset_dict = cat.to_dataset_dict(
    cdf_kwargs=cdf_kwargs, preprocess=preprocess)
# get first dict value
ds = dset_dict[list(dset_dict.keys())[0]].load()
ds.coords

Progress: |███████████████████████████████████████████████████████████████████████████████| 100.0% 

--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
             
--> There are 1 group(s)


Coordinates:
    height          float64 2.0
  * lon             (lon) float64 46.88 47.81 48.75
  * dcpp_init_year  (dcpp_init_year) float64 1.97e+03 1.971e+03
  * time            (time) int64 1 2 3 4 5 6 7 8 9 ... 17 18 19 20 21 22 23 24
  * lat             (lat) float64 -42.55 -41.61 -40.68
  * member_id       (member_id) <U8 'r1i1p1f1' 'r2i1p1f1'

In [15]:
ds = climpred_preprocess_post(ds)
ds.coords

Coordinates:
    height   float64 2.0
  * lon      (lon) float64 46.88 47.81 48.75
  * init     (init) float64 1.97e+03 1.971e+03
  * lead     (lead) int64 1 2 3 4 5 6 7 8 9 10 ... 15 16 17 18 19 20 21 22 23 24
  * lat      (lat) float64 -42.55 -41.61 -40.68
  * member   (member) <U8 'r1i1p1f1' 'r2i1p1f1'

In [16]:
climpred.prediction.compute_perfect_model(ds, ds.rename({'lead':'time'}))

  r = r_num / r_den
