# Step 1: Pre-processing model and reanalsyis data

---

## Instructions for activating the Jupyter kernel for the `cmip6hack-multigen` conda environment

In a Jupyterlab terminal, navigate to the `/cmip6hack-multigen/` folder and run the command:
```bash
source spinup_env.sh
```
which will create the `cmip6hack-multigen` conda environment and install it as a python kernel for jupyter.

Then, switch the kernel (drop-down menu in the top right hand corner) to cmip6hack-multigen and restart the notebook.

### Pre-process climate model output in GCS

This notebook uses [`intake-esm`](https://intake-esm.readthedocs.io/en/latest/) to ingest and organize climate model output from various model generations and resave their time-mean fields locally.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import sys
import numpy as np
import pandas as pd
import xarray as xr
import xskillscore as xs
import xesmf as xe
from tqdm.autonotebook import tqdm  # Fancy progress bars for our loops!
import intake
# util.py is in the local directory
# it contains code that is common across project notebooks
# or routines that are too extensive and might otherwise clutter
# the notebook design
import util
import preprocess as pp
import qc

  


In [3]:
varnames = ['tas', 'pr', 'psl']
timeslice = slice('1981', '2010')
coarsen_size = 2

In [4]:
ens_dict = pp.load_ensembles(varnames, timeslice=timeslice)

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

Loaded: variable_id ` tas ` from activity_id ` far `
Loaded: variable_id ` pr ` from activity_id ` far `
Loaded: variable_id ` psl ` from activity_id ` far `
Loaded: variable_id ` tas ` from activity_id ` sar `
Loaded: variable_id ` pr ` from activity_id ` sar `
Loaded: variable_id ` psl ` from activity_id ` sar `
Loaded: variable_id ` tas ` from activity_id ` tar `
Skip TAR.MPIfM.MPIfM.historical.r1i1p1f1.Amon.tas.gn before datetime conflict.
Loaded: variable_id ` pr ` from activity_id ` tar `
Skip TAR.MPIfM.MPIfM.historical.r1i1p1f1.Amon.pr.gn before datetime conflict.
Loaded: variable_id ` psl ` from activity_id ` tar `
Skip TAR.MPIfM.MPIfM.historical.r1i1p1f1.Amon.psl.gn before datetime conflict.
Loaded: variable_id ` tas ` from activity_id ` cmip3 `
Skip CMIP3.CSIRO-QCCCE.csiro_mk3_5.historical.r1i1p1f1.Amon.tas.gn before datetime conflict.
Loaded: variable_id ` pr ` from activity_id ` cmip3 `
Skip CMIP3.CSIRO-QCCCE.csiro_mk3_5.historical.r1i1p1f1.Amon.pr.gn before datetime confli

  **blockwise_kwargs,
  **blockwise_kwargs,
  **blockwise_kwargs,
  **blockwise_kwargs,
  **blockwise_kwargs,


#### Extracting time-mean

In [6]:
ens_dict = util.dict_func(ens_dict, xr.Dataset.mean, on_self=True, dim =['time'], keep_attrs=True)

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))




In [7]:
ens_dict = util.dict_func(ens_dict, xr.Dataset.compute, on_self=True)

HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

  x = np.divide(x1, x2, out)
  x = np.divide(x1, x2, out)
  x = np.divide(x1, x2, out)
  x = np.divide(x1, x2, out)





#### Adding ensemble-mean to the ensemble

Ensemble mean is now more complicated. Probably should delay until post-processing step. Commenting out until further notice.


In [8]:
# ens_dict = util.add_ens_mean(ens_dict)

### Pre-process observational data products

In [9]:
era5 = pp.load_era("../data/raw/reanalysis/ERA5_mon_2d.nc", timeslice=timeslice, coarsen_size=2)

### Save interim files

In [10]:
interim_path = "../data/interim/"
era5.to_zarr(interim_path + "era5_timemean", "w")

<xarray.backends.zarr.ZarrStore at 0x7f6fe9bbd950>

In [11]:
for key, ens in ens_dict.items():
    for data_var in ens.data_vars:
        # Remove empty attribute that messes up to_zarr method
        if 'intake_esm_varname' in ens[data_var].attrs:
            del ens[data_var].attrs['intake_esm_varname']
    ens.to_zarr(interim_path + f"{key}_timemean", "w")