# How to use custom preprocessing functions with `intake_esm`

installed manually with `pip install git+https://github.com/NCAR/intake-esm.git` to get the latest version.

In [2]:
import intake
import intake_esm
intake_esm.__version__

'2019.10.15.post18'

A common annoyance with CMIP6 data is that not all instutions followed a uniform naming convention, resulting in problems when a user tries to loop over several models for comparison of certain fields.

`cmip6_preprocessing` provides a renaming function which we can pass directly to `intake_esm` to get consistently named datasets.

First consider a basic model selection (temperature data from historical run for 3 models on the native grid):

In [12]:
query = dict(experiment_id='historical',# table_id='Oyr', 
             variable_id='thetao', grid_label='gn', member_id='r1i1p1f1',
             source_id=['MCM-UA-1-0', 'BCC-CSM2-MR', 'GFDL-ESM4'])
cat = col.search(**query)

dsets_pp = cat.to_dataset_dict()
for k, ds in dsets_pp.items():
    print(k)
    print(list(ds.dims))

--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'

--> There will be 3 group(s)
CMIP.BCC.BCC-CSM2-MR.historical.Omon.gn
['bnds', 'lat', 'lev', 'lon', 'member_id', 'time']
CMIP.NOAA-GFDL.GFDL-ESM4.historical.Omon.gn
['bnds', 'lev', 'member_id', 'time', 'vertex', 'x', 'y']
CMIP.UA.MCM-UA-1-0.historical.Omon.gn
['bnds', 'latitude', 'lev', 'longitude', 'member_id', 'time']


**Note the inhomogenous naming of variables**. In order to compare models we ideally want every dataset to have the same dimension names. We can achieve this by passing `rename_cmip6` as `preprocess` argument.

In [13]:
# Import custom renaming function
from cmip6_preprocessing.preprocessing import rename_cmip6

# pass to intake_esm
dsets_pp_renamed = cat.to_dataset_dict(preprocess=rename_cmip6)
for k, ds in dsets_pp_renamed.items():
    print(k)
    print(list(ds.dims))

--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'

--> There will be 3 group(s)
CMIP.BCC.BCC-CSM2-MR.historical.Omon.gn
['bnds', 'lev', 'member_id', 'time', 'x', 'y']
CMIP.NOAA-GFDL.GFDL-ESM4.historical.Omon.gn
['bnds', 'lev', 'member_id', 'time', 'vertex', 'x', 'y']
CMIP.UA.MCM-UA-1-0.historical.Omon.gn
['bnds', 'lev', 'member_id', 'time', 'x', 'y']


Now they all have consistent dimension names!

![](https://media.giphy.com/media/142UITjG5GjIRi/giphy.gif)