# CMIP6 Diagnostics with Intake-ESGF
In this example, we show how to use intake-esgf to search for CMIP6 data, load our data into xarray datasets, and visualize using matplotlib.

In [None]:
!pip install datatree

## Imports

In [1]:
import intake_esgf
import matplotlib as mpl
from datatree import DataTree

# Set the matplotlib default font size
mpl.rcParams['font.size'] = 16

ModuleNotFoundError: No module named 'datatree'

## Setup the Catalog

In [2]:
cat = intake_esgf.catalog.ESGFCatalog()
cat

Perform a search() to populate the catalog.

### Search for CMIP6 Experiments

In [3]:
subset = cat.search(
        activity_id="CMIP",
        experiment_id="historical",
        source_id="CESM2",
        variable_id=["gpp", "areacella", "sftlf"],
        member_id=["r1i1p1f1"],
)
subset

   Searching indices:   0%|          |0/2 [       ?index/s]

Summary information for 3 results:
mip_era                               [CMIP6]
experiment_id                    [historical]
source_id                             [CESM2]
institution_id                         [NCAR]
table_id                           [fx, Lmon]
member_id                          [r1i1p1f1]
grid_label                               [gn]
variable_id           [areacella, sftlf, gpp]
activity_drs                           [CMIP]
project                               [CMIP6]
datetime_stop     [nan, 2014-12-15T12:00:00Z]
datetime_start    [nan, 1850-01-15T11:44:59Z]
dtype: object

## Load Our Datasets in Xarray
We use xarray here to access the datasets. The `.to_dataset_dict()` method loads the different facets into a dictionary of datasets, cacheing the data to the `~/.esgf` directory on your local filesystem by default.

In [4]:
dsets = subset.to_dataset_dict()

Adding cell measures:   0%|          |0/3 [     ?dataset/s]

KeyError: 'areacella'

### Extract the GPP Dataset
We are interested in the Carbon Mass Flux out of Atmosphere Due to Gross Primary Production on Land (GPP) variable. We need to extract this from the dictionary of datasets using the following:

In [5]:
gpp_ds = dsets['Lmon.gpp']

NameError: name 'dsets' is not defined

## Apply a Computation
Let's calculate a monthly average across the historical record. We first define which years we are interested in (ex. 1980 - 2010), then calculate the monthly average.

In [None]:
gpp_monthly_ds = gpp_ds.sel(time=slice("1980", "2010")).groupby("time.month").mean()
gpp_monthly_ds

### Rename our Months
We are interested in more useful month names. We can fix this by adding in three letter month identifiers.

In [None]:
gpp_monthly_ds["month"] = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

## Visualize the Output
We can specify plotting by the month, with four months in each row.

In [None]:
gpp_monthly_ds.gpp.plot(col='month',
                        col_wrap=4,
                        robust=True);