# CMIP6 object store and `intake-esm`

Using a datastore and the `intake-esm` package can greatly simplify searching for available CMIP6 data (e.g. from different models, experiments, variables). 

## JASMIN CMIP6 object store

First we try using the [CMIP6 object store for JASMIN](https://github.com/cedadev/cmip6-object-store). I don't think this object store very complete at the moment, but it does have the advantage of pointing directly to the CMIP6 data on JASMIN which is close to the computation.

In [None]:
import xarray as xr
import intake
import intake_esm
import fsspec
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
%config InlineBackend.figure_format = 'retina'
import warnings

warnings.filterwarnings("ignore")

In [None]:
col_url = "https://raw.githubusercontent.com/cedadev/" \
          "cmip6-object-store/master/catalogs/ceda-zarr-cmip6.json"
col = intake.open_esm_datastore(col_url)
f'There are {len(col.df)} datasets'

In [None]:
cat = col.search(source_id="UKESM1-0-LL",
    experiment_id=["historical", "ssp585-bgc"], 
    member_id=["r4i1p1f2", "r12i1p1f2"],
    table_id="Amon",
    variable_id="tas")


# Extract the single record subsets for historical and future experiments
hist_cat = cat.search(experiment_id='historical')
ssp_cat = cat.search(experiment_id='ssp585-bgc')

In [None]:
def cat_to_ds(cat):
    zarr_path = cat.df['zarr_path'][0] # read the first ensemble member
    fsmap = fsspec.get_mapper(zarr_path)
    return xr.open_zarr(fsmap, consolidated=True, use_cftime=True)

In [None]:
hist_tas = cat_to_ds(hist_cat)['tas']
ssp_tas = cat_to_ds(ssp_cat)['tas']
print(hist_tas)

In [None]:
# Calculate time means
diff = ssp_tas.mean(axis=0) - hist_tas.mean(axis=0)

# Plot a map of the time-series means
plt.figure(figsize=(12,6))
ax = plt.axes(projection=ccrs.Robinson())
ax.coastlines()
ax.gridlines()
diff.plot(ax=ax, transform=ccrs.PlateCarree())

# Pangeo CMIP6 store on Google Cloud

Next we'll try reading from the Pangeo store of CMIP6 data, which is hosted on Google Cloud. This is physically further from the computation so likely to be slower.

In [None]:
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
col

In [None]:
cat = col.search(
    experiment_id=["historical", "ssp585"],
    table_id="Oyr",
    variable_id="o2",
    grid_label="gn",
)

cat

In [None]:
dset_dict = cat.to_dataset_dict(
    zarr_kwargs={"consolidated": True, "decode_times": True, "use_cftime": True}
)

In [None]:
ds = dset_dict["CMIP.CCCma.CanESM5.historical.Oyr.gn"]
print(ds)

In [None]:
ds.o2.isel(time=0, lev=0, member_id=range(1, 24, 4)).plot(col="member_id", col_wrap=3, robust=True)
