# Building a pCO2 testbed from CMIP data?

## Variables needed (and corresponding CMIP names)
See [here](https://docs.google.com/spreadsheets/d/1UUtoz6Ofyjlpx5LdqhKcwHFz2SGoTQV2_yekHyMfL9Y/edit#gid=1221485271) for names

- pCO2 = `'spco2'`
- SST = `'tos'`
- SSS = `'sos'`
- Chl = `'chl'`
- MLD = `'mlotst'` (Defined by sigma T criterion)

In [1]:
from xmip.utils import google_cmip_col
from xmip.preprocessing import combined_preprocessing
col = google_cmip_col()

In [8]:
# filter the full catalog for data we could use
cat = col.search(
    variable_id=['tos', 'sos', 'chl', 'mlotst', 'spco2'],
    table_id='Omon', # monthly ocean output only
    experiment_id=['historical', 'ssp245'],
    # I used ssp245 as example but we should probably use the one that is closest to the data from 2014-2023
    require_all_on=['source_id', 'member_id', 'grid_label'] # this ensures that results will have all variables and experiments available
)

In [9]:
# show how many members we have available for each model and grid label
cat.df.groupby(['source_id', 'grid_label'])[['member_id']].nunique()

Unnamed: 0_level_0,Unnamed: 1_level_0,member_id
source_id,grid_label,Unnamed: 2_level_1
CESM2,gn,3
CESM2-WACCM,gn,1
CESM2-WACCM,gr,3
CanESM5,gn,18
CanESM5-CanOE,gn,3
GFDL-ESM4,gr,1
UKESM1-0-LL,gn,5


These are not a lot of members. I suspect this is due to the fact that we are missing some of the variables (pco2 probably) for many of the members, and some members for the scenario experiments. 
From this result I think `CanESM5` is the logical starting point for this project with data for 18 members available.

I will work on adding more of the data, but first lets take a look at more common variables and how many members are available.

In [12]:
cat = col.search(
    variable_id=['spco2'],
    table_id='Omon',
    experiment_id=['historical'],
    require_all_on=['source_id', 'member_id', 'grid_label']
)
cat.df.groupby(['source_id', 'grid_label'])[['member_id']].nunique()

Unnamed: 0_level_0,Unnamed: 1_level_0,member_id
source_id,grid_label,Unnamed: 2_level_1
ACCESS-ESM1-5,gn,9
CESM2,gn,11
CESM2,gr,11
CESM2-FV2,gn,3
CESM2-FV2,gr,3
CESM2-WACCM,gn,3
CESM2-WACCM,gr,3
CESM2-WACCM-FV2,gn,3
CESM2-WACCM-FV2,gr,3
CNRM-ESM2-1,gn,11


This looks promising. There are 10 models that have 9+ members. I am confident that we can find and upload the data from ESGF for this.