# Labrador Sea oxygen CMIP6 model score

## Background
This notebook provides code to accompany the manuscript *"Decadal variability of oxygen uptake, export, and storage in the Labrador Sea from observations and CMIP6 models"* by J. Koelling, D. Atamanchuk, J. Karstensen, and DWR Wallace, accepted for publication in Frontiers in Marine Science on October 9, 2023. For questions please contact Jannes Koelling (jannes@uw.edu). The paper will be available at https://www.frontiersin.org/articles/10.3389/fmars.2023.1202299/

The code contained herein provides an example of the calculation of a "model score" as defined in the paper. This score is designed to assess the degree to which each model reproduces the observed mean and variability in oxygen content, as well as the mean air-sea gas exchange of oxygen. The example given here is for two of the nine CMIP6 models used in the paper, but the code could be easily adapted to output a score for any different model.

This example directly accesses CMIP6 data that is hosted online using OpenDAP. This notebook can therefore be run without downloading additional data after forking the repository, unlike [cmip6_score_local](./cmip6_score_local.ipynb) which uses a local file structure.

## Models used

In the current version, this file shows the process for the **MIROC E2SL** and **NOAA GFDL**, models, which include gridded data with dimensions `time, lev, lat, lon` for data variable `o2` and `time, lat, lon` for data variable `fgo2`.

Other models used in the paper are **NCAR CESM2**, **CMCC ESM2**, **NCC NorESM**, **CCC CanESM2**, **MRI ESM2**, **CNRM ESM2**, and **IPSL CM6A**, which are slow to run either because of their model grid or large file size. This notebook is meant to be a reduced example that allows users to reconstruct the calculation from the paper with relatively little effort. For a calculation including more models, see [cmip6_score_local](./cmip6_score_local.ipynb).

## Reading data

### Import python packages, and initialize data frames with observational data

In [1]:
import numpy as np
import pandas as pd
import xarray as xr
import warnings
warnings.filterwarnings('ignore')

Create dictionaries with OpenDAP file paths for each model. OpenDAP URLs were found using the example notebooks by Ryan Abernathey provided at https://medium.com/pangeo/cmip6-in-the-cloud-five-ways-96b177abe396

In [2]:
files_o2 = {}
files_fgo2 = {}

# MIROC E2SL files
infl_miroc = 'https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/OMIP/MIROC/MIROC-ES2L/omip1/r1i1p1f2/Omon/'
files_o2['miroc_e2sl'] = [infl_miroc + 'o2/gr1/v20200911/o2_Omon_MIROC-ES2L_omip1_r1i1p1f2_gr1_190001-200912.nc']
files_fgo2['miroc_e2sl'] = [infl_miroc + 'fgo2/gr1/v20200911/fgo2_Omon_MIROC-ES2L_omip1_r1i1p1f2_gr1_190001-200912.nc']

# NOAA GFDL files
infl_noaa = 'http://esgdata.gfdl.noaa.gov/thredds/dodsC/gfdl_dataroot4/OMIP/NOAA-GFDL/GFDL-CM4/omip1/r1i1p1f1/Omon/'
noaa_o2 = ['o2/gr/v20180701/o2_Omon_GFDL-CM4_omip1_r1i1p1f1_gr_194801-196712.nc',
    'o2/gr/v20180701/o2_Omon_GFDL-CM4_omip1_r1i1p1f1_gr_196801-198712.nc',
    'o2/gr/v20180701/o2_Omon_GFDL-CM4_omip1_r1i1p1f1_gr_198801-200712.nc']
noaa_fgo2 = ['fgo2/gr/v20180701/fgo2_Omon_GFDL-CM4_omip1_r1i1p1f1_gr_194801-196712.nc',
    'fgo2/gr/v20180701/fgo2_Omon_GFDL-CM4_omip1_r1i1p1f1_gr_196801-198712.nc',
    'fgo2/gr/v20180701/fgo2_Omon_GFDL-CM4_omip1_r1i1p1f1_gr_198801-200712.nc']

files_o2['noaa_gfdl'] = [infl_noaa + s for s in noaa_o2]
files_fgo2['noaa_gfdl'] = [infl_noaa + s for s in noaa_fgo2]

Data frames are `df_o2inv_lsw` for time series of oxygen inventory in LSW layer (0-2200m), `df_o2inv_bot` for oxygen inventory in the lower layer (2200m-bottom), `df_o2_prof` for the mean oxygen profile, and `df_gex` for mean gas exchange.

In [3]:
df_o2inv_lsw = pd.read_csv('data/O2_ref_ts.csv', index_col=0, parse_dates=True)
df_o2inv_bot = pd.read_csv('data/O2_ref_ts_lwr.csv', index_col=0, parse_dates=True)
df_o2_prof = pd.read_csv('data/O2_mean_prof.csv', index_col=0).rename(columns={"Oxygen [muM]": "Obs"})
df_gex = pd.DataFrame({'Obs': [22.66]})

In [4]:
# Central latitude; longitude can differ depending on whether model uses (0, 360) or (-180, 180) for lon
lat0 = 56.823;

### Read data for each model, and write into data frames

In [5]:
infl = ['miroc_e2sl', 'noaa_gfdl']
for nn in range(0, 2):
    ds_o2 = xr.open_mfdataset(files_o2[infl[nn]], data_vars=['o2'])
    ds_fgo2 = xr.open_mfdataset(files_fgo2[infl[nn]], data_vars=['fgo2'])
    
    # Convert from CF time to datetime to make compatible with observational data
    ds_o2['time'] = ds_o2.indexes['time'].to_datetimeindex()
    ds_fgo2['time'] = ds_fgo2.indexes['time'].to_datetimeindex()
    
    # Set target longitude either in range (-180, 180) or (0, 360)
    lon = ds_o2.lon.data
    if lon.max() < 200:
        lon0 = -52.22;
    else:
        lon0 = -52.22+360;
        
    # Extract data in central Lab Sea
    ds_o2_cls = ds_o2.sel(lat=lat0, lon=lon0, method='nearest')
    ds_fgo2_cls = ds_fgo2.sel(lat=lat0, lon=lon0, method='nearest')
    ds_o2_cls.o2.load()
    ds_fgo2_cls.fgo2.load()
    
    # Calculate inventory for the two layers
    dz = np.diff(ds_o2_cls.lev_bnds, axis=1)
    o2_inv0 = ds_o2_cls.o2.data*dz.T
    o2_inv = o2_inv0[:, ds_o2_cls.lev <= 2200].sum(axis=1)
    o2_inv_bot = np.nansum(o2_inv0[:, ds_o2_cls.lev > 2200], axis=1)
    
    # Calculate mean O2 profile and mean gas exchange. Oxygen values converted from mol/m3 to umol/L
    o2_prof = ds_o2_cls.o2.sel(time=slice("1950-01", "2009-12")).mean(axis=0)
    df_o2_prof[infl[nn]] = np.interp(df_o2_prof.index, ds_o2_cls.lev, o2_prof*1e3)
    df_gex[infl[nn]] = ds_fgo2_cls.fgo2.sel(time=slice("1950-01", "2009-12")).values.mean()*365*86400
    
    # Calculate annual means, then use bfill to match time grid of observations because observational data
    # are listed as July, while python reports the date for the annual mean for Jan 1 - Dec 31 as Dec 31
    # Finally merge into dataframe
    df_mod_lsw = pd.DataFrame({infl[nn]: o2_inv}, index = ds_o2_cls.time).resample("Y").mean()
    df_mod_lsw = df_mod_lsw.reindex(df_o2inv_lsw.index, method='bfill')
    df_o2inv_lsw = df_o2inv_lsw.merge(df_mod_lsw, left_index=True, right_index=True)
    
    df_mod_bot = pd.DataFrame({infl[nn]: o2_inv_bot}, index = ds_o2_cls.time).resample("Y").mean()
    df_mod_bot = df_mod_bot.reindex(df_o2inv_bot.index, method='bfill')
    df_o2inv_bot = df_o2inv_bot.merge(df_mod_bot, left_index=True, right_index=True)

## Calculation of model scores
The total model score is based on five factors comparing the models to observations: correlation of LSW inventory, maximum LSW inventory anomaly, standard deviation of lower layer inventory anomaly, mean oxygen profile bias, and mean gas exchange. Models can earn between 0 and 20 points for each, for a total of 100 points. More detail on the calculation is provided in the methods section of the paper.

Drop columns where any of the models has no data to ensure consistent time frame

In [6]:
df_o2inv_lsw.dropna(inplace=True)
df_o2inv_lsw = df_o2inv_lsw-df_o2inv_lsw.mean()

df_o2inv_bot.dropna(inplace=True)
df_o2inv_bot = df_o2inv_bot-df_o2inv_bot.mean()

### Correlation score
Correlation between modeled and observational time series of oxygen content anomalies in the LSW layer

In [7]:
lsw_corr = df_o2inv_lsw.corrwith(df_o2inv_lsw['Obs'])
scr_corr = (20*lsw_corr**2/0.81).round()
scr_corr[scr_corr < 0] = 0

### Maximum anomaly in LSW layer (0-2200m)
Ratio of the differences between maximum from 1990-1995 and minimum from 2002-2007 in model and observation

In [8]:
lsw_max = df_o2inv_lsw["1990-01-01":"1995-01-01"].max()-df_o2inv_lsw["2002-01-01":"2007-01-01"].min()


scr_max = 20*(lsw_max/lsw_max['Obs'])
# Set to 0 if less than 0, or use inverse ratio if lsw_max(model) > lsw_max(Obs)
scr_max[lsw_max > lsw_max['Obs']] = 20*(lsw_max['Obs']/lsw_max[lsw_max > lsw_max['Obs']])
scr_max = scr_max.round()
scr_max[scr_max < 0] = 0

### Standard deviation for layer 2 (2200m-bottom)
Ratio of standard deviations in bottom layer between models and observations

In [9]:
bot_std = df_o2inv_bot.std()

scr_std = 20*(bot_std/bot_std['Obs'])
# use inverse ratio if bot_std(model) > bot_std(Obs)
scr_std[bot_std > bot_std['Obs']] = 20*(bot_std['Obs']/bot_std[bot_std > bot_std['Obs']])
scr_std = scr_std.round()

### Mean Oxygen profile score
For this score, 10 points are based on the mean absolute bias in layer 1, and 10 on the mean absolute bias in layer 2. Note that this is different from calculating a bias over the whole water column, because in our calculation a positive bias in one layer and negative in the other do not offset

In [10]:
o2_diff = df_o2_prof.sub(df_o2_prof['Obs'], axis=0)
scr_prof_1 = 10 - abs(o2_diff[o2_diff.index < 2200].mean(axis=0))/2
scr_prof_2 = 10 - abs(o2_diff[o2_diff.index >= 2200].mean(axis=0))/2
scr_prof_1[scr_prof_1 < 0] = 0
scr_prof_2[scr_prof_2 < 0] = 0
scr_prof = (scr_prof_1 + scr_prof_2).round()

### Gas exchange score
Based on comparison of the mean gas exchange over the study period with the mean and standard deviation of values taken from the literature (Wolf et al, 2018 and Atamanchuk et al., 2020; see paper for references)

In [11]:
std_gex = 5.2

scr_gex = (20-(abs(df_gex['Obs'][0]-df_gex)-std_gex/2)/(4*std_gex)*20).round()
scr_gex[scr_gex < 0] = 0
scr_gex[scr_gex > 20] = 20

### Calculate and print total score

Note that these scores can be slightly different from those reported in the paper because of small differences in how Matlab and python handle some of the operations

In [12]:
scr = scr_corr + scr_max + scr_std + scr_prof + scr_gex
scr.drop(['Obs'], axis=1)

Unnamed: 0,miroc_e2sl,noaa_gfdl
0,51.0,45.0
