## Generating a Table of Global Fluxes

This notebook reads in the CESM1 historical run (for CMIP5),
the ensemble of 11 CESM2 historical runs (for CMIP6),
and also the four SSP CESM2 ensembles (for CMIP6).
A table is generated containing values listed [issue #6](https://github.com/marbl-ecosys/cesm2-marbl/issues/6)


> * Net primary production (PgC/yr) (`photoC_TOT_zint`)
> * Diatom primary production (%)   (`photoC_diat_zint`)
> * Sinking POC at 100 m (PgC/yr)   (`POC_FLUX_100m`)
> * Sinking CaCO3 at 100 m (PgC/yr) (`CaCO3_FLUX_100m`)
> * Rain ratio (CaCO3/POC) 100 m    (ratio of two above)
> * Nitrogen fixation (TgN/yr)      (`diaz_Nfix`)
> * Nitrogen deposition (TgN/yr)    (`NOx_FLUX` + `NHy_FLUX`)
> * Denitrification (TgN/yr)        (`DENITRIF`)
> * N cycle imbalance = deposition + fixation - denitrification (TgN/yr) # deposition = N* [see Kristen's notebook -- Biological Diagnostics?]
> * Air–sea CO2 flux (PgC yr21)     (`FG_CO2`)
> * Mean ocean oxygen (uM = umol/L = mmol/m^3)    (`O2`)
> * Volume where O2 <80 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <60 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <5 mmol/m^3 (10^15 m^3)  # based on others

Values will be computed one at a time, due to an issue with `xr.merge` and trying to read multiple variables at once.

### This notebook uses several python packages

The watermark package shows the version number used to help others recreate this environment.

In [1]:
import os

import cftime

import xarray as xr
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.colors as colors
import cmocean

import cartopy
import cartopy.crs as ccrs

import esmlab

import intake
import intake_esm
import ncar_jobqueue
from dask.distributed import Client
from pint import UnitRegistry

# Add new units to UnitRegistry
units = UnitRegistry()
units.define('gram N = mol / 14.007 = gN')
units.define('gram C = mol / 12.01 = gC')

%load_ext watermark
%watermark -a "Mike Levy" -d -iv -m -g -h

cftime        1.0.3.4
pandas        0.25.3
numpy         1.17.3
cartopy       0.17.0
matplotlib    3.1.2
cmocean       2.0
xarray        0.14.0
esmlab        2019.4.27.post55
ncar_jobqueue 2019.10.16.1
intake        0.5.3
intake_esm    2019.10.15.post40
Mike Levy 2019-11-22 

compiler   : GCC 7.3.0
system     : Linux
release    : 3.10.0-693.21.1.el7.x86_64
machine    : x86_64
processor  : x86_64
CPU cores  : 72
interpreter: 64bit
host name  : casper04
Git hash   : 2bb68a7ac3b83fa5a0ef044b6c02df697849ac34


#### Spin up a dask cluster

Some of these computations take a while

In [2]:
cluster = ncar_jobqueue.NCARCluster(project='P93300606')
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://128.117.181.210:38815  Dashboard: https://jupyterhub.ucar.edu/dav/user/mlevy/proxy/8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


In [3]:
cluster.scale(4)

### Read the intake_esm datastores

The `intake_esm` package is used to help identify which files belong in each experiment.
The `get_var_from_catalogs()` function is a wrapper to read specific files.

In [4]:
catalogs = dict()
catalogs['cesm2'] = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign-cesm2-cmip6-timeseries.json')

#cesm1 = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip5_NOT_CMORIZED.json')
catalogs['cesm1'] = intake.open_esm_datastore('/glade/work/mlevy/intake-esm-collection/json/glade-cesm1-cmip5-timeseries.json')

In [5]:
# NOTE: 1991-01-01 0:00:00 is the time stamp on the Dec 1990 monthly average
#       So slice("1991", "2001") would actually return Dec 1990 - Nov 2000
#       Specifying a day mid-month gets us to Jan 1991 - Dec 2000
#       (this can be verified by looking at time bounds)
time_slices_hist = slice("1990-01-15", "2000-01-15")
time_slices_SSP = slice("2090-01-15", "2100-01-15")
time_slices = dict()

time_slices['cesm1_PI'] = slice(290*12, 300*12) # cfunits doesn't years too far in past; this is 291-01-15 - 301-01-15
                                                # (chosen to correspond with 1990 in historical run, which branched from 151)
time_slices['cesm1_hist'] = time_slices_hist
time_slices['cesm1_hist_esm'] = time_slices_hist
time_slices['cesm2_hist'] = time_slices_hist
time_slices['cesm2_SSP1-2.6'] = time_slices_SSP
time_slices['cesm2_SSP2-4.5'] = time_slices_SSP
time_slices['cesm2_SSP3-7.0'] = time_slices_SSP
time_slices['cesm2_SSP5-8.5'] = time_slices_SSP

def get_var_from_catalogs(catalogs, variable, experiments):
    print(f'Reading {variable} from catalog...\n')
    if type(experiments) != list:
        experiments = [experiments]

    datasets = dict()
    experiment_dict = {'cesm1_PI' : ('cesm1', 'piControl'),
                       'cesm1_hist' : ('cesm1', 'historical'),
                       'cesm1_hist_esm' : ('cesm1', 'esm-hist'),
                       'cesm2_hist' : ('cesm2', 'historical'),
                       'cesm2_SSP1-2.6' : ('cesm2', 'SSP1-2.6'),
                       'cesm2_SSP2-4.5' : ('cesm2', 'SSP2-4.5'),
                       'cesm2_SSP3-7.0' : ('cesm2', 'SSP3-7.0'),
                       'cesm2_SSP5-8.5' : ('cesm2', 'SSP5-8.5')
                      }
    cesm1_exps = []
    cesm2_exps = []
    for exp in experiments:
        if experiment_dict[exp][0] == 'cesm1':
            cesm1_exps.append(experiment_dict[exp][1])
        elif experiment_dict[exp][0] == 'cesm2':
            cesm2_exps.append(experiment_dict[exp][1])
        else:
            print(f'WARNING: can not determine model version from {exp}')
    
    # CESM1 is historical only, CESM2 also has SSPs
    # Note some variable rename shenanigans to account for changes between CESM1 and CESM2
    if variable == 'CaCO3_FLUX_100m':
        cesm1_var = 'CaCO3_FLUX_IN'
    elif variable == 'POC_FLUX_100m':
        cesm1_var = 'POC_FLUX_IN'
    elif variable == 'photoC_diat_zint':
        cesm1_var = 'photoC_diat'
    elif variable == 'photoC_TOT_zint':
        cesm1_var = ['photoC_sp', 'photoC_diat', 'photoC_diaz']
    else:
        cesm1_var = variable

    dq = dict()
    if cesm1_exps:
        if type(cesm1_var) != list:
            dq['cesm1'] = catalogs['cesm1'].search(experiment=cesm1_exps, variable=cesm1_var).to_dataset_dict(cdf_kwargs={'chunks':{'time': 60}})
    if cesm2_exps:
        dq['cesm2'] = catalogs['cesm2'].search(experiment=cesm2_exps, variable=variable).to_dataset_dict(cdf_kwargs={'chunks':{'time': 48}})
    print('\n----\n')
    
    for exp in experiments:
        if type(cesm1_var) == list:
            if experiment_dict[exp][0] == 'cesm1':
                tmp_dataset = dict()
                for var_from_list in cesm1_var:
                    dq['cesm1'] = catalogs['cesm1'].search(experiment=experiment_dict[exp][1], variable=var_from_list).to_dataset_dict(cdf_kwargs={'chunks':{'time': 60}})
                    tmp_dataset[var_from_list] = _read_var_from_exp(dq['cesm1'], exp, f'ocn.{experiment_dict[exp][1]}.pop.h',
                                                                    time_slices[exp], variable, var_from_list)
                datasets[exp] = tmp_dataset[cesm1_var[0]]
                for var_from_list in cesm1_var[1:]:
                    datasets[exp][variable].data = datasets[exp][variable].data + tmp_dataset[var_from_list][variable].data
            else:
                datasets[exp] = _read_var_from_exp(dq['cesm2'], exp, f'ocn.{experiment_dict[exp][1]}.pop.h',
                                                   time_slices[exp], variable, variable)
        else:
            datasets[exp] = _read_var_from_exp(dq[experiment_dict[exp][0]], exp, f'ocn.{experiment_dict[exp][1]}.pop.h',
                                               time_slices[exp], variable, cesm1_var)
    return(datasets)

def _read_var_from_exp(dq, exp, stream, time_slice, variable, cesm1_var):
    # Define datasets
    dataset = dq[stream]

    keep_vars = ['REGION_MASK', 'z_t', 'z_t_150m', 'dz', 'TAREA', 'TLONG', 'TLAT', 'time', 'time_bound', 'member_id', 'ctrl_member_id'] + [variable, cesm1_var]
    if exp == 'cesm1_PI':
        # Need isel instead of sel since PI slice is in index space rather than years
        dataset = dataset.drop([v for v in dataset.variables if v not in keep_vars]).isel(time=time_slice)
    else:
        dataset = dataset.drop([v for v in dataset.variables if v not in keep_vars]).sel(time=time_slice)
    if variable not in dataset:
        dataset = dataset.rename({cesm1_var : variable})
        if variable in ['POC_FLUX_100m', 'CaCO3_FLUX_100m']:
            dataset = dataset.isel(z_t=10)  # 100m is top of 11th level, or z_t = 10 counting from 0
    if (cesm1_var != variable) and (cesm1_var in dataset):
        dataset = dataset.drop(cesm1_var)
    dataset[variable] = dataset[variable].where(dataset['REGION_MASK'] > 0)

    return(dataset)

def add_o2_below_thres(datasets, thres):
    new_datasets = datasets.copy()
    varname = f'O2_under_{thres}uM'
    new_datasets[varname] = dict()
    for exp in new_datasets['O2']:
        new_datasets[varname][exp] = new_datasets['O2'][exp].rename({'O2' : varname})
        new_datasets[varname][exp][varname] = xr.where(new_datasets['O2'][exp]['O2'] < thres, 1, 0)
        new_datasets[varname][exp][varname].attrs['units'] = ''
    return(new_datasets)

In [6]:
%%time

experiments = ['cesm1_PI',
               'cesm1_hist',
               #'cesm1_hist_esm',
               'cesm2_hist',
               'cesm2_SSP1-2.6',
               'cesm2_SSP2-4.5',
               'cesm2_SSP3-7.0',
               'cesm2_SSP5-8.5'
              ]

# Set up units
integral_units = dict()
integral_units['area'] = dict()
integral_units['volume'] = dict()

# Read in any 3D variable to get dataset containing TAREA and z_t
# And set up a DataArray to compute total volume
tmp_var = 'DENITRIF'
tmp_data = get_var_from_catalogs(catalogs, tmp_var, experiments)

total_volume = dict()
for exp in tmp_data:
    integral_units['area'][exp] = units[tmp_data[exp]['TAREA'].attrs['units']]
    integral_units['volume'][exp] = integral_units['area'][exp] * units[tmp_data[exp]['dz'].attrs['units']]

    # Compute total volume of ocean
    # Sum wgt over active ocean cells
    tmp_data2 = tmp_data[exp][tmp_var].isel(time=0, member_id=0).copy()
    tmp_data2.data = np.logical_not(np.isnan(tmp_data[exp][tmp_var].isel(time=0, member_id=0).data))
    total_volume[exp] = (tmp_data2 * tmp_data[exp]['TAREA'].isel(time=0) * tmp_data[exp]['dz'].isel(time=0)).sum().values
    print(f'Ocean volume in {exp}: {(total_volume[exp] * integral_units["volume"][exp]).to("L")}')

vol_exp = experiments[0]
# Estimate total volume in ocean
# (surface of earth is 71% water, avg depth is 3.7 km)
est_depth = 4*np.pi*0.71*((6371.22*units['km'])**2)*(3.7*units['km'])
print(f'Estimated ocean volume: {est_depth.to("L")}')

Reading DENITRIF from catalog...

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 2 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 5 group(s)

----

Ocean volume in cesm1_PI: 1.3243939779683552e+21 liter
Ocean volume in cesm1_hist: 1.3243939779683552e+21 liter
Ocean volume in cesm2_hist: 1.3243939779683552e+21 liter
Ocean volume in cesm2_SSP1-2.6: 1.3243939779683552e+21 liter
Ocean volume in cesm2_SSP2-4.5: 1.3243939779683552e+21 liter
Ocean volume in cesm2_SSP3-7.0: 1.3243939779683552e+21 liter
Ocean volume in cesm2_SSP5-8.5: 1.3243939779683552e+21 liter
Estimated ocean volume: 1.3400319094588907e+21 liter
CPU times: user 5.74 s, sys: 652 ms, total: 6.39 s
Wall time: 1min 33s


### Individual Table Computations

In this section, we compute each of the requested values for each dataset

#### Net primary production (PgC/yr)

CESM1 doesn't have `photoC_TOT_zint`

#### Diatom primary production (%)

CESM1 doesn't have `photoC_diat_zint`

#### Sinking POC at 100 m (PgC/yr)

CESM1 doesn't have `POC_FLUX_100m`

#### Sinking CaCO3 at 100 m (PgC/yr)

CESM1 doesn't have `CaCO3_FLUX_100m`

#### Rain ratio (CaCO3/POC) 100 m

Missing necessary vars to compute

#### Nitrogen deposition (TgN/yr)

#### Denitrification (TgN/yr)

In [7]:
%%time

all_data = dict()

# Process for updating intake-esm catalog
#       1. download all data from HPSS via get_ocn_cmip5_files.sh
#       2. rm /glade/u/home/mlevy/.intake_esm/collections/CESM1-CMIP5.nc
#       3. regenerate it via Anderson's legacy intake-esm
#       4. re-run build intake collections notebook
#       5. commit change to .csv.gz in /glade/work/mlevy/intake-esm-collection/csv.gz/
# NOTE: steps 2-5 can be done with notebooks/intake-esm-collection-defs/rebuild.sh

vars = ['photoC_TOT_zint', 'photoC_diat_zint',
        'POC_FLUX_100m', 'CaCO3_FLUX_100m',
        'diaz_Nfix', 'NOx_FLUX', 'NHy_FLUX', 'DENITRIF',
        'FG_CO2', 'O2']
for var in vars:
    all_data[var] = get_var_from_catalogs(catalogs, var, experiments)

Reading photoC_TOT_zint from catalog...

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 5 group(s)

----

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.str

In [8]:
# Add variables for computing volume where O2 is below a threshold
if 'O2' in all_data:
    all_data = add_o2_below_thres(all_data, 5)
    all_data = add_o2_below_thres(all_data, 20)
    all_data = add_o2_below_thres(all_data, 60)
    all_data = add_o2_below_thres(all_data, 80)

In [9]:
# Verify time bounds for each experiment
for exp in all_data[vars[0]]:
    bounds = list(all_data[vars[0]][exp].time_bound.values[ind] for ind in [(0,0), (-1,1)])
    print(f'Experiment: {exp}\nBounds\n----\n{bounds}\n\n')

Experiment: cesm1_PI
Bounds
----
[cftime.DatetimeNoLeap(291, 1, 1, 0, 0, 0, 0, 4, 1), cftime.DatetimeNoLeap(301, 1, 1, 0, 0, 0, 0, 0, 1)]


Experiment: cesm1_hist
Bounds
----
[cftime.DatetimeNoLeap(1990, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 0, 5, 1)]


Experiment: cesm2_hist
Bounds
----
[cftime.DatetimeNoLeap(1990, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 0, 5, 1)]


Experiment: cesm2_SSP1-2.6
Bounds
----
[cftime.DatetimeNoLeap(2090, 1, 1, 0, 0, 0, 0, 4, 1), cftime.DatetimeNoLeap(2100, 1, 1, 0, 0, 0, 0, 0, 1)]


Experiment: cesm2_SSP2-4.5
Bounds
----
[cftime.DatetimeNoLeap(2090, 1, 1, 0, 0, 0, 0, 4, 1), cftime.DatetimeNoLeap(2100, 1, 1, 0, 0, 0, 0, 0, 1)]


Experiment: cesm2_SSP3-7.0
Bounds
----
[cftime.DatetimeNoLeap(2090, 1, 1, 0, 0, 0, 0, 4, 1), cftime.DatetimeNoLeap(2100, 1, 1, 0, 0, 0, 0, 0, 1)]


Experiment: cesm2_SSP5-8.5
Bounds
----
[cftime.DatetimeNoLeap(2090, 1, 1, 0, 0, 0, 0, 4, 1), cftime.DatetimeNoLeap(2100, 1, 1, 0

In [10]:
def compute_global_averages(datasets, integral_units, variable):
    experiments = list(datasets[variable].keys())
    glb_avg = dict()
    new_units = dict()
    for exp in experiments:
        wgts = datasets[variable][exp]['TAREA'].isel(time=0)
        # Note that CESM2 has vertical integrals where CESM1 doesn't, so
        # same variable might have different units in different experiments
        unit_key = 'area'
        dims = ['nlat', 'nlon']
        if 'z_t_150m' in datasets[variable][exp][variable].dims:
            wgts = wgts * datasets[variable][exp]['dz'].isel(time=0, z_t=slice(0,15))
            wgts = wgts.rename({'z_t' : 'z_t_150m'})
            dims.append('z_t_150m')
            unit_key = 'volume'
        elif 'z_t' in datasets[variable][exp][variable].dims:
            wgts = wgts * datasets[variable][exp]['dz'].isel(time=0)
            dims.append('z_t')
            unit_key = 'volume'
        glb_avg[exp] = esmlab.weighted_sum(datasets[variable][exp][variable], dim=dims, weights=wgts).to_dataset(name=variable)
        old_units = units[datasets[variable][exp][variable].attrs['units']]
        new_units[exp] = old_units*integral_units[unit_key][exp]
        glb_avg[exp]
    return glb_avg, new_units

In [11]:
%%time

ann_avg = dict()
new_units = dict()
for variable in all_data:
    glb_avgs, new_units[variable] = compute_global_averages(all_data, integral_units, variable)
    ann_avg[variable] = dict()
    for exp in glb_avgs:
        glb_avgs[exp]['time_bound'] = all_data[variable][exp]['time_bound']
        ann_avg[variable][exp] = esmlab.resample(glb_avgs[exp], freq='ann')

CPU times: user 46.1 s, sys: 2.31 s, total: 48.4 s
Wall time: 1min 49s


## Reduce Data Sets

The following table shows global averages (also averaged over specified time slices)

In [13]:
vars = ['photoC_TOT_zint', 'photoC_diat_zint',
        'POC_FLUX_100m', 'CaCO3_FLUX_100m',
        'diaz_Nfix', 'NOx_FLUX', 'NHy_FLUX', 'DENITRIF',
        'FG_CO2', 'O2']

# Define final units
final_units = dict()
final_units['photoC_TOT_zint'] = 'PgC/year'
final_units['photoC_diat_zint'] = 'PgC/year'
final_units['POC_FLUX_100m'] = 'PgC/year'
final_units['CaCO3_FLUX_100m'] = 'PgC/year'
final_units['diaz_Nfix'] = 'TgN/year'
final_units['NOx_FLUX'] = 'TgN/year'
final_units['NHy_FLUX'] = 'TgN/year'
final_units['DENITRIF'] = 'TgN/year'
final_units['FG_CO2'] = 'PgC/year'
final_units['O2'] = 'uM'
final_units['O2_under_5uM'] = 'Pm * m^2'
final_units['O2_under_20uM'] = 'Pm * m^2'
final_units['O2_under_60uM'] = 'Pm * m^2'
final_units['O2_under_80uM'] = 'Pm * m^2'

In [14]:
def get_time_and_ensemble_mean(variable, ann_avg, exp, new_units, final_units):
    return((ann_avg[variable][exp][variable].mean(['time', 'member_id']).values *
            new_units[variable][exp]
           ).to(final_units[variable]))

In [15]:
# Define keys that will go into table columns

# SETTING UP NAMES FOR ALL TABLE KEYS
POC_key = 'Sinking POC at 100 m (PgC yr$^{-1}$)'
CaCO3_key = 'Sinking CaCO$_3$ at 100 m (PgC yr$^{-1}$)'
rain_key = 'Rain ratio (CaCO$_3$/POC) at 100 m'
NPP_key = 'Net primary production (PgC yr$^{-1}$)'
NPP_diat_key = 'Diatom primary production (%)'
Nfix_key = 'Nitrogen fixation (TgN yr$^{-1}$)'
Ndep_key = 'Nitrogen deposition (TgN yr$^{-1}$)'
denitrif_key = 'Denitrification (TgN yr$^{-1}$)'
Ncycle_key = 'N cycle imbalance* (TgN yr$^{-1}$)'
CO2_key = 'Air–sea CO2 flux (PgC yr$^{-1}$)'
O2_key = 'Mean ocean oxygen ($\mu$M)'
OMZ_key = 'OMZ volume (10$^{15}$ m$^3$; <20 $\mu$M)'
O2_vol_keys = dict()
for thres in [5, 60, 80]:
    O2_vol_keys[thres] = f'Volume (10$^1$$^5$ m$^3$) where O$_2$ <{thres} $\mu$M)'

diagnostic_columns = [NPP_key,
                      POC_key,
                      CaCO3_key,
                      rain_key,
                      Nfix_key,
                      Ndep_key,
                      denitrif_key,
                      Ncycle_key,
                      CO2_key,
                      NPP_diat_key,
                      O2_key,
                      OMZ_key,
                      O2_vol_keys[5],
                      O2_vol_keys[60],
                      O2_vol_keys[80]
                     ]

# Define rounding digit count here
rounding = dict()
rounding[POC_key] = 2
rounding[CaCO3_key] = 3
rounding[rain_key] = 3
rounding[NPP_key] = 1
rounding[NPP_diat_key] = 0
rounding[Nfix_key] = 0
rounding[Ndep_key] = 1
rounding[denitrif_key] = 0
rounding[Ncycle_key] = 0
rounding[CO2_key] = 2
rounding[O2_key] = 0
rounding[OMZ_key] = 0
for _,O2_vol_key in O2_vol_keys.items():
    rounding[O2_vol_key] = 0

In [16]:
%%time

assert experiments == list(ann_avg[vars[0]].keys()),'ann_avg not available for all experiments'

diagnostic_values = dict()
for exp in experiments:
    diagnostic_values[exp] = dict()
    # Compute each value by hand
    print(f'Computing 100m POC flux for {exp}')
    diagnostic_values[exp][POC_key] = get_time_and_ensemble_mean('POC_FLUX_100m', ann_avg, exp, new_units, final_units)

    print(f'Computing 100m CaCO3 flux for {exp}')
    diagnostic_values[exp][CaCO3_key] = get_time_and_ensemble_mean('CaCO3_FLUX_100m', ann_avg, exp, new_units, final_units)

    print(f'Computing 100m rain rate for {exp}')
    diagnostic_values[exp][rain_key] = (diagnostic_values[exp][CaCO3_key] /
                                         diagnostic_values[exp][POC_key])

    print(f'Computing net primary production for {exp}')
    diagnostic_values[exp][NPP_key] = get_time_and_ensemble_mean('photoC_TOT_zint', ann_avg, exp, new_units, final_units)

    print(f'Computing primary production from diatoms for {exp}')
    diagnostic_values[exp][NPP_diat_key] = 100*(get_time_and_ensemble_mean('photoC_diat_zint', ann_avg, exp, new_units, final_units) /
                                                diagnostic_values[exp][NPP_key])

    print(f'Computing Nfixation for {exp}')
    diagnostic_values[exp][Nfix_key] = get_time_and_ensemble_mean('diaz_Nfix', ann_avg, exp, new_units, final_units)
    
    print(f'Computing Ndep for {exp}')
    diagnostic_values[exp][Ndep_key] = (get_time_and_ensemble_mean('NOx_FLUX', ann_avg, exp, new_units, final_units) +
                                        get_time_and_ensemble_mean('NHy_FLUX', ann_avg, exp, new_units, final_units))

    print(f'Computing Denitrif for {exp}')
    diagnostic_values[exp][denitrif_key] = get_time_and_ensemble_mean('DENITRIF', ann_avg, exp, new_units, final_units)
    
    print(f'Computing Nitrogen Cycle imbalance for {exp}')
    table_key = 'N cycle imbalance* (TgN yr$^{-1}$)'
    diagnostic_values[exp][Ncycle_key] = (diagnostic_values[exp][Ndep_key] +
                                          diagnostic_values[exp][Nfix_key] -
                                          diagnostic_values[exp][denitrif_key])

    print(f'Computing air-sea CO2 Flux for {exp}')
    diagnostic_values[exp][CO2_key] = get_time_and_ensemble_mean('FG_CO2', ann_avg, exp, new_units, final_units)

    # Update O2 units to account for fact that we are dividing my total volume
    print(f'Computing O2 concentration for {exp}')
    new_units['O2'][exp] = new_units['O2'][exp] / integral_units["volume"][vol_exp]
    diagnostic_values[exp][O2_key] = get_time_and_ensemble_mean('O2', ann_avg, exp, new_units, final_units)/(total_volume[vol_exp])
        
    print(f'Computing OMZ volume for {exp}')
    diagnostic_values[exp][OMZ_key] = get_time_and_ensemble_mean('O2_under_20uM', ann_avg, exp, new_units, final_units)
    for thres in [5, 60, 80]:
        print(f'Computing volume where O2 < {thres} uM for {exp}')
        diagnostic_values[exp][O2_vol_keys[thres]] = get_time_and_ensemble_mean(f'O2_under_{thres}uM', ann_avg, exp, new_units, final_units)

    if exp != experiments[-1]:
        print('\n----\n')


Computing 100m POC flux for cesm1_PI
Computing 100m CaCO3 flux for cesm1_PI
Computing 100m rain rate for cesm1_PI
Computing net primary production for cesm1_PI
Computing primary production from diatoms for cesm1_PI
Computing Nfixation for cesm1_PI
Computing Ndep for cesm1_PI
Computing Denitrif for cesm1_PI
Computing Nitrogen Cycle imbalance for cesm1_PI
Computing air-sea CO2 Flux for cesm1_PI
Computing O2 concentration for cesm1_PI
Computing OMZ volume for cesm1_PI
Computing volume where O2 < 5 uM for cesm1_PI
Computing volume where O2 < 60 uM for cesm1_PI
Computing volume where O2 < 80 uM for cesm1_PI

----

Computing 100m POC flux for cesm1_hist
Computing 100m CaCO3 flux for cesm1_hist
Computing 100m rain rate for cesm1_hist
Computing net primary production for cesm1_hist
Computing primary production from diatoms for cesm1_hist
Computing Nfixation for cesm1_hist
Computing Ndep for cesm1_hist
Computing Denitrif for cesm1_hist
Computing Nitrogen Cycle imbalance for cesm1_hist
Computing

In [17]:
# Fill a dict with (data, units) tuple [unit conversion comes later]
table_dict = dict()
experiment_longnames={'cesm1_hist' : '1990s (CESM1)',
                      'cesm1_hist_esm' : '1990s (CESM1, BPRP)',
                      'cesm1_PI' : 'preindustrial (CESM1)',
                      'cesm2_hist' : '1990s (CESM2)',
                      'cesm2_SSP1-2.6' : 'RCP26 2090s (CESM2)',
                      'cesm2_SSP2-4.5' : 'RCP45 2090s (CESM2)',
                      'cesm2_SSP3-7.0' : 'RCP70 2090s (CESM2)',
                      'cesm2_SSP5-8.5' : 'RCP85 2090s (CESM2)'}

table_dict['Flux or Concentration'] = []
for table_key in diagnostic_columns:
    table_dict['Flux or Concentration'].append(table_key)
    for exp in experiments:
        if experiment_longnames[exp] not in table_dict:
            table_dict[experiment_longnames[exp]] = []
        try:
            # Workaround to drop decimal place when rounding to nearest integer
            rounded_val = np.round(diagnostic_values[exp][table_key].magnitude, rounding[table_key])
            if rounding[table_key] == 0:
                rounded_val = np.int(rounded_val)
            table_dict[experiment_longnames[exp]].append(f'{rounded_val}')
        except:
            table_dict[experiment_longnames[exp]].append('-')

pd.DataFrame(table_dict)

Unnamed: 0,Flux or Concentration,preindustrial (CESM1),1990s (CESM1),1990s (CESM2),RCP26 2090s (CESM2),RCP45 2090s (CESM2),RCP70 2090s (CESM2),RCP85 2090s (CESM2)
0,Net primary production (PgC yr$^{-1}$),55.8,56.2,48.7,48.8,49.2,49.6,50.1
1,Sinking POC at 100 m (PgC yr$^{-1}$),8.07,8.02,7.07,6.84,6.86,6.89,6.73
2,Sinking CaCO$_3$ at 100 m (PgC yr$^{-1}$),0.753,0.754,0.767,0.804,0.806,0.804,0.813
3,Rain ratio (CaCO$_3$/POC) at 100 m,0.093,0.094,0.109,0.118,0.117,0.117,0.121
4,Nitrogen fixation (TgN yr$^{-1}$),175.0,169.0,241.0,263.0,270.0,280.0,286.0
5,Nitrogen deposition (TgN yr$^{-1}$),6.6,29.2,35.2,25.2,33.2,41.1,38.3
6,Denitrification (TgN yr$^{-1}$),190.0,194.0,193.0,234.0,247.0,255.0,264.0
7,N cycle imbalance* (TgN yr$^{-1}$),-9.0,5.0,83.0,54.0,56.0,66.0,61.0
8,Air–sea CO2 flux (PgC yr$^{-1}$),0.0,1.76,1.83,0.61,2.24,4.47,5.31
9,Diatom primary production (%),35.0,34.0,37.0,33.0,33.0,34.0,31.0
