## Generating a Table of Global Fluxes

This notebook reads in the CESM1 historical run (for CMIP5),
the ensemble of 11 CESM2 historical runs (for CMIP6),
and also the four SSP CESM2 ensembles (for CMIP6).
A table is generated containing values listed [issue #6](https://github.com/marbl-ecosys/cesm2-marbl/issues/6)


> * Net primary production (PgC/yr) (`photoC_TOT_zint`)
> * Diatom primary production (%)   (`photoC_diat_zint`)
> * Sinking POC at 100 m (PgC/yr)   (`POC_FLUX_100m`)
> * Sinking CaCO3 at 100 m (PgC/yr) (`CaCO3_FLUX_100m`)
> * Rain ratio (CaCO3/POC) 100 m    (ratio of two above)
> * Nitrogen fixation (TgN/yr)      (`diaz_Nfix`)
> * Nitrogen deposition (TgN/yr)    (`NOx_FLUX` + `NHy_FLUX`)
> * Denitrification (TgN/yr)        (`DENITRIF`)
> * N cycle imbalance = deposition + fixation - denitrification (TgN/yr) # deposition = N* [see Kristen's notebook -- Biological Diagnostics?]
> * Air–sea CO2 flux (PgC yr21)     (`FG_CO2`)
> * Mean ocean oxygen (uM = umol/L = mmol/m^3)    (`O2`)
> * Volume where O2 <80 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <60 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <5 mmol/m^3 (10^15 m^3)  # based on others

Values will be computed one at a time, due to an issue with `xr.merge` and trying to read multiple variables at once.

### This notebook uses several python packages

The watermark package shows the version number used to help others recreate this environment.

In [1]:
import os

import cftime

import xarray as xr
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.colors as colors
import cmocean

import cartopy
import cartopy.crs as ccrs

import esmlab

import intake
import intake_esm
import ncar_jobqueue
from dask.distributed import Client
from pint import UnitRegistry

# Add new units to UnitRegistry
units = UnitRegistry()
units.define('gram N = mol / 14 = gN')
units.define('gram C = mol / 12 = gC')
units.define('year = 365 day = yr')

%load_ext watermark
%watermark -a "Mike Levy" -d -iv -m -g -h

xarray        0.14.0
intake_esm    2019.10.15.post40
pandas        0.25.3
cmocean       2.0
intake        0.5.3
matplotlib    3.1.2
cftime        1.0.3.4
cartopy       0.17.0
ncar_jobqueue 2019.10.16.1
numpy         1.17.3
esmlab        2019.4.27.post55
Mike Levy 2019-12-26 

compiler   : GCC 7.3.0
system     : Linux
release    : 3.10.0-693.21.1.el7.x86_64
machine    : x86_64
processor  : x86_64
CPU cores  : 72
interpreter: 64bit
host name  : casper05
Git hash   : 2c068922ad64f0fecb97509b0a7bef3d7d035d87


#### Spin up a dask cluster

Some of these computations take a while

In [2]:
cluster = ncar_jobqueue.NCARCluster(project='P93300606')
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://128.117.181.211:35904  Dashboard: https://jupyterhub.ucar.edu/dav/user/mlevy/proxy/8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


In [3]:
# Start with just 2 tasks

cluster.scale(2)

### Read the intake_esm datastores

The `intake_esm` package is used to help identify which files belong in each experiment.
The `get_var_from_catalogs()` function is a wrapper to read specific files.

In [4]:
catalogs = dict()
catalogs['cesm2'] = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign-cesm2-cmip6-timeseries.json')

#cesm1 = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip5_NOT_CMORIZED.json')
catalogs['cesm1'] = intake.open_esm_datastore('/glade/work/mlevy/intake-esm-collection/json/glade-cesm1-cmip5-timeseries.json')

### Define our experiments and the time periods we will average over

In [5]:
# Process for updating intake-esm catalog
#       1. download all data from HPSS via get_ocn_cmip5_files.sh
#       2. rm /glade/u/home/mlevy/.intake_esm/collections/CESM1-CMIP5.nc
#       3. regenerate it via Anderson's legacy intake-esm
#       4. re-run build intake collections notebook
#       5. commit change to .csv.gz in /glade/work/mlevy/intake-esm-collection/csv.gz/
# NOTE: steps 2-5 can be done with notebooks/intake-esm-collection-defs/rebuild.sh

vars = ['photoC_TOT_zint_100m', 'photoC_diat_zint_100m',
        'photoC_TOT_zint', 'photoC_diat_zint',
        'POC_FLUX_100m', 'CaCO3_FLUX_100m',
        'diaz_Nfix', 'NOx_FLUX', 'NHy_FLUX', 'DENITRIF',
        'SedDenitrif', 'DON_RIV_FLUX', 'DONr_RIV_FLUX',
        'FG_CO2', 'O2'
       ]
# experiments is a list of experiments to compute values for
experiments = [# CESM 1
               'cesm1_PI',
               'cesm1_PI_esm',
               'cesm1_hist',
               'cesm1_hist_esm',
               # CESM 2
#                'cesm2_PI',
#                'cesm2_hist',
#                'cesm2_SSP1-2.6',
#                'cesm2_SSP2-4.5',
#                'cesm2_SSP3-7.0',
#                'cesm2_SSP5-8.5',
              ]

# experiment_longnames defines the table headers
experiment_longnames={'cesm1_PI' : 'preindustrial (CESM1)',
                      'cesm1_PI_esm' : 'preindustrial (CESM1, BPRP)',
                      'cesm1_hist' : '1981-2005 (CESM1)',
                      'cesm1_hist_esm' : '1990s (CESM1)',
                      'cesm1_RCP45' : 'RCP 4.5 2090s (CESM1)', # not available yet
                      'cesm1_RCP85' : 'RCP 8.5 2090s (CESM1)', # not available yet
                      'cesm2_PI' : 'preindustrial (CESM2)',
                      'cesm2_hist' : '1990-2014 (CESM2)',
                      'cesm2_SSP1-2.6' : 'RCP26 2090s (CESM2)',
                      'cesm2_SSP2-4.5' : 'RCP45 2090s (CESM2)',
                      'cesm2_SSP3-7.0' : 'RCP70 2090s (CESM2)',
                      'cesm2_SSP5-8.5' : 'RCP85 2090s (CESM2)'}

# experiment_dict determines which module version & intake data each experiment uses
experiment_dict = {'cesm1_PI' : ('cesm1', 'piControl'),
                   'cesm1_PI_esm' : ('cesm1', 'esm-piControl'),
                   'cesm1_hist' : ('cesm1', 'historical'),
                   'cesm1_hist_esm' : ('cesm1', 'esm-hist'),
                   'cesm2_PI' : ('cesm2', 'piControl'),
                   'cesm2_hist' : ('cesm2', 'historical'),
                   'cesm2_SSP1-2.6' : ('cesm2', 'SSP1-2.6'),
                   'cesm2_SSP2-4.5' : ('cesm2', 'SSP2-4.5'),
                   'cesm2_SSP3-7.0' : ('cesm2', 'SSP3-7.0'),
                   'cesm2_SSP5-8.5' : ('cesm2', 'SSP5-8.5')
                  }

# NOTE: 2090-01-01 0:00:00 is the time stamp on the Dec 2089 monthly average
#       So slice("2090", "2100") would actually return Dec 2090 - Nov 2099
#       Specifying a day mid-month gets us to Jan 2090 - Dec 2099 (the 2090s)
#       (this can be verified by looking at time bounds)
time_slices_SSP = slice("2090-01-15", "2100-01-15")

time_slices = dict()

# 200 year averages for CESM1 PI runs, per Lindsay et al 2014
# (He starts 30 years prior to branch point, so I will too)
time_slices['cesm1_PI'] = slice(120*12, 320*12) # cfunits doesn't years too far in past; this is 121-01-15 - 321-01-15
time_slices['cesm1_PI_esm'] = slice(320*12, 520*12) # cfunits doesn't years too far in past; this is 321-01-15 - 521-01-15
# For CESM2, going from 50 years prior to first historical branch point
#                  to 50 years after end of last historical member
# TODO: These dates should be computed automatically based on intake metadata!
time_slices['cesm2_PI'] = slice(550*12, 1070*12) # cfunits doesn't years too far in past; this is 551-01-15 - 1071-01-15

# Historical runs all use slightly different time periods
time_slices['cesm1_hist'] = slice("1981-01-15", "2006-01-15") # per Lindsay et al 2014
time_slices['cesm1_hist_esm'] = slice("1990-01-15", "2000-01-15") # per Moore et al 2013
time_slices['cesm2_hist'] = slice("1990-01-15", "2015-01-15") # For our paper

# RCP runs use 2090s
time_slices['cesm1_RCP45'] = time_slices_SSP
time_slices['cesm1_RCP85'] = time_slices_SSP
time_slices['cesm2_SSP1-2.6'] = time_slices_SSP
time_slices['cesm2_SSP2-4.5'] = time_slices_SSP
time_slices['cesm2_SSP3-7.0'] = time_slices_SSP
time_slices['cesm2_SSP5-8.5'] = time_slices_SSP

In [6]:
def get_var_from_catalogs(catalogs, variable, experiments):
    print(f'Reading {variable} from catalog...\n')
    if type(experiments) != list:
        experiments = [experiments]

    datasets = dict()
    cesm1_exps = []
    cesm2_exps = []
    for exp in experiments:
        if experiment_dict[exp][0] == 'cesm1':
            cesm1_exps.append(experiment_dict[exp][1])
        elif experiment_dict[exp][0] == 'cesm2':
            cesm2_exps.append(experiment_dict[exp][1])
        else:
            print(f'WARNING: can not determine model version from {exp}')
    
    # CESM1 is historical only, CESM2 also has SSPs
    # Note some variable rename shenanigans to account for changes between CESM1 and CESM2
    depth_100m = False
    if variable == 'CaCO3_FLUX_100m':
        cesm1_var = 'CaCO3_FLUX_IN'
    elif variable == 'POC_FLUX_100m':
        cesm1_var = 'POC_FLUX_IN'
    elif variable == 'photoC_diat_zint_100m':
        cesm1_var = 'photoC_diat'
        depth_100m = True
    elif variable == 'photoC_TOT_zint_100m':
        cesm1_var = ['photoC_sp', 'photoC_diat', 'photoC_diaz']
        depth_100m = True
    elif variable in ['SedDenitrif', 'DON_RIV_FLUX', 'DONr_RIV_FLUX', 'photoC_TOT_zint', 'photoC_diat_zint']:
        cesm1_var = None
    else:
        cesm1_var = variable

    dq = dict()
    if cesm1_exps and (cesm1_var is not None):
        if type(cesm1_var) != list:
            dq['cesm1'] = catalogs['cesm1'].search(experiment=cesm1_exps, variable=cesm1_var).to_dataset_dict(cdf_kwargs={'chunks':{'time': 1}})
    if cesm2_exps:
        dq['cesm2'] = catalogs['cesm2'].search(experiment=cesm2_exps, variable=variable).to_dataset_dict(cdf_kwargs={'chunks':{'time': 1}})
    print('\n----\n')
    
    for exp in experiments:
        if type(cesm1_var) == list:
            if experiment_dict[exp][0] == 'cesm1':
                tmp_dataset = dict()
                for var_from_list in cesm1_var:
                    dq['cesm1'] = catalogs['cesm1'].search(experiment=experiment_dict[exp][1], variable=var_from_list).to_dataset_dict(cdf_kwargs={'chunks':{'time': 1}})
                    tmp_dataset[var_from_list] = _read_var_from_exp(dq['cesm1'], exp, f'ocn.{experiment_dict[exp][1]}.pop.h',
                                                                    time_slices[exp], variable, var_from_list)
                    if depth_100m:
                        tmp_da = tmp_dataset[var_from_list][variable].isel(z_t_150m=slice(0,10)).rename({'z_t_150m' : 'z_t_100m'})
                        tmp_dataset[var_from_list][variable] = tmp_da
                datasets[exp] = tmp_dataset[cesm1_var[0]]
                for var_from_list in cesm1_var[1:]:
                    datasets[exp][variable].data = datasets[exp][variable].data + tmp_dataset[var_from_list][variable].data
            else:
                datasets[exp] = _read_var_from_exp(dq['cesm2'], exp, f'ocn.{experiment_dict[exp][1]}.pop.h',
                                                   time_slices[exp], variable, variable)
        else:
            if (cesm1_var is None) and (experiment_dict[exp][0] == 'cesm1'):
                print(f'Skipping {variable} for {exp} because it is not available in CESM1 output')
            else:
                datasets[exp] = _read_var_from_exp(dq[experiment_dict[exp][0]], exp, f'ocn.{experiment_dict[exp][1]}.pop.h',
                                                   time_slices[exp], variable, cesm1_var)
                if depth_100m and 'z_t_150' in datasets[exp][variable].dims:
                    tmp_da = datasets[exp][variable].isel(z_t_150m=slice(0,10)).rename({'z_t_150m' : 'z_t_100m'})
                    datasets[exp][variable] = tmp_da

    return(datasets)

def _read_var_from_exp(dq, exp, stream, time_slice, variable, cesm1_var):
    # Define datasets
    dataset_full = dq[stream]

    # Initialize dataset with only time-invariant fields
    keep_vars_no_time = ['REGION_MASK', 'z_t', 'z_t_150m', 'dz', 'TAREA', 'TLONG', 'TLAT', 'member_id', 'ctrl_member_id']
    dataset = dataset_full.drop([v for v in dataset_full.variables if v not in keep_vars_no_time]).isel(time=0)

    # Then add variable / cesm1_var with full time dimension
    keep_vars_with_time = ['time', 'time_bound'] + [variable, cesm1_var]
    if exp in ['cesm1_PI', 'cesm1_PI_esm', 'cesm2_PI']:
        # Need isel instead of sel since PI slices are in index space rather than years
        dataset_full = dataset_full.isel(time=time_slice)
    else:
        dataset_full = dataset_full.sel(time=time_slice)
    for var in keep_vars_with_time:
        if var in dataset_full and var not in dataset:
            dataset[var] = dataset_full[var]
    del(dataset_full)
    if variable not in dataset:
        dataset = dataset.rename({cesm1_var : variable})
        if variable in ['POC_FLUX_100m', 'CaCO3_FLUX_100m']:
            dataset = dataset.isel(z_t=10)  # 100m is top of 11th level, or z_t = 10 counting from 0
    if (cesm1_var != variable) and (cesm1_var in dataset):
        dataset = dataset.drop(cesm1_var)
    # Limit output to open ocean
#     dataset[variable] = dataset[variable].where(dataset['REGION_MASK'] > 0)
    # Include marginal seas
    dataset[variable] = dataset[variable].where(dataset['REGION_MASK'] != 0)

    return(dataset)

def add_o2_below_thres(datasets, thres):
    new_datasets = datasets.copy()
    varname = f'O2_under_{thres}uM'
    new_datasets[varname] = dict()
    for exp in new_datasets['O2']:
        new_datasets[varname][exp] = new_datasets['O2'][exp].rename({'O2' : varname})
        new_datasets[varname][exp][varname] = xr.where(new_datasets['O2'][exp]['O2'] < thres, 1, 0)
        new_datasets[varname][exp][varname].attrs['units'] = ''
    return(new_datasets)

In [7]:
%%time

# Set up units
integral_units = dict()
integral_units['area'] = dict()
integral_units['volume'] = dict()

# Read in any 3D variable to get dataset containing TAREA and z_t
# And set up a DataArray to compute total volume
tmp_var = 'DENITRIF'
tmp_data = get_var_from_catalogs(catalogs, tmp_var, experiments)

total_volume = dict()
for exp in tmp_data:
    integral_units['area'][exp] = units[tmp_data[exp]['TAREA'].attrs['units']]
    integral_units['volume'][exp] = integral_units['area'][exp] * units[tmp_data[exp]['dz'].attrs['units']]

    # Compute total volume of ocean
    # Sum wgt over active ocean cells
    tmp_data2 = tmp_data[exp][tmp_var].isel(time=0, member_id=0).copy()
    tmp_data2.data = np.logical_not(np.isnan(tmp_data[exp][tmp_var].isel(time=0, member_id=0).data))
    total_volume[exp] = (tmp_data2 * tmp_data[exp]['TAREA'] * tmp_data[exp]['dz']).sum().values
    print(f'Ocean volume in {exp}: {(total_volume[exp] * integral_units["volume"][exp]).to("L")}')

vol_exp = experiments[0]
# Estimate total volume in ocean
# (surface of earth is 71% water, avg depth is 3.7 km)
est_depth = 4*np.pi*0.71*((6371.22*units['km'])**2)*(3.7*units['km'])
print(f'Estimated ocean volume: {est_depth.to("L")}')

Reading DENITRIF from catalog...

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 4 group(s)

----

Ocean volume in cesm1_PI: 1.3251402862315105e+21 liter
Ocean volume in cesm1_PI_esm: 1.3251402862315105e+21 liter
Ocean volume in cesm1_hist: 1.3251402862315105e+21 liter
Ocean volume in cesm1_hist_esm: 1.3251402862315105e+21 liter
Estimated ocean volume: 1.3400319094588907e+21 liter
CPU times: user 3.87 s, sys: 388 ms, total: 4.26 s
Wall time: 1min 43s


### Individual Table Computations

In this section, we compute each of the requested values for each dataset

#### Net primary production (PgC/yr)

CESM1 doesn't have `photoC_TOT_zint`

#### Diatom primary production (%)

CESM1 doesn't have `photoC_diat_zint`

#### Sinking POC at 100 m (PgC/yr)

CESM1 doesn't have `POC_FLUX_100m`

#### Sinking CaCO3 at 100 m (PgC/yr)

CESM1 doesn't have `CaCO3_FLUX_100m`

#### Rain ratio (CaCO3/POC) 100 m

Missing necessary vars to compute

#### Nitrogen deposition (TgN/yr)

#### Denitrification (TgN/yr)

In [8]:
## jump to 4 tasks for better performance defining datasets

cluster.scale(4)

In [9]:
%%time

all_data = dict()
for var in vars:
    all_data[var] = get_var_from_catalogs(catalogs, var, experiments)

Reading photoC_TOT_zint_100m from catalog...


----

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

--> There will be 1 group(s)
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experimen

In [10]:
# Add another four tasks to speed things up even more

cluster.scale(8)

In [11]:
# Add variables for computing volume where O2 is below a threshold
if 'O2' in all_data:
    all_data = add_o2_below_thres(all_data, 5)
    all_data = add_o2_below_thres(all_data, 20)
    all_data = add_o2_below_thres(all_data, 60)
    all_data = add_o2_below_thres(all_data, 80)

In [12]:
# Verify time bounds for each experiment
for exp in all_data[vars[0]]:
    bounds = list(all_data[vars[0]][exp].time_bound.values[ind] for ind in [(0,0), (-1,1)])
    print(f'Experiment: {exp}\nBounds\n----\n{bounds}\n\n')

Experiment: cesm1_PI
Bounds
----
[cftime.DatetimeNoLeap(121, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(321, 1, 1, 0, 0, 0, 0, 6, 1)]


Experiment: cesm1_PI_esm
Bounds
----
[cftime.DatetimeNoLeap(321, 1, 1, 0, 0, 0, 0, 6, 1), cftime.DatetimeNoLeap(521, 1, 1, 0, 0, 0, 0, 3, 1)]


Experiment: cesm1_hist
Bounds
----
[cftime.DatetimeNoLeap(1981, 1, 1, 0, 0, 0, 0, 0, 1), cftime.DatetimeNoLeap(2006, 1, 1, 0, 0, 0, 0, 4, 1)]


Experiment: cesm1_hist_esm
Bounds
----
[cftime.DatetimeNoLeap(1990, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 0, 5, 1)]




In [13]:
def compute_global_averages(datasets, integral_units, variable):
    experiments = list(datasets[variable].keys())
    glb_avg = dict()
    new_units = dict()
    for exp in experiments:
        wgts = datasets[variable][exp]['TAREA']
        # Note that CESM2 has vertical integrals where CESM1 doesn't, so
        # same variable might have different units in different experiments
        unit_key = 'area'
        dims = ['nlat', 'nlon']
        if 'z_t_100m' in datasets[variable][exp][variable].dims:
            wgts = wgts * datasets[variable][exp]['dz'].isel(z_t=slice(0,10))
            wgts = wgts.rename({'z_t' : 'z_t_100m'})
            dims.append('z_t_100m')
            unit_key = 'volume'
        elif 'z_t_150m' in datasets[variable][exp][variable].dims:
            wgts = wgts * datasets[variable][exp]['dz'].isel(z_t=slice(0,15))
            wgts = wgts.rename({'z_t' : 'z_t_150m'})
            dims.append('z_t_150m')
            unit_key = 'volume'
        elif 'z_t' in datasets[variable][exp][variable].dims:
            wgts = wgts * datasets[variable][exp]['dz']
            dims.append('z_t')
            unit_key = 'volume'
        glb_avg[exp] = esmlab.weighted_sum(datasets[variable][exp][variable], dim=dims, weights=wgts).to_dataset(name=variable)
        old_units = units[datasets[variable][exp][variable].attrs['units']]
        new_units[exp] = old_units*integral_units[unit_key][exp]
        glb_avg[exp]
    return glb_avg, new_units

In [14]:
%%time

ann_avg = dict()
new_units = dict()
for variable in all_data:
    print(f'Computing global average of {variable}...')
    glb_avgs, new_units[variable] = compute_global_averages(all_data, integral_units, variable)
    ann_avg[variable] = dict()
    for exp in glb_avgs:
        print(f'   ... computing for {exp}')
        glb_avgs[exp]['time_bound'] = all_data[variable][exp]['time_bound']
        ann_avg[variable][exp] = esmlab.resample(glb_avgs[exp], freq='ann')

if 'O2' in new_units:
    new_units['O2_orig'] = new_units['O2'].copy()

Computing global average of photoC_TOT_zint_100m...
   ... computing for cesm1_PI
   ... computing for cesm1_PI_esm
   ... computing for cesm1_hist
   ... computing for cesm1_hist_esm
Computing global average of photoC_diat_zint_100m...
   ... computing for cesm1_PI
   ... computing for cesm1_PI_esm
   ... computing for cesm1_hist
   ... computing for cesm1_hist_esm
Computing global average of photoC_TOT_zint...
Computing global average of photoC_diat_zint...
Computing global average of POC_FLUX_100m...
   ... computing for cesm1_PI
   ... computing for cesm1_PI_esm
   ... computing for cesm1_hist
   ... computing for cesm1_hist_esm
Computing global average of CaCO3_FLUX_100m...
   ... computing for cesm1_PI
   ... computing for cesm1_PI_esm
   ... computing for cesm1_hist
   ... computing for cesm1_hist_esm
Computing global average of diaz_Nfix...
   ... computing for cesm1_PI
   ... computing for cesm1_PI_esm
   ... computing for cesm1_hist
   ... computing for cesm1_hist_esm
Computi

## Reduce Data Sets

The following table shows global averages (also averaged over specified time slices)

In [15]:
# use 4 tasks for final computation

cluster.scale(4)

In [16]:
# Define final units
PgC_per_year = 'PgC/yr'
TgN_per_year = 'TgN/yr'
uM = 'uM'

final_units = dict()
final_units['photoC_TOT_zint'] = PgC_per_year
final_units['photoC_diat_zint'] = PgC_per_year
final_units['photoC_TOT_zint_100m'] = PgC_per_year
final_units['photoC_diat_zint_100m'] = PgC_per_year
final_units['POC_FLUX_100m'] = PgC_per_year
final_units['CaCO3_FLUX_100m'] = PgC_per_year
final_units['diaz_Nfix'] = TgN_per_year
final_units['NOx_FLUX'] = TgN_per_year
final_units['NHy_FLUX'] = TgN_per_year
final_units['DENITRIF'] = TgN_per_year
final_units['SedDenitrif'] = TgN_per_year
final_units['DON_RIV_FLUX'] = TgN_per_year
final_units['DONr_RIV_FLUX'] = TgN_per_year
final_units['FG_CO2'] = PgC_per_year
final_units['O2'] = 'uM'
final_units['O2_under_5uM'] = 'Pm * m^2'
final_units['O2_under_20uM'] = 'Pm * m^2'
final_units['O2_under_60uM'] = 'Pm * m^2'
final_units['O2_under_80uM'] = 'Pm * m^2'

In [17]:
def get_time_and_ensemble_mean(variable, ann_avg, exp, new_units, final_units):
    try:
        ens_time_mean = (ann_avg[variable][exp][variable].mean('member_id')).mean('time').values
    except:
        print(f'   * Can not compute {variable} for {exp}')
        return('-')
    return((ens_time_mean * new_units[variable][exp]).to(final_units[variable]))

In [18]:
# Define keys that will go into table columns

# SETTING UP NAMES FOR ALL TABLE KEYS
POC_key = f'Sinking POC at 100 m ({PgC_per_year})'
CaCO3_key = f'Sinking CaCO$_3$ at 100 m ({PgC_per_year})'
rain_key = f'Rain ratio (CaCO$_3$/POC) at 100 m'
NPP_key = f'Net primary production, full depth ({PgC_per_year})'
NPP_diat_key = f'Diatom primary production, full depth (%)'
NPP_100m_key = f'Net primary production, top 100m ({PgC_per_year})'
NPP_diat_100m_key = f'Diatom primary production, top 100m (%)'
Nfix_key = f'Nitrogen fixation ({TgN_per_year})'
Ndep_key = f'Nitrogen deposition ({TgN_per_year})'
denitrif_key = f'Water Column Denitrification ({TgN_per_year})'
denitrif2_key = f'Sediment Denitrification ({TgN_per_year})'
rivflux_key = f'Nitrogen River Flux ({TgN_per_year})'
Ncycle_key = f'N cycle imbalance* ({TgN_per_year})'
CO2_key = f'Air–sea CO2 flux ({PgC_per_year})'
O2_key = f'Mean ocean oxygen ($\mu$M)'
OMZ_key = f'OMZ volume (10$^1$$^5$ m$^3$; <20 $\mu$M)'
O2_vol_keys = dict()
for thres in [5, 60, 80]:
    O2_vol_keys[thres] = f'Volume (10$^1$$^5$ m$^3$) where O$_2$ <{thres} $\mu$M)'

# Define rounding digit count here
rounding = dict()
rounding[POC_key] = 2
rounding[CaCO3_key] = 3
rounding[rain_key] = 3
rounding[NPP_key] = 1
rounding[NPP_diat_key] = 0
rounding[NPP_100m_key] = 1
rounding[NPP_diat_100m_key] = 0
rounding[Nfix_key] = 0
rounding[Ndep_key] = 1
rounding[denitrif_key] = 0
rounding[denitrif2_key] = 0
rounding[rivflux_key] = 0
rounding[Ncycle_key] = 0
rounding[CO2_key] = 2
rounding[O2_key] = 0
rounding[OMZ_key] = 0
for _,O2_vol_key in O2_vol_keys.items():
    rounding[O2_vol_key] = 0

In [19]:
%%time

diagnostic_values = dict()
for exp in experiments:
    diagnostic_values[exp] = dict()
    # Compute each value by hand
    print(f'Computing 100m POC flux for {exp}')
    diagnostic_values[exp][POC_key] = get_time_and_ensemble_mean('POC_FLUX_100m', ann_avg, exp, new_units, final_units)

    print(f'Computing 100m CaCO3 flux for {exp}')
    diagnostic_values[exp][CaCO3_key] = get_time_and_ensemble_mean('CaCO3_FLUX_100m', ann_avg, exp, new_units, final_units)

    print(f'Computing 100m rain rate for {exp}')
    try:
        diagnostic_values[exp][rain_key] = (diagnostic_values[exp][CaCO3_key] /
                                             diagnostic_values[exp][POC_key])
    except:
        print(f'   * Can not compute rain rate for {exp}')

    print(f'Computing full depth net primary production for {exp}')
    diagnostic_values[exp][NPP_key] = get_time_and_ensemble_mean('photoC_TOT_zint', ann_avg, exp, new_units, final_units)

    print(f'Computing full depth primary production from diatoms for {exp}')
    try:
        diagnostic_values[exp][NPP_diat_key] = 100*(get_time_and_ensemble_mean('photoC_diat_zint', ann_avg, exp, new_units, final_units) /
                                                    diagnostic_values[exp][NPP_key])
    except:
        print(f'   * Can not compute primary production from diatoms for {exp}')

    print(f'Computing top 100m net primary production for {exp}')
    diagnostic_values[exp][NPP_100m_key] = get_time_and_ensemble_mean('photoC_TOT_zint_100m', ann_avg, exp, new_units, final_units)

    print(f'Computing top 100m primary production from diatoms for {exp}')
    try:
        diagnostic_values[exp][NPP_diat_100m_key] = 100*(get_time_and_ensemble_mean('photoC_diat_zint_100m', ann_avg, exp, new_units, final_units) /
                                                    diagnostic_values[exp][NPP_100m_key])
    except:
        print(f'   * Can not compute primary production from diatoms for {exp}')

    print(f'Computing Nfixation for {exp}')
    diagnostic_values[exp][Nfix_key] = get_time_and_ensemble_mean('diaz_Nfix', ann_avg, exp, new_units, final_units)
    
    print(f'Computing Ndep for {exp}')
    diagnostic_values[exp][Ndep_key] = (get_time_and_ensemble_mean('NOx_FLUX', ann_avg, exp, new_units, final_units) +
                                        get_time_and_ensemble_mean('NHy_FLUX', ann_avg, exp, new_units, final_units))

    print(f'Computing Water Column Denitrif for {exp}')
    diagnostic_values[exp][denitrif_key] = get_time_and_ensemble_mean('DENITRIF', ann_avg, exp, new_units, final_units)
    
    print(f'Computing Sediment Denitrif for {exp}')
    diagnostic_values[exp][denitrif2_key] = get_time_and_ensemble_mean('SedDenitrif', ann_avg, exp, new_units, final_units)

    print(f'Computing Nitrogen River Flux for {exp}')
    diagnostic_values[exp][rivflux_key] = (get_time_and_ensemble_mean('DON_RIV_FLUX', ann_avg, exp, new_units, final_units) +
                                           get_time_and_ensemble_mean('DONr_RIV_FLUX', ann_avg, exp, new_units, final_units))

    print(f'Computing Nitrogen Cycle imbalance for {exp}')
    table_key = 'N cycle imbalance* (TgN yr$^{-1}$)'
    try:
        diagnostic_values[exp][Ncycle_key] = (diagnostic_values[exp][Ndep_key] +
                                              diagnostic_values[exp][Nfix_key] -
                                              diagnostic_values[exp][denitrif_key])
        try:
            diagnostic_values[exp][Ncycle_key] = (diagnostic_values[exp][Ncycle_key] -
                                                  diagnostic_values[exp][denitrif2_key] - 
                                                  diagnostic_values[exp][rivflux_key])
        except:
            print(f'   * No additional denitrification terms for {exp}')
            pass
    except:
        print(f'   * Can not compute Ncycle imbalance for {exp}')

    print(f'Computing air-sea CO2 Flux for {exp}')
    diagnostic_values[exp][CO2_key] = get_time_and_ensemble_mean('FG_CO2', ann_avg, exp, new_units, final_units)

    # Update O2 units to account for fact that we are dividing my total volume
    print(f'Computing O2 concentration for {exp}')
    try:
        new_units['O2'][exp] = new_units['O2_orig'][exp] / integral_units["volume"][vol_exp]
        diagnostic_values[exp][O2_key] = get_time_and_ensemble_mean('O2', ann_avg, exp, new_units, final_units)/(total_volume[vol_exp])
    except:
        print(f'   * Can not compute O2 concentration for {exp}')
        
    print(f'Computing OMZ volume for {exp}')
    diagnostic_values[exp][OMZ_key] = get_time_and_ensemble_mean('O2_under_20uM', ann_avg, exp, new_units, final_units)
    for thres in [5, 60, 80]:
        print(f'Computing volume where O2 < {thres} uM for {exp}')
        diagnostic_values[exp][O2_vol_keys[thres]] = get_time_and_ensemble_mean(f'O2_under_{thres}uM', ann_avg, exp, new_units, final_units)

    if exp != experiments[-1]:
        print('\n----\n')


Computing 100m POC flux for cesm1_PI
Computing 100m CaCO3 flux for cesm1_PI
Computing 100m rain rate for cesm1_PI
Computing full depth net primary production for cesm1_PI
   * Can not compute photoC_TOT_zint for cesm1_PI
Computing full depth primary production from diatoms for cesm1_PI
   * Can not compute photoC_diat_zint for cesm1_PI
   * Can not compute primary production from diatoms for cesm1_PI
Computing top 100m net primary production for cesm1_PI
Computing top 100m primary production from diatoms for cesm1_PI
Computing Nfixation for cesm1_PI
Computing Ndep for cesm1_PI
Computing Water Column Denitrif for cesm1_PI
Computing Sediment Denitrif for cesm1_PI
   * Can not compute SedDenitrif for cesm1_PI
Computing Nitrogen River Flux for cesm1_PI
   * Can not compute DON_RIV_FLUX for cesm1_PI
   * Can not compute DONr_RIV_FLUX for cesm1_PI
Computing Nitrogen Cycle imbalance for cesm1_PI
   * No additional denitrification terms for cesm1_PI
Computing air-sea CO2 Flux for cesm1_PI
Comp

In [20]:
def make_table(diag_columns, test_exps):
    table_dict = dict()
    table_dict['Flux or Concentration'] = []
    for table_key in diag_columns:
        table_dict['Flux or Concentration'].append(table_key)
        for exp in test_exps:
            if experiment_longnames[exp] not in table_dict:
                table_dict[experiment_longnames[exp]] = []
            try:
                # Workaround to drop decimal place when rounding to nearest integer
                rounded_val = np.round(diagnostic_values[exp][table_key].magnitude, rounding[table_key])
                if rounding[table_key] == 0:
                    rounded_val = np.int(rounded_val)
                table_dict[experiment_longnames[exp]].append(f'{rounded_val}')
            except:
                table_dict[experiment_longnames[exp]].append('-')
    return(table_dict)

In [21]:
if 'cesm1_PI_esm' in diagnostic_values:
    print('Comparison of cesm1_PI_esm')
#     let var = fg_co2[d=2]
#     show var var
#      VAR = FG_CO2[D=2]
#     list var_integral_PgC_year
#                  VARIABLE : (1E-9 * 12 * 1E-15 * 86400 * 365) * VAR_MUL_AREA[I=@SUM,J=@SUM]
#                  X        : 0.5 to 320.5
#                  Y        : 0.5 to 384.5
#                  ENSEMBLE : 0421
#              -0.02491
    print(f'{diagnostic_values["cesm1_PI_esm"][CO2_key].magnitude} (should be -0.02491)')

#     let var = POC_FLUX_IN_100m[d=2]
#     show var var
#      VAR = POC_FLUX_IN_100M[D=2]
#     list/prec=6 var_integral_PgC_year
#                  VARIABLE : (1E-9 * 12 * 1E-15 * 86400 * 365) * VAR_MUL_AREA[I=@SUM,J=@SUM]
#                  X        : 0.5 to 320.5
#                  Y        : 0.5 to 384.5
#                  ENSEMBLE : 0421
#               8.06490
    print(f'{diagnostic_values["cesm1_PI_esm"][POC_key]} (should be 8.06490)')

# let var = photoC_diat_zint_100m[d=2]+photoC_sp_zint_100m[d=2]+photoC_diaz_zint_100m[d=2]
# show var var
#  VAR = PHOTOC_DIAT_ZINT_100M[D=2]+PHOTOC_SP_ZINT_100M[D=2]+PHOTOC_DIAZ_ZINT_100M[D=2]
# list/prec=6 var_integral_PgC_year
#              VARIABLE : (1E-9 * 12 * 1E-15 * 86400 * 365) * VAR_MUL_AREA[I=@SUM,J=@SUM]
#              X        : 0.5 to 320.5
#              Y        : 0.5 to 384.5
#              ENSEMBLE : 0421
#           55.4878
    print(f'{diagnostic_values["cesm1_PI_esm"][NPP_100m_key]} (should be 55.4878)')
else:
    print('No comparisons done, since cesm1_PI_esm experiment not included')

Comparison of cesm1_PI_esm
-0.02490595251150479 (should be -0.02491)
8.064900172824604 petagram C / year (should be 8.06490)
55.4877664992607 petagram C / year (should be 55.4878)


In [22]:
# Keith L's table

rounding[CO2_key] = 3
rounding[NPP_100m_key] = 2
pd.DataFrame(make_table([CO2_key, NPP_100m_key, POC_key], ['cesm1_PI', 'cesm1_hist']))

Unnamed: 0,Flux or Concentration,preindustrial (CESM1),1981-2005 (CESM1)
0,Air–sea CO2 flux (PgC/yr),-0.024,1.774
1,"Net primary production, top 100m (PgC/yr)",55.55,55.73
2,Sinking POC at 100 m (PgC/yr),8.08,8.01


In [23]:
# Keith M's original table
# We use a different set of preindustrial years
# Also, maybe he uses equal weighting for month -> year instead of number of days per month?

rounding[CO2_key] = 2
rounding[NPP_100m_key] = 1

test_exps = ['cesm1_PI_esm', 'cesm1_hist_esm', 'cesm1_RCP45', 'cesm1_RCP85']
diagnostic_columns = [NPP_key,
                      POC_key,
                      CaCO3_key,
                      rain_key,
                      Nfix_key,
                      Ndep_key,
                      denitrif_key,
                      Ncycle_key,
                      CO2_key,
                      NPP_diat_key,
                      O2_key,
                      OMZ_key
                     ]
pd.DataFrame(make_table(diagnostic_columns, test_exps))

Unnamed: 0,Flux or Concentration,"preindustrial (CESM1, BPRP)",1990s (CESM1),RCP 4.5 2090s (CESM1),RCP 8.5 2090s (CESM1)
0,"Net primary production, full depth (PgC/yr)",-,-,-,-
1,Sinking POC at 100 m (PgC/yr),8.06,8.06,-,-
2,Sinking CaCO$_3$ at 100 m (PgC/yr),0.758,0.751,-,-
3,Rain ratio (CaCO$_3$/POC) at 100 m,0.094,0.093,-,-
4,Nitrogen fixation (TgN/yr),177,174,-,-
5,Nitrogen deposition (TgN/yr),6.7,30.0,-,-
6,Water Column Denitrification (TgN/yr),190,193,-,-
7,N cycle imbalance* (TgN/yr),-6,10,-,-
8,Air–sea CO2 flux (PgC/yr),-0.02,2.19,-,-
9,"Diatom primary production, full depth (%)",-,-,-,-


In [24]:
# Updated table for our paper
rounding[CO2_key] = 2
rounding[NPP_100m_key] = 1

test_exps = ['cesm2_PI', 'cesm2_hist', 'cesm2_SSP1-2.6', 'cesm2_SSP2-4.5', 'cesm2_SSP3-7.0', 'cesm2_SSP5-8.5']
diagnostic_columns = [NPP_key,
                      NPP_100m_key,
                      POC_key,
                      CaCO3_key,
                      rain_key,
                      Nfix_key,
                      Ndep_key,
                      denitrif_key,
                      denitrif2_key,
                      rivflux_key,
                      Ncycle_key,
                      CO2_key,
                      NPP_diat_key,
                      NPP_diat_100m_key,
                      O2_key,
                      OMZ_key,
                      O2_vol_keys[5],
                      O2_vol_keys[60],
                      O2_vol_keys[80]
                     ]
pd.DataFrame(make_table(diagnostic_columns, test_exps))

Unnamed: 0,Flux or Concentration,preindustrial (CESM2),1990-2014 (CESM2),RCP26 2090s (CESM2),RCP45 2090s (CESM2),RCP70 2090s (CESM2),RCP85 2090s (CESM2)
0,"Net primary production, full depth (PgC/yr)",-,-,-,-,-,-
1,"Net primary production, top 100m (PgC/yr)",-,-,-,-,-,-
2,Sinking POC at 100 m (PgC/yr),-,-,-,-,-,-
3,Sinking CaCO$_3$ at 100 m (PgC/yr),-,-,-,-,-,-
4,Rain ratio (CaCO$_3$/POC) at 100 m,-,-,-,-,-,-
5,Nitrogen fixation (TgN/yr),-,-,-,-,-,-
6,Nitrogen deposition (TgN/yr),-,-,-,-,-,-
7,Water Column Denitrification (TgN/yr),-,-,-,-,-,-
8,Sediment Denitrification (TgN/yr),-,-,-,-,-,-
9,Nitrogen River Flux (TgN/yr),-,-,-,-,-,-
