## Generating a Table of Global Fluxes

This notebook reads in the CESM1 historical run (for CMIP5),
the ensemble of 11 CESM2 historical runs (for CMIP6),
and also the four SSP CESM2 ensembles (for CMIP6).
A table is generated containing values listed [issue #6](https://github.com/marbl-ecosys/cesm2-marbl/issues/6)


> * Net primary production (PgC/yr) (`photoC_TOT_zint`)
> * Diatom primary production (%)   (`photoC_diat_zint`)
> * Sinking POC at 100 m (PgC/yr)   (`POC_FLUX_100m`)
> * Sinking CaCO3 at 100 m (PgC/yr) (`CaCO3_FLUX_100m`)
> * Rain ratio (CaCO3/POC) 100 m    (ratio of two above)
> * Nitrogen fixation (TgN/yr)      (`diaz_Nfix`)
> * Nitrogen deposition (TgN/yr)    (`NOx_FLUX` + `NHy_FLUX`)
> * Denitrification (TgN/yr)        (`DENITRIF`)
> * N cycle imbalance = deposition + fixation - denitrification (TgN/yr) # deposition = N* [see Kristen's notebook -- Biological Diagnostics?]
> * Air–sea CO2 flux (PgC yr21)     (`FG_CO2`)
> * Mean ocean oxygen (uM = umol/L = mmol/m^3)    (`O2`)
> * Volume where O2 <80 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <60 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <5 mmol/m^3 (10^15 m^3)  # based on others

Values will be computed one at a time, due to an issue with `xr.merge` and trying to read multiple variables at once.

### This notebook uses several python packages

The watermark package shows the version number used to help others recreate this environment.

In [1]:
import os
import time # Want finer control than %time allows

import cftime

import xarray as xr
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.colors as colors
import cmocean

import cartopy
import cartopy.crs as ccrs

import esmlab
import xpersist as xp

import intake
import intake_esm
import ncar_jobqueue
from dask.distributed import Client
from pint import UnitRegistry

# Add new units to UnitRegistry
units = UnitRegistry()
units.define('gram N = mol / 14 = gN')
units.define('gram C = mol / 12 = gC')
units.define('year = 365 day = yr')

%load_ext watermark
%watermark -a "Mike Levy" -d -iv -m -g -h

xarray        0.14.0
matplotlib    3.1.2
cmocean       2.0
esmlab        2019.4.27.post55
intake        0.5.3
cftime        1.0.3.4
numpy         1.17.3
xpersist      0.0.post25
intake_esm    2019.10.15.post40
ncar_jobqueue 2019.10.16.1
pandas        0.25.3
cartopy       0.17.0
Mike Levy 2020-01-17 

compiler   : GCC 7.3.0
system     : Linux
release    : 3.10.0-693.21.1.el7.x86_64
machine    : x86_64
processor  : x86_64
CPU cores  : 72
interpreter: 64bit
host name  : casper02
Git hash   : a0586fe3e0aba173c3539f55a58d7a273030c3e7


#### Spin up a dask cluster

Some of these computations take a while

In [2]:
cluster = ncar_jobqueue.NCARCluster(project='P93300606')
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://128.117.181.208:43143  Dashboard: https://jupyterhub.ucar.edu/dav/user/mlevy/proxy/8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


In [3]:
# Start with just 2 tasks

cluster.scale(2)

### Read the intake_esm datastores

The `intake_esm` package is used to help identify which files belong in each experiment.
The `get_var_from_catalog()` function is a wrapper to read specific files.

In [4]:
catalogs = dict()
catalogs['cesm2'] = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign-cesm2-cmip6-timeseries.json')

#cesm1 = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip5_NOT_CMORIZED.json')
catalogs['cesm1'] = intake.open_esm_datastore('/glade/work/mlevy/intake-esm-collection/json/glade-cesm1-cmip5-timeseries.json')

### Define our experiments

In [5]:
# Process for updating intake-esm catalog
#       1. download all data from HPSS via get_ocn_cmip5_files.sh
#       2. rm /glade/u/home/mlevy/.intake_esm/collections/CESM1-CMIP5.nc
#       3. regenerate it via Anderson's legacy intake-esm
#       4. re-run build intake collections notebook
#       5. commit change to .csv.gz in /glade/work/mlevy/intake-esm-collection/csv.gz/
# NOTE: steps 2-5 can be done with notebooks/intake-esm-collection-defs/rebuild.sh

vars = [
        'photoC_TOT_zint_100m', 'photoC_diat_zint_100m',
        'photoC_TOT_zint', 'photoC_diat_zint',
        'POC_FLUX_100m', 'CaCO3_FLUX_100m',
        'diaz_Nfix', 'NOx_FLUX', 'NHy_FLUX', 'DENITRIF',
        'SedDenitrif', 'DON_RIV_FLUX', 'DONr_RIV_FLUX',
        'FG_CO2', 'O2' ,
        'O2_under_thres' # add a thres dimension corresponding to limits
       ]
# experiments is a list of experiments to compute values for
experiments = dict()
experiments['cesm1'] = ['cesm1_PI',
                        'cesm1_PI_esm',
                        'cesm1_hist',
                        'cesm1_hist_esm',
                       ]
               # CESM 2
experiments['cesm2'] = ['cesm2_PI',
                        'cesm2_hist',
                        'cesm2_SSP1-2.6',
                        'cesm2_SSP2-4.5',
                        'cesm2_SSP3-7.0',
                        'cesm2_SSP5-8.5',
                       ]

# experiment_longnames defines the table headers
experiment_longnames={'cesm1_PI' : 'preindustrial (CESM1)',
                      'cesm1_PI_esm' : 'preindustrial (CESM1, BPRP)',
                      'cesm1_hist' : '1981-2005 (CESM1)',
                      'cesm1_hist_esm' : '1990s (CESM1)',
                      'cesm1_RCP45' : 'RCP 4.5 2090s (CESM1)', # not available yet
                      'cesm1_RCP85' : 'RCP 8.5 2090s (CESM1)', # not available yet
                      'cesm2_PI' : 'preindustrial (CESM2)',
                      'cesm2_hist' : '1990-2014 (CESM2)',
                      'cesm2_SSP1-2.6' : 'RCP26 2090s (CESM2)',
                      'cesm2_SSP2-4.5' : 'RCP45 2090s (CESM2)',
                      'cesm2_SSP3-7.0' : 'RCP70 2090s (CESM2)',
                      'cesm2_SSP5-8.5' : 'RCP85 2090s (CESM2)'}

# experiment_dict determines which module version & intake data each experiment uses
experiment_dict = {'cesm1_PI' : ('cesm1', 'piControl'),
                   'cesm1_PI_esm' : ('cesm1', 'esm-piControl'),
                   'cesm1_hist' : ('cesm1', 'historical'),
                   'cesm1_hist_esm' : ('cesm1', 'esm-hist'),
                   'cesm2_PI' : ('cesm2', 'piControl'),
                   'cesm2_hist' : ('cesm2', 'historical'),
                   'cesm2_SSP1-2.6' : ('cesm2', 'SSP1-2.6'),
                   'cesm2_SSP2-4.5' : ('cesm2', 'SSP2-4.5'),
                   'cesm2_SSP3-7.0' : ('cesm2', 'SSP3-7.0'),
                   'cesm2_SSP5-8.5' : ('cesm2', 'SSP5-8.5')
                  }

In [6]:
def get_var_from_catalog(catalog, variable, exp):

    print(f'Reading {variable} from {exp}\n----\n')
    start = time.time()
    cesm1_exp = None
    cesm1_var = None
    cesm2_exp = None
    if experiment_dict[exp][0] == 'cesm1':
        cesm1_exp = experiment_dict[exp][1]
    elif experiment_dict[exp][0] == 'cesm2':
        cesm2_exp = experiment_dict[exp][1]
    else:
        print(f'WARNING: can not determine model version from {exp}')
        return(None)
    
    # Note some variable rename shenanigans to account for changes between CESM1 and CESM2
    if cesm1_exp:
        depth_100m = False
        if variable == 'CaCO3_FLUX_100m':
            cesm1_var = 'CaCO3_FLUX_IN'
        elif variable == 'POC_FLUX_100m':
            cesm1_var = 'POC_FLUX_IN'
        elif variable in ['photoC_diat_zint_100m', 'photoC_diat_zint']:
            cesm1_var = 'photoC_diat'
            depth_100m = (variable == 'photoC_diat_zint_100m') # Will use 150m for "full depth" integral
        elif variable in ['photoC_TOT_zint_100m', 'photoC_TOT_zint']:
            cesm1_var = ['photoC_sp', 'photoC_diat', 'photoC_diaz']
            depth_100m = (variable == 'photoC_TOT_zint_100m') # Will use 150m for "full depth" integral
        elif variable not in ['SedDenitrif', 'DON_RIV_FLUX', 'DONr_RIV_FLUX']:
            # for variables in above list, cesm1_var is None
            cesm1_var = variable

    if cesm1_exp and (cesm1_var is None):
        print(f'{variable} is not available in {exp}')
        return(None)

    if cesm1_exp and (cesm1_var is not None):
        if type(cesm1_var) == list:
            tmp_dataset = dict()
            for var_from_list in cesm1_var:
                print(f'Reading {var_from_list} to compute {variable}')
                dq = catalog.search(experiment=cesm1_exp, variable=var_from_list).to_dataset_dict(cdf_kwargs={'chunks':{'time': 1}})
                tmp_dataset[var_from_list] = _read_var_from_exp(dq, exp, f'ocn.{cesm1_exp}.pop.h', variable, var_from_list)
                if depth_100m and 'z_t_150m' in tmp_dataset[var_from_list][variable].dims:
                    tmp_da = tmp_dataset[var_from_list][variable].isel(z_t_150m=slice(0,10)).rename({'z_t_150m' : 'z_t_100m'})
                    tmp_dataset[var_from_list][variable] = tmp_da
            dataset = tmp_dataset[cesm1_var[0]]
            for var_from_list in cesm1_var[1:]:
                dataset[variable].data = dataset[variable].data + tmp_dataset[var_from_list][variable].data
        else:
            dq = catalog.search(experiment=cesm1_exp, variable=cesm1_var).to_dataset_dict(cdf_kwargs={'chunks':{'time': 1}})
            dataset = _read_var_from_exp(dq, exp, f'ocn.{cesm1_exp}.pop.h', variable, cesm1_var)
            if depth_100m and 'z_t_150m' in dataset[variable].dims:
                tmp_da = dataset[variable].isel(z_t_150m=slice(0,10)).rename({'z_t_150m' : 'z_t_100m'})
                dataset[variable] = tmp_da
    if cesm2_exp:
        dq = catalog.search(experiment=cesm2_exp, variable=variable).to_dataset_dict(cdf_kwargs={'chunks':{'time': 1}})
        dataset = _read_var_from_exp(dq, exp, f'ocn.{cesm2_exp}.pop.h', variable, cesm1_var)
    
    end = time.time()
    print(f'\nDone reading {variable} from {exp} in {np.round(end - start, 1)}s\n')
    return(dataset)

def _read_var_from_exp(dq, exp, stream, variable, cesm1_var):
    # Define dataset
    dataset_full = dq[stream]

    # Initialize dataset with only time-invariant fields
    keep_vars_no_time = ['REGION_MASK', 'z_t', 'z_t_150m', 'dz', 'TAREA', 'TLONG', 'TLAT', 'member_id', 'ctrl_member_id']
    dataset = dataset_full.drop([v for v in dataset_full.variables if v not in keep_vars_no_time]).isel(time=0)

    # Then add variable / cesm1_var with full time dimension
    keep_vars_with_time = ['time', 'time_bound'] + [variable, cesm1_var]
    dataset_full = dataset_full
    for var in keep_vars_with_time:
        if var in dataset_full and var not in dataset:
            dataset[var] = dataset_full[var]
    del(dataset_full)
    if variable not in dataset:
        dataset = dataset.rename({cesm1_var : variable})
        if variable in ['POC_FLUX_100m', 'CaCO3_FLUX_100m']:
            dataset = dataset.isel(z_t=10)  # 100m is top of 11th level, or z_t = 10 counting from 0
    if (cesm1_var != variable) and (cesm1_var in dataset):
        dataset = dataset.drop(cesm1_var)
    # Limit output to open ocean
#     dataset[variable] = dataset[variable].where(dataset['REGION_MASK'] > 0)
    # Include marginal seas
    dataset[variable] = dataset[variable].where(dataset['REGION_MASK'] != 0)

    return(dataset)

In [7]:
def _get_TAREA_and_dz(catalog, exp):
    cluster.scale(2)

    # Read in any 3D variable to get dataset containing TAREA and dz
    var = 'DENITRIF'

    full_ds = get_var_from_catalog(catalog, var, exp)
    ds = full_ds['dz'].to_dataset(name='dz')
    ds['TAREA'] = full_ds['TAREA']
    ds[var] = full_ds[var].isel(time=0, member_id=0)
    ds['active'] = ds[var].copy()
    ds['active'].data = np.logical_not(np.isnan(ds[var].data))
    del(ds[var])

    return(ds.compute())

In [8]:
%%time

# Set up units
integral_units = dict()
integral_units['area'] = dict()
integral_units['volume'] = dict()

total_volume = dict()
for model_version in experiments:
    for exp in experiments[model_version]:
        xp_func = xp.persist_ds(_get_TAREA_and_dz, name=f'{exp}_active', trust_cache=True)
        ds = xp_func(catalogs[model_version], exp)

        integral_units['area'][exp] = units[ds['TAREA'].attrs['units']]
        integral_units['volume'][exp] = integral_units['area'][exp] * units[ds['dz'].attrs['units']]

        # Compute total volume of ocean
        # Sum wgt over active ocean cells
        total_volume[exp] = (ds['active'] * ds['TAREA'] * ds['dz']).sum().values

cluster.scale(0)
print('\n----\n')

# Estimate total volume in ocean
# (surface of earth is 71% water, avg depth is 3.7 km)
est_depth = 4*np.pi*0.71*((6371.22*units['km'])**2)*(3.7*units['km'])
for exp in total_volume:
    print(f'Ocean volume in {exp}: {(total_volume[exp] * integral_units["volume"][exp]).to("L")}')
print(f'Estimated ocean volume: {est_depth.to("L")}')

assuming cache is correct
reading cached file: xpersist_cache/cesm1_PI_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm1_PI_esm_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm1_hist_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm1_hist_esm_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm2_PI_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm2_hist_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP1-2.6_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP2-4.5_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP3-7.0_active.nc
assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP5-8.5_active.nc

----

Ocean volume in cesm1_PI: 1.3251402862315105e+21 liter
Ocean volume in cesm1_PI_esm: 1.3251402862315105e+21 liter
Ocean volume in cesm1_hist: 1.3

### Individual Table Computations

In this section, we compute each of the requested values for each dataset

#### Net primary production (PgC/yr)

CESM1 doesn't have `photoC_TOT_zint`

#### Diatom primary production (%)

CESM1 doesn't have `photoC_diat_zint`

#### Sinking POC at 100 m (PgC/yr)

CESM1 doesn't have `POC_FLUX_100m`

#### Sinking CaCO3 at 100 m (PgC/yr)

CESM1 doesn't have `CaCO3_FLUX_100m`

#### Rain ratio (CaCO3/POC) 100 m

Missing necessary vars to compute

#### Nitrogen deposition (TgN/yr)

#### Denitrification (TgN/yr)

In [9]:
import datetime
def _debug_print(message):
    print(f'{str(datetime.datetime.now())}: {message}')

def _compute_global_average_and_resample(dataset, exp, variable, integral_units):
    unit_key = 'volume' if any(zdim in dataset[variable].dims for zdim in ['z_t_100m', 'z_t_150m', 'z_t']) else 'area'

    # 1) Compute global averages
    _debug_print('Calling esmlab.weighted_sum')
    wgts = dataset['TAREA']
    dims = ['nlat', 'nlon']
    if 'z_t_100m' in dataset[variable].dims:
        wgts = wgts * dataset['dz'].isel(z_t=slice(0,10))
        wgts = wgts.rename({'z_t' : 'z_t_100m'})
        dims.append('z_t_100m')
    elif 'z_t_150m' in dataset[variable].dims:
        wgts = wgts * dataset['dz'].isel(z_t=slice(0,15))
        wgts = wgts.rename({'z_t' : 'z_t_150m'})
        dims.append('z_t_150m')
    elif 'z_t' in dataset[variable].dims:
        wgts = wgts * dataset['dz']
        dims.append('z_t')
    glb_avg = esmlab.weighted_sum(dataset[variable], dim=dims, weights=wgts).to_dataset(name=variable)

    # 2) Resample to annual means
    _debug_print('Calling esmlab.resample')
    print(f'   ... computing for {exp}')
    glb_avg['time_bound'] = dataset['time_bound']
    ann_avg = esmlab.resample(glb_avg, freq='ann').compute()
    
    # store some unit metadata
    _debug_print('Determining units and returning')
    new_units = (units[dataset[variable].attrs['units']] * integral_units[unit_key][exp]).units
    ann_avg[variable].attrs['units'] = str(new_units)
    return ann_avg

def _compute_global_time_series_single_exp(catalog, exp, variable, integral_units):

    _debug_print('Getting data via intake-esm')
    cluster.scale(2)
    if variable == 'O2_under_thres':
        o2_thres = [5, 20, 60, 80]
        dataset = get_var_from_catalog(catalog, 'O2', exp)
        tmp_ann_avg = []
        for threshold in o2_thres:
            cluster.scale(12)
            dataset[variable] = xr.where(dataset['O2'] < threshold, 1, 0)
            dataset[variable].attrs['units'] = ''
            tmp_ann_avg.append(_compute_global_average_and_resample(dataset, exp, variable, integral_units))
        ann_avg = xr.concat(tmp_ann_avg, dim='o2_thres')
        ann_avg['o2_thres'] = o2_thres
    else:
        dataset = get_var_from_catalog(catalog, variable, exp)
        cluster.scale(12)
        ann_avg = _compute_global_average_and_resample(dataset, exp, variable, integral_units)

    return ann_avg

def compute_global_time_series(integral_units, variable, experiments, catalogs):
    new_units = dict()
    ann_avg = dict()

    print(f'Computing global average of {variable}...')
    for model_version in experiments:
        for exp in experiments[model_version]:
            # Compute global average (use xpersist to read from disk if available)
            xp_func = xp.persist_ds(_compute_global_time_series_single_exp, name=f'{exp}_{variable}', trust_cache=True)
            if (variable in ['SedDenitrif', 'DON_RIV_FLUX', 'DONr_RIV_FLUX']) and (model_version == 'cesm1'):
                continue
#             if (variable == 'O2_under_thres') and ('_PI' in exp):
#                 continue
            ann_avg[exp] = xp_func(catalogs[model_version], exp, variable, integral_units)

            # get new units right
            new_units[exp] = units[ann_avg[exp][variable].attrs['units']]

            print('')
    return ann_avg, new_units

In [10]:
%%time

new_units = dict()
ann_avg = dict()

for variable in vars:
    # I think this is ~60 minutes for 3D vars and 45 min for 2D vars?
    # (when using all 10 datasets)
    ann_avg[variable], new_units[variable] = compute_global_time_series(integral_units, variable, experiments, catalogs)

if 'O2' in new_units:
    new_units['O2_orig'] = new_units['O2'].copy()

Computing global average of photoC_TOT_zint_100m...
assuming cache is correct
reading cached file: xpersist_cache/cesm1_PI_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm1_PI_esm_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm1_hist_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm1_hist_esm_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm2_PI_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm2_hist_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP1-2.6_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP2-4.5_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpersist_cache/cesm2_SSP3-7.0_photoC_TOT_zint_100m.nc

assuming cache is correct
reading cached file: xpers

## Reduce Data Sets

Data has been reduced to annual means, but the netcdf files contain every year in the dataset.
For generating tables, we want to look at specific time periods.

In [11]:
# shut down cluster (everything done locally from here on out)
cluster.scale(0)

####  Define the time periods we will average over

This could be done earlier in the notebook, but I think it makes sense to wait until we have annual / global means.

In [12]:
# NOTE: 2090-01-01 0:00:00 is the time stamp on the Dec 2089 monthly average
#       So slice("2090", "2100") would actually return Dec 2090 - Nov 2099
#       Specifying a day mid-month gets us to Jan 2090 - Dec 2099 (the 2090s)
#       (this can be verified by looking at time bounds)
time_slices_SSP = slice("2090-01-15", "2100-01-15")

time_slices = dict()

# 200 year averages for CESM1 PI runs, per Lindsay et al 2014
# (He starts 30 years prior to branch point, so I will too)
time_slices['cesm1_PI'] = slice(120, 320) # cfunits doesn't years too far in past; this is 121-07-01 - 320-07-01
time_slices['cesm1_PI_esm'] = slice(320, 520) # cfunits doesn't years too far in past; this is 321-07-01 - 520-07-01
# For CESM2, going from 50 years prior to first historical branch point
#                  to 50 years after end of last historical member
# TODO: These dates should be computed automatically based on intake metadata!
time_slices['cesm2_PI'] = slice(550, 1070) # cfunits doesn't years too far in past; this is 551-07-01 - 1070-07-01

# Historical runs all use slightly different time periods
# Note: that the annual mean data is actually running from July 1st to June 30th
#       these slices were defined to work with monthly data, but pick up the correct years as well
time_slices['cesm1_hist'] = slice("1981-01-15", "2006-01-15") # per Lindsay et al 2014
time_slices['cesm1_hist_esm'] = slice("1990-01-15", "2000-01-15") # per Moore et al 2013
time_slices['cesm2_hist'] = slice("1990-01-15", "2015-01-15") # For our paper

# RCP runs use 2090s
time_slices['cesm1_RCP45'] = time_slices_SSP
time_slices['cesm1_RCP85'] = time_slices_SSP
time_slices['cesm2_SSP1-2.6'] = time_slices_SSP
time_slices['cesm2_SSP2-4.5'] = time_slices_SSP
time_slices['cesm2_SSP3-7.0'] = time_slices_SSP
time_slices['cesm2_SSP5-8.5'] = time_slices_SSP

In [13]:
# Verify time bounds for each experiment
for exp in ann_avg[vars[0]]:
    try:
        bounds = list(ann_avg[vars[0]][exp].sel(time=time_slices[exp]).time_bound.values[ind] for ind in [(0,0), (-1,-1)])
    except:
        bounds = list(ann_avg[vars[0]][exp].isel(time=time_slices[exp]).time_bound.values[ind] for ind in [(0,0), (-1,-1)])
    print(f'Experiment: {exp}\nRequested time bounds\n----\n{bounds}\n\n')

Experiment: cesm1_PI
Requested time bounds
----
[cftime.DatetimeNoLeap(121, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(321, 1, 1, 0, 0, 0, 0, 6, 1)]


Experiment: cesm1_PI_esm
Requested time bounds
----
[cftime.DatetimeNoLeap(321, 1, 1, 0, 0, 0, 0, 6, 1), cftime.DatetimeNoLeap(521, 1, 1, 0, 0, 0, 0, 3, 1)]


Experiment: cesm1_hist
Requested time bounds
----
[cftime.DatetimeNoLeap(1981, 1, 1, 0, 0, 0, 0, 0, 1), cftime.DatetimeNoLeap(2006, 1, 1, 0, 0, 0, 0, 4, 1)]


Experiment: cesm1_hist_esm
Requested time bounds
----
[cftime.DatetimeNoLeap(1990, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 0, 5, 1)]


Experiment: cesm2_PI
Requested time bounds
----
[cftime.DatetimeNoLeap(551, 1, 1, 0, 0, 0, 0, 5, 1), cftime.DatetimeNoLeap(1071, 1, 1, 0, 0, 0, 0, 0, 1)]


Experiment: cesm2_hist
Requested time bounds
----
[cftime.DatetimeNoLeap(1990, 1, 1, 0, 0, 0, 0, 2, 1), cftime.DatetimeNoLeap(2015, 1, 1, 0, 0, 0, 0, 6, 1)]


Experiment: cesm2_SSP1-2.6
Requested time bounds


#### Define the units to use in final table

Note that in the first cell of the notebook, we defined a year to be 365 days as well as `PgC` and `TgN` units.

In [14]:
# Define final units
PgC_per_year = 'PgC/yr'
TgN_per_year = 'TgN/yr'
uM = 'uM'

final_units = dict()
final_units['photoC_TOT_zint'] = PgC_per_year
final_units['photoC_diat_zint'] = PgC_per_year
final_units['photoC_TOT_zint_100m'] = PgC_per_year
final_units['photoC_diat_zint_100m'] = PgC_per_year
final_units['POC_FLUX_100m'] = PgC_per_year
final_units['CaCO3_FLUX_100m'] = PgC_per_year
final_units['diaz_Nfix'] = TgN_per_year
final_units['NOx_FLUX'] = TgN_per_year
final_units['NHy_FLUX'] = TgN_per_year
final_units['DENITRIF'] = TgN_per_year
final_units['SedDenitrif'] = TgN_per_year
final_units['DON_RIV_FLUX'] = TgN_per_year
final_units['DONr_RIV_FLUX'] = TgN_per_year
final_units['FG_CO2'] = PgC_per_year
final_units['O2'] = 'uM'
final_units['O2_under_thres'] = 'Pm * m^2'

#### Define labels for rows in each table

Also determine correct number of digits to write each value out to

In [15]:
# Define keys that will go into table columns

def O2_vol_keys(o2_thres):
    if o2_thres == 20:
        return f'OMZ volume (10$^1$$^5$ m$^3$; <20 $\mu$M)'
    return f'Volume (10$^1$$^5$ m$^3$) where O$_2$ <{o2_thres} $\mu$M)'

# SETTING UP NAMES FOR ALL TABLE KEYS
POC_key = f'Sinking POC at 100 m ({PgC_per_year})'
CaCO3_key = f'Sinking CaCO$_3$ at 100 m ({PgC_per_year})'
rain_key = f'Rain ratio (CaCO$_3$/POC) at 100 m'
NPP_key = f'Net primary production, full depth ({PgC_per_year})'
NPP_diat_key = f'Diatom primary production, full depth (%)'
NPP_100m_key = f'Net primary production, top 100m ({PgC_per_year})'
NPP_diat_100m_key = f'Diatom primary production, top 100m (%)'
Nfix_key = f'Nitrogen fixation ({TgN_per_year})'
Ndep_key = f'Nitrogen deposition ({TgN_per_year})'
denitrif_key = f'Water Column Denitrification ({TgN_per_year})'
denitrif2_key = f'Sediment Denitrification ({TgN_per_year})'
rivflux_key = f'Nitrogen River Flux ({TgN_per_year})'
Ncycle_key = f'N cycle imbalance* ({TgN_per_year})'
CO2_key = f'Air–sea CO2 flux ({PgC_per_year})'
O2_key = f'Mean ocean oxygen ($\mu$M)'

# Define rounding digit count here
rounding = dict()
rounding[POC_key] = 2
rounding[CaCO3_key] = 3
rounding[rain_key] = 3
rounding[NPP_key] = 1
rounding[NPP_diat_key] = 0
rounding[NPP_100m_key] = 1
rounding[NPP_diat_100m_key] = 0
rounding[Nfix_key] = 0
rounding[Ndep_key] = 1
rounding[denitrif_key] = 0
rounding[denitrif2_key] = 0
rounding[rivflux_key] = 0
rounding[Ncycle_key] = 0
rounding[CO2_key] = 2
rounding[O2_key] = 0
for o2_thres in [5, 20, 60, 80]:
    rounding[O2_vol_keys(o2_thres)] = 0

#### Average over all ensemble members and time (for proper time period)

In [16]:
def get_time_and_ensemble_mean(variable, ann_avg, exp, new_units, final_units):
    try:
        if exp in ['cesm1_PI', 'cesm1_PI_esm', 'cesm2_PI']:
            # Need isel instead of sel since PI slices are in index space rather than years
            ens_time_mean = (ann_avg[variable][exp][variable].isel(time=time_slices[exp]).mean('member_id')).mean('time').values
        else:
            ens_time_mean = (ann_avg[variable][exp][variable].sel(time=time_slices[exp]).mean('member_id')).mean('time').values
    except:
        print(f'   * Can not compute {variable} for {exp}')
        return('-')
    return((ens_time_mean * new_units[variable][exp]).to(final_units[variable]))

In [17]:
%%time

diagnostic_values = dict()
for model_version in experiments:
    for exp in experiments[model_version]:
        diagnostic_values[exp] = dict()
        # Compute each value by hand
        print(f'Computing 100m POC flux for {exp}')
        diagnostic_values[exp][POC_key] = get_time_and_ensemble_mean('POC_FLUX_100m', ann_avg, exp, new_units, final_units)

        print(f'Computing 100m CaCO3 flux for {exp}')
        diagnostic_values[exp][CaCO3_key] = get_time_and_ensemble_mean('CaCO3_FLUX_100m', ann_avg, exp, new_units, final_units)

        print(f'Computing 100m rain rate for {exp}')
        try:
            diagnostic_values[exp][rain_key] = (diagnostic_values[exp][CaCO3_key] /
                                                 diagnostic_values[exp][POC_key])
        except:
            print(f'   * Can not compute rain rate for {exp}')

        print(f'Computing full depth net primary production for {exp}')
        diagnostic_values[exp][NPP_key] = get_time_and_ensemble_mean('photoC_TOT_zint', ann_avg, exp, new_units, final_units)

        print(f'Computing full depth primary production from diatoms for {exp}')
        try:
            diagnostic_values[exp][NPP_diat_key] = 100*(get_time_and_ensemble_mean('photoC_diat_zint', ann_avg, exp, new_units, final_units) /
                                                        diagnostic_values[exp][NPP_key])
        except:
            print(f'   * Can not compute primary production from diatoms for {exp}')

        print(f'Computing top 100m net primary production for {exp}')
        diagnostic_values[exp][NPP_100m_key] = get_time_and_ensemble_mean('photoC_TOT_zint_100m', ann_avg, exp, new_units, final_units)

        print(f'Computing top 100m primary production from diatoms for {exp}')
        try:
            diagnostic_values[exp][NPP_diat_100m_key] = 100*(get_time_and_ensemble_mean('photoC_diat_zint_100m', ann_avg, exp, new_units, final_units) /
                                                        diagnostic_values[exp][NPP_100m_key])
        except:
            print(f'   * Can not compute primary production from diatoms for {exp}')

        print(f'Computing Nfixation for {exp}')
        diagnostic_values[exp][Nfix_key] = get_time_and_ensemble_mean('diaz_Nfix', ann_avg, exp, new_units, final_units)

        print(f'Computing Ndep for {exp}')
        diagnostic_values[exp][Ndep_key] = (get_time_and_ensemble_mean('NOx_FLUX', ann_avg, exp, new_units, final_units) +
                                            get_time_and_ensemble_mean('NHy_FLUX', ann_avg, exp, new_units, final_units))

        print(f'Computing Water Column Denitrif for {exp}')
        diagnostic_values[exp][denitrif_key] = get_time_and_ensemble_mean('DENITRIF', ann_avg, exp, new_units, final_units)

        print(f'Computing Sediment Denitrif for {exp}')
        diagnostic_values[exp][denitrif2_key] = get_time_and_ensemble_mean('SedDenitrif', ann_avg, exp, new_units, final_units)

        print(f'Computing Nitrogen River Flux for {exp}')
        diagnostic_values[exp][rivflux_key] = (get_time_and_ensemble_mean('DON_RIV_FLUX', ann_avg, exp, new_units, final_units) +
                                               get_time_and_ensemble_mean('DONr_RIV_FLUX', ann_avg, exp, new_units, final_units))

        print(f'Computing Nitrogen Cycle imbalance for {exp}')
        table_key = 'N cycle imbalance* (TgN yr$^{-1}$)'
        try:
            diagnostic_values[exp][Ncycle_key] = (diagnostic_values[exp][Ndep_key] +
                                                  diagnostic_values[exp][Nfix_key] -
                                                  diagnostic_values[exp][denitrif_key])
            try:
                diagnostic_values[exp][Ncycle_key] = (diagnostic_values[exp][Ncycle_key] -
                                                      diagnostic_values[exp][denitrif2_key] - 
                                                      diagnostic_values[exp][rivflux_key])
            except:
                print(f'   * No additional denitrification terms for {exp}')
                pass
        except:
            print(f'   * Can not compute Ncycle imbalance for {exp}')

        print(f'Computing air-sea CO2 Flux for {exp}')
        diagnostic_values[exp][CO2_key] = get_time_and_ensemble_mean('FG_CO2', ann_avg, exp, new_units, final_units)

        # Update O2 units to account for fact that we are dividing my total volume
        print(f'Computing O2 concentration for {exp}')
        try:
            new_units['O2'][exp] = new_units['O2_orig'][exp] / integral_units["volume"][exp]
            diagnostic_values[exp][O2_key] = get_time_and_ensemble_mean('O2', ann_avg, exp, new_units, final_units)/(total_volume[exp])
        except:
            print(f'   * Can not compute O2 concentration for {exp}')

        try:
            if exp in ann_avg['O2_under_thres']:
                for n, o2_thres in enumerate(ann_avg['O2_under_thres'][exp]['o2_thres'].data):
                    print(f'Computing volume where O2 < {o2_thres} uM for {exp}')
                    diagnostic_values[exp][O2_vol_keys(o2_thres)] = get_time_and_ensemble_mean('O2_under_thres', ann_avg, exp, new_units, final_units)[n]
        except:
            print(f'   * Can not compute O2 volumes under thresholds for {exp}')

        if exp != experiments[model_version][-1]:
            print('\n----\n')


Computing 100m POC flux for cesm1_PI
Computing 100m CaCO3 flux for cesm1_PI
Computing 100m rain rate for cesm1_PI
Computing full depth net primary production for cesm1_PI
Computing full depth primary production from diatoms for cesm1_PI
Computing top 100m net primary production for cesm1_PI
Computing top 100m primary production from diatoms for cesm1_PI
Computing Nfixation for cesm1_PI
Computing Ndep for cesm1_PI
Computing Water Column Denitrif for cesm1_PI
Computing Sediment Denitrif for cesm1_PI
   * Can not compute SedDenitrif for cesm1_PI
Computing Nitrogen River Flux for cesm1_PI
   * Can not compute DON_RIV_FLUX for cesm1_PI
   * Can not compute DONr_RIV_FLUX for cesm1_PI
Computing Nitrogen Cycle imbalance for cesm1_PI
   * No additional denitrification terms for cesm1_PI
Computing air-sea CO2 Flux for cesm1_PI
Computing O2 concentration for cesm1_PI
Computing volume where O2 < 5 uM for cesm1_PI
Computing volume where O2 < 20 uM for cesm1_PI
Computing volume where O2 < 60 uM for 

#### Actually make the tables

In [18]:
def make_table(diag_columns, test_exps):
    table_dict = dict()
    table_dict['Flux or Concentration'] = []
    for table_key in diag_columns:
        table_dict['Flux or Concentration'].append(table_key)
        for exp in test_exps:
            if experiment_longnames[exp] not in table_dict:
                table_dict[experiment_longnames[exp]] = []
            try:
                # Workaround to drop decimal place when rounding to nearest integer
                if exp != 'diff':
                    round_to = rounding[table_key]
                else:
                    round_to = 3
                rounded_val = np.round(diagnostic_values[exp][table_key].magnitude, round_to)
                if rounding[table_key] == 0:
                    rounded_val = np.int(rounded_val)
                table_dict[experiment_longnames[exp]].append(str(rounded_val))
                # Add asterisk denoting CESM1 integrals are 150m, not full depth
                if ('cesm1' in exp) and (table_key in [NPP_key, NPP_diat_key]):
                    table_dict[experiment_longnames[exp]][-1] = table_dict[experiment_longnames[exp]][-1] + '*'
            except:
                table_dict[experiment_longnames[exp]].append('-')
    return(table_dict)

In [19]:
if 'cesm1_PI_esm' in diagnostic_values:
    print('Comparison of cesm1_PI_esm')
#     let var = fg_co2[d=2]
#     show var var
#      VAR = FG_CO2[D=2]
#     list var_integral_PgC_year
#                  VARIABLE : (1E-9 * 12 * 1E-15 * 86400 * 365) * VAR_MUL_AREA[I=@SUM,J=@SUM]
#                  X        : 0.5 to 320.5
#                  Y        : 0.5 to 384.5
#                  ENSEMBLE : 0421
#              -0.02491
    print(f'{diagnostic_values["cesm1_PI_esm"][CO2_key].magnitude} (should be -0.02491)')

#     let var = POC_FLUX_IN_100m[d=2]
#     show var var
#      VAR = POC_FLUX_IN_100M[D=2]
#     list/prec=6 var_integral_PgC_year
#                  VARIABLE : (1E-9 * 12 * 1E-15 * 86400 * 365) * VAR_MUL_AREA[I=@SUM,J=@SUM]
#                  X        : 0.5 to 320.5
#                  Y        : 0.5 to 384.5
#                  ENSEMBLE : 0421
#               8.06490
    print(f'{diagnostic_values["cesm1_PI_esm"][POC_key]} (should be 8.06490)')

# let var = photoC_diat_zint_100m[d=2]+photoC_sp_zint_100m[d=2]+photoC_diaz_zint_100m[d=2]
# show var var
#  VAR = PHOTOC_DIAT_ZINT_100M[D=2]+PHOTOC_SP_ZINT_100M[D=2]+PHOTOC_DIAZ_ZINT_100M[D=2]
# list/prec=6 var_integral_PgC_year
#              VARIABLE : (1E-9 * 12 * 1E-15 * 86400 * 365) * VAR_MUL_AREA[I=@SUM,J=@SUM]
#              X        : 0.5 to 320.5
#              Y        : 0.5 to 384.5
#              ENSEMBLE : 0421
#           55.4878
    print(f'{diagnostic_values["cesm1_PI_esm"][NPP_100m_key]} (should be 55.4878)')
else:
    print('No comparisons done, since cesm1_PI_esm experiment not included')

Comparison of cesm1_PI_esm
-0.02490595251150479 (should be -0.02491)
8.064900172824604 petagram C / year (should be 8.06490)
55.4877664992607 petagram C / year (should be 55.4878)


In [20]:
# Keith L's table
rows = [CO2_key, NPP_100m_key, POC_key]
# Match number of digits in orginal paper
rounding[CO2_key] = 3
rounding[NPP_100m_key] = 2

# Add difference column
new_exp = 'diff'
diagnostic_values[new_exp] = dict()
experiment_longnames[new_exp] = 'Difference'
for key in rows:
    if ('cesm1_hist' in diagnostic_values) and ('cesm1_PI' in diagnostic_values):
        try:
            # If key has not been populated, we want a dash here
            diagnostic_values[new_exp][key] = diagnostic_values['cesm1_hist'][key] - diagnostic_values['cesm1_PI'][key]
        except:
            diagnostic_values[new_exp][key] = '-'
    else:
        diagnostic_values[new_exp][key] = '-'

pd.DataFrame(make_table(rows, ['cesm1_PI', 'cesm1_hist', new_exp]))

Unnamed: 0,Flux or Concentration,preindustrial (CESM1),1981-2005 (CESM1),Difference
0,Air–sea CO2 flux (PgC/yr),-0.024,1.774,1.798
1,"Net primary production, top 100m (PgC/yr)",55.55,55.73,0.173
2,Sinking POC at 100 m (PgC/yr),8.08,8.01,-0.075


In [21]:
# Keith M's original table
# We use a different set of preindustrial years
# Also, maybe he uses equal weighting for month -> year instead of number of days per month?

rounding[CO2_key] = 2
rounding[NPP_100m_key] = 1

test_exps = ['cesm1_PI_esm', 'cesm1_hist_esm', 'cesm1_RCP45', 'cesm1_RCP85']
diagnostic_columns = [NPP_key,
                      NPP_100m_key,
                      POC_key,
                      CaCO3_key,
                      rain_key,
                      Nfix_key,
                      Ndep_key,
                      denitrif_key,
                      Ncycle_key,
                      CO2_key,
                      NPP_diat_key,
                      NPP_diat_100m_key,
                      O2_key,
                      O2_vol_keys(20)
                     ]
pd.DataFrame(make_table(diagnostic_columns, test_exps))

Unnamed: 0,Flux or Concentration,"preindustrial (CESM1, BPRP)",1990s (CESM1),RCP 4.5 2090s (CESM1),RCP 8.5 2090s (CESM1)
0,"Net primary production, full depth (PgC/yr)",56.0*,56.5*,-,-
1,"Net primary production, top 100m (PgC/yr)",55.5,56.0,-,-
2,Sinking POC at 100 m (PgC/yr),8.06,8.06,-,-
3,Sinking CaCO$_3$ at 100 m (PgC/yr),0.758,0.751,-,-
4,Rain ratio (CaCO$_3$/POC) at 100 m,0.094,0.093,-,-
5,Nitrogen fixation (TgN/yr),177,174,-,-
6,Nitrogen deposition (TgN/yr),6.7,30.0,-,-
7,Water Column Denitrification (TgN/yr),190,193,-,-
8,N cycle imbalance* (TgN/yr),-6,10,-,-
9,Air–sea CO2 flux (PgC/yr),-0.02,2.19,-,-


In [22]:
# Updated table for our paper
rounding[CO2_key] = 2
rounding[NPP_100m_key] = 1

test_exps = ['cesm2_PI', 'cesm2_hist', 'cesm2_SSP1-2.6', 'cesm2_SSP2-4.5', 'cesm2_SSP3-7.0', 'cesm2_SSP5-8.5']
diagnostic_columns = [NPP_key,
                      NPP_100m_key,
                      POC_key,
                      CaCO3_key,
                      rain_key,
                      Nfix_key,
                      Ndep_key,
                      denitrif_key,
                      denitrif2_key,
                      rivflux_key,
                      Ncycle_key,
                      CO2_key,
                      NPP_diat_key,
                      NPP_diat_100m_key,
                      O2_key,
                      O2_vol_keys(20),
                      O2_vol_keys(5),
                      O2_vol_keys(60),
                      O2_vol_keys(80)
                     ]
pd.DataFrame(make_table(diagnostic_columns, test_exps))

Unnamed: 0,Flux or Concentration,preindustrial (CESM2),1990-2014 (CESM2),RCP26 2090s (CESM2),RCP45 2090s (CESM2),RCP70 2090s (CESM2),RCP85 2090s (CESM2)
0,"Net primary production, full depth (PgC/yr)",48.4,48.9,48.9,49.3,49.7,50.2
1,"Net primary production, top 100m (PgC/yr)",48.0,48.5,48.5,48.9,49.3,49.8
2,Sinking POC at 100 m (PgC/yr),7.0,7.07,6.84,6.86,6.89,6.73
3,Sinking CaCO$_3$ at 100 m (PgC/yr),0.769,0.769,0.805,0.806,0.804,0.813
4,Rain ratio (CaCO$_3$/POC) at 100 m,0.11,0.109,0.118,0.117,0.117,0.121
5,Nitrogen fixation (TgN/yr),242.0,244.0,263.0,271.0,281.0,287.0
6,Nitrogen deposition (TgN/yr),13.4,37.8,25.5,33.5,41.5,38.8
7,Water Column Denitrification (TgN/yr),185.0,192.0,234.0,248.0,256.0,265.0
8,Sediment Denitrification (TgN/yr),68.0,72.0,71.0,70.0,72.0,70.0
9,Nitrogen River Flux (TgN/yr),5.0,9.0,9.0,9.0,9.0,9.0
