## Generating a Table of Global Fluxes

This notebook reads in the CESM1 historical run (for CMIP5),
the ensemble of 11 CESM2 historical runs (for CMIP6),
and also the four SSP CESM2 ensembles (for CMIP6).
A table is generated containing values listed [issue #6](https://github.com/marbl-ecosys/cesm2-marbl/issues/6)


> * Net primary production (PgC/yr) (`photoC_TOT_zint`)
> * Diatom primary production (%)   (`photoC_diat_zint`)
> * Sinking POC at 100 m (PgC/yr)   (`POC_FLUX_100m`)
> * Sinking CaCO3 at 100 m (PgC/yr) (`CaCO3_FLUX_100m`)
> * Rain ratio (CaCO3/POC) 100 m    (ratio of two above)
> * Nitrogen fixation (TgN/yr)      (`diaz_Nfix`)
> * Nitrogen deposition (TgN/yr)    (`NOx_FLUX` + `NHy_FLUX`)
> * Denitrification (TgN/yr)        (`DENITRIF`)
> * N cycle imbalance = deposition + fixation - denitrification (TgN/yr) # deposition = N* [see Kristen's notebook -- Biological Diagnostics?]
> * Air–sea CO2 flux (PgC yr21)     (`FG_CO2`)
> * Mean ocean oxygen (mmol/m^3)    (`O2`)
> * Volume where O2 <80 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <60 mmol/m^3 (10^15 m^3) # based on others
> * Volume where O2 <5 mmol/m^3 (10^15 m^3)  # based on others

Values will be computed one at a time, due to an issue with `xr.merge` and trying to read multiple variables at once.

### This notebook uses several python packages

The watermark package shows the version number used to help others recreate this environment.

In [None]:
%matplotlib inline
import os

import cftime

import xarray as xr
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.colors as colors
import cmocean

import cartopy
import cartopy.crs as ccrs

import esmlab

import intake
import intake_esm
import ncar_jobqueue
from dask.distributed import Client
import pint

%load_ext watermark
%watermark -a "Mike Levy" -d -iv -m -g -h

#### Spin up a dask cluster

Some of these computations take a while

In [None]:
cluster = ncar_jobqueue.NCARCluster(project='P93300606')
client = Client(cluster)
client

In [None]:
cluster.scale(4)

### Read the intake_esm datastores

The `intake_esm` package is used to help identify which files belong in each experiment.
The `get_var_from_catalogs()` function is a wrapper to read specific files.

In [None]:
catalogs = dict()
#cesm2 = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign_via_glade-cmip6_NOT_CMORIZED.json')
catalogs['cesm2'] = intake.open_esm_datastore('/glade/work/mlevy/intake-esm-collection/json/campaign-cesm2-cmip6-timeseries.json')

#cesm1 = intake.open_esm_datastore('/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip5_NOT_CMORIZED.json')
catalogs['cesm1'] = intake.open_esm_datastore('/glade/work/mlevy/intake-esm-collection/json/glade-cesm1-cmip5-timeseries.json')

In [None]:
# NOTE: 1991-01-01 0:00:00 is the time stamp on the Dec 1990 monthly average
#       So slice("1991", "2001") would actually return Dec 1990 - Nov 2000
#       Specifying a day mid-month gets us to Jan 1991 - Dec 2000
#       (this can be verified by looking at time bounds)
time_slices_hist = slice("1990-01-15", "2000-01-15")
time_slices_SSP = slice("2090-01-15", "2100-01-15")
time_slices = dict()

time_slices['cesm1_hist'] = time_slices_hist
time_slices['cesm2_hist'] = time_slices_hist
time_slices['cesm2_SSP1-2.6'] = time_slices_SSP
time_slices['cesm2_SSP2-4.5'] = time_slices_SSP
time_slices['cesm2_SSP3-7.0'] = time_slices_SSP
time_slices['cesm2_SSP5-8.5'] = time_slices_SSP

def get_var_from_catalogs(catalogs, variable):
    datasets = dict()
    
    dq = dict()
    # CESM1 is historical only, CESM2 also has SSPs
    dq['cesm1'] = catalogs['cesm1'].search(experiment=['historical'], variable=variable).to_dataset_dict(cdf_kwargs={'chunks':{'time': 48}})
    dq['cesm2'] = catalogs['cesm2'].search(experiment=['historical', 'SSP1-2.6', 'SSP2-4.5', 'SSP3-7.0', 'SSP5-8.5'], variable=variable).to_dataset_dict(cdf_kwargs={'chunks':{'time': 48}})

    # Define datasets
    datasets['cesm1_hist'] = dq['cesm1']['ocn.historical.pop.h']
    # UNCOMMENT LINES BELOW WHEN HAPPY WITH FULL TABLE
#     datasets['cesm2_hist'] = dq['cesm2']['ocn.historical.pop.h']
#     datasets['cesm2_SSP1-2.6'] = dq['cesm2']['ocn.SSP1-2.6.pop.h']
#     datasets['cesm2_SSP2-4.5'] = dq['cesm2']['ocn.SSP2-4.5.pop.h']
#     datasets['cesm2_SSP3-7.0'] = dq['cesm2']['ocn.SSP3-7.0.pop.h']
    datasets['cesm2_SSP5-8.5'] = dq['cesm2']['ocn.SSP5-8.5.pop.h']

    keep_vars = ['z_t', 'z_t_150m', 'dz', 'TAREA', 'TLONG', 'TLAT', 'time', 'time_bound', 'member_id', 'ctrl_member_id'] + [variable]
    for exp in datasets:
        datasets[exp] = datasets[exp].drop([v for v in datasets[exp].variables if v not in keep_vars]).sel(time=time_slices[exp])

    return(datasets)


In [None]:
from pint import UnitRegistry
units = UnitRegistry()

# Set up units
integral_units = dict()
integral_units['area'] = dict()
integral_units['volume'] = dict()

# Read in any variable to get dataset containing TAREA and z_t
tmp_data = get_var_from_catalogs(catalogs, 'IRON_FLUX')

for exp in tmp_data:
    integral_units['area'][exp] = units[tmp_data[exp]['TAREA'].attrs['units']]
    integral_units['volume'][exp] = integral_units['area'][exp] * units[tmp_data[exp]['z_t'].attrs['units']]

### Individual Table Computations

In this section, we compute each of the requested values for each dataset

#### Net primary production (PgC/yr)

CESM1 doesn't have `photoC_TOT_zint`

#### Diatom primary production (%)

CESM1 doesn't have `photoC_diat_zint`

#### Sinking POC at 100 m (PgC/yr)

CESM1 doesn't have `POC_FLUX_100m`

#### Sinking CaCO3 at 100 m (PgC/yr)

CESM1 doesn't have `CaCO3_FLUX_100m`

#### Rain ratio (CaCO3/POC) 100 m

Missing necessary vars to compute

#### Nitrogen deposition (TgN/yr)

#### Denitrification (TgN/yr)

In [None]:
all_data = dict()

# Process for updating intake-esm catalog
#       1. download all data from HPSS via get_ocn_cmip5_files.sh
#       2. rm /glade/u/home/mlevy/.intake_esm/collections/CESM1-CMIP5.nc
#       3. regenerate it via Anderson's legacy intake-esm
#       4. re-run build intake collections notebook
#       5. commit change to .csv.gz in /glade/work/mlevy/intake-esm-collection/csv.gz/

vars = ['diaz_Nfix', 'NOx_FLUX', 'NHy_FLUX', 'DENITRIF']
for var in vars:
    all_data[var] = get_var_from_catalogs(catalogs, var)

# Verify time bounds for each experiment
for exp in all_data[vars[0]]:
    bounds = list(all_data[vars[0]][exp].time_bound.values[ind] for ind in [(0,0), (-1,1)])
    print(f'Experiment: {exp}\nBounds\n----\n{bounds}\n\n')

In [None]:
def compute_global_averages(datasets, integral_units, variable):
    experiments = list(datasets[variable].keys())
    wgts = datasets[variable][experiments[0]]['TAREA'].isel(time=0)
    dims = ['nlat', 'nlon']
    unit_key = 'area'
    if 'z_t_150m' in datasets[variable][experiments[0]][variable].dims:
        wgts = wgts * datasets[variable][experiments[0]]['dz'].isel(time=0, z_t=slice(0,15))
        wgts = wgts.rename({'z_t' : 'z_t_150m'})
        dims.append('z_t_150m')
        unit_key = 'volume'
    elif 'z_t' in datasets[variable][experiments[0]][variable].dims:
        wgts = wgts * datasets[variable][experiments[0]]['dz'].isel(time=0)
        dims.append('z_t')
        unit_key = 'volume'
    glb_avg = dict()
    new_units = dict()
    for exp in experiments:
        glb_avg[exp] = esmlab.weighted_sum(datasets[variable][exp][variable], dim=dims, weights=wgts).to_dataset(name=variable)
        old_units = units[datasets[variable][exp][variable].attrs['units']]
        new_units[exp] = old_units*integral_units[unit_key][exp]
        glb_avg[exp]
    return glb_avg, new_units

In [None]:
ann_avg = dict()
new_units = dict()
for variable in all_data:
    glb_avgs, new_units[variable] = compute_global_averages(all_data, integral_units, variable)
    ann_avg[variable] = dict()
    for exp in glb_avgs:
        glb_avgs[exp]['time_bound'] = all_data[variable][exp]['time_bound']
        ann_avg[variable][exp] = esmlab.resample(glb_avgs[exp], freq='ann')

## Reduce Data Sets

The following table shows global averages (also averaged over specified time slices)

In [None]:
# Define final units
final_units = dict()
final_units['DENITRIF'] = 'Tg/year'
final_units['NOx_FLUX'] = 'Tg/year'
final_units['NHy_FLUX'] = 'Tg/year'
final_units['diaz_Nfix'] = 'Tg/year'

# Define conversion factors
conversions = dict()

### TODO: RUN CONVERSIONS BY MATT
# Final units are TgN / yr
gN = 14 * units['g'] / units['mol']
conversions['DENITRIF'] = gN
conversions['NOx_FLUX'] = gN
conversions['NHy_FLUX'] = gN
conversions['diaz_Nfix'] = gN

In [None]:
ann_avg['diaz_Nfix'][exp]['diaz_Nfix'].mean(['time', 'member_id']).values

In [None]:
%%time

experiments = list(ann_avg[vars[0]].keys())
diagnostic_values = dict()
for exp in experiments:
    diagnostic_values[exp] = dict()
    # Compute each value by hand
    print(f'Computing Nfixation for {exp}')
    diagnostic_values[exp]['Nitrogen fixation (TgN yr$^{-1}$)'] =  \
        (ann_avg['diaz_Nfix'][exp]['diaz_Nfix'].mean(['time', 'member_id']).values *
         new_units['diaz_Nfix'][exp] *
         conversions['diaz_Nfix']
        ).to(final_units['diaz_Nfix'])
    
    print(f'Computing Ndep for {exp}')
    diagnostic_values[exp]['Nitrogen deposition (TgN yr$^{-1}$)'] = \
        (ann_avg['NOx_FLUX'][exp]['NOx_FLUX'].mean(['time', 'member_id']).values *
         new_units['NOx_FLUX'][exp] *
         conversions['NOx_FLUX']
        ).to(final_units['NOx_FLUX']) + \
        (ann_avg['NHy_FLUX'][exp]['NHy_FLUX'].mean(['time', 'member_id']).values *
         new_units['NHy_FLUX'][exp] *
         conversions['NHy_FLUX']
        ).to(final_units['NHy_FLUX'])

    print(f'Computing Denitrif for {exp}')
    diagnostic_values[exp]['Denitrification (TgN yr$^{-1}$)'] = \
        (ann_avg['DENITRIF'][exp]['DENITRIF'].mean(['time', 'member_id']).values *
         new_units['DENITRIF'][exp] *
         conversions['DENITRIF']
        ).to(final_units['DENITRIF'])
    
    print(f'Computing Nitrogen Cycle imbalance for {exp}')
    diagnostic_values[exp]['N cycle imbalance* (TgN yr$^{-1}$)'] = diagnostic_values[exp]['Nitrogen deposition (TgN yr$^{-1}$)'] + \
                                                                   diagnostic_values[exp]['Nitrogen fixation (TgN yr$^{-1}$)'] - \
                                                                   diagnostic_values[exp]['Denitrification (TgN yr$^{-1}$)']

In [None]:
# Fill a dict with (data, units) tuple [unit conversion comes later]
table_dict = dict()
diagnostic_columns = ['Gross primary production (PgC yr$^{-1}$)',
                      'Sinking POC at 100 m (PgC yr$^{-1}$)',
                      'Sinking CaCO$_3$ at 100 m (PgC yr$^{-1}$)',
                      'Rain ratio (CaCO$_3$/POC) at 100 m',
                      'Nitrogen fixation (TgN yr$^{-1}$)',
                      'Nitrogen deposition (TgN yr$^{-1}$)',
                      'Denitrification (TgN yr$^{-1}$)',
                      'N cycle imbalance* (TgN yr$^{-1}$)',
                      'Air–sea CO2 flux (PgC yr$^{-1}$)',
                      'Diatom primary production (%)',
                      'Mean ocean oxygen ($\mu$M)',
                      'OMZ volume (10$^{16}$ m$^3$; <20 $\mu$M)'
                     ]
experiment_longnames={'cesm1_hist' : '1990s (CESM1)', 'cesm2_SSP5-8.5' : 'RCP85 2090s (CESM2)'}

table_dict['Diagnostic'] = []
for variable in diagnostic_columns:
    table_dict['Diagnostic'].append(variable)
    for exp in experiments:
        if experiment_longnames[exp] not in table_dict:
            table_dict[experiment_longnames[exp]] = []
        try:
            table_dict[experiment_longnames[exp]].append(diagnostic_values[exp][variable].magnitude)
        except:
            table_dict[experiment_longnames[exp]].append('-')

pd.DataFrame(table_dict)