# Compare two GEOS-Chem Classic datasets

## Overview of this Notebook

* Import dependencies
* Define functions for plotting (eventually will be in gcpy)
* Define data directories and global variables
* Compare restart files (if needed)
* Quick plots: species concentrations
* Create PDFs: species concentrations
* Create PDFs: other diagnostic collections (example: Emissions)
* Create PDFs: new species categories and lumped variables
* Include bookmarks in PDFs for easier navigation
* Read bpch diagnostics

To Do: 
* Update plotting functions to handle missing time dimensions so can plot bpch dataset variables
* Convert bpch dataset names to netcdf names using the external dictionary we developed
* Discuss species lists with Matt Evans
* Fill in the rest of the category and species dictionaries
* Determine additional categories and their consitutents to include in benchmark (e.g. emissions, AOD)
* Create dictionary mapping between species and fixed range to plot for differences

## Import dependencies

In [12]:
import os
import numpy as np
import xarray as xr
import cubedsphere as cs
import xesmf as xe
import xbpch
import shutil
import gcpy as gc
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import ticker
from matplotlib.backends.backend_pdf import PdfPages
from matplotlib.colors import ListedColormap
from cartopy import crs
from cartopy.mpl.geoaxes import GeoAxes
from PyPDF2 import PdfFileWriter, PdfFileReader

# Enable auto-reloading modules
%load_ext autoreload
%autoreload 2

%matplotlib inline
import warnings; warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'cubedsphere'

## Define functions for plotting (eventually will be in gcpy)

### Single level, GC clasic only

Notes: The two GC files must be at the same grid resolution. You can use this function to plot interactively or to generate a multi-page pdf of plots. It takes a list of variable names to plot for a single collection only. You can plot for any level and any time slice in the file. By default the colorbars for the plots will have the same range, but you can turn this feature off. Also by default the colorbar of the fractional difference between the model outputs will be limited to +/-2, but you can change this as well via the passed parameters.

In [13]:
def compare_single_level(refdata, refstr, devdata, devstr, llres, varlist=None, ilev=0, itime=0, savepdf=False, 
                         pdfname='map.pdf', match_cbar=True, normalize_by_area=False, check_units=True):
    
    # If no varlist is passed, plot all (surface only for 3D)
    if varlist == None:
        [varlist, commonvars2D, commonvars3D] = compare_varnames(refdata, devdata)
        print('Plotting all common variables (surface only if 3D)')
    n_var = len(varlist)

    # Get lat-lon grid
    llgrid = gc.core.make_grid_LL(llres)

    # Get lat/lon extents
    [minlon, maxlon] = [min(llgrid['lon_b']), max(llgrid['lon_b'])]
    [minlat, maxlat] = [min(llgrid['lat_b']), max(llgrid['lat_b'])]
    
    # Create pdf (if saving)
    if savepdf:
        print('\nCreating {} for {} variables'.format(pdfname,n_var))
        pdf = PdfPages(pdfname)

    # Loop over variables
    for ivar in range(n_var):
        if savepdf: print('{} '.format(ivar), end='')
        varname = varlist[ivar]
        
        # Do some checks: dimensions and units
        varndim_ref = refdata[varname].ndim
        varndim_dev = devdata[varname].ndim
        if check_units: 
            assert varndim_ref == varndim_dev, 'Dimensions do not agree for {}!'.format(varname)
        units_ref = refdata[varname].units
        units_dev = devdata[varname].units
        if check_units: 
            assert units_ref == units_dev, 'Units do not match for {}!'.format(varname)
            
        # if normalizing by area, adjust units to be per m2, and adjust title string
        units = units_ref
        varndim = varndim_ref
        subtitle_extra = ''
                    
        # Slice the data
        if varndim == 4: 
            ds_ref = refdata[varname].isel(time=itime,lev=ilev)
            ds_dev = devdata[varname].isel(time=itime,lev=ilev)
        elif varndim == 3: 
            ds_ref = refdata[varname].isel(time=itime)
            ds_dev = devdata[varname].isel(time=itime)
            
        # if normalizing by area, transform on the native grid and adjust units and subtitle string
        exclude_list = ['WetLossConvFrac','Prod_','Loss_']
        if normalize_by_area and not any(s in varname for s in exclude_list):
            ds_ref.values = ds_ref.values / refdata['AREAM2'].values
            ds_dev.values = ds_dev.values / devdata['AREAM2'].values
            units = '{} m-2'.format(units)
            subtitle_extra = ', Normalized by Area'
            
        # Regrid the slices (skip for gchp vs gchp for now)
        vmin_ref = ds_ref.min()
        vmin_dev = ds_dev.min()
        vmin_cmn = np.min([vmin_ref, vmin_dev])
        vmax_ref = ds_ref.max()
        vmax_dev = ds_dev.max()
        vmax_cmn = np.max([vmax_ref, vmax_dev])
        if match_cbar: [vmin, vmax] = [vmin_cmn, vmax_cmn]
        
        # Create 3x2 figure
        figs, ((ax0, ax1), (ax2, ax3), (ax4, ax5)) = plt.subplots(3, 2, figsize=[12,14], 
                                                      subplot_kw={'projection': crs.PlateCarree()})
        # Give the figure a title
        offset = 0.96
        fontsize=25
        if varndim == 4:
            if ilev == 0: levstr = 'Surface'
            elif ilev == 22: levstr = '500 hPa'
            else: levstr = 'Level ' +  str(ilev-1)
            figs.suptitle('{}, {}'.format(varname,levstr), fontsize=fontsize, y=offset)
        elif varndim == 3: 
            figs.suptitle('{}'.format(varname), fontsize=fontsize, y=offset)
        else:
            print('varndim is 2 for {}! Must be 3 or 4.'.format(varname))   
        
        # Subplot (0,0): Ref
        ax0.coastlines()
        if not match_cbar: [vmin, vmax] = [vmin_ref, vmax_ref]
        plot0 = ax0.imshow(ds_ref, extent=(minlon, maxlon, minlat, maxlat), 
                           cmap=gc.plot.WhGrYlRd,vmin=vmin, vmax=vmax)
        ax0.set_title('{} (Ref){}\n{}'.format(refstr,subtitle_extra,llres)) 
        cb = plt.colorbar(plot0, ax=ax0, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.1 or (vmax-vmin) > 100:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        
        # Subplot (0,1): Dev
        ax1.coastlines()
        if not match_cbar: [vmin, vmax] = [vmin_dev, vmax_dev]
        plot1 = ax1.imshow(ds_dev, extent=(minlon, maxlon, minlat, maxlat), 
                           cmap=gc.plot.WhGrYlRd,vmin=vmin, vmax=vmax)
        ax1.set_title('{} (Dev){}\n{}'.format(devstr,subtitle_extra,llres)) 
        cb = plt.colorbar(plot1, ax=ax1, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.1 or (vmax-vmin) > 100:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
            
        # Calculate difference, get dynamic range, configure colorbar, use gray for NaNs
        absdiff = np.array(ds_dev) - np.array(ds_ref)
        diffabsmax = max([np.abs(np.nanmin(absdiff)), np.abs(np.nanmax(absdiff))])        
        cmap = mpl.cm.RdBu_r
        cmap.set_bad(color='gray')
            
        # Subplot (1,0): Difference, dynamic range
        [vmin, vmax] = [-diffabsmax, diffabsmax]
        ax2.coastlines()
        plot2 = ax2.imshow(absdiff, extent=(minlon, maxlon, minlat, maxlat), 
                           cmap=cmap,vmin=vmin, vmax=vmax)
        ax2.set_title('Difference\nDev - Ref, Dynamic Range') 
        cb = plt.colorbar(plot2, ax=ax2, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.1 or (vmax-vmin) > 100:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        if np.all(absdiff==0): 
            cb.ax.set_xticklabels(['0.0', '0.0', '0.0', '0.0', '0.0']) 
        
        # Subplot (1,1): Difference, restricted range
        [vmin, vmax] = [-1e-9, 1e-9] # Placeholder; need a dictional for this (species: limit)
        ax3.coastlines()
        plot3 = ax3.imshow(absdiff, extent=(minlon, maxlon, minlat, maxlat), 
                           cmap=cmap,vmin=vmin, vmax=vmax)
        ax3.set_title('Difference\nDev - Ref, Fixed Range') 
        cb = plt.colorbar(plot3, ax=ax3, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.1 or (vmax-vmin) > 100:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        if np.all(absdiff==0): 
            cb.ax.set_xticklabels(['0.0', '0.0', '0.0', '0.0', '0.0'])

        # Calculate fractional difference, get dynamic range, set 0/0 to Nan
        fracdiff = (np.array(ds_dev) - np.array(ds_ref)) / np.array(ds_ref)
        fracdiff[(ds_dev==0) & (ds_ref==0)] = np.nan
        fracdiffabsmax = max([np.abs(np.nanmin(fracdiff)), np.abs(np.nanmax(fracdiff))])
        
        # Subplot (2,0): Fractional Difference, full dynamic range
        [vmin, vmax] = [-fracdiffabsmax, fracdiffabsmax]
        ax4.coastlines()
        plot4 = ax4.imshow(fracdiff, extent=(minlon, maxlon, minlat, maxlat), vmin=vmin, vmax=vmax, cmap=cmap)
        ax4.set_title('Fractional Difference\n(Dev-Ref)/Ref, Dynamic Range') 
        cb = plt.colorbar(plot4, ax=ax4, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.1 or (vmax-vmin) > 100:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
            if np.all(absdiff==0): 
                cb.ax.set_xticklabels(['0.0', '0.0', '0.0', '0.0', '0.0'])
        cb.set_label(units)  
        
        # Subplot (2,1): Fractional Difference, restricted
        [vmin, vmax] = [-2, 2]
        #[vmin, vmax] = [-0.5, 2] # doesn't work with this colorbar. Need to customize one. Already in gamap?
                                  # Switch to this if change to ratios (dev/ref)
        ax5.coastlines()
        plot5 = ax5.imshow(fracdiff, extent=(minlon, maxlon, minlat, maxlat),
                           cmap=cmap,vmin=vmin, vmax=vmax)
        ax5.set_title('Fractional Difference\n(Dev-Ref)/Ref, Fixed Range') 
        cb = plt.colorbar(plot5, ax=ax5, orientation='horizontal', pad=0.10)
        if np.all(absdiff==0): 
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
            cb.ax.set_xticklabels(['0.0', '0.0', '0.0', '0.0', '0.0'])
        cb.set_label(units) 
            
        if savepdf:    
            pdf.savefig(figs)
            plt.close(figs)
            
    if savepdf: pdf.close()

### Zonal mean, GC classic only

Many of the same features available for plotting a single level above are also available for this function.

In [14]:
def compare_zonal_mean(refdata, refstr, devdata, devstr, llres, varlist=None, itime=0, savepdf=False, 
                       pdfname='zonalmean.pdf', match_cbar=True, normalize_by_area=False, check_units=True ):

    # If no varlist is passed, plot all 3D variables in the dataset
    if varlist == None:
        [commonvars, commonvars2D, varlist] = compare_varnames(refdata, devdata)
        print('Plotting all 3D variables')
    n_var = len(varlist)
    
    # Get lat-lon grid
    llgrid = gc.core.make_grid_LL(llres)
    
    # Universal plot setup
    xtick_positions = np.arange(-90,91,30)
    xticklabels = ['{}$\degree$'.format(x) for x in xtick_positions]
    ytick_positions = np.arange(0,61,20)
    yticklabels = [str(y) for y in ytick_positions]
    
    # Create pdf (if saving)
    if savepdf:
        print('\nCreating {} for {} variables'.format(pdfname, n_var))
        pdf = PdfPages(pdfname)

    # Loop over variables
    for ivar in range(n_var):
        if savepdf: print('{} '.format(ivar), end='')
        varname = varlist[ivar]
        
        # Do some checks: dimensions and units
        varndim_ref = refdata[varname].ndim
        varndim_dev = devdata[varname].ndim
        nlev = 72
        assert varndim_ref == varndim_dev, 'Dimensions do not agree for {}!'.format(varname)
        units_ref = refdata[varname].units
        units_dev = devdata[varname].units
        assert units_ref == units_dev, 'Units do not match for {}!'.format(varname)
        
        # Set plot extent
        extent=(-90,90,0,nlev)
        
        # if normalizing by area, adjust units to be per m2, and adjust title string
        units = units_ref
        varndim = varndim_ref
        subtitle_extra = ''
        
        # Slice the data
        ds_ref = refdata[varname].isel(time=itime)
        ds_dev = devdata[varname].isel(time=itime)

        # if normalizing by area, transform on the native grid and adjust units and subtitle string
        exclude_list = ['WetLossConvFrac','Prod_','Loss_']
        if normalize_by_area and not any(s in varname for s in exclude_list):
            ds_ref.values = ds_ref.values / refdata['AREAM2'].values[np.newaxis,:,:]
            ds_dev.values = ds_dev.values / devdata['AREAM2'].values[np.newaxis,:,:]
            units = '{} m-2'.format(units)
            subtitle_extra = ', Normalized by Area'
            
        # Calculate zonal mean of the regridded data
        zm_ref = ds_ref.mean(axis=2)
        zm_dev = ds_dev.mean(axis=2)
            
        # Get min and max for colorbar limits
        [vmin_ref, vmax_ref] = [zm_ref.min(), zm_ref.max()]
        [vmin_dev, vmax_dev] = [zm_dev.min(), zm_dev.max()]
        vmin_cmn = np.min([vmin_ref, vmin_dev])
        vmax_cmn = np.max([vmax_ref, vmax_dev])
        if match_cbar: [vmin, vmax] = [vmin_cmn, vmax_cmn]
        
        # Create 2x2 figure
        figs, ((ax0, ax1), (ax2, ax3), (ax4, ax5)) = plt.subplots(3, 2, figsize=[12,15.3], 
                                                      subplot_kw={'projection': crs.PlateCarree()})
        # Give the page a title
        offset = 0.96
        fontsize=25
        figs.suptitle('{}, Zonal Mean'.format(varname), fontsize=fontsize, y=offset)

        # Subplot 0: Ref
        if not match_cbar: [vmin, vmax] = [vmin_ref, vmax_ref]
        plot0 = ax0.imshow(zm_ref, cmap=gc.plot.WhGrYlRd, extent=extent, vmin=vmin, vmax=vmax)
        ax0.set_title('{} (Ref){}\n{}'.format(refstr, subtitle_extra, llres ))
        ax0.set_aspect('auto')
        ax0.set_xticks(xtick_positions)
        ax0.set_xticklabels(xticklabels)
        ax0.set_yticks(ytick_positions)
        ax0.set_yticklabels(yticklabels)
        cb = plt.colorbar(plot0, ax=ax0, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.001 or (vmax-vmin) > 1000:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        
        # Subplot 1: Dev
        if not match_cbar: [vmin, vmax] = [vmin_dev, vmax_dev]
        plot1 = ax1.imshow(zm_dev, cmap=gc.plot.WhGrYlRd, extent=extent, vmin=vmin, vmax=vmax)
        ax1.set_title('{} (Dev){}\n{}'.format(devstr, subtitle_extra, llres))
        ax1.set_aspect('auto')
        ax1.set_xticks(xtick_positions)
        ax1.set_xticklabels(xticklabels)
        ax1.set_yticks(ytick_positions)
        ax1.set_yticklabels(yticklabels)
        cb = plt.colorbar(plot1, ax=ax1, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.001 or (vmax-vmin) > 1000:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        
        # Calculate zonal mean difference
        zm_diff = zm_dev - zm_ref
                
        # Subplot 2: Difference, dynamic range
        diffabsmax = max([np.abs(zm_diff.min()), np.abs(zm_diff.max())])
        [vmin, vmax] = [-diffabsmax, diffabsmax]
        plot2 = ax2.imshow(zm_diff, cmap='RdBu_r', extent=extent, vmin=vmin, vmax=vmax)
        ax2.set_title('Difference\nDev - Ref, Dynamic Range')
        ax2.set_aspect('auto')
        ax2.set_xticks(xtick_positions)
        ax2.set_xticklabels(xticklabels)
        ax2.set_yticks(ytick_positions)
        ax2.set_yticklabels(yticklabels)
        cb = plt.colorbar(plot2, ax=ax2, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.001 or (vmax-vmin) > 1000:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        
        # Subplot 3: Difference, restricted range
        [vmin, vmax] = [-1e-9, 1e-9] # need a dictional for this (species: limit)
        plot3 = ax3.imshow(zm_diff, cmap='RdBu_r', extent=extent, vmin=vmin, vmax=vmax)
        ax3.set_title('Difference\nDev - Ref, Fixed Range')
        ax3.set_aspect('auto')
        ax3.set_xticks(xtick_positions)
        ax3.set_xticklabels(xticklabels)
        ax3.set_yticks(ytick_positions)
        ax3.set_yticklabels(yticklabels)
        cb = plt.colorbar(plot3, ax=ax3, orientation='horizontal', pad=0.10)
        if (vmax-vmin) < 0.001 or (vmax-vmin) > 1000:
            cb.locator = ticker.MaxNLocator(nbins=4)
            cb.update_ticks()
        cb.set_label(units)
        
        # Zonal mean fractional difference
        zm_fracdiff = (zm_dev - zm_ref) / zm_ref
            
        # Subplot 4: Fractional Difference, dynamic range
        fracdiffabsmax = max([np.abs(zm_fracdiff.min()), np.abs(zm_fracdiff.max())])
        [vmin, vmax] = [-fracdiffabsmax, fracdiffabsmax]
        plot4 = ax4.imshow(zm_fracdiff, vmin=vmin, vmax=vmax, cmap='RdBu_r', extent=extent)
        ax4.set_title('Fractional Difference\n(Dev-Ref)/Ref, Dynamic Range')
        ax4.set_aspect('auto')
        ax4.set_xticks(xtick_positions)
        ax4.set_xticklabels(xticklabels)
        ax4.set_yticks(ytick_positions)
        ax4.set_yticklabels(yticklabels)
        cb = plt.colorbar(plot4, ax=ax4, orientation='horizontal', pad=0.10)
        cb.set_clim(vmin=vmin, vmax=vmax)
        cb.set_label('unitless')   
        
        # Subplot 5: Fractional Difference, restricted range
        [vmin, vmax] = [-2, 2]
        plot5 = ax5.imshow(zm_fracdiff, vmin=vmin, vmax=vmax, cmap='RdBu_r', extent=extent)
        ax5.set_title('Fractional Difference\n(Dev-Ref)/Ref, Fixed Range')
        ax5.set_aspect('auto')
        ax5.set_xticks(xtick_positions)
        ax5.set_xticklabels(xticklabels)
        ax5.set_yticks(ytick_positions)
        ax5.set_yticklabels(yticklabels)
        cb = plt.colorbar(plot5, ax=ax5, orientation='horizontal', pad=0.10)
        cb.set_clim(vmin=vmin, vmax=vmax)
        cb.set_label('unitless') 
            
        if savepdf:    
            pdf.savefig(figs)
            plt.close(figs)
            
    if savepdf: pdf.close()

## Define directories and global variables

In [15]:
# Shared high-level directory
testdir = '/Users/lizzielundgren/gc/data/benchmarking/1mo_prototype/1day_test'

# Ref and dev run output directories
refdir = os.path.join(testdir,'12.0.3_nc')
devdir = os.path.join(testdir,'12.1.0_nc')

# Ref and dev strings (for use in plots and messages)
refstr='12.0.3'
devstr='12.1.0'

# Set directory to store plots
plotsdir = os.path.join(testdir,'plots')

# Define cubed sphere resolutions (used for restart file comparison)
llres = '4x5'

# Check that paths exist
gc.core.check_paths(refdir, devdir)

NameError: name 'gc' is not defined

In [None]:
# Print Ref netcdf filenames
reffiles = [k for k in os.listdir(refdir) if '.nc' in k]
for k in reffiles:
    print(k)

In [None]:
# Print Dev netcdf filenames
devfiles = [k for k in os.listdir(devdir) if '.nc' in k]
for k in devfiles:
    print(k)

## Compare restart files (if needed)

In [None]:
#reffile_rst = os.path.join(refdir,'GEOSChem_restart.201607010000.nc')
#devfile_rst = os.path.join(devdir,'GEOSChem_restart.201607010000.nc')
#gc.core.check_paths(reffile_rst, devfile_rst)

In [None]:
#refdata_rst = xr.open_dataset(reffile_rst)
#devdata_rst = xr.open_dataset(devfile_rst)
#[commonvars, commonvars2D, commonvars3D] = gc.core.compare_varnames(refdata_rst, refstr, devdata_rst, devstr)

In [None]:
#gc.core.get_stats(refdata_rst, refstr, devdata_rst, devstr, 'SPC_CFC11')

In [None]:
#compare_zonal_mean( refdata_rst, refstr, devdata_rst, devstr, llres, varlist=['SPC_O3'] )

In [None]:
#compare_single_level( refdata_rst, refstr, devdata_rst, devstr, llres, varlist=['SPC_O3'] )

## Quick plots: species concentration

In [None]:
day = '20160701'
time = '0000'
collection = 'SpeciesConc'
[refdata, devdata] = gc.core.get_collection_data(refdir, devdir, collection, day, time)

In [None]:
[commonvars, commonvars1D, commonvars2D, commonvars3D] = gc.core.compare_varnames(refdata, refstr, devdata, devstr)

In [None]:
CFCs=[k for k in commonvars if 'CFC' in k]
print(CFCs)

In [None]:
varname='SpeciesConc_O3'
gc.core.compare_stats(refdata, refstr, devdata, devstr, varname)

In [None]:
compare_single_level( refdata, refstr, devdata, devstr, llres, varlist=[varname] )

In [None]:
compare_zonal_mean( refdata, refstr, devdata, devstr, llres, varlist=[varname] )

## Create PDFs: species concentrations

In [None]:
# Define subset of variables in files
#varlist = commonvars3D
varlist = ['SpeciesConc_O3']
#varlist = [k for k in commonvars3D if 'CFC' in k]

In [None]:
# Surface
#pdfname = os.path.join(plotsdir,'{}_Surface.pdf'.format(ref_collection))
pdfname = os.path.join(plotsdir,'{}_Surface.pdf'.format(varname))
#pdfname = os.path.join(plotsdir,'{}_Surface.pdf'.format('CFCs'))
compare_single_level( refdata, refstr, devdata, devstr, llres, varlist=varlist, ilev=0,
                     savepdf=True, pdfname=pdfname )

In [None]:
# Zonal mean
#pdfname = os.path.join(plotsdir,'{}_ZonalMean.pdf'.format(ref_collection))
pdfname = os.path.join(plotsdir,'{}_ZonalMean.pdf'.format(varname))
#pdfname = os.path.join(plotsdir,'{}_ZonalMean.pdf'.format('CFCs'))
compare_zonal_mean(refdata, refstr, devdata, devstr, llres, varlist=varlist, 
                   savepdf=True, pdfname=pdfname )

## Create PDFs: other diagnostic collections (example: Emissions)

In [None]:
# Inspect the available files again, but only show those that match
commonfiles = [k for k in reffiles if k in devfiles]
commonfiles

In [None]:
# Template for inspecting any collection:

# To specify file:
#day = {edit}
#time = {edit}
#collection = {edit}

# For retrieving the data:
#[refdata, devdata] = gc.core.get_collection_data(refdir, devdir, collection, day, time)

# For inspecting interactively:
#refdata
#devdata
#[commonvars, commonvars1D, commonvars2D, commonvars3D] = gc.core.compare_varnames(refdata, refstr, devdata, devstr)
# varname = {edit}
#gc.core.get_stats(refdata, refstr, devdata, devstr, varname)
# varlist = {edit}
#compare_single_level(refdata, refstr, devdata, devstr, llres, varlist=[varname] )
#compare_zonal_mean(refdata, refstr, devdata, devstr, llres, varlist=[varname] )

# For creating PDFs:
#prefix = {edit}
#pdfname = os.path.join(plotsdir,'{}_Surface.pdf'.format(prefix))
#compare_single_level(refdata, refstr, devdata, devstr, llres, varlist=varlist, ilev=0,
#                     savepdf=True, pdfname=pdfname )
#pdfname = os.path.join(plotsdir,'{}_ZonalMean.pdf'.format(prefix))
#compare_zonal_mean(refdata, refstr, devdata, devstr, varlist=varlist, llres, 
#                   savepdf=True, pdfname=pdfname )

# For other function options use shift-tab while hovering over the function name in a cell that calls it

In [None]:
# Example: Emissions
day = '20160701'
time = '0000'
collection = 'Emissions'
[refdata, devdata] = gc.core.get_collection_data(refdir, devdir, collection, day, time)

In [None]:
[commonvars, commonvars1D, commonvars2D, commonvars3D] = gc.core.compare_varnames(refdata, refstr, devdata, devstr)

In [None]:
varname = 'EmisACET_BioBurn'
compare_single_level(refdata, refstr, devdata, devstr, llres, varlist=[varname] )

## Create PDFs: new species categories and lumped variables

NOTE: More discussion is needed to finalize these lists. Several species are not currently outputted and several species that are outputted are not listed here.

Primary checks: O3, NOx (NO+NO2), CO, OH, AOD

Aerosols: SO4, NIT, NH4, DST1, DST2, DST3, DST3, OCPI, OCPO, BCPI, BCPO, SALC, SALA, SO2s, SOA,

Nitrogen: NO, NO2, NO3, N2O5, HNO2, HNO4, CH3O2NO2, HNO3, PAN, PPN, IPMN, N2O, NH4, NH3, NIT, 
NOx, NOy (gas), NOy (gas+aerosol), NHx

ROy: OH, HO2, RO2 (sum all peroxy), O(1D), H2O2, H, H2, H2O, O(3P)

Primary Organics
  HCs: CH4, C2H6, C2H4, C3H8, PRPE, ALK4, BENZ, TOL, XYLE, ISOP, Monoterpenes
  Others: GLYX, HCOOH, MAP, MBO, MOH, RCHO

Secondary Organics
  Ketone: ACET, MEK, MVK
  Aldehydes: ALD2, CH2O, HPALD
  Peroxides: CH3OOH, ISOPOOH
  Nitrates: ISOPN
  Epoxides: IEPOX
  Acids: ACTA
  Alcohols: MOH, EtOH 

Sulfur: SO2, DMS, OCS, SO4, SOx (SO2+SO4)

Iodine: I, I2, ICl, IBr, IO, HOI, HI, INO2, INO3, OIO, IxOy, AERI+ISALA+ISALC
CH3I, CH2I2, CH2ICl, CH2IBr, IOy (I+IO+OIO+x*IxOy), 
Iy (I+ 2*I2+ ICl+ IBr+ HOI+ HI+ INO2+ INO3+ OIO+ x*IxOy+AERI+ISALA+ISALC)

Bromine: Br, BrO, Br2, BrCl, HOBr, HBr, BrNO2, BrNO3, CH3Br, CH2Br2, CHBr3,
BrOx(Br+BrO), Bry(Br, BrO, Br2, BrCl, HOBr, HBr, BrNO2, BrNO3)

Chlorine: Cl, ClO, Cl2, HOCl, HCl, Cl2O2, ClOO, ClNO2, ClNO3, OClO,
CCl4, CH3Cl, CH2Cl2, CH3CCl3, SUM(CFC11, CFC12, CFC113, CFC114, CFC115),
SUM(H1211, H1302, H2402), SUM(HCFC123, HCFC141b, HCFC142b),
ClOx (Cl+ClO), Cly (Cl, ClO, 2*Cl2, HOCl, HCl, Cl2O2, ClOO, ClNO2, ClNO3, OClO)


In [None]:
# Load species conc collection
day = '20160701'
time = '0000'
collection = 'SpeciesConc'
[refdata, devdata] = gc.core.get_collection_data(refdir, devdir, collection, day, time)
spc_units = refdata['SpeciesConc_O3'].units 

In [None]:
[commonvars, commonvars1D, commonvars2D, commonvars3D] = gc.core.compare_varnames(refdata, refstr, devdata, devstr)

In [None]:
# Print list of species in file (skip non-species variables)
spcvars = [k for k in commonvars if 'SpeciesConc' in k]
print('{} species:'.format(len(spcvars)))
print([k.replace('SpeciesConc_','') for k in spcvars])

In [None]:
# Define function to add a new variable to a data set using known units, name, and existing vars to sum.
def add_species_to_dataset(ds, name, varstup, units, verbose=False, overwrite=False ):
    if key in ds.data_vars and overwrite:
        ds.drop(key)
    else:
        assert key not in ds.data_vars, '{} already in dataset!'.format(key)
    if verbose: print('Creating {}'.format(key))
    darr = ds[varstup[0][0]] * varstup[0][1]
    for i in range(len(varstup)):
        if verbose: print(' -> {}'.format(varstup[i][0]))
        if i==0: continue
        darr = darr + ds[varstup[i][0]]*varstup[i][1]
    darr.name = key
    darr.attrs['units'] = units
    ds = xr.merge([ds,darr])
    return ds

In [None]:
# Create dictionary defining constituents of each new species group to sum values for
spcdict = {'NOy': (('SpeciesConc_NO',1),('SpeciesConc_NO2',1))}
# Add more later

In [None]:
# Add the summed species variables to the ref and dev datasets
for key in spcdict:
    refdata = add_species_to_dataset(refdata, key, spcdict[key], spc_units, overwrite=True)
    devdata = add_species_to_dataset(devdata, key, spcdict[key], spc_units, overwrite=True)

In [None]:
# Create dictionary defining constituents of each category to create files for
categorydict = {'Primary': ('SpeciesConc_O3', 'SpeciesConc_CO','NOy'),
               'PrimaryOrgancsHCs': ('SpeciesConc_ALK4', 'SpeciesConc_ISOP', 'SpeciesConc_PRPE',
                                    'SpeciesConc_C3H8', 'SpeciesConc_C2H6', 'SpeciesConc_BENZ',
                                    'SpeciesConc_TOLU', 'SpeciesConc_XYLE', 'SpeciesConc_CH4')
               }
# Add more later

In [None]:
# Make a PDF for each category
for catkey in categorydict:
    pdfname = os.path.join(plotsdir,'{}_Surface.pdf'.format(catkey))
    compare_single_level(refdata, refstr, devdata, devstr, llres, varlist=categorydict[catkey], ilev=0,
                     savepdf=True, pdfname=pdfname )

## Include bookmarks in PDFs for easier navigation

In [None]:
# Create a new file with bookmarks for each of the category files just created
for catkey in categorydict:

    # Existing pdf
    pdfname_in = os.path.join(plotsdir,'{}_Surface.pdf'.format(catkey))
    pdfobj = open(pdfname_in,"rb")
    input = PdfFileReader(pdfobj)
    numpages = input.getNumPages()
    
    # Pdf write and new filename
    pdfname_out = os.path.join(plotsdir,'{}_Surface_with_bookmarks.pdf'.format(catkey))
    output = PdfFileWriter()

    # Loop over variables (pages) in the file, removing the diagnostic prefix
    varlist = [k.replace('SpeciesConc_','') for k in categorydict[catkey]]
    for i, varname in enumerate(varlist):
        output.addPage(input.getPage(i))
        output.addBookmark(varname,i)
        
    # Write to new file
    outputstream = open(pdfname_out,'wb')
    output.write(outputstream) 
    outputstream.close()
    
    # Replace the old file with the new
    os.rename(pdfname_out, pdfname_in)

## Read in bpch diagnostics

In [None]:
datafile = 'trac_avg.geosfp_4x5_benchmark.201607010000'

refstr = '12.0.3_bpch'
devstr = '12.1.0_bpch'

refdir_bp = os.path.join(testdir,refstr)
devdir_bp = os.path.join(testdir,devstr)
reftracerinfo_file = os.path.join(testdir,refstr,'tracerinfo.dat') 
refdiaginfo_file = os.path.join(testdir,refstr,'diaginfo.dat')
devtracerinfo_file = os.path.join(testdir,devstr,'tracerinfo.dat') 
devdiaginfo_file = os.path.join(testdir,devstr,'diaginfo.dat')

refdata = os.path.join(refdir_bp, datafile)
devdata = os.path.join(devdir_bp, datafile)
print('Checking bpch data files')
gc.core.check_paths(refdata, devdata)

print('\nChecking tracerinfo.dat files')
gc.core.check_paths(reftracerinfo_file, devtracerinfo_file)

print('\nChecking diaginfo.dat files')
gc.core.check_paths(refdiaginfo_file, devdiaginfo_file)


In [None]:
refbp = xbpch.open_bpchdataset(refdata, tracerinfo_file=reftracerinfo_file, diaginfo_file=refdiaginfo_file)
devbp = xbpch.open_bpchdataset(devdata, tracerinfo_file=devtracerinfo_file, diaginfo_file=devdiaginfo_file)
[commonvars, commonvars1D, commonvars2D, commonvars3D] = gc.core.compare_varnames(refbp, refstr, devbp, devstr)

<h2>Convert the GAMAP-style bpch diagnostic names to netCDF diagnostic names</h2>

We have created a function to convert the old bpch diagnostic names to the names used in the new GEOS-Chem netCDF diagnostic output.  The function uses a dictionary to specify each old name and the new name that should replace it.

NOTE: Only the relevant diagnostic names used for the 1-month benchmark output are included in this function.  At a later date we can add other diagnostic names (e.g. from the GEOS-Chem specailty simulations) to make the function more general.

In [None]:
def convert_bpch_names_to_netcdf_names(ds, inplace=True, verbose=False):

    '''
    Function to convert the non-standard bpch diagnostic names
    to names used in the GEOS-Chem netCDF diagnostic outputs.
    
    Arguments:
    ds      : The xarray Dataset object whose names are to be replaced.
    
    inplace : If inplace=True (which is the default setting), then 
              the variable names in ds will be renamed in-place.
              Otherwise a copy of ds will be created.

    verbose : Turns on verbose print output

    NOTE: Only the diagnostic names needed for the 1-month benchmark
    plots have been added at this time.  To make this a truly general
    tool, we can consider adding the diagnostic names for the GEOS-Chem
    specialtiy simulations later on.
    '''

    # Names dictionary (key = bpch id, value[0] = netcdf id,
    # value[1] = action to create full name using id)
    names = {'IJ_AVG_S_':         ['SpeciesConc',                   'append' ],
             'OD_MAP_S_OPD1550':  ['AODDust550nm_bin1',             'replace'],
             'OD_MAP_S_OPD2550':  ['AODDust550nm_bin2',             'replace'],
             'OD_MAP_S_OPD3550':  ['AODDust550nm_bin3',             'replace'],
             'OD_MAP_S_OPD4550':  ['AODDust550nm_bin4',             'replace'],
             'OD_MAP_S_OPD5550':  ['AODDust550nm_bin5',             'replace'],
             'OD_MAP_S_OPD6550':  ['AODDust550nm_bin6',             'replace'],
             'OD_MAP_S_OPD7550':  ['AODDust550nm_bin7',             'replace'],
             'OD_MAP_S_OPSO4550': ['AODHyg550nm_SO4',               'replace'],
             'OD_MAP_S_OPBC550':  ['AODHyg550nm_BCPI',              'replace'],
             'OD_MAP_S_OPOC550':  ['AODHyg550nm_OCPI',              'replace'],
             'OD_MAP_S_OPSSa550': ['AODHyg550nm_SALA',              'replace'],
             'OD_MAP_S_OPSSc550': ['AODHyg550nm_SALC',              'replace'],
             'OD_MAP_S_ODSLA':    ['AODStratLiquidAer550nm',        'replace'],
             'ACETSRCE_ACETbg':   ['EmisACET_DirectBio',            'replace'],
             'ACETSRCE_ACETmb':   ['EmisACET_MethylBut',            'replace'],
             'ACETSRCE_ACETmo':   ['EmissACET_Monoterp',            'replace'],
             'ACETSRCE_ACETop':   ['EmissACET_Ocean',               'replace'],
             'ANTHSRCE_':         ['Anthro',                        'append' ],
             'BC_ANTH_BLKC':      ['EmisBC_Anthro',                 'skip'   ],
             'BC_BIOB_BLKC':      ['EmisBC_BioBurn',                'skip'   ],
             'BC_BIOF_BLKC':      ['EmisBC_Biofuel',                'skip'   ],
             'BIOBSRCE_':         ['BioBurn',                       'append' ],
             'BIOFSRCE_':         ['Biofuel',                       'append' ],
             'BIOGSRCE_':         ['Biogenic',                      'append' ],
             'BXHGHT_S_':         ['Met',                           'append' ],
             'CHEM_L_S_OH':       ['OHconcAfterChem',               'replace'],
             'CHEM_L_S_HO2':      ['HO2concAfterChem',              'replace'],
             'CHEM_L_S_O1D':      ['O1DconcAfterChem',              'replace'],
             'CHEM_L_S_O':        ['O3PconcAfterChem',              'replace'],
             'CO__SRCE_COanth':   ['EmisCO_Anthro',                 'skip'   ],
             'CO__SRCE_CObb':     ['EmisCO_BioBurn',                'skip'   ],
             'CO__SRCE_CObf':     ['EmisCO_Biofuel',                'skip'   ],
             'CO__SRCE_COmono':   ['EmisCO_Monoterp',               'replace'],
             'CO__SRCE_COship':   ['EmisCO_Ship',                   'replace'],
             'CV_FLUX_':          ['CloudConvFlux',                 'append' ],
             'DAO_3D_S_':         ['Met',                           'append' ],
             'DAO_FLDS_PS_PBL':   ['Met_PBLH',                      'skip'   ],
             'DAO_FLDS_':         ['Met',                           'append' ],
             'DMS_BIOG_DMS':      ['EmisDMS_Ocean',                 'replace'],
             'DRYD_FLX_':         ['DryDep',                        'append' ],
             'DRYD_VEL_':         ['DryDepVel',                     'append' ],
             'DUST_SRC_':         ['TBD',                           'skip'   ],
             'DUSTSRCE_DST1':     ['EmisDST1_Natural',              'replace'],
             'DUSTSRCE_DST2':     ['EmisDST2_Natural',              'replace'],
             'DUSTSRCE_DST3':     ['EmisDST3_Natural',              'replace'],
             'DUSTSRCE_DST4':     ['EmisDST4_Natural',              'replace'],
             'DXYP_DXYP':         ['Met_AREAM2',                    'replace'],
             'ECIL_SRC_':         ['TBD',                           'skip'   ],
             'ECOB_SRC_':         ['TBD',                           'skip'   ],
             'EW_FLX_S_':         ['AdvFluxZonal',                  'append' ],
             'FJX_FLUX_':         ['TBD',                           'skip'   ],
             'IJ_24H_S_':         ['TBD',                           'skip'   ],
             'IJ_MAX_S_':         ['TBD',                           'skip'   ],
             'IJ_SOA_S_':         ['AerMass',                       'append' ],
             'INST_MAP_':         ['TBD',                           'skip'   ],
             'ISRPIA_S_ISORPH':   ['Chem_PHSAV',                    'replace'],
             'ISRPIA_S_ISORH+':   ['Chem_HPLUSSAV',                 'replace'],
             'ISRPIA_S_ISORH2O':  ['Chem_WATERSAV',                 'replace'],
             'JV_MAP_S_':         ['JNoon',                         'append' ],
             'LFLASH_':           ['TBD',                           'skip'   ],
             'NH3_ANTH_NH3':      ['EmisNH3_Anthro',                'skip'   ],
             'NH3_NATU_NH3':      ['EmisNH3_Natural',               'replace'],
             'NK_EMISS_':         ['TBD',                           'skip'   ],
             'MC_FRC_S_':         ['WetLossConvFrac',               'append' ],
             'NO_AC_S_NO':        ['EmisNO_Aircraft',               'replace'],
             'NO_AN_S_NO':        ['EmisNO_Anthro',                 'skip'   ],
             'NO_FERT_NO':        ['EmisNO_Fert',                   'replace'],
             'NO_LI_S_NO':        ['EmisNO_Lightning',              'replace'],
             'NO_SOIL_NO':        ['EmisNO_Soil',                   'replace'],
             'NS_FLX_S_':         ['AdvFluxMerid',                  'append' ],
             'OC_ANTH_ORGC':      ['EmisOC_Anthro',                 'replace'],
             'OC_LIMO_LIMO':      ['TBD',                           'skip'   ],
             'OC_MTPA_MTPA':      ['TBD',                           'skip'   ],
             'OC_MTPO_MTPO':      ['TBD',                           'skip'   ],
             'OC_SESQ_SESQ':      ['TBD',                           'skip'   ],
             'OCIL_SRC_':         ['TBD',                           'skip'   ],
             'OCOB_SRC_':         ['TBD',                           'skip'   ],
             'OD_MAP_S_OPD':      ['Met_OPTD',                      'replace'],
             'OD_MAP_S_CLDTOT':   ['Met_CLDF',                      'replace'],
             'OD_MAP_S_OPTD':     ['AODDust',                       'replace'],
             'OD_MAP_S_SD':       ['AerSurfAreaDust',               'replace'],
             'OD_MAP_S_HGSO4':    ['AerHygroscopicGrowth_SO4',      'replace'],
             'OD_MAP_S_HGBC':     ['AerHygroscopicGrowth_BCPI',     'replace'],
             'OD_MAP_S_HGOC':     ['AerHygroscopicGrowth_OCPI',     'replace'],
             'OD_MAP_S_HGSSa':    ['AerHygroscopicGrowth_SALA',     'replace'],
             'OD_MAP_S_HGSSc':    ['AerHygroscopicGrowth_SALC',     'replace'],
             'OD_MAP_S_SSO4':     ['AerSurfAreaHyg_SO4',            'replace'],
             'OD_MAP_S_SBC':      ['AerSurfAreaHyg_BCPI',           'replace'],
             'OD_MAP_S_SOC':      ['AerSurfAreaHyg_OCPI',           'replace'],
             'OD_MAP_S_SSSa':     ['AerSurfAreaHyg_SALA',           'replace'],
             'OD_MAP_S_SSSc':     ['AerSurfAreaHyg_SALC',           'replace'],
             'OD_MAP_S_SASLA':    ['AerSurfAreaStratLiquid',        'replace'],
             'OD_MAP_S_NDSLA':    ['AerNumDensityStratLiquid',      'replace'],
             'OD_MAP_S_ODSPA':    ['AODPolarStratCloud550nm',       'replace'],
             'OD_MAP_S_SASPA':    ['AerSurfAreaPolarStratCloud',    'replace'],
             'OD_MAP_S_NDSPA':    ['AerNumDensityStratParticulate', 'replace'],
             'OD_MAP_S_ISOPAOD':  ['AODSOAfromAqIsoprene550nm',     'replace'],
             'OD_MAP_S_AQAVOL':   ['AerAqueousVolume',              'replace'],
             'PBLDEPTH_':         ['TBD',                           'skip'   ],
             'PEDGE_S_PSURF':     ['TBD',                           'skip'   ],
             'PG_SRCE_':          ['TBD',                           'skip'   ],
             'PG_PP_':            ['TBD',                           'skip'   ],
             'PL_BC_S_BLKC':      ['ProdBCPIfromBCPO',              'replace'],
             'PL_OC_S_ORGC' :     ['ProdOCPIfromOCPO',              'replace'],
             'PL_OC_S_ASOA' :     ['Prodfrom',                      'skip'   ],
             'PL_OC_S_ISOA' :     ['Prodfrom',                      'skip'   ],
             'PL_OC_S_TSOA' :     ['Prodfrom',                      'skip'   ],
             'PL_SUL_S_SO2dms':   ['ProdSO2fromDMSandOH',           'replace'],
             'PL_SUL_S_SO2no3':   ['ProdSO2fromDMSandNO3',          'replace'],
             'PL_SUL_S_SO2tot':   ['ProdSO2fromDMS',                'replace'],
             'PL_SUL_S_MSAdms':   ['ProdMSAfromDMS',                'replace'],
             'PL_SUL_S_SO4gas':   ['ProdSO4fromGasPhase',           'replace'],
             'PL_SUL_S_SO4h2o2':  ['ProdSO4fromH2O2inCloud',        'replace'],
             'PL_SUL_S_SO4o3s':   ['ProdSO4fromO3s',                'replace'],
             'PL_SUL_S_SO4o3':    ['ProdSO4fromO3inCloud',          'replace'],
             'PL_SUL_S_SO4ss':    ['ProdSO4fromO3inSeaSalt',        'replace'],
             'PL_SUL_S_SO4dust':  ['ProdSO4fromOxidationOnDust',    'replace'],
             'PL_SUL_S_NITdust':  ['ProdNITfromHNO3uptakeOnDust',   'replace'],
             'PL_SUL_S_H2SO4dus': ['ProdSO4fromUptakeOfH2SO4g',     'replace'],
             'PL_SUL_S_HNO3ss':   ['LossHNO3onSeaSalt',             'replace'],
             'PL_SUL_S_SO4hobr':  ['ProdSO4fromHOBrInCloud',        'replace'],
             'PL_SUL_S_SO4sro3':  ['ProdSO4fromSRO3',               'replace'],
             'PL_SUL_S_SO4srhob': ['ProdSO4fromSRHObr',             'replace'],
             'PORL_L_S_PCO_CH4':  ['ProdCObyCH4',                   'skip'   ],
             'PORL_L_S_PCO_NMVO': ['ProdCObyNMVOC',                 'skip'   ],
             'PORL_L_S_PO3':      ['Prod_O3',                       'replace'],
             'PORL_L_S_PCO':      ['Prod_CO',                       'replace'],
             'PORL_L_S_PSO4':     ['Prod_SO4',                      'replace'],
             'PORL_L_S_POx':      ['Prod_Ox',                       'replace'],
             'PORL_L_S_LO3':      ['Loss_O3',                       'replace'],
             'PORL_L_S_LCO':      ['Loss_CO',                       'replace'],
             'PORL_L_S_LOx':      ['Loss_Ox',                       'replace'],
             'RADMAP_':           ['TBD',                           'skip'   ],
             'RN_SRCE_Rn':        ['EmisRn_Soil',                   'replace'],
             'RN_SRCE_Pb':        ['PbFromRnDecay',                 'replace'],
             'RN_SRCE_Be7':       ['EmisBe_Cosmic',                 'replace'],
             'RN_DECAY_':         ['RadDecay',                      'append' ],
             'SALTSRCE_SALA':     ['EmisSALA_Natural',              'replace'],
             'SALTSRCE_SALC':     ['EmisSALC_Natural',              'replace'],
             'SHIP_SSS_':         ['TBD',                           'skip'   ],
             'SO2_AC_S_SO2':      ['EmisSO2_Aircraft',              'replace'],
             'SO2_AN_S_SO2':      ['EmisSO2_Anthro',                'skip'   ],
             'SO2_EV_S_SO2':      ['EmisSO2_EVOL',                  'replace'],
             'SO2_NV_S_SO2':      ['EmisSO2_NVOL',                  'replace'],
             'SO2_SHIP_SO2':      ['EmisSO2_Ship',                  'replace'],
             'SO4_AN_S_SO4':      ['EmisSO4_Anthro',                'skip'   ],
             'SO4_BIOF_SO4':      ['EmisSO4_Biofuel',               'replace'],
             'SF_EMIS_':          ['TBD',                           'skip'   ],
             'SS_EMIS_':          ['TBD',                           'skip'   ],
             'THETA_S_THETA':     ['Met_THETA',                     'replace'],
             'TIME_TPS_TIMETROP': ['TBD',                           'skip'   ],
             'TMS_COND_':         ['TBD',                           'skip'   ],
             'TMS_COAG_':         ['TBD',                           'skip'   ],
             'TMS_NUCL_':         ['TBD',                           'skip'   ],
             'TMS_AQOX_':         ['TBD',                           'skip'   ],
             'AERO_FIX_':         ['TBD',                           'skip'   ],
             'TMS_SOA_':          ['TBD',                           'skip'   ],
             'TR_PAUSE_TP_HGHT':  ['Met_TropHt',                    'replace'],
             'TR_PAUSE_TP_LEVEL': ['Met_TropLev',                   'replace'],
             'TR_PAUSE_TP_PRESS': ['Met_TROPP',                     'replace'],
             'UP_FLX_S_':         ['AdvFluxVert',                   'append' ],
             'WETDCV_S_':         ['WetLossConv',                   'append' ],
             'WETDLS_S_':         ['WetLossLS',                     'append' ],
             'PG-PP_S_':          ['POPS',                          'append' ],
             'NH3_BIOB_NH3':      ['EmisNH3_BioBurn',               'skip'   ],
             'NH3_BIOF_NH3':      ['EmisNH3_Biofuel',               'skip'   ],
             'NO_BIOB_NO':        ['EmisNO_BioBurn',                'skip'   ],
             'NO_BIOF_NO':        ['EmisNO_Biofuel',                'skip'   ],
             'OC_BIOB_ORGC':      ['EmisOC_BioBurn',                'skip'   ],
             'OC_BIOF_ORGC':      ['EmisOC_Biofuel',                'skip'   ],
             'OC_BIOG_ORGC':      ['EmisOC_Biogenic',               'skip'   ],
             'SO2_BIOB_SO2':      ['EmisSO2_BioBurn',               'skip'   ],
             'SO2_BIOF_SO2':      ['EmisSO2_Biofuel',               'skip'   ]}

    # define some special variable to overwrite above
    special_vars = {'AerMassPM25'  : 'PM25',
                    'AerMassbiogOA': 'TotalBiogenicOA',
                    'AerMasssumOA' : 'TotalOA',
                    'AerMasssumOC' : 'TotalOC',
                    'AerMassBNO'   : 'BetaNO',
                    'AerMassOC'    : 'OC',
                    'Met_AIRNUMDE' : 'Met_AIRNUMDEN',
                    'Met_UWND'     : 'Met_U',
                    'Met_VWND'     : 'Met_V',
                    'Met_CLDTOP'   : 'Met_CLDTOPS',
                    'Met_GWET'     : 'Met_GWETTOP',
                    'Met_PRECON'   : 'Met_PRECCON',
                    'Met_PREACC'   : 'Met_PRECTOT',
                    'Met_PBL'      : 'Met_PBLH' }

    # Python dictionary for variable name replacement
    old_to_new = {}

    # Loop over all variable names in the data set
    for variable_name in ds.data_vars:

        # Check if name matches anything in dictionary. Give warning if not.
        oldid = ''
        newid = ''
        idaction = ''
        for key in names:
            if key in variable_name:
                if names[key][1] == 'skip':
                    # Verbose output
                    if verbose:
                        print("WARNING: skipping {}".format(key))
                else:
                    oldid = key
                    newid = names[key][0]
                    idaction = names[key][1]
                break

        # Go to the next line if no definition was found
        if oldid == '' or newid == '' or idaction == '':
            continue

        # If fullname replacement:
        if idaction == 'replace':
            oldvar = oldid
            newvar = newid

            # Update the dictionary of names with this pair
            old_to_new.update({variable_name : newvar})

        # For all the rest:
        else:
            linearr = variable_name.split("_")
            varstr = linearr[-1]
            oldvar = oldid + varstr

            # Most append cases just append with underscore
            if oldid in ['IJ_AVG_S_', 'RN_DECAY_', 'WETDCV_S_', 'WETDLS_S_', 'BXHGHT_S_', 'DAO_FLDS_', 
                         'DAO_3D_S_', 'PL_SUL_',   'CV_FLX_S_', 'EW_FLX_S_', 'NS_FLX_S_', 'UP_FLX_S_',
                         'MC_FRC_S_', 'JV_MAP_S_' ]:
     
                # Skip certain fields that will cause conflicts w/ netCDF
                if oldid in [ 'DAO_FLDS_PS_PBL', 'DAO_FLDS_TROPPRAW' ]:

                    # Verbose output
                    if verbose:
                        print( 'Skipping: {}'.format(oldid) )
                else:
                    newvar = newid + '_' +varstr

            # IJ_SOA_S_
            elif oldid == 'IJ_SOA_S_':
                newvar = newid + varstr
            
            # DRYD_FLX_, DRYD_VEL_
            elif 'DRYD_' in oldid:
                newvar = newid + '_' + varstr[:-2]

            # BIOBSRCE_, BIOFSRCE_, BIOGSRCE_. ANTHSRCE_
            elif oldid in ['BIOBSRCE_', 'BIOFSRCE_', 'BIOGSRCE_', 
                           'ANTHSRCE_' ]:
                newvar = 'Emis' + varstr +'_' + newid
            
            # If nothing found...
            else:
                
                # Verbose output
                if verbose:
                    print("WARNING: Nothing defined for: {}".format(variable_name))
                continue

          # Overwrite certain variable names
            if newvar in special_vars:
                newvar = special_vars[newvar]

            # Update the dictionary of names with this pair
            old_to_new.update({variable_name : newvar})

    # Verbose output
    if verbose:
        print("\nList of bpch names and netCDF names")
        for key in old_to_new:
            print("{} ==> {}".format(key.ljust(25),old_to_new[key].ljust(40)))

    # Rename the variables in the dataset
    if verbose:
        print( "\nRenaming variables in the data...")
    ds = ds.rename(name_dict=old_to_new, inplace=inplace)
    
    # Return the dataset
    return ds


NOTE: Temporarily point to Bob Yantosca's data files on Odyssey for testing.  This notebook appears to point to a data folder on Lizzie's Mac.  @Lizzie, you can remove this later once we merge the notebook.


In [None]:
# Bob Y. has the bpch files and *.dat files stored in this path
testdir = '/n/home/09/ryantosca/GC/python/data/bpch_examples'

# Ref data
refstr = 'Ref'
reftracerinfo_file = os.path.join(testdir, refstr, 'tracerinfo.dat') 
refdiaginfo_file = os.path.join(testdir, refstr, 'diaginfo.dat')
refdata = os.path.join(testdir, refstr, 'trac_avg.GC_12.0.0')

# Dev data
devstr = 'Dev'
devtracerinfo_file = os.path.join(testdir, devstr, 'tracerinfo.dat') 
devdiaginfo_file = os.path.join(testdir, devstr, 'diaginfo.dat')
devdata = os.path.join(testdir, devstr, 'trac_avg.GC_12.1.0')

# Make sure the bpch files exist
print('Checking bpch data files')
gc.core.check_paths(refdata, devdata)

# Make sure the tracerinfo.dat files exist
print('\nChecking tracerinfo.dat files')
gc.core.check_paths(reftracerinfo_file, devtracerinfo_file)

# Make sure the diaginfo.dat files exist
print('\nChecking diaginfo.dat files')
gc.core.check_paths(refdiaginfo_file, devdiaginfo_file)


Read the data files (@Lizzie, you can delete this, this is for testing)


In [None]:
# Read the "Ref" data
print("\nReading the 'Ref' file: {}".format(refdata))
refbp = xb.open_bpchdataset(filename=refdata,
                            tracerinfo_file=reftracerinfo_file,
                            diaginfo_file=refdiaginfo_file)

# Print contents
refbp

In [None]:
# Read the "Dev" data
print("\nReading the 'Dev' file: {}".format(devdata))
devbp = xb.open_bpchdataset(filename=devdata,
                            tracerinfo_file=devtracerinfo_file,
                            diaginfo_file=devdiaginfo_file)

# Print contents
devbp

Convert the diagnostic names to the new netCDF names

In [17]:
refbp = convert_bpch_names_to_netcdf_names(refbp)

# Show contents
refbp

NameError: name 'refbp' is not defined