# By-band _k_-Distribution Files

## Dependencies

`numpy` is installed in the Python environment at NERSC (`module load python`), but `xarray` is not, so the user must install the package on their own. `PIPPATH` is the assumed location. This notebook depends heavily on `xarray`. 

In [None]:
import os, sys

# "standard" install
import numpy as np

# directory in which libraries installed with conda are saved
PIPPATH = '{}/.local/'.format(os.path.expanduser('~')) + \
    'cori/3.7-anaconda-2019.10/lib/python3.7/site-packages'
PATHS = ['common', PIPPATH]
for path in PATHS: sys.path.append(path)

# user must do `pip install xarray` on cori (or other NERSC machines)
import xarray as XA

# common submodule
import utils

## Paths

`kFileNC` can point to any netCDF file that contains absorption coefficients for all _g_-points. Initially, we start with the reference _k_-distribution file from 4-Dec-2018, available in the [RTE-RRTMGP GitHub repository](https://github.com/earth-system-radiation/rte-rrtmgp/blob/master/rrtmgp/data/rrtmgp-data-lw-g256-2018-12-04.nc). This was placed in the E3SM project space at NERSC.

In [None]:
PROJECT = '/global/project/projectdirs/e3sm/pernak18/inputs/g-point-reduce'
kFileNC = '{}/rrtmgp-data-lw-g256-2018-12-04.nc'.format(PROJECT)
utils.file_check(kFileNC)

## Bandsplitting

1. Open the netCDF with the entire _k_-distribution in it
2. Loop over all bands and determine what _g_-point indices correspond to it
3. Loop over all netCDF variables and determine which ones have the `gpt` dimension, which is the only dimension that is modified
4. "Slice" the variables with the `gpt` dimensions so they only contain the portions of the _k_-distribution corresponding to a given band
5. Write modified **and** unmodified variables to a new output netCDF that specifies the band number

The end result is a netCDF for each band that contains only the parts of the variables that depend on _g_-points the correspond to the given band.

**Note**: if files of the same name for a given band exist. This is so we do not append newer data to older files. If the files exist and the user wants to retain them, it is recommended that the files be moved to a subdirectory or somewhere else on the disk.

We also want to store the weights for each _g_-point in these files. I am not sure where these originated, but I got them from Menno's optimization code. The weights are the same for each band.

In [None]:
weights = [
    0.1527534276, 0.1491729617, 0.1420961469, 0.1316886544, 
    0.1181945205, 0.1019300893, 0.0832767040, 0.0626720116, 
    0.0424925000, 0.0046269894, 0.0038279891, 0.0030260086, 
    0.0022199750, 0.0014140010, 0.0005330000, 0.0000750000
]
xaWeights = XA.DataArray(
    weights, dims={'gpt': range(len(weights))}, name='gpt_weights')

with XA.open_dataset(kFileNC) as kAllObj:
    gLims = kAllObj.bnd_limits_gpt
    ncVars = list(kAllObj.keys())
    dimStr = 'gpt'

    for iBand in kAllObj.bnd.values:
        # make a separate netCDF for each band
        outNC = 'coefficients_lw_band{:02d}.nc'.format(iBand+1)

        # make sure we don't keep appending to an older file
        if os.path.exists(outNC): os.remove(outNC)

        # determine which variables need to be parsed
        for ncVar in ncVars:
            # append to netCDF if it already exists, start the file if not
            modeNC = 'a' if os.path.exists(outNC) else 'w'

            ncDat = kAllObj[ncVar]

            if dimStr in kAllObj[ncVar].dims:
                # grab only the g-point information for this band
                # and convert to zero-offset
                i1, i2 = gLims[iBand].values-1
                ncDat = ncDat.isel(gpt=slice(i1, i2+1))
            # endif

            # write variable to output file
            ncDat.to_netcdf(outNC, mode=modeNC)
        # end ncVar loop

        # write weights to output file
        xaWeights.to_netcdf(outNC, mode='a')

        print('Completed {}'.format(outNC))
    # end band loop
# endwith

## Validation

Check if `outNC` files for each band do what they are supposed to do and compare with original arrays. We are deliberately using a different technique for extracting data (heavier usage of `numpy` rather than `xarray` machinery). To validate, we apply a simple difference (error) calculation and print the range of the differences rather than any kind of plotting because some of the arrays have a number of dimensions that would make comprehensive plotting difficult.

In [None]:
import glob
bandFiles = sorted(glob.glob('coefficients_lw_band??.nc'))
gptVars = ['kmajor', 'plank_fraction']

for iBand, bFile in enumerate(bandFiles):
    print(os.path.basename(bFile))

    with XA.open_dataset(kFileNC) as broad, XA.open_dataset(bFile) as band:
        for gVar in gptVars:
            i1, i2 = np.array(broad.bnd_limits_gpt)[iBand]-1
            kBroad = np.array(broad[gVar][:,:,:,i1:i2+1])
            kBand = np.array(band[gVar])

            # diff won't work if arrays do not have consistent dimensions
            diff = kBand-kBroad

            # print min and max of diff
            dMin, dMax = utils.pmm(diff)
            print('{} difference range: ({}, {})'.format(
                gVar, dMin, dMax))
    # end with
    print()
# end band loop