# By-band _k_-Distribution Files

## Dependencies

`numpy` is installed in the Python environment at NERSC (`module load python`), but `xarray` is not, so the user must install the package on their own. `PIPPATH` is the assumed location. This notebook depends heavily on `xarray`. 

In [None]:
import os, sys

# "standard" install
import numpy as np

# directory in which libraries installed with conda are saved
PIPPATH = '{}/.local/'.format(os.path.expanduser('~')) + \
    'cori/3.7-anaconda-2019.10/lib/python3.7/site-packages'
sys.path.append(PIPPATH)

# user must do `pip install xarray` on cori (or other NERSC machines)
import xarray as XA

## Paths

`kFileNC` can point to any netCDF file that contains absorption coefficients for all _g_-points. Initially, we start with the reference _k_-distribution file from 4-Dec-2018, available in the [RTE-RRTMGP GitHub repository](https://github.com/earth-system-radiation/rte-rrtmgp/blob/master/rrtmgp/data/rrtmgp-data-lw-g256-2018-12-04.nc). This was placed in the E3SM project space at NERSC.

In [None]:
PROJECT = '/global/project/projectdirs/e3sm/pernak18/inputs/g-point-reduce'
kFileNC = '{}/rrtmgp-data-lw-g256-2018-12-04.nc'.format(PROJECT)

## Bandsplitting

1. Open the netCDF with the entire _k_-distribution in it
2. Loop over all bands and determine what _g_-point indices correspond to it
3. Loop over all netCDF variables and determine which ones have the `gpt` dimension, which is the only dimension that is modified
4. "Slice" the variables with the `gpt` dimensions so they only contain the portions of the _k_-distribution corresponding to a given band
5. Write modified **and** unmodified variables to a new output netCDF that specifies the band number

The end result is a netCDF for each band that contains only the parts of the variables that depend on _g_-points the correspond to the given band.

Note: if files of the same name for a given band exist. This is so we do not append newer data to older files. If the files exist and the user wants to retain them, it is recommended that the files be moved to a subdirectory or somewhere else on the disk.

In [None]:
with XA.open_dataset(kFileNC) as kAllObj:
    gLims = kAllObj.bnd_limits_gpt
    ncVars = list(kAllObj.keys())
    dimStr = 'gpt'

    for iBand in kAllObj.bnd.values:
        # make a separate netCDF for each band
        outNC = 'coefficients_lw_band{:02d}.nc'.format(iBand+1)

        # make sure we don't keep appending to an older file
        if os.path.exists(outNC): os.remove(outNC)

        # determine which variables need to be parsed
        for ncVar in ncVars:
            # append to netCDF if it already exists, start the file if not
            modeNC = 'a' if os.path.exists(outNC) else 'w'

            ncDat = kAllObj[ncVar]

            if dimStr in kAllObj[ncVar].dims:
                # grab only the g-point information for this band
                i1, i2 = gLims[iBand].values-1
                ncDat = ncDat.isel(gpt=slice(i1, i2+1))
            # endif

            ncDat.to_netcdf(outNC, mode=modeNC)
        # end ncVar loop

        print('Completed {}'.format(outNC))
    # end band loop
# endwith read