# Aggregate Regional and Monthly Statistics

This notebook aggregates statistics for both 1-D ($B_L$) and 2-D (joint $\text{SUBSAT}_\text{L}$-$\text{CAPE}_\text{L}$) binning schemes.

## Import Necessary Packages

In [1]:
import warnings
import numpy as np
import xarray as xr
from numba import jit
from datetime import datetime
warnings.filterwarnings('ignore')

## User-Defined Configurations

Define the user's name/email, specify the directory where the P-$B_L$ datasets are, and set the directory where the binned statistics datasets will be saved. Define regions of interest with their respective latitude/longitude bounds, and set binning parameters for $B_L$/$\mathrm{CAPE_L}$/$\mathrm{SUBSAT_L}$, along with the precipitation threhsold (in mm/day) distinguishing precipitating from non-precipitating regimes.

In [2]:
AUTHOR     = 'Savannah L. Ferretti'
EMAIL      = 'savannah.ferretti@uci.edu'
FILEDIR    = '/global/cfs/cdirs/m4334/sferrett/monsoon-pod/data/processed'
SAVEDIR    = '/global/cfs/cdirs/m4334/sferrett/monsoon-pod/data/processed'
REGIONS    = {'Eastern Arabian Sea':{'latmin':9.,'latmax':19.5,'lonmin':64.,'lonmax':72.}, 
              'Central India':{'latmin':18.,'latmax':24.,'lonmin':76.,'lonmax':83.},
              'Central Bay of Bengal':{'latmin':9.,'latmax':14.5,'lonmin':86.5,'lonmax':90.},
              'Equatorial Indian Ocean':{'latmin':5.,'latmax':10.,'lonmin':62.,'lonmax':67.5},
              'Konkan Coast':{'latmin':15.,'latmax':19.5,'lonmin':69.,'lonmax':72.5}} 
BINPARAMS  = {'bl':{'min':-0.6,'max':0.1,'width':0.0025},
              'cape':{'min':-70.,'max':20.,'width':1.},
              'subsat':{'min':-20.,'max':70.,'width':1.}}
PRTHRESH   = 0.25

## Load $P$-$B_L$ Datasets

Load in all three P-$B_L$ datasets from `FILEDIR`.

In [7]:
def load(filename,filedir=FILEDIR):
    filepath = f'{filedir}/{filename}'
    ds = xr.open_dataset(filepath)
    return ds.load()

In [8]:
hrimergprbl = load('HR_ERA5_IMERG_pr_bl_terms.nc')
lrimergprbl = load('LR_ERA5_IMERG_pr_bl_terms.nc')
lrgpcpprbl  = load('LR_ERA5_GPCP_pr_bl_terms.nc')

## Functions for Calculating Binned Statistics

The `get_region()` and `get_month()` functions subset a given P-$B_L$ dataset by the the region and month(s) of interest, respectively.

In [6]:
def get_region(data,key,regions=REGIONS):
    region = regions[key]
    return data.sel(lat=slice(region['latmin'],region['latmax']),lon=slice(region['lonmin'],region['lonmax']))

def get_month(data,months):
    if not isinstance(months,(list,tuple)):
        months = [months]
    monthmask = data.time.dt.month.isin(months)
    return data.sel(time=monthmask)

The `calc_binned_stats()` function performs statistical analysis using the 1-D ($B_L$), 2-D (joint $\mathrm{SUBSAT_L}$-$\mathrm{CAPE_L}$), or both binning schemes, as specificed by the `bintype` parameter. For each bin in the scheme(s), it calculates three statistics: the total count of data points (Q0 for 1-D, P0 for 2-D), the count of data points exceeding `PRTHRESH` (QE for 1-D, PE for 2-D), and the sum of precipitation values (Q1 for 1-D and P1 for 2-D). To optimize performance, the function utilizes [Numba's just-in-time](https://numba.readthedocs.io/en/stable/user/jit.html) compilation. The resulting statistics are then structured and stored in an Xarray.Dataset.

In [25]:
def get_bin_edges(key,binparams=BINPARAMS):
    varname  = binparams[key]
    return np.arange(varname['min'],varname['max']+varname['width'],varname['width'])
     
@jit(nopython=True)
def fast_1D_binned_stats(blidxs,prdata,nblbins,prthresh=PRTHRESH):
    Q0 = np.zeros(nblbins)
    QE = np.zeros(nblbins)
    Q1 = np.zeros(nblbins)
    for i in range(prdata.size):
        blidx = blidxs.flat[i]
        prval = prdata.flat[i]
        if 0<=blidx<nblbins and np.isfinite(prval):
            Q0[blidx] += 1
            Q1[blidx] += prval
            if prval>prthresh:
                QE[blidx] += 1
    return Q0,QE,Q1

@jit(nopython=True)
def fast_2D_binned_stats(capeidxs,subsatidxs,prdata,ncapebins,nsubsatbins,prthresh=PRTHRESH):
    P0 = np.zeros((nsubsatbins,ncapebins))
    PE = np.zeros((nsubsatbins,ncapebins))
    P1 = np.zeros((nsubsatbins,ncapebins))
    for i in range(prdata.size):
        capeidx = capeidxs.flat[i]
        subsatidx = subsatidxs.flat[i]
        prval = prdata.flat[i]
        if 0<=subsatidx<nsubsatbins and 0<=capeidx<ncapebins and np.isfinite(prval):
            P0[subsatidx,capeidx] += 1
            P1[subsatidx,capeidx] += prval
            if prval>prthresh:
                PE[subsatidx,capeidx] += 1
    return P0,PE,P1

def calc_binned_stats(data,bintype,binparams=BINPARAMS,prthresh=PRTHRESH,author=AUTHOR,email=EMAIL):
    if bintype not in ['1D','2D','both']:
        raise ValueError("Bin type must be '1D', '2D', or 'both'")
    ds = xr.Dataset()
    if bintype in ['1D','both']:
        blbins = get_bin_edges('bl',binparams)
        blidxs = ((data.bl.values-binparams['bl']['min'])/binparams['bl']['width']+0.5).astype(np.int32)
        Q0,QE,Q1 = fast_1D_binned_stats(blidxs,data.pr.values,blbins.size,prthresh)
        ds['Q0'] = ('bl',Q0)
        ds['QE'] = ('bl',QE)
        ds['Q1'] = ('bl',Q1)
        ds['bl'] = blbins
        ds.Q0.attrs = dict(long_name='Count of points in each bin')
        ds.QE.attrs = dict(long_name=f'Count of precipitating ( > {prthresh} mm/day) points in each bin')
        ds.Q1.attrs = dict(long_name='Sum of precipitation in each bin', units='mm/day')
        ds.bl.attrs = dict(long_name='Average buoyancy in the lower troposphere', units='m/s²')
    if bintype in ['2D','both']:
        capebins   = get_bin_edges('cape',binparams)
        subsatbins = get_bin_edges('subsat',binparams)
        capeidxs   = ((data.cape.values-binparams['cape']['min'])/binparams['cape']['width']-0.5).astype(np.int32)
        subsatidxs = ((data.subsat.values-binparams['subsat']['min'])/binparams['subsat']['width']-0.5).astype(np.int32)
        P0,PE,P1 = fast_2D_binned_stats(capeidxs,subsatidxs,data.pr.values,capebins.size,subsatbins.size,prthresh)
        ds['P0']     = (('subsat','cape'),P0)
        ds['PE']     = (('subsat','cape'),PE)
        ds['P1']     = (('subsat','cape'),P1)
        ds['cape']   = capebins
        ds['subsat'] = subsatbins
        ds.P0.attrs     = dict(long_name='Count of points in each bin')
        ds.PE.attrs     = dict(long_name=f'Count of precipitating ( > {prthresh} mm/day) points in each bin')
        ds.P1.attrs     = dict(long_name='Sum of precipitation in each bin', units='mm/day')
        ds.cape.attrs   = dict(long_name='Undilute plume buoyancy', units='K')
        ds.subsat.attrs = dict(long_name='Subsaturation in the lower free-troposphere', units='K')
    ds.attrs = dict(history=f'Created on {datetime.today().strftime("%Y-%m-%d")} by {author} ({email})')
    return ds

## Execute Binned Statistics Calculation

We execute the aforementioned workflow by region. The `process_by_region()` function creates monthy binned statistics datasets (for both 1-D and 2-D schemes), aggregates them by region, and merges them into a single Xarray.Dataset.

In [26]:
def process_by_region(ds,regions=REGIONS,binparams=BINPARAMS=prthresh=PRTHRESH,author=AUTHOR,email=EMAIL):
    regionstatslist = []
    for region in regions:
        regiondata     = get_region(ds,region,regions)
        monthstatslist = []
        for month in np.unique(ds.time.dt.month.values):
            monthdata  = get_month(regiondata,month)
            monthstats = calc_binned_stats(monthdata,'both',binparams,prthresh,author,email)
            monthstatslist.append(monthstats.expand_dims({'month':[month]}))
        regionstats = xr.concat(monthstatslist,dim='month')
        regionstatslist.append(regionstats.expand_dims({'region':[region]}))
    return xr.concat(regionstatslist,dim='region')

In [28]:
hrimergstats = process_by_region(hrimergprbl)
lrimergstats = process_by_region(lrimergprbl)
lrgpcpstats  = process_by_region(lrgpcpprbl)

## Save Statistics Datasets

Save each binned statistics dataset a netCDF file to `SAVEDIR`.

In [29]:
def save(ds,filename,savedir=SAVEDIR):
    filepath = f'{savedir}/{filename}'
    ds.to_netcdf(filepath)

In [30]:
save(hrimergstats,'HR_ERA5_IMERG_binned_stats.nc')
save(lrimergstats,'LR_ERA5_IMERG_binned_stats.nc')
save(lrgpcpstats,'LR_ERA5_GPCP_binned_stats.nc')