# Connection betwen variablity in PW and expected precipitation

Let $P$ be the precipitaiton and $Q$ be precipitable water. Then, the expected value of precip is
$$ E[P] = E_Q E_{P|Q} P.$$

Her, $E_{P|Q} P = f(Q)$ is a deterministic function of $Q$, which will assume remains unchanged with the distribution of $Q$. This would be the case for an ML parameterization trained on one dataset and applied to another. There is a large literature showing that mean precip depends exponentially on PW (or column relative humidity).



## Load data

In [None]:
from src.data import open_data, runs
from uwnet.xarray_interface import dataset_to_torch_dict
from torch.autograd import grad
import seaborn as sns

def select_tropics(x):
    return x.isel(y=slice(28,34))


model  = torch.load("../models/268/5.pkl")
ds = open_data('training')
pw_ng = (ds.layer_mass * ds.QT).sum('z')/1000

## Compute relationship

In [None]:
ds_with_pw = select_tropics(ds.assign(PW=pw_ng))
bins = pw_ng.quantile(np.linspace(0, 1, 200)).values

In [None]:
prec_df = ds_with_pw[['Prec', 'PW']].to_dataframe()
cut = pd.cut(prec_df.PW, list(bins))
bin_means = prec_df.groupby(by=cut).mean()

In [None]:
plt.plot(bin_means.PW, bin_means.Prec)

Let's compute the expected precipitation using the conditional expectation formula above.

In [None]:
def expected_precip(count):
    return (count * bin_means.Prec).sum()/count.sum()


def get_bin_counts(pw):
    return pw.groupby(pd.cut(pw, bins)).count()

print("Mean precipitation from bin calculation", expected_precip(get_bin_counts(prec_df.PW)))
print("Mean precipitation from average", float(prec_df.Prec.mean()))

We can see that the estimates of mean precip are nearly identical.

## Nudged Run

Now let's load the nudged run

In [None]:
nudge_2d = runs['nudge'].data_2d[['PW', 'NPNN']].pipe(select_tropics).rename({'NPNN': 'Prec'}).to_dataframe()

In [None]:
nudge_2d.PW.plot(kind='kde')
prec_df.PW.plot(kind='kde')

plt.legend(['Nudge', 'NG-Aqua'])

We can see the distribution of PW in the nudged run is shifted to the left has much less variance.

In [None]:
print("Predicted Mean precipitation from nudged PW", expected_precip(get_bin_counts(nudge_2d.PW)))
print("Mean precipitation from average", float(nudge_2d.Prec.mean()))

We see that most of the change in precipitation between the two runs is due to the shift changes in the distribution of PW. Intriguingly, there is only a small difference in the mean PW of these two:

In [None]:
print("Mean PW in nudging", nudge_2d.PW.mean())
print("Mean PW in NG-Aqua", prec_df.PW.mean())

This is less than 2% difference

In [None]:
print("Mean PW in nudging", nudge_2d.PW.var())
print("Mean PW in NG-Aqua", prec_df.PW.var())

On the other hand, there is a much larger relative decrease in the variance.

Tropical variance as a function of time in the NG-Aqua simulation

# Evolution of variance in nudged and debiased runs

In [None]:
pw_d = runs['debias'].data_2d.PW.sel(time=slice(100,110))
pw_n = runs['nudge'].data_2d.PW.sel(time=slice(100,110))

ds = xr.concat([pw_d, pw_n], ['debias', 'nudge'])


df = select_tropics(ds).mean(['x', 'y']).to_dataframe().reset_index()
df_var = select_tropics(ds).var(['x', 'y']).to_dataframe().reset_index()


sns.FacetGrid(df, hue="concat_dim").map(plt.plot, "time", "PW").add_legend()

plt.figure()
sns.FacetGrid(df_var, hue="concat_dim").map(plt.plot, "time", "PW").add_legend()

In the nudged run, the mean decreases quickly, probably owing to the biased neural network scheme. In both runs, the variance is drastically lower.

In [None]:
from scipy import signal

def periodogram_x(pw):

    x = pw.values
    f, x = signal.periodogram(x, axis=-1, fs=1/160)
    Pxx_mean = x.mean(axis=(0,1))
    plt.loglog(f, Pxx_mean)


def select_tropics(x):
    return x.isel(y=slice(28,34)).isel(time=slice(100, 110))





periodogram_x(ds_with_pw.PW)
periodogram_x(select_tropics(runs['nudge'].data_2d.PW))
periodogram_x(select_tropics(runs['debias'].data_2d.PW))
plt.gca().set_ylim(bottom=1e0, top=1e6)
plt.xlabel('f (1/km)')
plt.legend(['NG-Aqua', 'Nudge', 'Debias'])

This shows that the variablity is higher at all scales in the NG-Aqua simulation,  but especially small scales.