## Seasonal cycle analysis
We suspect that the seasonality of runoff will shift over time in many basins, peaking earlier in the hydrological year.  This would also have an effect on the 3-month SPEI we have been computing.

This notebook aims to plot the seasonal cycle of runoff in a case study basin, here the INDUS.  We will read in the runoff aggregated to basin scale by Finn Wimberly.

13 Oct 2023 | EHU
- Update 15 Mar 2024: attempt a grouped box plot for the three models' cycles, for all basins

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from datetime import date
import collections
import datetime
import itertools
import json
import os


## Generic the filepath to the main data folder
# fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/Runoff-intercomparison/BasinAggregated-FW/RGI 13/'
fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/Runoff-intercomparison/BasinAggregated-FW/RGI 11/' ## check an unaffected basin
fpath = '/Volumes/GoogleDrive/.shortcut-targets-by-id/1M3W4MT2CRgIZULUT5TXC3gDeFyKkA9DX/Runoff/RGI 13/' ## check updated outputs


#All of the climate models used
modelnames_short = ['BCC-CSM2-MR',
                    'MPI-ESM1-2-HR',
                    'MRI-ESM2-0',
                    'CESM2-WACCM',
                    'NorESM2-MM'] ## these are the ones for which we have GCM data as of Oct 2023

SSPpaths = ['ssp126','ssp245','ssp370','ssp585']   #Specifiying the SSP

In [None]:
this_GCM = modelnames_short[0]
this_basin = 'INDUS'
scen=SSPpaths[1]

fname = fpath +'runoff_AlignedMonthly_{}_{}_{}.csv'.format(this_GCM, scen, this_basin)
temp_df = pd.read_csv(fname, index_col=0)
temp_df.index = pd.to_datetime(temp_df.index)

In [None]:
temp_df.index

In [None]:
td = temp_df.loc[(temp_df.index.month==1) & (temp_df.index.year>2000) & (temp_df.index.year<2011)]

In [None]:
td.mean(axis=0)

In [None]:
def decadal_mean(df, year_lower, year_upper):
    months=np.arange(1,13)
    # monthly_mean = {m: for m in months}
    monthly_df = pd.DataFrame()
    for m in months:
        monthly_vals = df.loc[(df.index.month==m) 
                              & (df.index.year>year_lower)
                              & (df.index.year<year_upper)]
        monthly_mean = monthly_vals.mean(axis=0) 
        monthly_df[m] = monthly_mean
    # monthly_df.index = months
    return monthly_df.transpose()

In [None]:
new_df = decadal_mean(temp_df, 2000,2011)

In [None]:
new_df

In [None]:
fig,ax=plt.subplots()
ax.plot(new_df['GloGEM'], color='Green', label='GloGEM')
ax.plot(new_df['PyGEM'], color='Purple', label='PyGEM')
ax.plot(new_df['OGGM'], color='Blue', label='OGGM')
ax.set(xlabel='Month', 
       ylabel='Runoff [km$^{3}$, TBC]', 
       title='Seasonal cycle in {}, years 2001-2010, {}, {}'.format(this_basin, this_GCM, scen),
      xticks=(1,3,6,9,12))
ax.legend(loc='best')
plt.show()

Cool.  We have produced an example for the early-21st-century case, with one GCM.  Now we want to compare the end of the 21st century for the same model.  Eventually, we want to show all 5 GCMs together...but this will require a little more cmputation.

In [None]:
new_df_end21C = decadal_mean(temp_df, 2090,2101)

In [None]:
fig,ax=plt.subplots()
ax.plot(new_df_end21C['GloGEM'], color='Green', label='GloGEM')
ax.plot(new_df_end21C['PyGEM'], color='Purple', label='PyGEM')
ax.plot(new_df_end21C['OGGM'], color='Blue', label='OGGM')
ax.set(xlabel='Month', 
       ylabel='Runoff [km$^{3}$, TBC]', 
       title='Seasonal cycle in {}, years 2091-2100, {}, {}'.format(this_basin, this_GCM, scen),
      xticks=(1,3,6,9,12))
ax.legend(loc='best')
plt.show()

Plot the two together to see the difference.

In [None]:
fig,(ax1,ax2)=plt.subplots(2, sharex=True, sharey=True)
ax1.plot(new_df['GloGEM'], color='Green', label='GloGEM')
ax1.plot(new_df['PyGEM'], color='Purple', label='PyGEM')
ax1.plot(new_df['OGGM'], color='Blue', label='OGGM')
ax1.annotate('2001-2010', xy=(10,15))
ax1.legend(loc='upper left')

ax2.plot(new_df_end21C['GloGEM'], color='Green', label='GloGEM')
ax2.plot(new_df_end21C['PyGEM'], color='Purple', label='PyGEM')
ax2.plot(new_df_end21C['OGGM'], color='Blue', label='OGGM')
ax2.annotate('2091-2100', xy=(10,15))

fig.supxlabel('Month')
fig.supylabel('Runoff [km$^{3}$, TBC]')
fig.suptitle('Seasonal cycle in {}, {}, {}'.format(this_basin, this_GCM, scen))
# ax2.set(xlabel='Month', 
#        ylabel='Runoff [km$^{3}$, TBC]', 
#        title='Seasonal cycle in {}, years 2091-2100, {}, {}'.format(this_basin, this_GCM, scen),
#       xticks=(1,3,6,9,12))

### Compare with another GCM
In our SPEI analysis (Oct 2023) BCC-CSM2-MR showed GloGEM positively buffering, OGGM negatively buffering, and PyGEM not doing much buffering.  The GCM MRI-ESM2-0 instead showed all three models negatively buffering, with OGGM the least negative.  What does the seasonal cycle look like for that example?

In [None]:
this_GCM = modelnames_short[2]
this_basin = 'INDUS'
scen=SSPpaths[1]

fname = fpath +'runoff_AlignedMonthly_{}_{}_{}.csv'.format(this_GCM, scen, this_basin)
temp_df = pd.read_csv(fname, index_col=0)
temp_df.index = pd.to_datetime(temp_df.index)

In [None]:
new_stack_early21C= decadal_mean(temp_df, 2000,2011)
new_stack_late21C =decadal_mean(temp_df, 2090,2101)

In [None]:
fig,(ax1,ax2)=plt.subplots(2, sharex=True, sharey=True)
ax1.plot(new_stack_early21C['GloGEM'], color='Green', label='GloGEM')
ax1.plot(new_stack_early21C['PyGEM'], color='Purple', label='PyGEM')
ax1.plot(new_stack_early21C['OGGM'], color='Blue', label='OGGM')
ax1.annotate('2001-2010', xy=(10,15))
ax1.legend(loc='upper left')

ax2.plot(new_stack_late21C['GloGEM'], color='Green', label='GloGEM')
ax2.plot(new_stack_late21C['PyGEM'], color='Purple', label='PyGEM')
ax2.plot(new_stack_late21C['OGGM'], color='Blue', label='OGGM')
ax2.annotate('2091-2100', xy=(10,15))

fig.supxlabel('Month')
fig.supylabel('Runoff [km$^{3}$, TBC]')
fig.suptitle('Seasonal cycle in {}, {}, {}'.format(this_basin, this_GCM, scen))
# ax2.set(xlabel='Month', 
#        ylabel='Runoff [km$^{3}$, TBC]', 
#        title='Seasonal cycle in {}, years 2091-2100, {}, {}'.format(this_basin, this_GCM, scen),
#       xticks=(1,3,6,9,12))

In [None]:
fig,ax = plt.subplots()
ax.plot(new_df['PyGEM'])

Something strange is going on with PyGEM.  Finn is looking into it (13 Oct 2023 13:45 ET).  Meanwhile, let's visualise the _change_ in the seasonal cycle from beginning to end of century.

## Change in seasonal cycle

In [None]:
mvals = np.arange(1,13)

fig,(ax1,ax2,ax3)=plt.subplots(1,3, sharey=True, sharex=True)

for ax in (ax1,ax2,ax3):
    ax.axhline(y=0, ls=':', lw=0.5, color='k')
    ax.set(xticks=(2,4,6,8,10,12))

ax1.bar(mvals,new_stack_late21C['GloGEM']-new_stack_early21C['GloGEM'], 
       color='Green', label='GloGEM')
ax2.bar(mvals,new_stack_late21C['PyGEM']-new_stack_early21C['PyGEM'], 
       color='Purple', label='PyGEM')
ax3.bar(mvals,new_stack_late21C['OGGM']-new_stack_early21C['OGGM'], 
       color='Blue', label='OGGM')

fig.supxlabel('Month')
fig.supylabel('Runoff [km$^{3}$]')
fig.suptitle('Change in seasonal runoff, {}, late 21st C. versus early 21st C.'.format(this_basin))

## Plot all GCMs together

In [None]:
multiGCM_df_glo = {m: [] for m in modelnames_short}
multiGCM_df_py = {m: [] for m in modelnames_short}
multiGCM_df_og = {m: [] for m in modelnames_short}

for m in modelnames_short:
    fname = fpath +'runoff_AlignedMonthly_{}_{}_{}.csv'.format(m, scen, this_basin)
    temp_df = pd.read_csv(fname, index_col=0)
    temp_df.index = pd.to_datetime(temp_df.index)
    multiGCM_df_glo[m] = temp_df['GloGEM']
    multiGCM_df_py[m] = temp_df['PyGEM']
    multiGCM_df_og[m] = temp_df['OGGM']

multiGCM_df_glo = pd.DataFrame.from_dict(multiGCM_df_glo)
multiGCM_df_py = pd.DataFrame.from_dict(multiGCM_df_py)
multiGCM_df_og = pd.DataFrame.from_dict(multiGCM_df_og)

In [None]:
multiGCM_df_og

In [None]:
dmean_og_early = decadal_mean(multiGCM_df_og, year_lower=2000, year_upper=2011)
dmean_py_early = decadal_mean(multiGCM_df_py, year_lower=2000, year_upper=2011)
dmean_glo_early = decadal_mean(multiGCM_df_glo, year_lower=2000, year_upper=2011)

dmean_og_late = decadal_mean(multiGCM_df_og, year_lower=2090, year_upper=2101)
dmean_py_late = decadal_mean(multiGCM_df_py, year_lower=2090, year_upper=2101)
dmean_glo_late = decadal_mean(multiGCM_df_glo, year_lower=2090, year_upper=2101)

In [None]:
fig,(ax1,ax2) = plt.subplots(2, sharex=True, sharey=True)
for m in modelnames_short:
    ax1.plot(dmean_glo_early[m], color='Green', alpha=0.5)
    ax1.plot(dmean_py_early[m], color='Purple', alpha=0.5)
    ax1.plot(dmean_og_early[m], color='Blue', alpha=0.5)
    
    ax2.plot(dmean_glo_late[m], color='Green', alpha=0.5)
    ax2.plot(dmean_py_late[m], color='Purple', alpha=0.5)
    ax2.plot(dmean_og_late[m], color='Blue', alpha=0.5)
    
ax1.plot(dmean_glo_early.mean(axis=1), color='Green', label='GloGEM')
ax1.plot(dmean_py_early.mean(axis=1), color='Purple', label='PyGEM')
ax1.plot(dmean_og_early.mean(axis=1), color='Blue', label='OGGM')
ax1.annotate('2001-2010', xy=(10,15))
ax1.legend(loc='upper left')

ax2.plot(dmean_glo_late.mean(axis=1), color='Green')
ax2.plot(dmean_py_late.mean(axis=1), color='Purple')
ax2.plot(dmean_og_late.mean(axis=1), color='Blue')
ax2.annotate('2091-2100', xy=(10,15))

fig.supxlabel('Month')
fig.supylabel('Runoff [km$^{3}$, TBC]')
fig.suptitle('Seasonal cycle in {}, all GCMs, {}'.format(this_basin, scen))

Check mid-century, at Finn's request (20 Oct 2023).

In [None]:
dmean_og_early = decadal_mean(multiGCM_df_og, year_lower=2000, year_upper=2011)
dmean_py_early = decadal_mean(multiGCM_df_py, year_lower=2000, year_upper=2011)
dmean_glo_early = decadal_mean(multiGCM_df_glo, year_lower=2000, year_upper=2011)

dmean_og_mid = decadal_mean(multiGCM_df_og, year_lower=2050, year_upper=2061)
dmean_py_mid = decadal_mean(multiGCM_df_py, year_lower=2050, year_upper=2061)
dmean_glo_mid = decadal_mean(multiGCM_df_glo, year_lower=2050, year_upper=2061)

dmean_og_late = decadal_mean(multiGCM_df_og, year_lower=2090, year_upper=2101)
dmean_py_late = decadal_mean(multiGCM_df_py, year_lower=2090, year_upper=2101)
dmean_glo_late = decadal_mean(multiGCM_df_glo, year_lower=2090, year_upper=2101)

In [None]:
fig,(ax1,ax2, ax3) = plt.subplots(3, sharex=True, sharey=True)
for m in modelnames_short:
    ax1.plot(dmean_glo_early[m], color='Green', alpha=0.5)
    ax1.plot(dmean_py_early[m], color='Purple', alpha=0.5)
    ax1.plot(dmean_og_early[m], color='Blue', alpha=0.5)
    
    ax2.plot(dmean_glo_mid[m], color='Green', alpha=0.5)
    ax2.plot(dmean_py_mid[m], color='Purple', alpha=0.5)
    ax2.plot(dmean_og_mid[m], color='Blue', alpha=0.5)
    
    ax3.plot(dmean_glo_late[m], color='Green', alpha=0.5)
    ax3.plot(dmean_py_late[m], color='Purple', alpha=0.5)
    ax3.plot(dmean_og_late[m], color='Blue', alpha=0.5)
    
ax1.plot(dmean_glo_early.mean(axis=1), color='Green', label='GloGEM')
ax1.plot(dmean_py_early.mean(axis=1), color='Purple', label='PyGEM')
ax1.plot(dmean_og_early.mean(axis=1), color='Blue', label='OGGM')
ax1.annotate('2001-2010', xy=(10,15))
ax1.legend(loc='upper left')

ax2.plot(dmean_glo_mid.mean(axis=1), color='Green')
ax2.plot(dmean_py_mid.mean(axis=1), color='Purple')
ax2.plot(dmean_og_mid.mean(axis=1), color='Blue')
ax2.annotate('2051-2060', xy=(10,15))

ax3.plot(dmean_glo_late.mean(axis=1), color='Green')
ax3.plot(dmean_py_late.mean(axis=1), color='Purple')
ax3.plot(dmean_og_late.mean(axis=1), color='Blue')
ax3.annotate('2091-2100', xy=(10,15))

fig.supxlabel('Month')
fig.supylabel('Runoff [km$^{3}$]')
fig.suptitle('Seasonal cycle in {}, all GCMs, {}'.format(this_basin, scen))
fig.savefig('/Users/Lizz/Documents/Research/Runoff-intercomparison/Figures/{}-seasonal_cycle_{}-{}'.format(datetime.date.today(), this_basin, scen))

## Grouped box plot

Let's try a grouped box plot for each decade, just to see if we can make it?

In [None]:
dmean_glo_early

In [None]:
fig, ax = plt.subplots()

box =ax.boxplot(x=dmean_glo_early.T)

In [None]:
monthly_glo = multiGCM_df_glo.groupby(by=[multiGCM_df_glo.index.month]).mean()


In [None]:
monthly_glo

In [None]:
fig,ax = plt.subplots()
box1 = ax.boxplot(monthly_glo.T)

In [None]:
relative_fraction = monthly_glo / monthly_glo.max()

In [None]:
relative_fraction

In [None]:
fig,ax = plt.subplots()
box1 = ax.boxplot(relative_fraction.T)

In [None]:
fig,ax = plt.subplots()

gg = ax.boxplot((dmean_glo_early).T, patch_artist=True) ## GloGEM
pg = ax.boxplot((dmean_py_early).T, patch_artist=True) ## PyGEM
og = ax.boxplot((dmean_og_early).T, patch_artist=True) ## OGGM

colors=['Green', 'Purple', 'Blue']
gems = [gg, pg, og]
for gem, c in zip(gems, colors):
    for b in gem['boxes']:
        b.set_facecolor(c)
    for caps in gem['caps']:
        caps.set_color(c)
    for f in gem['fliers']:
        f.set_color(c)


This is kind of ugly and doesn't seem to show much.  Could group side by side but the cycle is still hard to read.  Dividing by the max to show the relative seasonal cycle -- how close this month's value is to the max -- produces a bit nicer plot, but still not something usable.

In [None]:
fig,ax = plt.subplots()

gg = ax.boxplot((dmean_glo_early/dmean_glo_early.max()).T, patch_artist=True) ## GloGEM
pg = ax.boxplot((dmean_py_early/dmean_py_early.max()).T, patch_artist=True) ## PyGEM
og = ax.boxplot((dmean_og_early/dmean_og_early.max()).T, patch_artist=True) ## OGGM

colors=['Green', 'Purple', 'Blue']
gems = [gg, pg, og]
for gem, c in zip(gems, colors):
    for b in gem['boxes']:
        b.set_facecolor(c)
    for caps in gem['caps']:
        caps.set_color(c)
    for f in gem['fliers']:
        f.set_color(c)


### New grouping of late-century values

In [None]:
def timeslice_median(df, year_lower, year_upper):
    months=np.arange(1,13)
    # monthly_mean = {m: for m in months}
    monthly_df = pd.DataFrame()
    for m in months:
        monthly_vals = df.loc[(df.index.month==m) 
                              & (df.index.year>year_lower)
                              & (df.index.year<year_upper)]
        monthly_median = monthly_vals.median(axis=0) 
        monthly_df[m] = monthly_median
    # monthly_df.index = months
    return monthly_df.transpose()

In [None]:
median30yr_og_late = timeslice_median(multiGCM_df_og, year_lower=2070, year_upper=2101)
median30yr_py_late = timeslice_median(multiGCM_df_py, year_lower=2070, year_upper=2101)
median30yr_glo_late = timeslice_median(multiGCM_df_glo, year_lower=2070, year_upper=2101)

In [None]:
median30yr_og_late

In [None]:
fig, ax = plt.subplots()
ax.plot((median30yr_glo_late.median(axis=1))/(median30yr_glo_late.median(axis=1).max()), color='Green', label='GloGEM')
ax.plot((median30yr_py_late.median(axis=1))/(median30yr_py_late.median(axis=1).max()), color='Purple', label='PyGEM')
ax.plot((median30yr_og_late.median(axis=1))/(median30yr_og_late.median(axis=1).max()), color='Blue', label='OGGM')

In [None]:
## Try dividing by absolute max of the dataset for that month?
monthly_max_glo = multiGCM_df_glo.groupby(by=[multiGCM_df_glo.index.month]).mean()
monthly_max_glo

In [None]:
fig, ax = plt.subplots()
ax.plot((median30yr_glo_late.median(axis=1))/(monthly_max_glo.mean(axis=1)), color='Green', label='GloGEM')
# ax.plot((median30yr_py_late.median(axis=1))/(median30yr_py_late.median(axis=1).max()), color='Purple', label='PyGEM')
# ax.plot((median30yr_og_late.median(axis=1))/(median30yr_og_late.median(axis=1).max()), color='Blue', label='OGGM')

In [None]:
monthly_max_glo.mean(axis=1)

## Boxplot with 30 years of data, taking multi-GCM mean first?

In [None]:
multiGCM_median_glo = multiGCM_df_glo.median(axis=1)
multiGCM_median_glo

In [None]:
ggvals = multiGCM_df_glo.loc[(multiGCM_df_glo.index.year>2070)
                              & (multiGCM_df_glo.index.year<2101)]
ggvals = ggvals.melt(ignore_index=False)

In [None]:
ggvals['month'] = ggvals.index.month
ggvals

In [None]:
ggvals.boxplot(by='month')

In [None]:
fig, ax = plt.subplots()
bp = ggvals.boxplot(by='month', ax=ax)
ax.set(title='Monthly runoff {}'.format(this_basin))

Okay...having used "melt" to put all the values from the different GCMs in line, we can get this to show up with a late-century plot of the seasonal cycle.  I still do not think this looks cleaner than the single line, and we still can't control it using any other boxplot method (must be df.boxplot, not df.plot.box or plt.boxplot(df)).

## Back to normalized lines

In [None]:
late_C_gg = multiGCM_df_glo.loc[(multiGCM_df_glo.index.year>2070)
                              & (multiGCM_df_glo.index.year<2101)]

late_C_gg_monthmeans = late_C_gg.groupby(by=[late_C_gg.index.month]).mean()
late_C_gg_monthmeans/late_C_gg_monthmeans.max()

In [None]:
fig, ax = plt.subplots()
ax.plot((late_C_gg_monthmeans/late_C_gg_monthmeans.max()).mean(axis=1), color='Green', label='GloGEM')
# # ax.plot((median30yr_py_late.median(axis=1))/(median30yr_py_late.median(axis=1).max()), color='Purple', label='PyGEM')
# # ax.plot((median30yr_og_late.median(axis=1))/(median30yr_og_late.median(axis=1).max()), color='Blue', label='OGGM')

Try and show for all 3 GEMs.  Sigh...

In [None]:
late_C_gg = multiGCM_df_glo.loc[(multiGCM_df_glo.index.year>2070)
                              & (multiGCM_df_glo.index.year<2101)]
late_C_gg_monthmeans = late_C_gg.groupby(by=[late_C_gg.index.month]).mean()

late_C_py = multiGCM_df_py.loc[(multiGCM_df_py.index.year>2070)
                              & (multiGCM_df_py.index.year<2101)]
late_C_py_monthmeans = late_C_py.groupby(by=[late_C_py.index.month]).mean()

late_C_og = multiGCM_df_og.loc[(multiGCM_df_og.index.year>2070)
                              & (multiGCM_df_og.index.year<2101)]
late_C_og_monthmeans = late_C_og.groupby(by=[late_C_og.index.month]).mean()

In [None]:
fig, ax = plt.subplots()
ax.plot((late_C_gg_monthmeans/late_C_gg_monthmeans.max()).mean(axis=1), color='Green', label='GloGEM')
ax.plot((late_C_py_monthmeans/late_C_py_monthmeans.max()).mean(axis=1), color='Purple', label='PyGEM')
ax.plot((late_C_og_monthmeans/late_C_og_monthmeans.max()).mean(axis=1), color='Blue', label='OGGM')

ax.set(title='Seasonal cycle {}, 2070-2100'.format(this_basin),
       xlabel='Month',
       ylabel='Fraction of max monthly runoff')

In [None]:
## version showing all GCMs -- less readable

fig, ax = plt.subplots()
ax.plot((late_C_gg_monthmeans/late_C_gg_monthmeans.max()), color='Green', label='GloGEM')
ax.plot((late_C_py_monthmeans/late_C_py_monthmeans.max()), color='Purple', label='PyGEM')
ax.plot((late_C_og_monthmeans/late_C_og_monthmeans.max()), color='Blue', label='OGGM')

ax.set(title='Seasonal cycle {}, 2070-2100'.format(this_basin),
       xlabel='Month',
       ylabel='Fraction of max monthly runoff')