**INPUT:**
- topUSlakes.csv: list of top 100 US lakes
- us_lakes_ts_minimal.csv: ice phenology data for US lakes
- top100_extra_weather_all_deltaElev_100m.csv: meteorological time series built in `2 - Build continuous weather time series` notebook.

- US_lakes_climateNA_1901-2020MP.csv: ClimateNA monthly data from 1901-2020 for each lake coordinate and elevation

**OUTPUT:**
- model_input_v10.csv OR
- model_input_v11.csv and model_input_v11b.csv OR
- model_input_v12.csv and model_input_v12b.csv


**DEPENDENCIES**:
- ``kpmb_weather.py`` is found in `modules` directory
- ``phenology_temperature`` is found in `modules` directory

**NOTEBOOK SUMMARY**
- Smooths time series using Chebyshev polynomial fit (n=13)
- Calculates two zero crossing dates (positive to negative temperatures and negative to positive temperatures)
- Adjusts all meteorological time series so they are relative to occurrence of ice-on or ice-off event
- Calculates **Freezing Degree Days** and **Positive Degree Days** in period before ice-on and ice-off (2 ways):
    1. Look only at temperatures between zero-crossing and subsequent ice-on or ice-off
    2. Look at all temperatures before ice-on; and all temperatures between ice-on and ice-off

This file was modified on September 1, 2022 to include ClimateNA solar radiation data (MJ m-2 d-1), where available.
- calculate seasonal average (MJ m-2 d-1)

In [None]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
from pathlib import Path
from numpy.polynomial import Chebyshev

from scipy.signal import find_peaks

from scipy.stats import pearsonr

from IPython.display import clear_output


import sys
sys.path.append('../modules')
from kpmb_weather import get_date

# for visualization of temperature record and ice on and ice off dates
from phenology_temperature import iceon_off_summary

Parameters matching those set in `2 - Build continuous weather time series` (connected to file names)

In [None]:
nlakes = 100
delta_elev_m = 100

### Customize these directories and file names

Define ICEMODELS_DIR, the directory where the ice phenology data are saved.

Define location of us_lakes_csv, file containing ice phenology data for all lakes. Columns must include:
- lakecode : a unique code for the given lake
- start_year : beginning of winter season (e.g., 1990 for winter of 1990-1991)
- ice_on_1 : first ice on of season
- ice_off_1, ice_off_2, ... , ice_off_6 : first, second, .. sixth ice-off event of season (if there is only one ice-off, the rest of the columns can be NaN)
- froze : 'Y' or 'N' (whether the lake froze)

Define location of climateNA_csv, file containing output from ClimateNA.



In [None]:
volume = 'My Passport for Mac'
ICEMODELS_DIR = Path(f'/Volumes/{volume}/IceModels/')

us_lakes_csv = Path('us_lakes_ts_minimal.csv')

climateNA_csv= Path('climateNA.csv')


Read in CSV files

In [None]:
# file contains the list of US lakes used in this study
dftopN = pd.read_csv(ICEMODELS_DIR/f'topUSlakes.csv',low_memory=False)

# ice phenology for all US lakes
dfice = pd.read_csv(us_lakes_csv,low_memory=False)


# added 'extra' weather for snow and precipitation, etc.
#   and removed dates with "quality" flags
filename = ICEMODELS_DIR/f'/top{nlakes}_extra_weather_all_deltaElev_{delta_elev_m}m.csv'

dfworkingFilled = pd.read_csv(filename, low_memory=False)
dfworkingFilled['DATE'] = pd.to_datetime(dfworkingFilled.DATE)

#### Additional QAQC.
Check snow on warm days (TMIN>5)... Matches original weather record and no quality flags. So keep it.

In [None]:
indmin = (dfworkingFilled['SNOW'] > 0) & (dfworkingFilled['TMIN']>5)
display(dfworkingFilled[indmin].dropna(how='all',axis=1))
plt.plot(dfworkingFilled.loc[indmin,'TMIN'],
         dfworkingFilled.loc[indmin,'SNOW'],ls='none',marker='o')


Create a list of leap years for easier calculation below.

In [None]:
list_of_leap_years = [i for i in range(1800,2021,4) if (
                                                    ((i % 400)==0) | 
                                                        ((
                                                            (i % 100)!=0) & ((i % 4)==0)))]
#list_of_leap_years

### Reorganize and filter meteorological time series
Convert all dates to DOY, relative to Dec 31 = 0 (Dec 30 = -1, Jan 1 = 1, etc.)

Pivot table to produce four tables
- columns are DOY
- rows are (lakecode, year)
- values are one of TMINMAX, SNOW, PRCP or SNWD 

#### 1. Reorganize meteorological time series
- Organize by ice season (e.g., 2002-2003)
- Convert dates to day of year relative to Dec 31=0.

In [None]:
# determine day of year
dfworkingFilled['doy'] = dfworkingFilled.DATE.dt.day_of_year

# determine start_year; i.e., the year before time series
dfworkingFilled['start_year'] = dfworkingFilled.DATE.dt.year-1
#  e.g., 2002-01-01 to 2002-12-31 has a start_year of 2001 at this point

# convert DOY to negative if July 1st or later (182 or 183 in leap year)

# find all rows that are in a leap year
ly = (dfworkingFilled.start_year+1).isin(list_of_leap_years)
print('Number of rows in a leap year:', ly.sum())
ind = ((dfworkingFilled.doy>181) & ~ly) | ((dfworkingFilled.doy>182) & ly)

dfworkingFilled.loc[ind,'start_year'] = dfworkingFilled.loc[ind,'start_year']+1
# e.g., 2002-07-01 to 2002-12-31 have a start_year of 2002 (other 2002 dates have a start_year of 2001)

# adjust so day of year is symmetric (both negative and positive) around Dec 31.
#   e.g., Dec 31 is 0, Dec 30 is -1, Jan 1 is 1, Jan 2 is 2.
dfworkingFilled.loc[ind,'doy'] = dfworkingFilled.loc[ind,'doy']-(365+ly)

# Create a dictionary of meteorological dataframes
#   Dataframes have Lake and year as row indices; and columns are day of year. (-182 to 183)
dfdaily = {}
for var in ['TMINMAX','SNOW','PRCP','SNWD']:
    dfdaily[var] = dfworkingFilled.pivot_table(index=['lakecode','start_year'],
                                               dropna=False,columns='doy',values=var)
    print(var, dfdaily[var].shape)

# Get index (i.e., lakecode and start_year) used to identify each of the 19600 rows of the dataframe

index_order = dfdaily['TMINMAX'].index

#### 2. Aligning ice phenology data and meteorological data

a) Calculate ice on day of year, ice off day of year, and duration

b) Reorganize so ice index matches weather index.

c) Select only those indices (i.e., (lakecode, start_year)) that have both an ice phenology record and a meteorological time series.

In [None]:
# a)
# ice on is first ice on event of year
dfice['ice_on_doy'] = (pd.to_datetime(dfice.ice_on_1) - pd.to_datetime(dfice.start_year.astype(str)+'-12-31')).dt.days

# ice off is last ice off event of year
dfice['ice_off_doy'] = (pd.to_datetime(dfice.ice_off_6.fillna(dfice.ice_off_5.fillna(dfice.ice_off_4).fillna(dfice.ice_off_3).fillna(dfice.ice_off_2).fillna(dfice.ice_off_1))) - pd.to_datetime(dfice.start_year.astype(str)+'-12-31')).dt.days

# duration is difference between ice on and ice off
dfice['ice_duration'] = dfice['ice_off_doy'] - dfice['ice_on_doy']

# ice duration is 0 if the lake did not freeze
ind_nofreeze = dfice.froze=='N'
dfice.loc[ind_nofreeze,'ice_duration'] = 0

# b)
ice_index = dfice.set_index(['lakecode','start_year']).index

# extract relevant indices (i.e., (lakecode,start_year)) from reorganized weather time series dataframe 
#   choosing only those indices that have a match in the ice phenology record
weather_index = pd.MultiIndex.from_tuples([i for i in index_order if i in ice_index],names=['lakecode','start_year'])

## c)
# select only those ice phenology records that also have a meteorological time series
df_lakeice = dfice.set_index(['lakecode','start_year']).loc[weather_index,:]

# select only those meteorological time series that also have ice phenology records
df_lakeweather = {}
for var in ['TMINMAX','SNOW','PRCP','SNWD']:
    df_lakeweather[var] = dfdaily[var].loc[weather_index,:]

#### QAQC.
Look at rows that have missing TMINMAX. May not be able to use these rows for FDD and PDD calculations

In [None]:
ind = df_lakeweather['TMINMAX'].loc[:,-183:181].isnull().any(axis=1)
print(ind.sum(),'out of', df_lakeweather['TMINMAX'].shape[0])

Look at rows that have some NAs, but not all NAs.

In [None]:
df_lakeweather['TMINMAX'].dropna(how='all',axis=0)

Find length of time series that is needed for complete ice season, once shifted to ice-on as Day Zero. (i.e., maximum duration)

In [None]:
max_duration = df_lakeice.ice_duration.max()
print("Latest day (after ice on), i.e., longest duration:",max_duration)

### Find zero-crossing day based on air temperature time series (as DOY)
- using `df_lakeweather_offset_iceon['TMINMAX']` and `df_lakeweather_offset_iceoff['TMINMAX']`
- find day of earliest negative temperature in the entire record, **min_doy** 
- find latest ice off DOY, **max_doy**
- calculate cumulative degree days time series starting at **min_doy** and ending at **max_doy**
- smooth this record using Chebyshev 13th degree polynomial.
- minimum that occurs before ice-on is the freezing zero-crossing
- maximum that occurs before ice-off is the thawing zero-crossing


Find zero-crossing based on DOY from original data, independent of ice-on and ice-off knowledge

In [None]:
min_day = (df_lakeweather['TMINMAX']<=0).any(axis=0).replace(False,np.nan).dropna().index[0]
max_day = df_lakeice.ice_off_doy.max()
print(min_day, max_day)
# start a couple of weeks before min_day
delta_days = 14

tcusum = df_lakeweather['TMINMAX'].loc[:,min_day - delta_days:max_day + delta_days].cumsum(axis=1)
print(tcusum.shape)

In [None]:
tcusum.shape

Testing algorithm.

In [None]:
win = 7
df_lakeweather['TMINMAX'].loc[:,min_day - delta_days:max_day + delta_days].loc[('JGL01',1997),:].plot(lw=0.5)
df_lakeweather['TMINMAX'].loc[:,min_day - delta_days:max_day + delta_days].loc[('JGL01',1997),:].rolling(win).mean().plot()
plt.gca().axhline(0,color='k')
ax2 = plt.gca().twinx()
tcusum.loc[('JGL01',1997),:].plot(ax=ax2)
df_lakeweather['TMINMAX'].loc[:,min_day - delta_days:max_day + delta_days].loc[('JGL01',1997),:].rolling(win).mean().cumsum().plot()

List of "problem" lakes/years to manually confirm work OK with given algorithm

In [None]:
check_i = [('DMR2', 1998.0),
 ('DMR2', 2001.0),
 ('JD01', 1995.0),
 ('JD01', 1998.0),
 ('JGL01', 1997.0),
 ('JJM1', 2000.0),
 ('JJM1', 2010.0),
 ('JJM1', 2012.0),
 ('JJM1', 2017.0),
 ('JJM18', 1998.0),
 ('JJM18', 2003.0),
 ('JJM2', 2010.0),
 ('JJM2', 2012.0),
 ('JJM2', 2017.0),
 ('JJM27', 1998.0),
 ('JJM28', 1981.0),
 ('JJM4', 2010.0),
 ('JJM4', 2012.0),
 ('JJM4', 2017.0),
 ('JJM6', 2010.0),
 ('JJM6', 2012.0),
 ('JJM6', 2017.0),
 ('JJM9', 1995.0),
 ('KMS14', 1992.0),
 ('KMS19', 1973.0),
 ('KMS25', 1976.0),
 ('KMS25', 1997.0),
 ('KMS25', 2003.0),
 ('LR1', 2005.0),
 ('LR2', 2001.0),
 ('MICH03', 1988.0),
 ('MICH03', 1989.0),
 ('MICH03', 2005.0),
 ('MICH03', 2011.0),
 ('MICH06', 2008.0),
 ('MINN34', 1987.0),
 ('MINN4', 2001.0),
 ('xKB0014', 1994.0),
 ('xKB0019', 1990.0),
 ('xKB0263', 1980.0),
 ('xKB0269', 1990.0),
 ('xKB0269', 1997.0),
 ('xKB0269', 2011.0),
 ('xKB0364', 1977.0),
 ('xKB1162', 1997.0),
 ('xKB1370', 1991.0),
 ('xKB1746', 1986.0),
 ('xKB1746', 1998.0),
 ('xKB1921', 1981.0),
 ('xKB1921', 1998.0),
 ('xKB1921', 1980.0)]

In [None]:
chebyshev_smooth = False
window = 7 # days 

df_zero_cross = pd.DataFrame(index = df_lakeweather['TMINMAX'].index)

dfsmoothed_tminmax = pd.DataFrame(index = df_lakeweather['TMINMAX'].index)
# degree of polynomial fit to cumulative sum of temperature time series
ndeg = 13
ii =0
for i,row in tcusum.iterrows():
    clear_output(wait=True)
    ii+=1
    print(ii)
    x = row.index.astype(float)
    y = row.astype(float).values
    ind = row.isnull()
    x = x[~ind]
    y = y[~ind]
    if len(x)==0:
        continue
        
    ice_on_doy = df_lakeice.loc[i,'ice_on_doy']
    ice_off_doy = df_lakeice.loc[i,'ice_off_doy']
    
    # find zero-crossings based on different rolling mean temperatures 
    for win in [1,3,5,7,14]:
        newx = np.arange(x[0],x[-1])
        # first make sure series in continuous
        newy = np.interp(newx, x, y)
        newy = pd.Series(newy,index=newx).rolling(win).mean()
        
        # find peaks (maxima);
        peaks = newx[find_peaks(newy)[0]]

        # find troughs (minima); 
        troughs = newx[find_peaks(-newy)[0]]

        # choose first peak; i.e. first time mean temp crosses below zero
        if len(peaks)>0:
            freeze_doy = peaks[0]
        else:
            freeze_doy = np.nan

        # choose first trough, but must be after ice_on_doy and after freeze_doy
    
        troughs = [t for t in troughs if (t > ice_on_doy) & (t > freeze_doy)]
        
        if len(troughs)>0:
            thaw_doy = troughs[0]
        else:
            thaw_doy = np.nan
        df_zero_cross.loc[i,f'ZC{win}FreezeDOY'] =freeze_doy
        df_zero_cross.loc[i,f'ZC{win}ThawDOY'] = thaw_doy
        
    c = Chebyshev.fit(x,y,deg=ndeg)
    newx = np.linspace(x[0],x[-1],1000)
    newy = c(newx)

    # save Chebyshev smoothed time series to new dataframe
    dfsmoothed_tminmax.loc[i,range(int(x[0]),int(x[-1])+1)] = c(range(int(x[0]),int(x[-1])+1))
    
    # find peaks (maxima)
    peaks = newx[find_peaks(newy)[0]]
    
    # find troughs (minima)
    troughs = newx[find_peaks(-newy)[0]]
    
    # choose first peak
    if len(peaks)>0:
        freeze_doy = peaks[0]
    else:
        freeze_doy = np.nan
    
    # choose last trough
    if len(troughs)>0:
        thaw_doy = troughs[-1]
    else:
        thaw_doy = np.nan
        
    df_zero_cross.loc[i,'ZCFreezeDOY'] =freeze_doy
    df_zero_cross.loc[i,'ZCThawDOY'] = thaw_doy
    
    if i in check_i:
        continue
        tmp_ts = df_lakeweather['TMINMAX'].loc[:,min_day - delta_days:max_day + delta_days].loc[i,:]
        print(i)
        plt.plot(x,y)
        plt.plot(newx,newy)
        plt.axvline(freeze_doy,color='C0',lw=2)
        plt.axvline(thaw_doy,color='C3',lw=2)
        yy = np.max(plt.gca().get_ylim())
        for iii,xx in df_zero_cross.loc[i,:].iteritems():
            if 'Thaw' in iii:
                color='C3'
            else:
                color='C0'
            plt.axvline(xx,ls=':',c=color)
            plt.gca().text(xx,yy, ''.join(re.findall('\d+',iii)),ha='center',rotation=0,fontsize=20,color=color)
        plt.axvline(ice_on_doy,lw=8,color='C0',alpha=0.5)
        plt.axvline(ice_off_doy,lw=8,color='C3',alpha=0.5)
        ax = plt.gca()
        ax.set_ylabel('Cumulative degree days')
        ax2 = ax.twinx()
        tmp_ts.plot(ax=ax2,color='k',lw=0.5)
        ax2.axhline(0,lw=0.5,color='k')
        ax2.set_ylabel('Temperature')
        plt.show()
        display(df_zero_cross.loc[i,:])
        input('continue?')
        
    #if not row.isnull().all():
    #    break

In [None]:
df_DOY = df_zero_cross.merge(df_lakeice[['lake','froze','ice_on_doy','ice_off_doy','ice_duration']],
                    validate='one_to_one',
                    how='outer',left_index=True,right_index=True)
display(df_DOY.head())
print(df_DOY.shape)

Look at Pearson r coefficients of ice_on_doy and all ZC?FreezeDOY columns. Which offers the tightest correlation?

Look at Pearson r coefficients of ice_off_doy and all ZC?ThawDOY columns. Which offers the tightest correlation?



In [None]:
df_corr = pd.DataFrame(columns=['r','p'])

for cc in [cc for cc in df_DOY.columns if ('Thaw' in cc) | ('Freeze' in cc)]:
    x = df_DOY[cc].copy()
    y = df_DOY['ice_on_doy'].copy()
    if 'Thaw' in cc:
        y = df_DOY['ice_off_doy'].copy()
    ind = x.isnull() | y.isnull()
    x = x[~ind]
    y = y[~ind]
    r,p = pearsonr(x,y)
    df_corr.loc[cc,'r'] = r
    df_corr.loc[cc,'p'] = p
    

In [None]:
df_corr.sort_values('r',ascending=False)

### Best correlation is between ice-on/off and the zero-crossing day determined from polynomial fit.

### Histograms of zero-crossing and ice-on/ice-off delay
- negative delay values could be a result of:
    - local air temperatures being different than those reported at the nearest weather station
    - zero-crossing is looking at the general trend of temperature
        - there could still be negative (or positive) temperatures preceding zero-crossing
    - wind events prior to ice off could result in ice breakup, even when temperatures are hovering slightly below zero

In [None]:
bins = np.linspace(-50,125,80)

fig,ax = plt.subplots()
(df_DOY.ice_on_doy - df_DOY.ZCFreezeDOY).hist(ax=ax,bins=bins,label='Ice On delay')
(df_DOY.ice_off_doy- df_DOY.ZCThawDOY).hist(ax=ax,alpha=0.5,bins=bins, label="Ice Off delay")
ax.legend()
ax.set_xlabel('Time since zero-crossing (days)')
ax.set_yscale('log')


## FDD and PDD: Antecedent conditions
Reorganize so all weather variables are relative to ice-on or ice-off date, removing weather data from rows with no ice-on or ice-off information.

We will keep 366 days. 

ICEON:
- 210 days after ice-on (to cover all ice durations)
- and thus 155 days before ice-on.

ICEOFF:
- 10 days after ice-on
- 355 days before ice-off


#### 1. Shift meteorological time series using the ice_on_doy column in df_lakeice.

Need to add columns. Otherwise shifting would shift some values out of the dataframe.

In [None]:
max_day_column = 210
min_day_column = -355
#min_day_column = -183

In [None]:
df_lakeweather_offset_iceon = {}
df_lakeweather_offset_iceoff = {}


# create dataframe with additional columns in it (-355 to -184 and 183 to 210)
dftmp = pd.DataFrame(index = df_lakeweather['TMINMAX'].index, 
                     columns = 
                     list(np.arange(-184,min_day_column-1,-1))+
                     list(np.arange(183,max_day_column+1)))


for var in ['TMINMAX','SNOW','PRCP','SNWD']:
    
    dfworking = df_lakeweather[var].copy()
    
    # add additional columns
    dfworking = pd.concat([dfworking,dftmp],axis=1) 
    
    # sort columns
    dfworking = dfworking.loc[:,
                list(np.arange(min_day_column,max_day_column+1))].copy()

    # start with complete meteorological time series
    dfon_ = dfworking.copy()
    dfoff_ = dfworking.copy()
    
    for ii in range(-183,183):
        # Find all lakes/years that have ice-on on day -ii
        ind1 = df_lakeice.ice_on_doy == -ii
        
        # Find all lakes/years that have ice-off on day -ii
        ind2 = df_lakeice.ice_off_doy == -ii
        
        # continue if there are no lakes/years with ice on or ice off on day -ii
        if (ind1.sum()==0) & (ind2.sum()==0):
            continue        
        
        clear_output(wait=True)
        print(var, ii)
        
        # shift the entire time series so it will be relative to ice on date
        #df_shift = df_lakeweather[var].loc[:,-183:182].shift(ii,axis=1).copy()
        df_shift = dfworking.loc[:,:].shift(ii,axis=1).copy()
        
        # replace timeseries with values from shifted dataframe
        # select only those rows that actually have lakes/years with ice-on on day -ii
        dfon_.loc[ind1,:] = df_shift[ind1].copy()
        # select only those rows that actually have lakes/years with ice-off on day -ii
        dfoff_.loc[ind2,:] = df_shift[ind2].copy()
        
        
    # go back and remove weather info if there is no info for ice_on_doy or ice_off_doy
    # df_lakeice and df_lakeweather dataframes have the same row indices
    ind1 = df_lakeice.ice_on_doy.isnull()
    ind2 = df_lakeice.ice_off_doy.isnull()
    #dfon_.loc[ind1,-183:182] = np.nan
    #dfoff_.loc[ind2,-183:182] = np.nan
    
    dfon_.loc[ind1,:] = np.nan
    dfoff_.loc[ind2,:] = np.nan
    
    # write these dataframes to their respective dictionary fields
    df_lakeweather_offset_iceon[var] = dfon_
    df_lakeweather_offset_iceoff[var] = dfoff_
        

In [None]:
#df_zero_cross.dropna().ZCFreeze.hist(bins = np.linspace(-130,30,80),alpha=0.5)
#df_zero_cross.dropna().ZCThaw.hist()
#(df_zero_cross.dropna().ZCThaw - df_zero_cross.dropna().IceDuration).hist(bins = np.linspace(-130,30,80),alpha=0.5)

### Calculate FDD and PDD in two ways
1. Use zero-crossing date to calculate a FDD and PDD prior to ice-on and ice-off respectively; only include days after zero-crossing

2. Calculate total FDD prior to ice-on; and total PDD between ice-on and ice-off

In [None]:
#df_zero_cross

In [None]:
df_DOY.columns

In [None]:
df_temp = df_lakeweather_offset_iceon['TMINMAX'].copy() # this is entire time series shifted so iceon is Day Zero

x0 = (df_temp<0).any(axis=0).replace(False,np.nan).dropna().index[0]
print(x0)

for i,row in df_DOY.iterrows():
    if np.isnan(row.ice_duration) | (row.ice_duration==0):
        continue
    
    #  1. FDD and PDD based on zero-crossing
    #
    if np.isnan(row.ZCFreezeDOY):
        fdd = np.nan
    else:
        zc = int(round(row.ZCFreezeDOY - row.ice_on_doy))
        if zc > 0:
            zc = 0
        pre_iceon = df_temp.loc[i,zc:0]
        fdd = np.abs(pre_iceon[pre_iceon<0].sum())
    if np.isnan(row.ZCThawDOY):
        pdd = np.nan
    else:
        zc = int(round(row.ZCThawDOY - row.ice_on_doy))
        if zc > row.ice_duration:
            zc = row.ice_duration
        pre_iceoff = df_temp.loc[i,zc:row.ice_duration]
        pdd = pre_iceoff[pre_iceoff>0].sum()
    
    #  2. TOTAL FDD and PDD
    # look at all days before ice on; x0 is -140, earliest below zero day
    pre_iceon2 = df_temp.loc[i,x0:0]
    fdd2 = np.abs(pre_iceon2[pre_iceon2<0].sum())
    
    # look at all days between ice on and ice off
    pre_iceoff2 = df_temp.loc[i,0:row.ice_duration]
    pdd2 = pre_iceoff2[pre_iceoff2>0].sum()
    
    df_DOY.loc[i,'FDD_ZC'] = fdd
    df_DOY.loc[i,'PDD_ZC'] = pdd
    
    df_DOY.loc[i,'FDD'] = fdd2
    df_DOY.loc[i,'PDD'] = pdd2


In [None]:
df_DOY

What is the earliest day that temp drops below 0?

In [None]:
print("Earliest day (before ice on):",
      (df_temp.iloc[:,2:]<0).any(axis=0).replace(False,np.nan).dropna().index[0])
print("Latest day (after ice on), i.e., longest duration:",df_lakeice.ice_duration.max())
      

### Display example temperature and ice phenology record

- Look for years and lakes where PDD and FDD are zero (from 4a - Data exploration notebook)
- There are 51 total (22 FDD are zero, 29 PDD are zero)
- Display these here to see what is going on


In [None]:
dfsmoothed_tminmax = dfsmoothed_tminmax.loc[:,dfsmoothed_tminmax.columns.sort_values()]

In [None]:
#lkcode, year = 'DMR2', 2003
#lkcode, year = 'xKB1045',1991
for lkcode,year in zip(['DMR1','DMR2'],[2003,2003]):
    #lakename = dfice.loc[dfice.lakecode==lkcode,'lake'].drop_duplicates().values[0].title()
    duration = dfice[(dfice.lakecode==lkcode) & (dfice.start_year==year)].ice_duration.values[0]
    if np.isnan(duration):
        continue
    iceon_off_summary(lkcode, year, df_lakeweather_offset_iceon, dfsmoothed_tminmax,
                      df_DOY, zc = True, date_range=(-140, 198))



### Seasonal weather patterns
- Calculate seasonal weather patterns from daily NOAA meteorological time series

In [None]:
#dfworkingFilled.loc[(dfworkingFilled.lakecode=='DMR1')& (dfworkingFilled.start_year==1852),'SNOW'].dropna()

In [None]:
dfworkingFilled['Year'] = dfworkingFilled.DATE.dt.year
dfworkingFilled['Month'] = dfworkingFilled.DATE.dt.month
dfworkingFilled['Year_lag1'] = dfworkingFilled.Year-1

ind = dfworkingFilled.Month<6
dfworkingFilled.loc[ind,'WinterYear'] = dfworkingFilled.loc[ind,'Year_lag1']
dfworkingFilled.loc[~ind,'WinterYear'] = dfworkingFilled.loc[~ind,'Year']
dfworkingFilled['Season'] = ((dfworkingFilled.Month+6) // 3) % 4

complete_seasons = dfworkingFilled.groupby(['lakecode','WinterYear','Season']).Month.apply(lambda x: len(x.unique())==3).reset_index().rename({'Month':'CompleteSeason'},axis=1)

print('Status: first merger')
dfworkingFilled_extra = dfworkingFilled.merge(complete_seasons,left_on=['lakecode','WinterYear','Season'],
                      right_on=['lakecode','WinterYear','Season'],validate='many_to_one',how='left')

print('Status: entering for loop')
for var in ['TMINMAX','SNOW','SNWD','PRCP']:
    dfworkingFilled_extra.loc[~dfworkingFilled_extra.CompleteSeason,var] = np.nan

    season_dict = {0:f'{var}_lagJJA',1:f'{var}_lagSON',2:f'{var}_DJF',3:f'{var}_MAM'}
    print(var)
    if var in ['TMINMAX','SNWD']:
        dfseasonal_ = dfworkingFilled_extra.groupby(['lakecode','WinterYear','Season'])[var].mean().reset_index().pivot_table(index = ['lakecode','WinterYear'],columns='Season',values=var).rename(season_dict,axis=1)
    else:
        dfseasonal_ = dfworkingFilled_extra.groupby(['lakecode','WinterYear','Season'])[var].sum(min_count=1).reset_index().pivot_table(index = ['lakecode','WinterYear'],columns='Season',values=var).rename(season_dict,axis=1)
    if var=='TMINMAX':
        dfseasonal_all = dfseasonal_.copy()
    else:
        dfseasonal_all = dfseasonal_all.merge(dfseasonal_,left_index=True,right_index=True,how='outer',validate='one_to_one')


In [None]:
#dfresult2.columns

**26 APRIL 2023 UPDATE**
- added fdd_zc and hdd_zc columns

What about zero duration no freeze seasons?

In [None]:
#df_DOY

In [None]:
#dftopN

In [None]:
#dfseasonal_all

In [None]:
df_final = dfseasonal_all.merge(df_DOY.drop('lake',axis=1).rename_axis(('lakecode','WinterYear')).rename(
    {'FDD':'FDD_on','PDD':'PDD_off','FDD_ZC':'FDD_on_zc','PDD_ZC':'PDD_off_zc'},axis=1),
                                left_index=True,right_index=True, how='outer')

#df_final = dfseasonal_all.merge(df_fdd.rename_axis(('lakecode','WinterYear')).rename('FDD_on'),
#                                left_index=True,right_index=True,how='outer').merge(
#                                df_hdd.rename_axis(('lakecode','WinterYear')).rename('HDD_off'),
#                                left_index=True,right_index=True,how='outer').merge(
#                                df_fdd_smooth.rename_axis(('lakecode','WinterYear')).rename('FDD_on_smooth'),
#                                left_index=True,right_index=True,how='outer').merge(
#                                df_hdd_smooth.rename_axis(('lakecode','WinterYear')).rename('HDD_off_smooth'),
#                                left_index=True,right_index=True,how='outer')

display(df_final.columns)

df_final = df_final.join(dftopN.set_index('lakecode'),how='outer').rename_axis(('lakecode','start_year')).reset_index()

#df_final = df_final.merge(dfice[['lakecode','start_year','ice_on_doy','ice_off_doy','ice_duration']], left_index=True,
#               right_on= ['lakecode','start_year'],how='left')
first_columns= ['lakecode','lake','start_year','ice_on_doy','ice_off_doy','ice_duration','lat','lon']
next_columns = [c for c in df_final.columns if c not in first_columns]
df_final = df_final[first_columns+next_columns].drop(['start_date','end_date'],axis=1).sort_values(
    ['lakecode','start_year']).reset_index(drop=True)
df_final.columns
#df_final.HDD_off.hist()
df_final.PDD_off.hist() # blue
df_final.PDD_off_zc.hist(alpha=0.5) # orange

Version 10 excludes HydroLAKES match for Peltier Lake, MN

Version 11 is completely rebuilt weather time series based on TMINMAX values and including snow and precipitation
- changed column headings
    - level_0 -> lakecode
    - GDD_off -> HDD_off
    - duration -> ice_duration
- new column headings
    - SNOW seasonal (total snow fall)
    - SNWD seasonal (average snow depth)
    - PRCP seasonal (total precipitation)
- removed column headings
    - GDD_on
    - FDD_off
    - FDD_offseason
    - GDD_off
    - FDD_year
    - start_date
    - end_date
    
Version 12 adds:
- new column headings
    - FDD_on_zc (FDD calculation only since zero-crossing)
    - PDD_off_zc (PDD calculation only since zero-crossing)
    - HDD_off renamed to PDD_off

In [None]:
df_final.shape

In [None]:
# change to v12 on 28 APRIL 2023
df_final.to_csv(ICEMODELS_DIR/f'model_input_v12.csv')

#df_lakeweather['SNOW'].loc[('LR1',1895),:].replace(0,np.nan).dropna()

Add ClimateNA solar radiation data based on lat,lon and Elevation columns
- lagJJA, lagSON, DJF and MAM

In [None]:
df_final = pd.read_csv(ICEMODELS_DIR/f'model_input_v12.csv')
df_final.columns

In [None]:
dfclimatena = pd.read_csv(climateNA_csv)

Look at ClimateNA data

In [None]:
dfclimatena[dfclimatena.Rad05!=-9999].Rad05.hist()
dfclimatena.head()

In [None]:
df_final.columns

In [None]:
dfclimatena[ (dfclimatena.Rad05!=-9999)].plot.scatter('Rad04','Tmax04',
                                                                                               marker='.'
                                                                                       )

Display solar radiation to confirm it makes sense. I.e., it should be spatially correlated.

In [None]:
dfclimatena[(dfclimatena.Rad07!=-9999) & 
            (dfclimatena.Year.isin([1990]))].plot.scatter(
    x='Longitude',y='Latitude',c='Rad07')

In [None]:
#dfclimatena[(dfclimatena.Rad01 < 0) & (dfclimatena.Rad01!=-9999)]

Look at variation in solar radiation

In [None]:
ax  = dfclimatena.loc[(dfclimatena.Rad05!=-9999) & dfclimatena.ID2.isin(['DMR1']),:].plot('Year','Rad03',label='DMR1')
dfclimatena.loc[(dfclimatena.Rad05!=-9999) & dfclimatena.ID2.isin(['MINN25']),:].plot('Year','Rad03',ax=ax,label='MINN25')


### Prepare ClimateNA data.
- Shift so "year" runs from June-Dec then Jan-May

In [None]:
df_final = pd.read_csv(ICEMODELS_DIR/f'model_input_v12.csv',index_col= 0)

dfclimatena = pd.read_csv(climateNA_csv).replace(-9999,np.nan)
dfclimatena = dfclimatena[['Year','ID2']+[f'Tave{i:02d}' for i in range(1,13)]+
                          [f'PPT{i:02d}' for i in range(1,13)]+
                         [f'Rad{i:02d}' for i in range(1,13)]]
dfclimatena = dfclimatena.set_index(['ID2','Year']).stack().reset_index()

dfclimatena['Month'] = dfclimatena.level_2.str[-2:].astype(int)

dfclimatena['Season'] = dfclimatena['Year'].apply(lambda x: f"{x}-{x+1}")
ind = dfclimatena['Month'] < 6
dfclimatena.loc[ind,'Season'] = dfclimatena.loc[ind,'Year'].apply(lambda x: f"{x-1}-{x}")
dfclimate = dfclimatena.pivot_table(index = ['ID2','Season'],columns='level_2',values=0)[[f"PPT{i:02d}" for i in
                                                                              [r for r in range(6,13)]+[r for r in range(1,6)]]+
                                                                             [f"Tave{i:02d}" for i in
                                                                              [r for r in range(6,13)]+[r for r in range(1,6)]]+
                                                                             [f"Rad{i:02d}" for  i in  
                                                                              [r for r in range(6,13)]+[r for r in range(1,6)]]]
                                                                                        
dfclimate

### ClimateNA (continued)
Create seasonal temperature, precipitation and solar radiation columns

In [None]:
for new, m in {'lagJJA':[6,7,8],'lagSON':[9,10,11],'DJF':[12,1,2],'MAM':[3,4,5]}.items():
    for v in ['PPT','Tave','Rad']:
        columns = [f'{v}{mm:02d}' for mm in m]
        newv  = f'{v}_{new}'
        if v in ['Tave','Rad']:
            dfclimate[newv] = dfclimate[columns].mean(axis=1,skipna=False)
        else:
            dfclimate[newv] = dfclimate[columns].sum(axis=1,skipna=False)
            
dfclimate['Year'] = dfclimate.index.get_level_values(1).str[:4].astype(int).values

In [None]:
df_final2 = df_final.merge(dfclimate.iloc[:,-12:].reset_index(),
                           right_on=['ID2','Year'],left_on=['lakecode','start_year'],how='left').drop(['ID2','Year'],axis=1)

In [None]:
df_final2.to_csv(ICEMODELS_DIR/f'model_input_v12b.csv')

Check that this new version (v12b) has the same indices and columns as the previous version (v11b)

New columns in v12b:
- 'ZCFreezeDOY', 'ZCThawDOY', 'froze', 'FDD_on_zc', 'PDD_off_zc', 'PDD_off'

In [None]:
dd = pd.read_csv(ICEMODELS_DIR/f'model_input_v11b.csv')

In [None]:
index1 = dd.set_index(['lakecode','start_year']).index
index2 = df_final2.set_index(['lakecode','start_year']).index
missing_index = [i for i in index1
 if i not in index2]
display(dd.set_index(['lakecode','start_year']).loc[missing_index,:])
missing_index = [i for i in index2
 if i not in index1]
display(df_final2.set_index(['lakecode','start_year']).loc[missing_index,:])

print([c for c in df_final2.columns if c not in dd.columns])
print([c for c in dd.columns if c not in df_final2.columns])
