# IRRmodel\*WCM calibration with PSO on KGE of $\sigma^0$

The v5 code is a final, cleaned and commented version.

## Temporal resolution of inputs parameters and calibration
The input datasets can have different temporal resolutions:
- hourly
- daily
- multi-daily

platinum_df tables of Budrio's SWC, rain and irrigation are hourly.
Climate data such as ET and PET are daily. 
Sigma0 values are daily and multi-daily.
NDVI, LAI and other satellite products are multi-daily.

In order to be self-consistent, the model must be calibrated by taking
into account the temporal scale of variation of quantities.
In particular, the soil water balance (SWB) model requires inputs
that can be hourly or daily. Hourly data have to resampled to be
consistent with daily ones, so SWC, rain and irrigation have to be taken
as their mean and sum values. In this way consistency with ET or PET is
ensured.
Once the SWB model has provided daily values, they can be used as inputs in
the water cloud model, that is calibrated against sigma0 values.

It is clear that an optimal calibration would require an hourly SWB estimate
to match with the correct hours of passage of the satellite. On the other
hand, this would require hourly interpolation of ET and PET data. Moreover, at the
moment the computational power required for such a procedure is not possible
to estimate.

In [2]:
import sys
sys.path.append('../')

from modules.funcs import *
from modules.funcs_pso import *
# from modules.pyeto import *

# KEEP YOUR MODELS IN THE NOTEBOOK UNTIL THEY ARE PERFECT
# CAUSE EXTERNAL IMPORT IS AWFUL IN JUPYTER
# from IRRI_WCM.IRRI_WCM_model import *

In [3]:
def pso_calib_irri(PAR):
    """Ausiliary function for PSO optimization"""
    global inputs
    global irri
    n_particles = PAR.shape[0]
    err = np.zeros(n_particles)
    for i in range(n_particles):
        WW,IRR,sig0,KGE = IRR_WCM(PAR[i], inputs, irri)
        err[i] = 1 - KGE
    return err

# WCM

In [4]:
def WCM(PAR, data_in, units='lin'):
    """Water Cloud Model.
    
    This function simulates backscattering with WCM and returns
    the KGE index to perform its minimization for calibration
    of parameters A,B,C,D.
    WCM is parametrized with a single vegetation descriptor (nominated
    LAI, but can be anything).
    Fitting can be performed in linear or dB scale.
    
    
    Inputs
    ------
    - PAR: list
        List of initial guesses for the parameters to calibrate.
    - data_in: list
        List of inputs of observables, that must be in the form:
        [SM,LAI,t_deg,obs], being SM = soil moisture,
        LAI = Leaf Area Index, t_deg = angle of observation,
        obs = observed total sigma0
    - units: str, default 'linear'
        choose to calibrate the model's parameters in 'linear' or 'db' scale
        
    Return
    ------
    KGE between simulated and observed backscattering.
    
    """

    A,B,C,D = PAR # parameters to fit
    SM,LAI,t_deg,obs = data_in # input data
    
    theta = t_deg*np.pi/180. # angle of observation
    sig0s_dB = C+D*SM # sigma0_soil [dB]
    T2 = np.exp((-2*B*LAI)/np.cos(theta)) # attenuation
    
    if units=='lin':
        sig0s = db_lin(sig0s_dB) # sigma0_soil [lin]
        sig0v = A*LAI*np.cos(theta)*(1-T2) # sigma0_veg [lin]
        sig0_lin = T2*sig0s+sig0v # sigma0_tot [lin]
        sig0=lin_db(sig0_lin) # sigma0_tot [dB]
    elif units=='db':
        sig0v = A*LAI*np.cos(theta)*(1-T2) # sigma0_veg [db]
        sig0 = T2*sig0s+sig0v # sigma0_tot [db]
    else: raise NameError('Please choose one of the options: linear/db')
        
    OUT=he.evaluator(he.kge, sig0, obs) # OUT is kge, r, alpha, beta
    KGE=OUT[0,:];

    return [sig0,KGE]

#----------------------------------------------------------------------------

def SM_fromWCM(PAR, data_in, units='lin'):
    """Inverted WCM for SM estimation."""

    A,B,C,D = PAR # parameters, fitted
    SM,LAI,t_deg,obs = data_in # input data
    
    theta = t_deg*np.pi/180. # angle of observation
    T2 = np.exp((-2*B*LAI)/np.cos(theta)) # attenuation
    sig0v = A*LAI*np.cos(theta)*(1-T2) # sigma0_veg
    
    if units=='lin':
        sig0s_lin = (db_lin(obs)-sig0v)/T2
        SMretr = (lin_db(sig0s_lin)-C)/D
    elif units=='db':
        sig0s = (obs-sig0v)/T2
        SMretr = (sig0s-C)/D 
    
    OUT=he.evaluator(he.kge, SMretr, SM) # OUT is kge, r, alpha, beta
    KGE=OUT[0,:];

    return [SMretr,KGE]

# IRRI+WCM

In [5]:
def IRR_WCM(PAR, inputs, user_in):
    """Irrigation model and WCM integration.
    
    Based on minimization of KGE between observed and simulated
    $\sigma^0$ values via PSO (pyswarm) optimization.
    The soil water balance model (IRR) produces an estimate of the soil water
    content WW [%] that is used to simulate $\sigma^0$ by a water cloud
    model (WCM).
    
    Inputs
    ----------
    - PAR: initial guess values for parameters to calibrate
        PAR = [A, B, C, D, W_0, W_max, S_fc, S_w, rho_st, Kc]
    - inputs: input quantities for calibration,
        [d, d_sat, P, IRRobs, EPOT, WWobs, LAI, t_deg, obs]
    - user_in: user-defined options
        irri = user_in: if user_in=True, irrigation is estimated
        and not taken as an input, else the input observed irrigation
        is used in the soil water balance
    
    Return
    -------
    KGE from hydroeval between sigma0 observed and simulated.
    
    """

    # User input
    irri = user_in
    
    # Unpack inputs
    A, B, C, D, W_0, W_max, S_fc, S_w, rho_st, Kc = PAR
    d, d_sat, P, IRRobs, EPOT, WWobs, veg, t_deg, obs = inputs
    
    W_fc = S_fc*W_max # water content at field capacity
    W_w  = S_w*W_max # water content at wilting point
    theta = t_deg*np.pi/180. # angle of incidence
    
    if irri==True: IRR = [0]*len(d) # daily, water content
    else: IRR = IRRobs
    
    Ks = [0]*len(d) # daily, water stress coefficient
    rho = [0]*len(d) # daily, depletion fraction
    PS = [0]*len(d) # daily, deep percolation
    W = [0]*len(d) # daily, water content
    
    W[0] = W_0*W_max
    
    for t in [i+1 for i in range(len(d)-1)]:
        rho[t]=rho_st+0.04*(5-Kc*EPOT[t])
        if W[t-1]>=(1-rho[t])*W_fc:
            Ks[t]=1
        elif (W[t-1]>W_w)and(W[t-1]<(1-rho[t])*W_fc):
            Ks[t]=float(W[t-1]-W_w)/((1-rho[t])*(W_fc-W_w))
        else: Ks[t]=0
        
        DOY=d[t].dayofyear
        
        # Irrigation estimate (for summer season only)
        # Irrigation is estimated as the amount of water needed from the day
        # before to take water content up to field capacity
        if irri==True:
            if np.logical_and(DOY>134,DOY<235): # summer season
                if W[t-1]<=(1-rho[t])*W_fc: IRR[t]=W_fc-W[t-1]
        
        # Water balance
        W[t]=W[t-1]+P[t]+IRR[t]-EPOT[t]*Kc*Ks[t]
        
        # Computation of deep percolation (water above field capacity)
        if W[t]>W_fc:
            PS[t]=W[t]-W_fc
            W[t]=W_fc
            
    WW=np.array(W)/W_max   
    WWsat = pd.DataFrame(timeseries(d,WW)).set_index(0).loc[d_sat][1].values
    
    T2 = np.exp((-2*B*veg)/np.cos(theta)) # two-way attenuation from the vegetation layer
    sig0s = db_lin(C+D*WWsat) # define bare soil backscatter [fit in dB, then in lin]
    sig0v = A*veg*np.cos(theta)*(1-T2) # define backscatter from the vegetation [fit in lin]
    sig0_lin = T2*sig0s+sig0v
    sig0=lin_db(sig0_lin) # from linear scale to dB
        
    OUT=he.evaluator(he.kge, sig0, obs) # OUT is kge, r, alpha, beta
    KGE=OUT[0,:];

    return [WW,IRR,sig0,KGE]

# Global

In [6]:
filename = f'irr_obs_'

# Input data

Input data formatting convention:
- ausiliary variables for extraction of data (directory name, file name, etc...)
- extraction into pd dataframe
- cleaning, resampling: drop unnecessary columns, set index to daily DateIndex

## Time resampling consistency

- $\sigma^0$ values are extracted with a timestamp aaaa-mm-dd hh:mm:ss with frequency 'H' (hourly), then are rounded by `.round()` to the midnight of the nearest day to have frequency 'D' (daily), e.g. a passage at 7 am on 1st july is rounded to 0 am of 1st july, a passage at 7 pm would be rounded at 0 am of 2nd july.
- many quantities need to be resampled from hourly to daily datasets: use the `.resample()` method on a dataframe with hourly DatetimeIndex and pass the argument `origin='end_day'`. In this way, for each day X, data are considered between hours 1 and 24(=0 of day X+1) and the timestamp assigned is the one of the day X+1 (on which the operation ends). 

In [7]:
print('Starting...\n'+
      '#-------------------------------------------------------------\n'+
      'Use of satellite-derived SM is provided for comparison, not calibration.\n')
verbose = True if input("Verbose data extraction? (Describe datasets/files) [y/n]")=='y' else False

Starting...
#-------------------------------------------------------------
Use of satellite-derived SM is provided for comparison, not calibration.



Verbose data extraction? (Describe datasets/files) [y/n] y


In [8]:
#----------------------------------------------------------------------------
# Field data from TEST_SITE
# Daily data from 2015 to 2017, various gaps

# Data extracted:
# - rain (as input SWB)
# - EPOT = potential evapotranspiration (as input SWB)

namesite = 'ITALY_BUDRIO'
siteID = '5'
namefig = namesite+'_'+siteID

site_df = xr.open_dataset(f'Inputs\TEST_SITE\TEST_SITE_{namesite}.nc',
                         engine='netcdf4').to_dataframe();
site_df = site_df.rename(columns={'Time_days':'Date'})
site_df = site_df.set_index('Date')
site_df = site_df.loc[:,[col for col in site_df.columns if col.endswith(siteID)]]
if verbose: site_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1037 entries, 2015-01-01 to 2017-11-02
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Irrigation_5        214 non-null    float64
 1   Rainfall_5          1037 non-null   float64
 2   ET_5                1037 non-null   float64
 3   PET_5               1037 non-null   float64
 4   SSM_ASCAT_5         1009 non-null   float64
 5   SSM_CCI_combined_5  998 non-null    float64
 6   SSM_CCI_active_5    972 non-null    float64
 7   SSM_CCI_passive_5   925 non-null    float64
 8   SSM_SMAP_5          609 non-null    float64
 9   SSM_SMOS_5          598 non-null    float64
 10  SSM_THEIA_5         179 non-null    float64
 11  SSM_RT1_5           241 non-null    float64
dtypes: float64(12)
memory usage: 105.3 KB


In [9]:
#----------------------------------------------------------------------------
# Sigma0 values

# Freq: D
# Daily values of backscattering from 2014 to 2022 (complete S1 series)
# Data extracted:
# - sigma0 values, VV and VH
# - angle of incidence of reference orbit (nearest to 40°)

sigma_df = pd.read_csv('Data\\budrio-half.csv', delimiter='\t');
sigma_df['Datetime'] = sigma_df.Date.apply(lambda x : pd.to_datetime(x))
sigma_df.Date = sigma_df.Date.apply(lambda x : pd.to_datetime(x).round(freq='D'))
sigma_df = sigma_df.set_index('Date')
if verbose: sigma_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1150 entries, 2014-10-12 to 2022-11-28
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Angle[°]     1150 non-null   float64       
 1   Geometry     1150 non-null   object        
 2   Orb          1150 non-null   int64         
 3   Pass         1150 non-null   object        
 4   VV_norm[dB]  1150 non-null   float64       
 5   VH_norm[dB]  1150 non-null   float64       
 6   CR           1150 non-null   float64       
 7   Datetime     1150 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(4), int64(1), object(2)
memory usage: 80.9+ KB


In [97]:
#----------------------------------------------------------------------------
# Budrio field data from platinum_df tables

# Freq: H
# Data extracted:
# - SWC (as input SWB/comparison)
# - rain (as input SWB)
# - irrigation (as input SWB)
# - temperature (as input SWB, ET0 estimate)

platinum_df = pd.ExcelFile('Inputs\Platinum_Budrio.xlsx', engine='openpyxl')
platinum_df = pd.concat([platinum_df.parse('2017_1h'), platinum_df.parse('2020_1h')])

# Column 'Date' contains date+hour = hourly information
# Column 'Data' contains only date = daily information
platinum_df['Ora_1'] = pd.to_datetime(platinum_df['Ora'].astype('str')).apply(lambda x: x.time())
platinum_df['Data_1'] = pd.to_datetime(platinum_df['Data'].astype('str')).apply(lambda x: x.date())
platinum_df['Datetime'] = platinum_df.apply(lambda r : dtt.datetime.combine(r['Data_1'],r['Ora_1']),1)
platinum_df = platinum_df.drop(['ID', 'Data', 'Ora', 'Data_1', 'Ora_1', '214Pb[cps]'],axis=1)
platinum_df = platinum_df.set_index('Datetime')
platinum_resampled = platinum_df.resample('D', origin='end_day')
if verbose: platinum_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9474 entries, 2017-04-03 11:00:00 to 2020-09-01 20:00:00
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   SWC[m3/m3]       8888 non-null   float64
 1   Pioggia[mm]      9394 non-null   float64
 2   Irrigazione[mm]  9474 non-null   float64
 3   Temperatura[°C]  9442 non-null   float64
dtypes: float64(4)
memory usage: 370.1 KB


In [12]:
#----------------------------------------------------------------------------
# Budrio field data from meteo tables

# Freq: H
# Data extracted:
# none
# Needed for eto from FAO PM 

meteo_df = pd.ExcelFile('Inputs\Budrio_Meteo.xlsx', engine='openpyxl')

meteo_df = pd.concat([meteo_df.parse('2017'), meteo_df.parse('2020')]).set_index('ID')

# Column 'Date' contains date+hour = hourly information
meteo_df['Datetime'] = meteo_df.apply(lambda r : dtt.datetime.combine(r['Data'],r['Ora']),1)
meteo_df = meteo_df.set_index('Datetime')
meteo_df = meteo_df.drop(['Data', 'Ora'],axis=1)

if verbose: meteo_df.info()
# meteo_df.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9475 entries, 2017-04-03 11:00:00 to 2020-09-01 20:00:00
Data columns (total 12 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Raffica di vento [m/s]          9474 non-null   float64
 1   Radiazione solare [W/m2]        9474 non-null   float64
 2   Direzione del vento [°]         9474 non-null   float64
 3   Radiazione UV [MED]             9474 non-null   float64
 4   Velocità del vento [m/s]        9474 non-null   float64
 5   Pressione atmosferica [hPa]     9474 non-null   float64
 6   Punto di rugiada [C°]           9474 non-null   float64
 7   Temperatura Aria Netsens [C°]   5125 non-null   float64
 8   Umidità aria Netsens [%]        5125 non-null   float64
 9   Temperatura Aria Supporto [C°]  5125 non-null   float64
 10  Temperatura Aria [C°]           4349 non-null   float64
 11  Umidità aria [%]                4349 non-null   float64
dty

In [14]:
# Budrio database
# Freq: H
# Merging of Platinum+Meteo

meteo_h = pd.merge(right=platinum_df, left=meteo_df, on='Datetime')
if verbose: meteo_h.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9474 entries, 2017-04-03 11:00:00 to 2020-09-01 20:00:00
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Raffica di vento [m/s]          9473 non-null   float64
 1   Radiazione solare [W/m2]        9473 non-null   float64
 2   Direzione del vento [°]         9473 non-null   float64
 3   Radiazione UV [MED]             9473 non-null   float64
 4   Velocità del vento [m/s]        9473 non-null   float64
 5   Pressione atmosferica [hPa]     9473 non-null   float64
 6   Punto di rugiada [C°]           9473 non-null   float64
 7   Temperatura Aria Netsens [C°]   5125 non-null   float64
 8   Umidità aria Netsens [%]        5125 non-null   float64
 9   Temperatura Aria Supporto [C°]  5125 non-null   float64
 10  Temperatura Aria [C°]           4348 non-null   float64
 11  Umidità aria [%]                4348 non-null   float64
 12

After long and deep thinking
and at least ten coffee drinking
I have reached the conclusion
that the best possible solution
for the inputs' database 
is in fact the easiest case:
take each column as it is,
resample, sum or take the mean,
give a name to every one
and please take it easy for once!

In [None]:
# Build inputs as timeseries (using the nominal function)
# timeseries(dates, data) -> matrix[columns={dates, data}]

In [47]:
# ET0 calculation

import datetime
import sys

sys.path.append('../')
from modules.pyeto.pyeto import *

def hargre(lat_deg, dates, temp_min, temp_max, temp_mean):
    """Hargreaves-Samani model for ET0 estimation from temperature input.
    
    Params
    ------
    - lat_deg: float
    - dates: timestamp
    - temp_*: float
    
    
    """
    lat = deg2rad(lat_deg)  # Convert latitude in degrees to radians
    day_of_year = dates.dayofyear
    sol_decli = sol_dec(day_of_year) # Solar declination
    sha = sunset_hour_angle(lat, sol_decli)
    ird = inv_rel_dist_earth_sun(day_of_year)
    et_radia = et_rad(lat, sol_decli, sha, ird) # Extraterrestrial radiation
    return hargreaves(temp_min, temp_max, temp_mean, et_radia)

In [98]:
from IRRI_WCM.EPOT_Hargreaves_pyeto import *

lat_deg = 44.570842547510622 # latitude of Budrio (deg)
temp_min = platinum_resampled.min()['Temperatura[°C]'].values
temp_max = platinum_resampled.max()['Temperatura[°C]'].values
temp_mean = platinum_resampled.mean()['Temperatura[°C]'].values
dates = platinum_resampled.asfreq().index
eto = timeseries( dates,
                 [ hargre(lat_deg, dates[i] , temp_min[i], temp_max[i], temp_mean[i])
                  for i in range(len(dates)) ] )
eto_df = pd.DataFrame(eto).rename(columns={0:'Date',1:'ET0'}).set_index('Date')
if verbose: eto_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1248 entries, 2017-04-04 to 2020-09-02
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ET0     396 non-null    float64
dtypes: float64(1)
memory usage: 19.5 KB


# [WIP]