# Modeling Soil Moisture

The storage and movement of soil water is subjected to the interaction of multiple forces associated with the Earth's gravitational field, interactions with porous and non-porous soil particles, the presence of solutes in the soil solution, plant roots, microorganisms, and environmental factors like precipitation. 

While soil water flow usually requires sophisticated models, the temporal dynamics of soil water storage can be captured with simple Markov-type models, at least for simple applications in agriculture and hydrology. A common model used to estimate the soil water storage over a given depth over time is:

$$ S_t = \alpha S_{t-1} + P_t$$

where $S_t$ is the soil water storage in a given soil profile at time $t$ (typically daily time-steps for this simple models) and $P_t$ is total precipitation during time $t$. The parameter $\alpha$ is called the loss parameters and encapsulates losses due to evaporation, transpiration (if veegtation is present), deep drainage, and surface runoff. In the absence of precipitation ($P_t = 0$), the soil water storage at time $t$ is a fraction of the soil water storage at time $t-1$. In this case, the change in soil water storage is dictated by the magnitude of the parameter $\alpha$, which is usally constant in simple models and can also change with space and time in more advanced models. The parameter $\alpha$ is an empirical parameters that needs to be derived from observations, or at least inferred from studies conducted under similar environmental conditions.

In this notebook we will use a refined version of the previous equation, in which we add few additional parameters to constrain the time series of soil water storage within physically plausible upper and lower limits. We will also use a transient $\alpha$  parameter to account for the different atmospheric demand along the year. The new equation is:

$$ S_t = S_{min} + \alpha (S_{t-1} - S_{min}) + P_t $$

$$ S_t = min(S_t, S_{max}) $$


$S_{min}$ is the minimum soil water storage and $S_{max}$ is the maximum soil water storage at saturated conditions. The $S_{max}$ constrain will automatically ignore precipitation inputs that exceed the storage capacity of the soil (over the considered depth). These losses would be due to runoff, drainage, or a combination of both, and would be in addition to the losses accounted for the loss coefficient. One way to estimate the loss coefficient `alpha` is this:

$$ \alpha = c + (1-c) \ sin \Bigg [2 \pi \frac{DOY-\phi}{365} + \frac{\pi}{2} \Bigg ]  $$

$\phi$ is a phase constant to align the sine curve with the day of the year (DOY) with maximum demand in days from DOY=182 and $c$ is an empirical constant that determines the annual mean loss coefficient. During the winter time the loss coefficient is assumed to reach a value of nearly 1, supressing any water loss while $\alpha$ is at or near its maximum value.

In [1]:
# Import modules
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import column
output_notebook()

In [2]:
# Load data
df = pd.read_csv('../datasets/gypsum_ks_daily_2018.csv')
df = df[['TIMESTAMP','PRECIP','VWC5CM','VWC10CM','VWC20CM']]
df.head()

Unnamed: 0,TIMESTAMP,PRECIP,VWC5CM,VWC10CM,VWC20CM
0,1/1/18 0:00,0.0,0.1377,0.1167,0.2665
1,1/2/18 0:00,0.0,0.1234,0.1021,0.2642
2,1/3/18 0:00,0.0,0.1206,0.0965,0.2353
3,1/4/18 0:00,0.0,0.1235,0.0973,0.2094
4,1/5/18 0:00,0.0,0.1249,0.0976,0.2047


In [3]:
# Conver date strings to Pandas datetime
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%y %H:%M')

# Compute day of the year
df['DOY'] = df['TIMESTAMP'].dt.dayofyear
df.head()

Unnamed: 0,TIMESTAMP,PRECIP,VWC5CM,VWC10CM,VWC20CM,DOY
0,2018-01-01,0.0,0.1377,0.1167,0.2665,1
1,2018-01-02,0.0,0.1234,0.1021,0.2642,2
2,2018-01-03,0.0,0.1206,0.0965,0.2353,3
3,2018-01-04,0.0,0.1235,0.0973,0.2094,4
4,2018-01-05,0.0,0.1249,0.0976,0.2047,5


In [4]:
# Compute storage in top 20 cm
S_obs = df['VWC5CM']*50 + (df['VWC5CM']+df['VWC10CM'])/2*50 + (df['VWC10CM']+df['VWC20CM'])/2*100

# Define function for loss coefficient
alpha_fn = lambda x,c,phi: c + (1-c) * np.sin(2*np.pi*(x-phi)/365 + np.pi/2)


In [10]:
# Define independent variables
P = df['PRECIP']
DOY = df['DOY']

# Pre-allocate variables
S_pred = np.ones(df.shape[0]) * np.nan
alpha = np.ones(df.shape[0]) * np.nan

# Model parameters
c = 0.95
phi = 15
S_min = S_obs.min()
S_max = S_obs.max()

# Initial conditions
S_pred[0] = S_obs[0]
alpha[0] = alpha_fn(DOY[0],c,phi)


# Implement Markov chain
for t in range(1,df.shape[0]):
    alpha[t] = alpha_fn(DOY[t],c,phi)
    S_pred[t] = S_min + alpha[t] * (S[t-1] - S_min) + P[t]
    S_pred[t] = np.min([S_pred[t],S_max])
    
    
# Plots
f1 = figure(width=700, height=300, x_axis_type='datetime')
f1.line(df['TIMESTAMP'], S_obs, legend_label='Observed')
f1.line(df['TIMESTAMP'], S_pred, line_color='tomato', legend_label='Predicted')
f1.yaxis.axis_label = 'Soil water storage in top 20 cm (mm)'
f1.legend.location = 'top_left'

f2 = figure(width=700, height=300, x_axis_type='datetime')
f2.line(df['TIMESTAMP'], alpha)
f2.yaxis.axis_label = 'Loss coefficient (unitless)'

show(column(f1,f2))


In [13]:
# Compute error

MAE = np.mean(np.abs(S_obs - S_pred))
print('Mean Absolute Error', round(MAE,1), 'mm')

Mean Absolute Error 7.1 mm


## Practice

- What other improvements would you add to this simple Markov chain model to improve the modeling of surface and rootzone soil moisture?

- What happens with the predictions of soil water storage if we change the phase constant to a value of 90? Where does the maximum loss occurs? Create a figure showing the error ($S-S_obs$) between the observed and modified soil water storage using the shifted loss coefficient.

## References

Crow, W.T. and Ryu, D., 2009. A new data assimilation approach for improving runoff prediction using remotely-sensed soil moisture retrievals. Hydrology and Earth System Sciences, 13(1), pp.1-16.