# SWMM Simulated Streamflow in Fall Creek, NY

## MAKE THIS SOUND BETTER

Reads, graphs and performs sensitivity analysis for observations, parameters and models from 24,000 SWMM model runs.

To generate parameter sets to run through SWMM, we employed the Saltelli method (described in more detail in Section 3). The code is available in the "Sampling.py" script. The parameter sets were then run through SWMM to simulate streamflow in the domain.

To complete and visualize a comprehensive sensitivity analysis on the SWMM simulations, we use packages from both R and Python. This command allows us to run R scripts in the Python jupyter notebook.

In [None]:
%load_ext rpy2.ipython

To calculate the objective functions for the SWMM simulated streamflow, we use the [hydroGOF](https://cran.r-project.org/web/packages/hydroGOF/hydroGOF.pdf) package in R. For the SWMM streamflow simulations in the Fall Creek, NY watershed, we calculate mean error, mean absolute error, mean squared error, Nase-Sutcliffe efficiency, percent bias, and root mean squared error. The objective function evaluations are saved in the "input/" folder to be loaded into the sensitivity analysis.

In [None]:
%%R 
# this rpy2 'Rmagic' command allows to run the entire cell block as R code

# load in the R script that calculates six objective functions on SWMM model simulations versus USGS streamgage data
# %load Calculate_Objective_Functions.R

library(dplyr)
library(hydroGOF)

# load in observation and simulation data
obs <- read.csv("input/observation_ts.csv", header = TRUE)
  # "time_steps" row, "index" and "value" columns 

sim <- read.csv("input/simulation_ts.csv", header = TRUE) %>%
    dplyr::select(-1)
  # "model_runs" rows, "time_steps" columns

model_runs <- nrow(sim)
time_steps <- ncol(sim)
num_pars <- ncol(pars)

mean_error <- array(NA, model_runs)
mean_abs_error <- array(NA, model_runs)
mean_sq_error <- array(NA, model_runs)
root_mse <- array(NA, model_runs)
p_bias <- array(NA, model_runs)
nse <- array(NA, model_runs)

for (i in 1:model_runs) {

  print(i)

  mean_error[i] <- me(sim = as.numeric(sim[i, 1:time_steps]), obs = as.numeric(obs[, 2]))
  mean_abs_error[i] <- mae(sim = as.numeric(sim[i, 1:time_steps]), obs = as.numeric(obs[, 2]))
  mean_sq_error[i] <- mse(sim = as.numeric(sim[i, 1:time_steps]), obs = as.numeric(obs[, 2]))
  root_mse[i] <- rmse(sim = as.numeric(sim[i, 1:time_steps]), obs = as.numeric(obs[, 2]))
  p_bias[i] <- pbias(sim = as.numeric(sim[i, 1:time_steps]), obs = as.numeric(obs[, 2]))
  nse[i] <- NSE(sim = as.numeric(sim[i, 1:time_steps]), obs = as.numeric(obs[, 2]))

}

OF <- as.data.frame(mean_error) %>%
  setNames("me") %>%
  dplyr:: mutate(mae = mean_abs_error,
                 mse = mean_sq_error,
                 rmse = root_mse,
                 pbias = p_bias,
                 nse = nse)

write.table(OF, "/input/OF_values.txt", sep = " ",
            row.names = FALSE, col.names = FALSE)

Now we can load the model simulation data, observation/truth data, parameter sets, time stamps, and objective function values into Python.

In [None]:
import pandas as pd

sim = pd.read_csv("input/simulation_ts.csv", index_col = 0)
obs = pd.read_csv("input/observation_ts.csv")
pars = pd.read_csv("input/params.csv", header = 0)
pars.columns = ['w', 'n_imperv', 'n_perv', 's_imperv', 's_perv', 'k_sat', 'per_routed', 'cmelt', 'Tb', 'A1', 'B1']
timestamps = pd.read_csv("input/timestamps.csv")
OF = pd.read_csv("input/OF_values.csv")

### 1. Graph Observed and Modeled Output

The following code produces time series plots for the observation/truth data. Since we have several thousand model simulation time series, visualizing them on a typical time series figure does not convey any significant meaning. Instead, we can rank the observations to get an idea of how often we exceed certain levels and create [magnitude percentile plots](https://waterprogramming.wordpress.com/2019/02/26/magnitude-varying-sensitivity-analysis-and-visualization-part-1/). The output plots saves to the "output/plots/magnitude_perc" directory.

In [None]:
%load Magnitude_Perc.py

### 2. Approximate Bayesian Calculation
Approximate Bayesian computation (ABC) represents the combination of model parameter values that maximize the probability of representing the observed data. ABC bypasses the evaluation of the likelihood function by approximating the likelihood function by using simulations compared to the observed data. For more information on ABC, see Engeland and Gottschalk (2002), Neuman (2003), Sunnaker et al. (2013) etc. 

The steps and requirements for approximate Bayesian calculation are: 1) Observed data has a known mean and standard deviation and the user defines a summary statistic (i.e. objective function) 2) Assume you don’t know anything about the parameters, so assume a uniform prior interval [0,1]. 3) A total of n parameters are drawn from prior, the model is simulated for each of the parameter points , which results in n sequences of simulated data. 4) Calculate the summary statistic for each sequence of simulated data 5) Calculate distance between observed and simulated transition frequencies for all parameter points. Specify some tolerance  and “keep” parameter points smaller than or equal to the summary statistic as approximate samples from the posterior.

For the python script below, specify the number of model runs, tolerance of the objective functions, number of histogram bins and the figure colors. Here we used pre-defined objective functions, but the code can be modified to calculate a variety of objective functions. The plots produced are histograms of the various parameters illustrating the difference between original modeled output and the ABC constrained parameter sets.

In [None]:
%load approx_bayes_calc_of_defined.py 

In [None]:
# Specify tolerance for objective functions (OF)
tolerance_rmse = 6.0   # OF < tolerance
tolerance_pbias = 15.0 # -tolerance < OF > tolerance
tolerance_nse = 0.0  # OF >= tolerance

runs = 24000 # specify number of model runs
bins = 100   # specify number of histogram bins
color1 = 'b' # color of original model output
color2 = 'k' # color of 1st ABC applied to OF (NSE)
color3 = 'r' # color of 2nd ABC applied to OF (p-bias)
color4 = 'g' # color of 3rd ABC applied to OF (RMSE)

# Runs function that evaluates models outputs with approximate Bayesian calculations
runABC(pars, OF, runs, bins, color1, color2)

### 2. Sensitivity Analysis

Three sensitivity analyses are incorporated into the workflow: a variance-based sensitivity analysis, a moment-independent sensitivity analysis, and an ordinary least squares regression. The Sobol method ([Sobol, 2001](https://doi.org/10.1016/S0378-4754(00)00270-6)) is a variance-based global sensitivity analysis that yields first-order, second-order, and total-order sensitivity indices. Sobol’s method can effectively handle nonlinear responses and measures the effects of interactions within non-additive systems. It decomposes the variance of the model output into fractions which can be attributed to inputs or sets of inputs. The first-order sensitivity index (i.e. main effect index) quantifies parameter impact on model output variance by averaging over the variations in other input parameters. The second-order sensitivity index decomposes model variance by parameter interactions with one another. The total-order sensitivity index (i.e. total effect index) measures the contribution each parameter on model output across the first-order index and all higher-order indices. Our workflow employs the Saltelli scheme ([Saltelli, 2002](https://doi.org/10.1016/S0010-4655(02)00280-1); [Saltelli et al., 2010](https://doi.org/10.1016/j.cpc.2009.09.018)), which allows for the calculation of the first-order, second-oder, and total-order sensitivity indices with fewer model runs than the traditional approach. 

The delta index ([Borgonovo, 2007](https://doi.org/10.1016/j.ress.2006.04.015); [Plischke, et al., 2013](https://doi.org/10.1016/j.ejor.2012.11.047)) is a moment-independent global sensitivity analysis. The delta sensitivity analysis searches for parameters with the greatest impact on the density function of model output. 

Lastly, the ordinary least squares regression yields a R2 coefficient, which quantifies the linear effects of model input parameters on model output variance.

To visualize objective function sensitivity to model input parameters, the following code produces radial convergence plots, scatter plots, portrait plots, and spider plots based on the outputs from the sensitivity analyses.

The Sobol and delta sensitivity indices are calculated using a modified version of the python sensitivity analysis library ([SALib](https://salib.readthedocs.io/en/latest/index.html)).

In [None]:
%load SensIndices_RCPlots.py

From the sensitivity analysis results (calculated and exported from python), we can create portrait plots, scatter plots, and spider plots for various objective functions and parameter values.

First, the data is loaded and formatted into a usable format and then exported to a .csv file. Then the script creates additional plots to help visualize and convey parameter sensitivity.

In [None]:
%load Portrait_Scatter_Spider.R

### XX. Conclusion

We did awesome stuff and this is how we feel about it...

### XX. References
- Engeland, K., Gottschalk L. Bayesian estimation of parameters in regional hydrological model, Hydrol. Earth Sys. Sci., **2002**, *6*(5), 883-898. https://doi.org/10.5194/hess-6-883-2002
- Neuman, S. Maximum likelihood Bayesian averaging of uncertain model predictions, Stoch. Environ. Res. Risk Assess. **2003** *17*, 291. https://doi.org/10.1007/s00477-003-0151-7
- Sunnåker M., Busetto A.G., Numminen E., Corander J., Foll M., Dessimoz C. Approximate Bayesian Computation, PLoS Comput. Biol. **2013** *9*(1): e1002803. https://doi.org/10.1371/journal.pcbi.1002803
