# Writing a sum-of-squares function in R, for SIR model fitting


## Introduction

When you are trying to fit a model to data, you use a *distance function* to assess, quantitatively how "far" the model is from the data. By calculating the output of this function for a range of parameter sets, you can find which of those parameter sets generates the model which is the closest fit to the data.

You can calculate the value of the distance function by hand for each model you try - but because this means doing the same calculation repeatedly with different inputs, this is a good task for which using a function would be efficient.

So here, you will create a function yourself, to  

* calculate and return the *sum-of-squares* value (SSQ) for the fit of a simple SIR model, parameterised with given values of $\beta$ and $\gamma$, to any dataset.

* use this function to find the sum-of-squares of models fit to a simple datset (the flu dataset you may have used in other etivities), with the following parameters:

     * $\beta$ = 1.15, $\gamma$ = 0.02  
     * $\beta$ = 1.7, $\gamma$ = 0.45 

Put very simply, your function will have this structure:

In [None]:
SIR_SSQ <- function(arguments) { ### fill in your arguments
    
    # Calculate model output
    ### YOUR CODE HERE ###
    
    # Calculate sum-of-squares (SSQ) of model fit
    ### YOUR CODE HERE ###

    return(SSQ)

}

Calculating the SSQ gives a quantitative measure of the distance from your model output to the data. 

Last week's etivities demonstrate manual calibration of a model, visually checking the model ouputs from a small number of parameter combinations. It would quickly become infeasible to test enough combinations of parameters in this way, to find the combination producing the best available model fit. So automating the process of simulating each model and calculating the SSQ, giving a quantitative distance measure instead of a visual check for each model, makes this more efficient. 

So in this etivity we will create a function which can:
simulate the model for a given combination of parameters, compare the output to data and calculate the SSQ.

Later in this course you will learn how to use optimisation functions to take this a step further and automatically find the combination of parameters giving the best fit, as defined by your chosen distance function. One of these functions, ```optim()```, is introduced in the etivity following this one in this module.  

With this in mind, think about  
 * what else you will need to put within the body of the function, apart from the sum-of-squares calculation  
 * what inputs you will give the function and how to arrange them into arguments.
 
 
*Tips:*  

* *When using ```ode()```, you have an example of a function as an argument of another function.* 
*The function that you feed into ```ode()``` gets the values for its arguments from the ```ode()``` function.*

* *Keep track of the names of your arguments, so you can see that they match with what the "inner" function is expecting.*

* *To make your function applicable to any dataset and combination of parameters you choose, the function will need to read in the data as one of its arguments.*

* *To calculate the SSQ, you will have to calculate the difference between model output and data, at only the timepoints for which data is available. How can you achieve this?*

* *Write your function so that it has sensible variable names within, perhaps referring to particular columns in the data. Then when you use the function, make sure the data you are testing it on, has the expected column names!* 

(NB. Regarding the last hint... There is more than one way of writing a working, widely applicable function, but our solution is written this way, where you'll have to always check the names within the data and then rename as necessary. You could instead write your function so it takes in the correct column name as one of the arguments...)


### Task:

* **create a function which calculates and returns the *sum-of-squares* value (SSQ) for the fit of a simple SIR model, parameterised with given values of $\beta$ and $\gamma$, to any dataset.**

* **use this function to find the sum-of-squares of models fit to a simple datset (the flu dataset you may have used in other etivities), with the following parameters:**

     * $\beta$ = 1.15, $\gamma$ = 0.02  
     * $\beta$ = 1.7, $\gamma$ = 0.45 

In [4]:
SIR_SSQ <- function() { ### fill in your arguments
    
    # Calculate model output
    ### YOUR CODE HERE ###
    
    # Calculate sum-of-squares (SSQ) of model fit
    ### YOUR CODE HERE ###

    return(SSQ)

}

## load data 
flu_data <- read.csv("Graphics and Data/idm2_sir_data.csv")
initial_state_values <- c(S = 762, I = 1, R = 0)

## Check whether your loaded data, has the column names which your function is expecting.
## replace column names if necessary
### YOUR CODE HERE ### 
# (hint: search the internet for a function which will show you the column names of a data frame or matrix)

## calculate the sum-of-squares for an SIR model fit to these data where:
### beta = 1.15, gamma = 0.02  
### beta = 1.7,  gamma = 0.45  


### Further work:

How would you modify your SSQ function, so it would take in any type of model equation into the ```ode()``` function?