spatial_leave_one_out_method.Rmd

---
title: "Spatial Leave One Out"
author: "Daivd Leslie"
date: "Wednesday, March 25, 2015"
output: pdf_document
---

The purpose of the code is to evaluate the effectiveness of the spatial leave one out method for determining the predictive power of each model given. In order to acomplish this, the model given will be trained with data that has been reduced by **one sample**. **The sample that** is left out will then be used as a new data point to be predicted by the given model. Let's first start by generating some spatial autocorrelated data.

```{r message=FALSE, warning=FALSE}
# Import Statement(s)
library(RandomFields)
# Create Grid
n = 100

# Set distance between points (Must be derived from data)
spat_range = 5

# Create  a model with spatial dependence
mod_spat_dep = RMexp(var=1, scale=spat_range)

# Create spatially autocorralated predictor variables
x1 = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)
x2 = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)
x3 = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)

# Create spatail error term
spat_err = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)

# Convert objects to vectors
spat_err = as.vector(spat_err)
x1 = as.vector(x1)
x2 = as.vector(x2)
x3 = as.vector(x3)
y = 2*x1 + x2 + 3*x3 + spat_err

# Create coordinates
coords = cbind(rep(x=1:n, times=n), rep(x=1:n,each=n))

# Organize coordinates, predictors, and responce in data frame
dataset = data.frame(y, x1, x2, x3, coords)

head(dataset)

```

Now that we have our spatially autocorrelated data, lets take a sample of our data and verify that the results generated by the model are reasonalbe estimates of the true model's coeffcients (Remebering that they should be close to the exspected values of: x1=2, x2=1, and x3=3). 

```{r}
# Create sample population
n = 2000
samp = sample(nrow(dataset), size=n)
# Create model(s)
mods = list()
mods$x1234_mod = glm(y ~ x1 + x2 + x3, data = dataset[samp, ])


model = glm(y ~ x1 + x2 + x3, data = dataset[samp, ])

summary(mods$x1234_mod)
```

Looking at the model summary, it appears that calculated coeffeicents are fairly close to actual coefficents. Now that we feel cofident about the sample taken, lets see how good the models generated will be at predicting new data. In order to do this, lets apply the spatial leave one out method to the data.

```{r warning=FALSE}
# Function(s)
get_training_rows = function(coords, dist_thres, longlat=FALSE) {
    ## Computing the distance matrix from the data:
    if (longlat) {
        require(sp)
        dist_matrix = spDists(coords, longlat=TRUE)
    }
    else
        dist_matrix = as.matrix(dist(coords))
    ## Initializing the row indices of the dataset to be used in training
    training_rows = list()
    ## Creating the sets of training indices
    for (i in 1:nrow(dist_matrix)) {
        # Keeping only the observations far enough of the i-st observation by
        # using the threshold distance
        num_cell = which(dist_matrix[i, ] > dist_thres)
        training_rows[[i]] = num_cell
    }
    return(training_rows)
}

sloo_simple = function(model,training){
  intercept = NULL
  x1 = NULL
  x2 = NULL
  x3 = NULL
  predicted = NULL
  observed = NULL
  logLik = NULL
  # Creating the response variable name
  y <- as.character(x=formula(x=model)[2])
  # Initializing the 'logLik' object
  # logLik <- vector(mode="numeric",length=nrow(x=model$data))
	# Calculating the SLOO logLikelihoods for each observation i
	for(i in 1:nrow(x=model$data)){ 
		# Extracting the i-st training set:
	   training_data <- model$data[training[[i]],]
		# Calculating the model parameters from the i-st training set:
		m <- glm(formula=formula(model),data=training_data,family= model$family)
    
    intercept[i] = coef(m)[1]
    x1[i] = coef(m)[2]
    x2[i] = coef(m)[3]
    x3[i] = coef(m)[4]
    
		# Predicting the i-st observation from the i-st training set:
		predicted[i] <- predict(object=m,newdata=model$data[i,],type="response")
    observed[i] <- residuals(m) + predict(m)
		# Calculating the probability of the i-st observed value according
    # to the predicted one by the i-st training set:
    logLik[i] <- dnorm(x=model$data[i,y],mean=predicted[i],
                      sd=sqrt(sum(residuals(m)^2)/nrow(training_data)),
                      log=T)
  }
  finalMatrix = data.frame(intercept, x1, x2, x3, predicted, observed, logLik)
	# Calculating the overall SLOO logLikelihood:
	# Sum.logLik <- sum(logLik)
	return(finalMatrix)
}

training = get_training_rows(coords[samp, ], dist_thres = spat_range)

# evalutate the four models using log likelihood 
# with the SLOO and a non-spatial approach
sloo_ll = sloo_simple(model, training)
#ll = sapply(mods, logLik)

head(sloo_ll)


```