/
spatial_leave_one_out_method.Rmd
136 lines (110 loc) · 4.61 KB
/
spatial_leave_one_out_method.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
title: "Spatial Leave One Out"
author: "Daivd Leslie"
date: "Wednesday, March 25, 2015"
output: pdf_document
---
The purpose of the code is to evaluate the effectiveness of the spatial leave one out method for determining the predictive power of each model given. In order to acomplish this, the model given will be trained with data that has been reduced by **one sample**. **The sample that** is left out will then be used as a new data point to be predicted by the given model. Let's first start by generating some spatial autocorrelated data.
```{r message=FALSE, warning=FALSE}
# Import Statement(s)
library(RandomFields)
# Create Grid
n = 100
# Set distance between points (Must be derived from data)
spat_range = 5
# Create a model with spatial dependence
mod_spat_dep = RMexp(var=1, scale=spat_range)
# Create spatially autocorralated predictor variables
x1 = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)
x2 = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)
x3 = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)
# Create spatail error term
spat_err = RFsimulate(mod_spat_dep, x=1:n, y=1:n, grid=T)
# Convert objects to vectors
spat_err = as.vector(spat_err)
x1 = as.vector(x1)
x2 = as.vector(x2)
x3 = as.vector(x3)
y = 2*x1 + x2 + 3*x3 + spat_err
# Create coordinates
coords = cbind(rep(x=1:n, times=n), rep(x=1:n,each=n))
# Organize coordinates, predictors, and responce in data frame
dataset = data.frame(y, x1, x2, x3, coords)
head(dataset)
```
Now that we have our spatially autocorrelated data, lets take a sample of our data and verify that the results generated by the model are reasonalbe estimates of the true model's coeffcients (Remebering that they should be close to the exspected values of: x1=2, x2=1, and x3=3).
```{r}
# Create sample population
n = 2000
samp = sample(nrow(dataset), size=n)
# Create model(s)
mods = list()
mods$x1234_mod = glm(y ~ x1 + x2 + x3, data = dataset[samp, ])
model = glm(y ~ x1 + x2 + x3, data = dataset[samp, ])
summary(mods$x1234_mod)
```
Looking at the model summary, it appears that calculated coeffeicents are fairly close to actual coefficents. Now that we feel cofident about the sample taken, lets see how good the models generated will be at predicting new data. In order to do this, lets apply the spatial leave one out method to the data.
```{r warning=FALSE}
# Function(s)
get_training_rows = function(coords, dist_thres, longlat=FALSE) {
## Computing the distance matrix from the data:
if (longlat) {
require(sp)
dist_matrix = spDists(coords, longlat=TRUE)
}
else
dist_matrix = as.matrix(dist(coords))
## Initializing the row indices of the dataset to be used in training
training_rows = list()
## Creating the sets of training indices
for (i in 1:nrow(dist_matrix)) {
# Keeping only the observations far enough of the i-st observation by
# using the threshold distance
num_cell = which(dist_matrix[i, ] > dist_thres)
training_rows[[i]] = num_cell
}
return(training_rows)
}
sloo_simple = function(model,training){
intercept = NULL
x1 = NULL
x2 = NULL
x3 = NULL
predicted = NULL
observed = NULL
logLik = NULL
# Creating the response variable name
y <- as.character(x=formula(x=model)[2])
# Initializing the 'logLik' object
# logLik <- vector(mode="numeric",length=nrow(x=model$data))
# Calculating the SLOO logLikelihoods for each observation i
for(i in 1:nrow(x=model$data)){
# Extracting the i-st training set:
training_data <- model$data[training[[i]],]
# Calculating the model parameters from the i-st training set:
m <- glm(formula=formula(model),data=training_data,family= model$family)
intercept[i] = coef(m)[1]
x1[i] = coef(m)[2]
x2[i] = coef(m)[3]
x3[i] = coef(m)[4]
# Predicting the i-st observation from the i-st training set:
predicted[i] <- predict(object=m,newdata=model$data[i,],type="response")
observed[i] <- residuals(m) + predict(m)
# Calculating the probability of the i-st observed value according
# to the predicted one by the i-st training set:
logLik[i] <- dnorm(x=model$data[i,y],mean=predicted[i],
sd=sqrt(sum(residuals(m)^2)/nrow(training_data)),
log=T)
}
finalMatrix = data.frame(intercept, x1, x2, x3, predicted, observed, logLik)
# Calculating the overall SLOO logLikelihood:
# Sum.logLik <- sum(logLik)
return(finalMatrix)
}
training = get_training_rows(coords[samp, ], dist_thres = spat_range)
# evalutate the four models using log likelihood
# with the SLOO and a non-spatial approach
sloo_ll = sloo_simple(model, training)
#ll = sapply(mods, logLik)
head(sloo_ll)
```