# Spatial autocorrelation

(material taken from https://mgimond.github.io/Spatial/)


Suppose we have a collection of points on the earth's surface, and each point has a (numerical) feature value. We may ask the question: 

**Is the distribution of feature values random, or is there spatial structure?**

Technically, this amounts to the question whether the **spatial autocorrelation** is 0.


### Temporal autocorrelation

<img src=_img/Acf_new.svg width=500>
(image from Wikipedia)

<img src=_img/Random_maps.png>

**Moran's I** is a test statistic to test for spatial autocorrelation.

Let $N$ be the number of points and $w_{ij}$ the *weight* (strength of influence) between points $i$ and $j$. $w_{ii} =0$ for all $i$. 

$$
\begin{aligned}
I &= \frac{N}{W} \frac{\sum_{i,j} w_{ij}(x_i-\overline{x})(x_j-\overline{x}}{\sum_i(x_i-\overline{x})^2}
W &= \sum_{ij} w_{ij}\\
\overline x &= \frac{\sum_i x_i}{N}
\end{aligned}
$$

$I$ ranges from $-1$ to $1$. The expected value in the absence of spatial autocorrelation is 
$$
\frac{-1}{N-1}
$$

In [None]:
library(sf)
library(tidyverse)
library(tmap)
library(spdep)

In [None]:
load(url("https://github.com/mgimond/Spatial/raw/main/Data/moransI.RData"))


In [None]:
st_as_sf(s1)

In [None]:
library(tmap)
tm_shape(s1) + tm_polygons(style="quantile", col = "Income") +
  tm_legend(outside = TRUE, text.size = .8) 

First we create a neighborhood matrix.

In [None]:
nb <- poly2nb(s1, queen=TRUE)
nb

In [None]:
nb[[1]]


In [None]:
s1$NAME[1]

In [None]:
s1$NAME[c(2,3,4,5)]

Next we assign weights to the edges of the neighborhood graph. For simplicity's sake, we assume equal weight for each neighbor.

In [None]:
lw <- nb2listw(nb, style="W", zero.policy=TRUE)

In [None]:
lw$weights[1]

Then we compute the weighted average of the incomes of neighboring counties for each county.

In [None]:
Inc.lag <- lag.listw(lw, s1$Income)

In [None]:
Inc.lag

In [None]:
st_as_sf(s1) %>%
    st_drop_geometry() %>%
    select(NAME, Income) %>%
    mutate(Inc.lag = Inc.lag)

Doing some exploratory data analysis:

In [None]:
st_as_sf(s1) %>%
    st_drop_geometry() %>%
    select(NAME, Income) %>%
    mutate(Inc.lag = Inc.lag) %>%
    ggplot() +
    geom_point(aes(x=Income, y=Inc.lag)) +
    geom_smooth(aes(x=Income, y=Inc.lag), method=lm)

The slope of the regression line is Moran's I.

In [None]:
M <- lm(Inc.lag ~ s1$Income)
coef(M)[2]

To assess significance, we can the a random permutation test.

In [None]:
n <- 599L   # Define the number of simulations
I.r <- vector(length=n)  # Create an empty vector

for (i in 1:n){
  # Randomly shuffle income values
  x <- sample(s1$Income, replace=FALSE)
  # Compute new set of lagged values
  x.lag <- lag.listw(lw, x)
  # Compute the regression slope and store its value
  M.r    <- lm(x.lag ~ x)
  I.r[i] <- coef(M.r)[2]
}

In [None]:
data.frame(I.r = I.r) %>%
    ggplot() +
    geom_histogram(aes(x=I.r)) +
    geom_vline(xintercept=coef(M)[2], col='red')

Pseudo-$p$-value:

In [None]:
mean(I.r > coef(M)[2])

In [None]:
The Moran test does not use 

In [None]:
moran.test(s1$Income,lw)

There is also a version of Moran's I test using simulations:


In [None]:
MC<- moran.mc(s1$Income, lw, nsim=599)

# View results (including p-value)
MC

In [None]:
# Plot the distribution (note that this is a density plot instead of a histogram)
plot(MC, main="", las=1)