# Spatial Autocorrelation
<img src="images/spatial_auto.png" width=700 />

Autocorrelation is a measure of similarity (correlation) between nearby observations.

**The first law of geography**: Everything is related to everything else, but near things are more related than distant things.” Waldo R. Tobler (Tobler 1970)

The idea is to investigate whether or not spatial objects with similar values are clustered, randomly distributed or dispersed. But why is autocorrellation important? Statistics relies on observations being independent from one another. If autocorrelation exists in a time or space, then this violates the fact that observations are independent from one another. On the other hand, it also implies that there could be something interesting regarding die data distribution, which may be interesting to investigate.

## Spatial Autocorrelation

Spatial autocorrelation is simply to understand the degree to which one object is similar to other nearby objectsand is an is an extension of temporal autocorrelation.  

The idea is that where adjacent observations have similar data values the map shows positive spatial autocorrelation. Where adjacent observations tend to have very contrasting values then the map shows negative spatial autocorrelation. There are several statistical techniques for detecting its presence. 

In contrast to temporal autocorrelation, which only has one dimension, spatial autocorrellation is a little bit more complicated, because it has at least two dimensions. 

The presence of spatial autocorrelation is interresting for spatial analysis, because it can help us for example to investigate and understand association between different featuresin our data (e.g. Land cover and land surface, health care and survival ...). Also the presence of spatial autocorrelation implies information redundancy and also has important implications for the methodology of spatial data analysis. 

<img src="images/correlation.png" width=500 />

In [None]:
from libpysal.weights import lat2W, lag_spatial
from spreg import OLS
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import inv

In [None]:
def draw_map(lamb):
    s = 20
    n = s**2
    w = lat2W(s, s, rook=False)
    w.transform = 'R'
    e = np.random.random((n, 1))
    u = inv(np.eye(n) - lamb * w.full()[0])
    u = np.dot(u, e)
    ul = lag_spatial(w, u)
    u = (u - u.mean()) / np.std(u)
    ul = (ul - ul.mean()) / np.std(ul)
    gu = u.reshape((s, s))
    # Figure
    f = plt.figure(figsize=(9, 4))
    ax1 = f.add_subplot(121)
    ax1.matshow(gu, cmap=plt.cm.YlGn)
    ax1.set_frame_on(False)
    ax1.axes.get_xaxis().set_visible(False)
    ax1.axes.get_yaxis().set_visible(False)
    #---
    ax2 = f.add_subplot(122)
    sc = ax2.scatter(u, ul, linewidth=0)
    ols = OLS(ul, u)
    tag = "b = %.3f"%ols.betas[1][0]
    ax2.plot(u, ols.predy, c='red', label=tag)
    ax2.axvline(0, c='0.5')
    ax2.axhline(0, c='0.5')
    ax2.legend()
    plt.xlabel('u')
    plt.ylabel('Wu')
    plt.suptitle("$\lambda$ = %.2f"%lamb)
    plt.show()

In [None]:
%matplotlib inline

In [None]:
draw_map(0.95)

Let’s say we are interested in spatial autocorrelation of the Plasmodium falciparum parasite rate (PfPR)  in the different departments of Burkina Faso. If there were spatial autocorrelation, regions of a similar PfPR would be spatially clustered.

In [None]:
import geopandas as gpd

In [None]:
bfa = gpd.read_file('Data/vector/burkina/bfa.shp')
bfa

In [None]:
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
bfa.plot(column='_pfprmean', scheme='Quantiles', k=5, cmap='GnBu', legend=True, ax=ax)

## Spatial weights

Spatial weights are mathematical structures used to represent spatial relationships.

A spatial weight $w_{i,j}$ expresses the notion of a geographical relationship between locations $i$ and $j$. 

These relationships can be based on a number of criteria including contiguity, geospatial distance and general distances.

**Contigutiy weights**

These weights are symmetric, in that when polygon $A$ neighbors polygon $B$, both $w{AB} = 1$ and $w{BA} = 1$.

<img src="images/rook_queen.png" />

- **rook criterion**: spatial units are neighbors when they share a common edge 
- **queen criterion**: defines neighbors as spatial units sharing at least a common vertex

In [None]:
import libpysal as lps
gdf = bfa
wq =  lps.weights.Queen.from_dataframe(gdf)
wq

To get the neighbors & weights around an observation, use the observation's index on the weights object, like a dictionary:

In [None]:
wq[4]

In [None]:
self_and_neighbors = [4]
self_and_neighbors.extend(wq.neighbors[4])
print(self_and_neighbors)

In [None]:
neigbours = gdf.loc[self_and_neighbors][1:]
neigbours

In [None]:
fig, ax = plt.subplots(figsize = (10,10)) 
gdf.loc[self_and_neighbors].plot(color='red',ax=ax)
neigbours.plot(ax=ax)

In [None]:
from splot.libpysal import plot_spatial_weights

plot_spatial_weights(wq, gdf)
plt.show()

**Distance**
There are many other kinds of weighting functions in PySAL. Another separate type use a continuous measure of distance to define neighborhoods. 

In [None]:
def getXY(pt):
    return (pt.x, pt.y)
centroidseries = gdf['geometry'].centroid
x,y = [list(t) for t in zip(*map(getXY, centroidseries))]
plt.plot(x,y,'.')

In [None]:
data=np.column_stack((x, y))
kd = lps.cg.KDTree(data)
wnn2 = lps.weights.KNN(kd, 2)

In [None]:
plot_spatial_weights(wnn2, gdf)
plt.show()

**Kernel weights**

Kernel Weights are continuous distance-based weights that use kernel densities to define the neighbor relationship.

In [None]:
kw=lps.weights.Kernel(data)
plot_spatial_weights(kw, gdf)
plt.show()

**Similarity**

Once we have the data and the spatial weights matrix ready, we can start by computing the spatial lag. The spatial weight between different regions indicate if the two regions are neighbors (i.e., geographically similar). But what we also need is a measure of similarity between the attributes we want to investigate. Therefor we will calculate the spatial lag. 

For region $i$ the spatial lag is defined as: $$ylag_i = \sum_j w_{i,j} y_j$$

In [None]:
import mapclassify as mc
df = gdf
wq =  lps.weights.Queen.from_dataframe(df)
wq.transform = 'r'

In [None]:
y = df['_pfprmean']
ylag = lps.weights.lag_spatial(wq, y)

In [None]:
ylagq5 = mc.Quantiles(ylag, k=5)

In [None]:
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=ylagq5.yb).plot(column='cl', categorical=True, \
        k=5, cmap='GnBu', linewidth=0.1, ax=ax, \
        edgecolor='white', legend=True)
ax.set_axis_off()
plt.title("Spatial Lag pfprmean (Quantiles)")

plt.show()



In [None]:
df['lag_pfprmean'] = ylag
f,ax = plt.subplots(1,2,figsize=(2.16*5,5))
df.plot(column='_pfprmean', ax=ax[0], edgecolor='k',
        scheme="quantiles",  k=5, cmap='GnBu')
ax[0].axis(df.total_bounds[np.asarray([0,2,1,3])])
ax[0].set_title("pfprmean")
df.plot(column='lag_pfprmean', ax=ax[1], edgecolor='k',
        scheme='quantiles', cmap='GnBu', k=5)
ax[1].axis(df.total_bounds[np.asarray([0,2,1,3])])
ax[1].set_title("Spatial Lag pfprmean")
ax[0].axis('off')
ax[1].axis('off')
plt.show()

## Global Spatial Autocorrelation

To complement the geovisualization of these associations we can turn to formal statistical measures of spatial autocorrelation. Let's start simple a think of the problem as a binary case (high and low autocorrelation)

In [None]:
y.median()
yb = y > y.median()
sum(yb)

In [None]:
yb = y > y.median()
labels = ["0 Low", "1 High"]
yb = [labels[i] for i in 1*yb] 
df['yb'] = yb

In [None]:
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
df.plot(column='yb', cmap='binary', edgecolor='grey', legend=True, ax=ax)

In the next step we can look at so called joint counts. A join exists for each neighbor pair of observations, and the joins are reflected in our binary spatial weights object wq. If we pair each region with it's neigbours we can get three different types of joins for each pairing:

- Low Low (white white)
- High High (black black)
- High Low (black white)

In [None]:
import esda 

yb = 1 * (y > y.median()) # convert back to binary
wq =  lps.weights.Queen.from_dataframe(df)
wq.transform = 'b'
np.random.seed(12345)
jc = esda.join_counts.Join_Counts(yb, wq)



In [None]:
jc.bb
jc.ww
jc.bw

But what can we do with this result? What we want to know is, if this pattern show spatial autocorrelation. Therefor we have to answer the question, would we expect the same pattern if the process leading to this pattern would be a completly random one. 

PySAL uses random spatial permutations of the observed attribute values to generate a realization under the null of complete spatial randomness (CSR). This is repeated a large number of times (999 default) to construct a reference distribution to evaluate the statistical significance of our observed counts.

In [None]:
jc.sim_bb

In [None]:
jc.mean_bb

In [None]:
import seaborn as sbn
sbn.kdeplot(jc.sim_bb, shade=True)
plt.vlines(jc.bb, 0, 0.075, color='r')
plt.vlines(jc.mean_bb, 0,0.075)
plt.xlabel('BB Counts')

The density plot shows the distribution of the BB counts, with the black vertical line indicating the mean BB count from the synthetic realizations and the red line the observed BB count. Clearly our observed value is extremely high. Since this is below conventional significance levels, we would reject the null of complete spatial randomness in favor of spatial autocorrelation

### Moran's I

Another way to investigate spatial autocorrellation is Moran's I,  a test for global autocorrelation for a continuous attribute.

$$I = \frac{n}{\sum_{i=1}^n (y_i - \bar{y})^2} \frac{\sum_{i=1}^n \sum_{j=1}^n w_{ij}(y_i - \bar{y})(y_j - \bar{y})}{\sum_{i=1}^n \sum_{j=1}^n w_{ij}}$$

- $n$ = number  of  observations
- $y$ = is the variable of interest
- $\bar{y}$ = the mean value of y
- $w_{ij}$ = the weights

First, we transform our weights to be row-standardized, from the current binary state:

In [None]:
wq.transform = 'r'
y = df['_pfprmean']

In [None]:
from esda.moran import Moran

moran = Moran(y, wq)
moran.I

In [None]:
from splot.esda import moran_scatterplot
fig, ax = moran_scatterplot(moran, aspect_equal=True)
plt.show()

In [None]:
from splot.esda import plot_moran

plot_moran(moran, zstandard=True, figsize=(10,4))
plt.show()



In [None]:
moran.p_sim

In [None]:
y = df['_pfprmean']

### Local Autocorrelation: Hot Spots, Cold Spots, and Spatial Outliers

We can also look at the local autocorrelation, which enables us to detect Hot Spots, Cold Spots, and Spatial Outliers

In [None]:
from splot.esda import moran_scatterplot
from esda.moran import Moran_Local

# calculate Moran_Local and plot
moran_loc = Moran_Local(y, wq)
fig, ax = moran_scatterplot(moran_loc)
ax.set_ylabel('Spatial Lag')
plt.show()

In [None]:
fig, ax = moran_scatterplot(moran_loc, p=0.05)
ax.set_ylabel('Spatial Lag')
plt.show()





We can now distinguish between diffrent types of autocorrelation

These types of local spatial autocorrelation describe similarities or dissimilarities between a specific polygon with its neighboring polygons. The upper left quadrant for example indicates that polygons with low values are surrounded by polygons with high values. The lower right quadrant shows polygons with high values surrounded by neighbors with low values. This indicates an association of dissimilar values.

In [None]:
from splot.esda import lisa_cluster

lisa_cluster(moran_loc, gdf, p=0.05)
plt.show()

In [None]:
from splot.esda import plot_local_autocorrelation

plot_local_autocorrelation(moran_loc, gdf, '_pfprmean', figsize=(30,15))
plt.show()



# Literature

Bivand, Roger S., et al. Applied spatial data analysis with R. Vol. 747248717. New York: Springer, 2008.

https://geocompr.robinlovelace.net/spatial-class.html

https://pysal.org/libpysal/index.html

https://github.com/pysal/splot/blob/master/notebooks/esda_morans_viz.ipynb

http://darribas.org/gds_scipy16/ipynb_md/04_esda.html

https://github.com/pysal/esda/blob/master/notebooks/Spatial%20Autocorrelation%20for%20Areal%20Unit%20Data.ipynb

https://splot.readthedocs.io/en/stable/users/tutorials/weights.html#distance-band-weights