# 3. Exploratory Spatial Data Analysis (ESDA)
## 3.1 Python Spatial Abstraction Library (PySAL)

#### 3.1.1 Overview
[PySAL](http://pysal.readthedocs.io/en/latest/) is a Python library that enables users to incorporate a number of spatial analytical methods into their applications including:
* Creating spatial weights matrices
* Assessing spatial autocorrelation
* Spatial econometric modeling

#### 3.1.2 Installation and Packaging

`conda install pysal`

`pip install -U pysal`

#### 3.1.3 Basic Usage
PySAL has its own geosptial I/O tools that can read and write many of the formats that GDAL/OGR can. It provides additional support for a number of spatials weights formats.

In [2]:
# Read a shapefile
import pysal
# pysal.open.check()
shp = pysal.open('./data/stpete_cenacs_2014.shp')
len(shp)

231

> *_Demonstrate opening a shapefile in GeoDa_*

## 3.2 Spatial Weights Matrix
#### 3.2.1 Introduction
Spatial weights matrices form the basis for a number of spatial analytical calculations. They come in a variety of types including:
* Contiguity-based weights
* Distance-based weights
* k-nearest neighbor weights
* Distance band weights
* Kernel weights

For this workshop we will focus on contiguity-based weights that are typically used for polygon geospatial models and can be constructed using a variety of topological criteria:
* Queen contiguity (share boundary and/or vertex)
* Rook contiguity (share boundary)
* Bishop contiguity (share vertex only)
* Lagged contiguity

<img src="./img/rook.png" width="300" height="300"/></img>

<img src="./img/queen.png" width="300" height="300"/></img>

#### 3.2.2 Creating the weights matrix (W)
The easiest way to create a weights matrix in PySAL is to create it directly from the file. The weights have a number of properties that make it easy to characterize contiguity within the data including the number of neighbors at a certain location and the histogram of neighbors.

In [None]:
# Create a weights matrix with rook contiguity from a shapefile
w = pysal.rook_from_shapefile("stpete_cenacs_2014.shp")
w.weights[0]
w.neighbors[5]

In [None]:
# Create a weights matrix with queen contiguity from a shapefile
w = pysal.queen_from_shapefile("stpete_cenacs_2014.shp")
# w.weights[0]
# w.neighbors[5]
w.histogram

In [None]:
# Export a W as a .gal file
gal = pysal.open('stpete_cenacs_2014.gal','w')
gal.write(w)
gal.close()

> Demonstrate creating a W matrix in GeoDa

#### 3.2.3 Higher Order Contiguity and Spatial Lag

Some use cases require the calcultion of higher order contiguity. For example, analysis of pollution from a smoke stack which avoids immediate neighborhoods, but may pollute neighborhoods further away as particulates settle. PySAL makes it easy to construct a higher order weights matrix from an existing one.

In [None]:
# Create a higher order weights matrix from an existing W
w2 = pysal.higher_order(w, 2)
w2.neighbors[0]
w2

Spatial lag is a new variable created based on an absolute or weighted average (row standarized W) 

In [None]:
# Calculate the spatial lag for Median household income
import numpy as np
f = pysal.open("stpete_cenacs_2014.dbf")
f.header
y = np.array(f.by_col['MEDHHINC'])
yl = pysal.lag_spatial(w,y)
yl

In [None]:
# Calculate spatial lag with a row standardized W
w.transform = 'r'
yr = pysal.lag_spatial(w,y)
yr

## 3.3 Spatial Autocorrelation
#### 3.3.1 Overview
Spatial autocorrelation originates from values distributed over space that are not the product of a random process. The oft quoted Waldo Tobler established the 1st Law of Geography which states that:

>everything is related to everything else, but near things are more related than distant things.

Spatial autocorrelation can be measured as a global or local effect in a spatial dataset and can also be both positive and negative.

<img src="./img/sa.jpg" width="600" height="300"/></img>


#### 3.3.2 Measures of Global Spatial Autocorrelation
PySAL provides a number of global spatial autocorrelation measures. We'll focus on arguably the most widely used statistic called Moran's I named after Pat Moran. The statistic can be calculated using the following formula:

<img src="./img/moransI.png" width="300" height="300"/></img>

where ...

Let's caluclate Moran's I for median household income values in St. Petersburg, Fl using PySAL.

In [None]:
# Calculate Moran's I using a column and weights matrix
mi = pysal.Moran(y, w, two_tailed=False)
mi.I
mi.EI

In [None]:
# Calculate the pseudo p-value for Moran's I
np.random.seed(12345)
mir = pysal.Moran(y, w, permutations = 9999)
mir.p_sim

> *Show Moran's I  in GeoDa

#### 3.3.3 Local Spatial Autocorrelation
Local indicators of spatial autocorrelation assess the spatial autocorrelation at each observation. 

In [4]:
# Calculate LISA for a dataset
lm = pysal.Moran_Local(y,w)
lm.n
len(lm.Is)
lm.p_sim

NameError: name 'y' is not defined