# 3. Exploratory Spatial Data Analysis (ESDA)
## 3.1 Python Spatial Abstraction Library (PySAL)

#### 3.1.1 Overview
[PySAL](http://pysal.readthedocs.io/en/latest/) is a Python library that enables users to incorporate a number of spatial analytical methods into their applications including:
* Creating spatial weights matrices
* Assessing spatial autocorrelation
* Spatial econometric modeling

#### 3.1.2 Installation and Packaging

`conda install pysal`

`pip install -U pysal`

#### 3.1.3 Basic Usage
PySAL has its own geosptial I/O tools that can read and write many of the formats that GDAL/OGR can. It provides additional support for a number of spatials weights formats.

In [1]:
# Read a shapefile
import pysal
# pysal.open.check()
shp = pysal.open('./data/census/stpete_cenacs_2015.shp')
len(shp)

231

> *_Demonstrate opening a shapefile in GeoDa_*

## 3.2 Spatial Weights Matrix
#### 3.2.1 Introduction
Spatial weights matrices form the basis for a number of spatial analytical calculations. They come in a variety of types including:
* Contiguity-based weights
* Distance-based weights
* k-nearest neighbor weights
* Distance band weights
* Kernel weights

For this workshop we will focus on contiguity-based weights that are typically used for polygon geospatial models and can be constructed using a variety of topological criteria:
* Queen contiguity (share boundary and/or vertex)
* Rook contiguity (share boundary)
* Bishop contiguity (share vertex only)
* Lagged contiguity

<img src="./img/rook.png" width="300" height="300"/></img>
<p><center>*Old Northeast St. Pete census blocks illustrating Rook-contiguity*<center></p>

<img src="./img/queen.png" width="300" height="300"/></img>
<p><center>*Old Northeast St. Pete census blocks illustrating Queen-contiguity*<center></p>

#### 3.2.2 Creating the weights matrix (W)
The easiest way to create a weights matrix in PySAL is to create it directly from the file. The weights have a number of properties that make it easy to characterize contiguity within the data including the number of neighbors at a certain location and the histogram of neighbors.

In [2]:
# Create a weights matrix with rook contiguity from a shapefile
w = pysal.rook_from_shapefile("./data/census/stpete_cenacs_2015.shp")
w.weights[0]
w.neighbors[5]

[1, 2, 3, 6, 10, 14]

In [3]:
# Create a weights matrix with queen contiguity from a shapefile
w = pysal.queen_from_shapefile("./data/census/stpete_cenacs_2015.shp")
w.histogram

[(1, 1),
 (2, 2),
 (3, 10),
 (4, 20),
 (5, 48),
 (6, 57),
 (7, 58),
 (8, 21),
 (9, 8),
 (10, 2),
 (11, 2),
 (12, 0),
 (13, 0),
 (14, 1),
 (15, 0),
 (16, 0),
 (17, 1)]

In [4]:
# Export a W as a .gal file
gal = pysal.open('./data/census/stpete_cenacs_2015.gal','w')
gal.write(w)
gal.close()

> Demonstrate creating a W matrix in GeoDa

#### 3.2.3 Higher Order Contiguity

Some use cases require the calculation of higher order contiguity. For example, analysis of pollution from a smoke stack which avoids immediate neighborhoods, but may pollute neighborhoods further away as particulates settle. PySAL makes it easy to construct a higher order weights matrix from an existing one.

In [5]:
# Create a higher order weights matrix from an existing W
# Issues ceating this without reading directly from the .gal
w = pysal.weights.Queen.from_file('./data/census/stpete_cenacs_2015.gal')
w2 = pysal.higher_order(w, 2)
w2.histogram

[(4, 2),
 (5, 2),
 (6, 6),
 (7, 5),
 (8, 14),
 (9, 16),
 (10, 20),
 (11, 20),
 (12, 19),
 (13, 27),
 (14, 17),
 (15, 17),
 (16, 15),
 (17, 12),
 (18, 7),
 (19, 5),
 (20, 6),
 (21, 9),
 (22, 4),
 (23, 1),
 (24, 1),
 (25, 1),
 (26, 2),
 (27, 0),
 (28, 0),
 (29, 1),
 (30, 0),
 (31, 1),
 (32, 0),
 (33, 1)]

#### 3.2.4 Spatial Lag
Spatial lag is a new variable created based on an absolute or weighted average (row standarized W) 

In [6]:
# Calculate the spatial lag for Median household income
# Binary weights matrix will produce cumulative weights of neighborhood values
import numpy as np
f = pysal.open("./data/census/stpete_cenacs_2015.dbf")
f.header
y = np.array(f.by_col['MEDHHINC'])
yl = pysal.lag_spatial(w,y)
yl

array([  526135.,   233965.,   275383.,   375029.,   287964.,   423450.,
         263165.,   247640.,   210497.,   270351.,   344603.,   331851.,
         336902.,   300853.,   274127.,   228833.,   289431.,   135777.,
         377851.,   321967.,   413276.,   348732.,   255116.,   337399.,
         255593.,   411820.,   420203.,   399202.,   342894.,   488076.,
         393826.,   228450.,   372177.,   332736.,   232078.,   502282.,
         149555.,   483465.,   185496.,   412462.,   251819.,   305925.,
         322775.,   207482.,  1187391.,   176068.,    46765.,   420611.,
         209844.,   232717.,   163337.,   196814.,   135316.,   368255.,
         170266.,   198499.,   132426.,   208330.,   221998.,   181924.,
         442427.,    96957.,   153638.,   455600.,   573338.,   317967.,
         347957.,   235340.,   404991.,   389061.,   443302.,   393004.,
         118735.,   631616.,   283565.,   264506.,   251256.,   368086.,
         268564.,   283998.,   255837.,   319642., 

<img src="./img/medhhinc_wavg.png" width="1000" height="1000"/></img>

In [8]:
# Calculate spatial lag with a row standardized W
# Spatial lag variable represents a weighted average
w.transform = 'r'
yr = pysal.lag_spatial(w,y)
yr

array([  75162.14285714,   46793.        ,   45897.16666667,
         53575.57142857,   57592.8       ,   47050.        ,
         37595.        ,   41273.33333333,   42099.4       ,
         45058.5       ,   49229.        ,   47407.28571429,
         48128.85714286,   50142.16666667,   45687.83333333,
        114416.5       ,   48238.5       ,   45259.        ,
         53978.71428571,   45995.28571429,   51659.5       ,
         49818.85714286,   42519.33333333,   48199.85714286,
         51118.6       ,   68636.66666667,   60029.        ,
         49900.25      ,   68578.8       ,   81346.        ,
         49228.25      ,   45690.        ,   33834.27272727,
         47533.71428571,   33154.        ,   71754.57142857,
         29911.        ,   60433.125     ,   26499.42857143,
         68743.66666667,   25181.9       ,   50987.5       ,
         53795.83333333,   51870.5       ,   69846.52941176,
         44017.        ,   46765.        ,   46734.55555556,
         41968.8       ,

<img src="./img/medhhinc_cavg.png" width="1000" height="1000"/></img>

## 3.3 Spatial Autocorrelation
#### 3.3.1 Overview
Spatial autocorrelation originates from values distributed over space that are not the product of a random process. The oft quoted Waldo Tobler established the 1st Law of Geography which states that:

>Everything is related to everything else, but near things are more related than distant things.

Spatial autocorrelation can be measured as a global or local effect in a spatial dataset and can also be both positive and negative.

<img src="./img/sa.jpg" width="600" height="300"/></img>


#### 3.3.2 Measures of Global Spatial Autocorrelation
PySAL provides a number of global spatial autocorrelation measures. We'll focus on arguably the most widely used statistic called Moran's I named after Pat Moran. The statistic can be calculated using the following formula:

<img src="./img/moransI.png" width="300" height="300"/></img>

where ...

Let's caluclate Moran's I for median household income values in St. Petersburg, Fl using PySAL.

In [9]:
# Calculate Moran's I using a column and weights matrix
w.transform = 'b'
mi = pysal.Moran(y, w, two_tailed=False)
mi.I
# mi.EI

0.28135221461193555

In [10]:
# Calculate the pseudo p-value for Moran's I
np.random.seed(12345)
mir = pysal.Moran(y, w, permutations = 9999)
mir.p_sim

0.0001

> *Show Moran's I  in GeoDa

#### 3.3.3 Local Spatial Autocorrelation
Local indicators of spatial autocorrelation assess the spatial autocorrelation at each observation. 

In [11]:
# Calculate LISA for a dataset
lm = pysal.Moran_Local(y,w)
lm.n
len(lm.Is)
# lm.p_sim
lm.Is

array([  1.19195720e+00,   1.40708980e-04,  -2.08709293e-02,
        -1.03863487e-01,  -2.56059590e-01,   3.47594333e-05,
        -2.18000788e-01,   1.79471402e-01,  -2.58146509e-01,
         4.00700307e-02,   3.49524736e-03,   4.99360946e-03,
        -2.51708808e-03,  -9.40303383e-04,  -1.99250515e-02,
        -1.19592062e+00,   1.97783850e-02,  -4.65263247e-02,
        -1.69376668e-01,  -2.29759446e-02,  -8.12841678e-02,
         9.46348286e-03,   4.31063445e-02,  -1.87424112e-03,
        -2.63852826e-02,   5.53106315e-01,  -2.52576339e-01,
         3.39377028e-02,   4.35727065e-01,   2.58599901e+00,
        -4.43553455e-02,   1.95347484e-02,   1.21626155e-01,
        -2.60597529e-03,   5.14456322e-01,   1.68261093e+00,
        -5.40866280e-01,   2.50888761e-01,   9.15167454e-01,
         1.78229863e+00,   5.81758288e-01,   1.95119064e-01,
         2.61653566e-01,  -1.16939058e-01,  -1.74510813e+00,
         4.73641696e-02,   4.74379187e-03,   1.13941203e-04,
        -3.01490591e-02,

<img src="./img/medhhinc_lisa.png" width="1000" height="1000"/></img>