# 3. Exploratory Spatial Data Analysis (ESDA)
## 3.1 Python Spatial Abstraction Library (PySAL)

#### 3.1.1 Overview
[PySAL](http://pysal.readthedocs.io/en/latest/) is a Python library that enables users to incorporate a number of spatial analytical methods into their applications including:
* Creating spatial weights matrices
* Assessing spatial autocorrelation
* Spatial econometric modeling

#### 3.1.2 Installation and Packaging

`conda install pysal`

`pip install -U pysal`

#### 3.1.3 Basic Usage
PySAL has its own geosptial I/O tools that can read and write many of the formats that GDAL/OGR can. It provides additional support for a number of spatials weights formats.

In [14]:
# Read a shapefile
import pysal
# pysal.open.check()
shp = pysal.open('./data/stpete_cenacs_2014.shp')
len(shp)

231

> *_Demonstrate opening a shapefile in GeoDa_*

## 3.2 Spatial Weights Matrix
#### 3.2.1 Introduction
Spatial weights matrices form the basis for a number of spatial analytical calculations. They come in a variety of types including:
* Contiguity-based weights
* Distance-based weights
* k-nearest neighbor weights
* Distance band weights
* Kernel weights

For this workshop we will focus on contiguity-based weights that are typically used for polygon geospatial models and can be constructed using a variety of topological criteria:
* Queen contiguity (share boundary and/or vertex)
* Rook contiguity (share boundary)
* Bishop contiguity (share vertex only)
* Lagged contiguity

<img src="./img/rook.png" width="300" height="300"/></img>

<img src="./img/queen.png" width="300" height="300"/></img>

#### 3.2.2 Creating the weights matrix (W)
The easiest way to create a weights matrix in PySAL is to create it directly from the file. The weights have a number of properties that make it easy to characterize contiguity within the data including the number of neighbors at a certain location and the histogram of neighbors.

In [15]:
# Create a weights matrix with rook contiguity from a shapefile
w = pysal.rook_from_shapefile("./data/stpete_cenacs_2014.shp")
w.weights[0]
w.neighbors[5]

[193, 203, 204, 206]

In [16]:
# Create a weights matrix with queen contiguity from a shapefile
w = pysal.queen_from_shapefile("./data/stpete_cenacs_2014.shp")
# w.weights[0]
# w.neighbors[5]
w.histogram

[(1, 1),
 (2, 2),
 (3, 10),
 (4, 20),
 (5, 48),
 (6, 57),
 (7, 58),
 (8, 21),
 (9, 8),
 (10, 2),
 (11, 2),
 (12, 0),
 (13, 0),
 (14, 1),
 (15, 0),
 (16, 0),
 (17, 1)]

In [17]:
# Export a W as a .gal file
gal = pysal.open('stpete_cenacs_2014.gal','w')
gal.write(w)
gal.close()

> Demonstrate creating a W matrix in GeoDa

#### 3.2.3 Higher Order Contiguity and Spatial Lag

Some use cases require the calcultion of higher order contiguity. For example, analysis of pollution from a smoke stack which avoids immediate neighborhoods, but may pollute neighborhoods further away as particulates settle. PySAL makes it easy to construct a higher order weights matrix from an existing one.

In [18]:
# Create a higher order weights matrix from an existing W
w2 = pysal.higher_order(w, 2)
w2.neighbors[0]
w2

Unsupported sparse argument.


AttributeError: 'NoneType' object has no attribute 'neighbors'

Spatial lag is a new variable created based on an absolute or weighted average (row standarized W) 

In [19]:
# Calculate the spatial lag for Median household income
import numpy as np
f = pysal.open("./data/stpete_cenacs_2014.dbf")
f.header
y = np.array(f.by_col['MEDHHINC'])
yl = pysal.lag_spatial(w,y)
yl

array([  146763.,   166211.,   199561.,   306442.,   183476.,   352686.,
         616358.,   268633.,   392636.,   408417.,   160558.,   144656.,
         208949.,   256077.,   167465.,   350218.,   311834.,    97201.,
         273457.,   414432.,   112661.,   277598.,   201191.,   184138.,
         358053.,   166831.,   302095.,   346152.,   278698.,   289837.,
         334073.,   266484.,   241389.,   301873.,   185733.,   381341.,
         345456.,   495665.,   400397.,   125559.,   384452.,   261814.,
         291231.,   486758.,   291739.,   267592.,   133707.,   149572.,
          78086.,   272613.,   542617.,   579796.,   210776.,   244382.,
         168796.,   299722.,   348216.,   245197.,   325827.,   438478.,
         305547.,   183825.,   289119.,    81900.,   386661.,   444013.,
         269717.,   301594.,   322806.,   595313.,   299177.,   292400.,
         269888.,   502653.,   321707.,   146173.,   204626.,   192741.,
        1214896.,   242027.,   471751.,   166973., 

In [9]:
# Calculate spatial lag with a row standardized W
w.transform = 'r'
yr = pysal.lag_spatial(w,y)
yr

array([ 20966.14285714,  27701.83333333,  49890.25      ,  61288.4       ,
        30579.33333333,  70537.2       ,  77044.75      ,  53726.6       ,
        65439.33333333,  58345.28571429,  32111.6       ,  28931.2       ,
        29849.85714286,  42679.5       ,  41866.25      ,  50031.14285714,
        44547.71428571,  24300.25      ,  45576.16666667,  46048.        ,
        22532.2       ,  46266.33333333,  40238.2       ,  26305.42857143,
        51150.42857143,  41707.75      ,  50349.16666667,  49450.28571429,
        34837.25      ,  41405.28571429,  47724.71428571,  44414.        ,
        40231.5       ,  50312.16666667,  37146.6       ,  42371.22222222,
        49350.85714286,  70809.28571429,  66732.83333333,  41853.        ,
        48056.5       ,  43635.66666667,  58246.2       ,  54084.22222222,
        58347.8       ,  53518.4       ,  26741.4       ,  21367.42857143,
        26028.66666667,  38944.71428571,  77516.71428571,  96632.66666667,
        42155.2       ,  

## 3.3 Spatial Autocorrelation
#### 3.3.1 Overview
Spatial autocorrelation originates from values distributed over space that are not the product of a random process. The oft quoted Waldo Tobler established the 1st Law of Geography which states that:

>everything is related to everything else, but near things are more related than distant things.

Spatial autocorrelation can be measured as a global or local effect in a spatial dataset and can also be both positive and negative.

<img src="./img/sa.jpg" width="600" height="300"/></img>


#### 3.3.2 Measures of Global Spatial Autocorrelation
PySAL provides a number of global spatial autocorrelation measures. We'll focus on arguably the most widely used statistic called Moran's I named after Pat Moran. The statistic can be calculated using the following formula:

<img src="./img/moransI.png" width="300" height="300"/></img>

where ...

Let's caluclate Moran's I for median household income values in St. Petersburg, Fl using PySAL.

In [20]:
# Calculate Moran's I using a column and weights matrix
mi = pysal.Moran(y, w, two_tailed=False)
mi.I
# mi.EI

0.33661606822377693

In [21]:
# Calculate the pseudo p-value for Moran's I
np.random.seed(12345)
mir = pysal.Moran(y, w, permutations = 9999)
mir.p_sim

0.0001

> *Show Moran's I  in GeoDa

#### 3.3.3 Local Spatial Autocorrelation
Local indicators of spatial autocorrelation assess the spatial autocorrelation at each observation. 

In [23]:
# Calculate LISA for a dataset
lm = pysal.Moran_Local(y,w)
lm.n
len(lm.Is)
# lm.p_sim
lm.Is

array([  1.18765140e+00,   9.34563357e-01,   8.44806031e-03,
        -1.93368834e-01,   7.22973914e-01,   1.18268986e+00,
         2.53963372e+00,  -2.16855935e-01,   4.21878455e-01,
        -3.53743686e-01,   2.94886358e-01,   8.56554404e-01,
         3.45131893e-01,   2.06155011e-01,   1.22118296e-01,
         3.82878667e-02,  -7.06033910e-02,   1.74910984e+00,
         4.34361984e-02,  -5.98533352e-03,   1.40233774e+00,
         2.43106110e-03,  -2.37491681e-01,   9.01101224e-01,
        -6.47838589e-03,  -4.49042113e-02,  -2.50510597e-02,
         3.05115672e-02,   6.80355820e-01,   1.72919456e-01,
        -4.31957876e-03,   2.28839923e-02,  -4.88228594e-02,
         4.50618088e-02,   1.53197312e-01,   1.81549311e-01,
         2.83476150e-03,   1.83965373e+00,   1.87716048e+00,
         4.10865352e-01,   6.70531262e-04,   4.79789773e-02,
         4.70265368e-02,   3.13831239e-01,   2.21060912e-01,
         8.59479509e-02,   1.11301603e+00,   1.16335530e+00,
         1.55586770e+00,