# Spatial Autocorrelation with PySAL

## Local Moran's I exercise (Local Indicators of Spatial Association, or LISAs) 

Using this Jupyter notebook, you will code the same process that we used in GeoDa to run the Univariate Local Moran's I to identify local hot/cold spots in the data.

Adapated from https://pysal.readthedocs.org/en/latest/users/tutorials/autocorrelation.html#local-moran-s-i
Information also available on pg 39 in PySAL_Documentation.pdf included with the SDS Bootcamp materials.


### Task #1: read data and create variables

#### First, import the needed Python components - all scripting in Python begins with import

In [None]:
# PySAL and Numpy the only ones needed to actually run the spatial autocorrelation analysis
import pysal
import numpy as np

#Folium is used to create some map visualizations
import folium

# These others are to handle, query, and plot data
import os
import pandas as pd
import geopandas as gpd
import json
import simpledbf
%matplotlib inline
import matplotlib.pyplot as plt

# This message below will print after the commands above are successfully completed
print ('All requested Python libraries imported successfully')

#### To calculate Moran's I, we need to give it a list of neighbors. We can do this by reading in a spatial weights matrix.
Remember that PySAL likes the GAL file format, which we created in the GeoDa exercise. 

This file can be converted to an ArcGIS Spatial Weights Matrix (SWM) file, and vice versa.
PySAL can read the GAL file as follows: counties = pysal.open('path/to/file/called/file.gal').read()

#### Alternatively, and much easier, PySAL can read the neighbors directly from a shapefile

In [None]:
# Instead of reading in the .gal file we created in GeoDa, we will ask PySAL to create one from the shapefile.
counties = pysal.queen_from_shapefile('/home/ubuntu/Documents/Counties/cnty_Lyme_disease.shp')

# This message below will print after the command above is successfully completed
print ('New weights file successfully created')

#### The queen_from_shapefile function has defined neighbors using the the queen weights criteria, which defines a location's neighbors as those areas with at least one shared corner

#### Other options include a rook weights matrix, in which neighbors need to share an entire border (i.e. a line of two connected vertices)

In [None]:
# Next, let's read in the dbf that contains data for the counties
# http://www.pysal.org/users/tutorials/fileio.html

table = pysal.open('/home/ubuntu/Documents/Counties/cnty_Lyme_disease.dbf')

# This message below will print after the command above is successfully completed
print ('DBF file successfully imported')

# Return the column headers from dbf
table.header

In [None]:
# Next, specify which column contain the variable of interest
# Notice that we are using the array function from numpy, which we named np during our import
# This array will contain the data from the column called 2005

lymecases = np.array(table.by_col['2005'])

# This message below will print after the command above is successfully completed
print ('A variable called lymecases successfully created')

### Task #2: complete a single run of Local Moran's I to identify local hot/cold spots in the data

In [None]:
# Using the functions examples below, update the parameters to run Local Moran's I on the year 2005 data

# In the online tutorial, the function reads as follows: lm = pysal.Moran_Local(y,w)
# In their example, y = array containing homicide rates and w = spatial weights variable for the neighbors

# Another example could be something like: lm = pysal.Moran_Local(crimeindex, blockgroups)
# In this example, crimeindex is the array containing a crime index and blockgroups is the spatial weights variable 

# We also want to run multiple permutations of the random distribution
# Hint: lm = pysal.Moran_Local(y,w,permutations = 1)



In [None]:
# Again, examine the help to learn more about the outputs from your function
#help(lm)

In [None]:
# Which attribute of lm can you use to see the actual observed Local Moran's I values (LISAs)?
# Hint: lm.attributename


In [None]:
# Which attribute of lm can you use to see the statistical significance of difference between I and simulated I values (LISAs)?
# Hint: lm.attributename


In [None]:
# Which attribute of lm can you use to check the cluster type for each LISA?
# Hint: lm.attributename
# Remember: 1 = High/High 2 = Low/High 3 = Low/Low 4 = High/Low 


### Task #3: check for significant Local Moran's I values for the LISAs

In [None]:
# Hint: create a new variable and make it equal to a boolean statement regarding the p value being less than a certain value
# Hint: variablename = lm.attributename<value



### Task #4: create a csv file of Local Moran's I output results for all counties

In [None]:
# Create a variable that will hold the county identifier, similar to how you made an array of the 2005 values
# Hint: variablename = np.array(f.by_col['NameofColumninDBF'])



In [None]:
# Next, export results using numpy export to save the LISAs to a csv with the County list
# This csv can be joined to the county shapefile to map the LISAs and highlight hot/cold spots 
# Uncomment following lines and fill in as needed to export to csv
# Be sure to include all the output variables you explore previously; there should be 3 plus one you created based on output values

# Hint: np.savetxt("filename.csv", np.column_stack((countylist, lm.attributename, lm.attributename, lm.attributename, variableyoucreated)), delimiter=' , ', fmt="%s")
# print 'a message to yourself that the file is ready'




## Bonus Exercises for Local Moran's I exercise

### Bonus #1: add the original lyme disease counts to the csv file of Local Moran's I output results for all counties

In [None]:
# Hint: use np.savetxt from Task #4 and add a new parameter to the column_stack function
# Hint: examine the help to learn more about the outputs from your function
# Hint: help(lm)  --> which output contains the original data? 
# Hint: you also previously created a variable that containts this info, too



### Bonus #2: add a header to the csv file of Local Moran's I output results for all counties

In [None]:
# Hint: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.savetxt.html
# Hint: np.savetxt("filename.csv", np.column_stack((countylist, lm.attributename, lm.attributename, lm.attributename, sig)), delimiter=' , ', fmt="%s", header='FirstColumnName, SecondColumnName, etc', comments = '')
# Hint: leave the comments parameter empty (Or, check out what happens if you don't)
# print 'a message to yourself that the file is ready'




## Advanced Bonus Exercises for Local Moran's I exercise

### Advanced Bonus #1: use a loop to run Local Moran's I for other years of data (2006 to 2014)

In [None]:
# Note: these advanced bonus exercises may be challenging for new Python programmers 
# Hint: review Advanced Bonus Exercise #1 for Global Moran's I
# Hint: create a list of the years to begin


In [None]:
# Hint: check out Lists as an iterable on: https://wiki.python.org/moin/ForLoop 
# Hint: review Advanced Bonus Exercise #1 for Global Moran's I



### Advanced Bonus #2: expand your loop to create a different csv for each year

In [None]:
# Note: these advanced bonus exercises are challenging for new Python programmers
# Hint: you will need to combine Task #4 (or Bonus #1/#2) and Advanced Bonus #1
# Hint: your loop will need to include the csv name based on the year of analysis
# Hint: create a variable for the csvname and append the year to the name in the loop



## End of Local Moran's I exercise

### Other options to continue your exploration:
#### You can run Global or Local Moran's I on other datasets for crime in the Bonus Data folder
#### You can explore the new csv in QGIS and join with the county shapefile to map out the Local Moran's I values. If you need help doing a table join in QGIS, check out this easy tutorial: http://www.qgistutorials.com/en/docs/performing_table_joins.html 


#### Advanced Python users: maybe continue to explore pandas. See if you can figure out how to combine your multi-year output to a single pandas dataframe:  http://pandas.pydata.org/pandas-docs/stable/10min.html 

In [None]:
# Hint: check out Lists as an iterable on: https://wiki.python.org/moin/ForLoop 
# Hint: review Advanced Bonus Exercise #1 for Global Moran's I
# Hint: we can create a separate data frame for each year and then concatenate all the years into one data frame


In [None]:
# We can play with this data frame a little bit

# For example: which counties in 2012 have significant LISA?


### Or just for fun: explore some advanced visualizations below

In [None]:
# Create a map of the P Value for each LISA value for 2005

# To use Folium for map visualizations, we convert the shapefile to GeoJSON

# We can use GeoPandas for this conversion
# First, read in the shapefile to geopandas
shapefile = gpd.read_file('/home/ubuntu/Documents/Counties/viz/cnty_Lyme_disease_WGS84.shp').set_index('NAME_PCASE')

# Next, save the file out to GeoJSON
with open('/home/ubuntu/Documents/Counties/viz/cnty_Lyme_disease_WGS84.geojson', 'w') as f:
    f.write(shapefile.to_json())

# This message below will print after the commands above are successfully completed
print ('Successfully converted shapefile to geojson')

# Use simpledbf to read the attribute table (dbf file)
dbf = simpledbf.Dbf5('/home/ubuntu/Documents/Counties/cnty_Lyme_disease.dbf')

# Read in the GeoJSON file created in previous step
counties = '/home/ubuntu/Documents/Counties/viz/cnty_Lyme_disease_WGS84.geojson'

# Convert dbf file to a pandas data frame
df = dbf.to_dataframe()

# Store the quadrant values in a variable called q
q = lm.q

# Store the p values in a variable called q
df["p"] = lm.p_sim

# Change the quadrant values of the counties whose p values are not significant to 0 for convenience of graphing
q = [q[i] if lm.p_sim[i] < 0.0501 else 0 for i in range(0,58)]

# Update the dataframe with the new quadrant values 
df["q"] = q

# load GeoJSON data
geo_json_data = json.load(open(counties))

# Assign the p values and q values in to the GeoJSON object
for i in range(0,58):
    geo_json_data["features"][i]["properties"]["q"] = str(df["q"][i])
    geo_json_data["features"][i]["properties"]["p"] = str(df["p"][i])

# Create an empty Folium map
m = folium.Map([37, -122], zoom_start=6)

# Create a map based on significance(p-values)
folium.GeoJson(
    geo_json_data,
    style_function=lambda feature: {
        'fillColor': '#006400' if float(feature['properties']['p']) <= 0.0001 else '#008000' if (float(feature['properties']['p']) > 0.0001 and 
float(feature['properties']['p']) <= 0.001) else '#228B22' if (float(feature['properties']['p']) > 0.001 and 
float(feature['properties']['p']) <= 0.01) else "#00FF00" if (float(feature['properties']['p']) > 0.01 and 
float(feature['properties']['p']) < 0.05001) else '#808080',
        'color': 'black',
        'fill_opacity': 1.0,
        'weight': 2,
        'dashArray': '5, 5',
        'fillOpacity': 0.6
    }
).add_to(m)

# Save the map as html
m.save(os.path.join("/home/ubuntu/Documents/Counties/viz/p_map.html"))

# Print the map to screen
m

In [None]:
# We can extend the code in the previous cell to create a second map for 2005 
# of the quadrants types based on significance from p-values

# Counties with non significant p values are in grey
# Counties with significant p values are in different colors depending on quadrant type
# 1 = High/High in Red
# 2 = Low/High in Light Blue
# 3 = Low/Low in Dark Blue
# 4 = High/Low in Pink

# Create an empty Folium map
n = folium.Map([37, -122], zoom_start=6)

# Create a map of the quadrant labels based on significance (p-values)
folium.GeoJson(
    geo_json_data,
    style_function=lambda feature: {
        'fillColor': 'darkblue' if int(feature['properties']['q']) == 3 else 'lightblue' if int(feature['properties']['q']) == 2
    else 'darkred' if int(feature['properties']['q']) == 1 else "pink" if int(feature['properties']['q'])== 4 else 'gray',
        'color': 'black',
        'weight': 2,
        'dashArray': '5, 5',
        'fillOpacity': 0.6
    }
).add_to(n)

# Save the map as html
n.save(os.path.join("/home/ubuntu/Documents/Counties/viz/q_map.html"))

# Print the map to screen
n