# Getting Started with Spatial Data Science in Python
### University of Minnesota Day of Data
Bryan C. Runck // runck014@umn.edu // Department of Geography, Environment and Society

**Overview**
How can we use python to do spatial data science? This jam session will provide a hands-on overview of basic mapping in Python with GeoPandas and how to perform basic spatial analysis using PySAL. No programming experience is required.

## Objectives
1. Make simple maps with [GeoPandas](http://geopandas.org) and AirBnB data
    - Data I/O
    - Make chloropleth maps
    - Make scatterplots
    - Rate mapping
    - Recognize the importance of projections
2. Perform an exploratory visual analysis of the data to identify potential places you would want to hone an AirBnB stay
3. Use [PySAL](http://pysal.readthedocs.io/en/latest/) to compute global spatial autocorrelation 
    - Constructing spatial weights
    - Moran's I (Global)
    - Visually check result
4. Use Moran's I to determine which AirBnB variables have high levels of spatial autocorrelation
    


# Table of Contents

1. [Motivation for spatial data science](https://docs.google.com/presentation/d/1_RuL1EHp7sOn5yLnCuBqRWW8eGo-Z8YemyEzjS2KpXU/edit?usp=sharing) (link to slides)
2. [Getting spatial data](#get_data)
3. [Data exploration](#esda)
4. [Basic spatial analysis](#bda)



<a id='get_data'></a>
# Getting Spatial Data

In [None]:
'''
#wget is a linux tool; Jupyter allows access through ! character; https://www.gnu.org/software/wget/
Download a file from Minnesota Geospatial Commons
'''
#!wget ftp://ftp.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_dnr/bdry_dnr_wildlife_mgmt_areas_pub/shp_bdry_dnr_wildlife_mgmt_areas_pub.zip

In [None]:
'''
linux tool to unzip files; may need to install on system
'''
#!unzip shp_bdry_dnr_wildlife_mgmt_areas_pub.zip

In [None]:
'''
linux command; 
ls lists all of the file, -l is a formatting designator; 
*.shp tells linux, list all of the files in the current directory that end in
.shp <- is a common spatial data format; comes with 3 other files
https://en.wikipedia.org/wiki/Shapefile#Shapefile_shape_index_format_(.shx)
'''
#!ls -l *.shp

<a id='esda'></a>
# Spatial Data Exploration

In [None]:
import geopandas as gpd

## Import Data

AirBnB's in Chicago - the data set comes from Luc Anselin's spatial data science group in at the University of Chicago. 

**Metadata can be found [here](https://geodacenter.github.io/data-and-lab//airbnb_Chicago-2015/).***

In [None]:
chicago_bnb = gpd.read_file('data/airbnb_Chicago 2015.shp')

In [None]:
#check import to make sure it looks OK
chicago_bnb.head()

In [None]:
chicago_bnb.crs

^^ crs = coordinate reference system; it makes all of the spatial stuff work correctly; see [wikipedia](https://en.wikipedia.org/wiki/Spatial_reference_system) for a nice overview; EPSG = European Petroleum Survey Group, and is a prominent spatial reference system identifier; 4326 is the code for WGS84 a global datum (e.g. how points in GIS are connected to real places on the earth)

## Summarizing and Basic Plotting

In [None]:
# describe feature works just as it does in pandas; 
chicago_bnb.describe()

In [None]:
#%matplotlib inline #tells matplotlib to print to Jupyter
chicago_bnb.plot()

In [None]:
print(chicago_bnb['community'][0])
chicago_bnb['geometry'][0]

In [None]:
chicago_bnb['geometry'][0:10]

In [None]:
chicago_bnb['geometry'][0:10].plot()

## Chloropleth Map
The histogram of maps

In [None]:
# columns that could be mapped
print(chicago_bnb.columns.values)

In [None]:
chicago_bnb.plot(column='price_pp', cmap='magma')

***Wait, where is the legend?***

In [None]:
########## HACK ############ 
# adapted from https://stackoverflow.com/questions/36008648/colorbar-on-geopandas
## add a color bar
## colormap options: https://matplotlib.org/users/colormaps.html
from matplotlib import pyplot as plt

# add colorbar that is normalized vmin to vmax
def add_color_bar(map_object, variable_column, cmap_string):
    vmin, vmax = variable_column.min(),  variable_column.min(),
    fig = map_object.get_figure()
    cax = fig.add_axes()
    sm = plt.cm.ScalarMappable(cmap=cmap_string, norm=plt.Normalize(vmin=vmin, vmax=vmax))
    # fake up the array of the scalar mappable
    sm._A = []
    fig.colorbar(sm, cax=cax)

In [None]:
chic_price_pp = chicago_bnb.plot(column='price_pp', cmap='magma')
add_color_bar(chic_price_pp, chicago_bnb['price_pp'], 'magma')

# Scatterplots

In [None]:
plt.scatter(x=chicago_bnb['rev_rating'], y=chicago_bnb['response_r'])

# Comparing Multiple Scatteplots with Seaborn
The goal is to identify interesting relationships that could guide exploratory mapping.

In [None]:
import seaborn as sns

In [None]:
list(chicago_bnb.columns.values)

In [None]:
sns.pairplot(chicago_bnb[['response_r','accept_r', 'price_pp']])

## Rate Mapping

Raw numbers are incomparable across jurisdictions; to say that there were 100 homicides in one US county and 10 in another makes it seem like there were a lot in county with 100, until this is converting into a rate by placing total population in the denomentator.

In [None]:
num_thefts_map = chicago_bnb.plot(column='num_theft', cmap='magma')
add_color_bar(num_thefts_map, chicago_bnb['num_theft'], 'magma')

In [None]:
chicago_bnb['thefts_per_capita'] = chicago_bnb['num_theft']/chicago_bnb['population']

In [None]:
per_person_thefts_map = chicago_bnb.plot(column='thefts_per_capita', cmap='magma')
add_color_bar(num_thefts_map, chicago_bnb['thefts_per_capita'], 'magma')

In [None]:
chicago_bnb['crimes_per_capita'] = chicago_bnb['num_crimes']/chicago_bnb['population']
num_crimes_map = chicago_bnb.plot(column='crimes_per_capita', cmap='magma')
add_color_bar(num_crimes_map, chicago_bnb['crimes_per_capita'], 'magma')

# Activity: Identify Three Potential Places in Chicago Where You Would Want to Stay
You and a friend are planning to head to Chicago on a budget. You want to identify the top three communities to look for an AirBnB in. You’ve been provided with a dataset to aid in you in your decision-making.

Utilize the basic ideas we explored related to mapping to:
1. Identify three potential communities where you would want to stay
2. Make a map with these three communities highlighted
3. **Challenge:** create a linear combination of variables to create an index score of where you would want to stay. For example, the value of community to you, $v(community)$, could be modeled as:

$ v(community) = weight_1 * norm(price_{pp}) + weight_2 * norm(accept_r) + ... + weight_n * norm(variable_n)$

Variables would need to be normalized, and subjective weights can be assigned based on what you personally value.


<a id='bda'></a>
# Basic Spatial Data Analysis

In [None]:
import pysal
import numpy as np

## Constructing spatial weights

There are three different types of spatial weights:
1. Contiguity Based Weights
2. Distance Based Weights
3. Kernel Weights

This demonstration only performs contiguity-based weights, namely queen and rook. Here is a [link](http://pysal.readthedocs.io/en/latest/users/tutorials/weights.html#pysal-spatial-weight-types) to the documentation with many more.

In [None]:
chicago_weights = pysal.weights.Rook.from_dataframe(chicago_bnb)

In [None]:
print("%.4f"%chicago_weights.pct_nonzero)

In [None]:
chicago_Qweights = pysal.weights.Queen.from_dataframe(chicago_bnb)

In [None]:
print("%.4f"%chicago_Qweights.pct_nonzero)

In [None]:
chicago_Qweights.weights

In [None]:
help(chicago_Qweights)

## Moran's I

In [None]:
y=chicago_bnb['price_pp']
w=chicago_Qweights
chicago_Qweights_moran = pysal.Moran(y, w, two_tailed=False)

In [None]:
"%.3f"%chicago_Qweights_moran.I

In [None]:
"%.8f"%chicago_Qweights_moran.p_norm

In [None]:
chicago_bnb.plot(column='price_pp')

# Activity
Utilize the basic ideas we explored related to spatial autocorrelation to:
1. Test spatial autocorrelation across multiple variables and weights
2. Which variable is the most spatially autocorrelated?
3. Do you have any hunches as to why there is or isn’t spatial autocorrelation in different variables?
