## Spatial Modeling and Analytics

Welcome to the Hour of Cyberinfrastructure (also known as the Hour of CI). This Beginner lesson on Spatial Modeling and Analytics introduces the fundamental concept of spatial analysis. If you have not already taken the Gateway Lesson, complete it first. 

Lesson Developers:

*Karen Kemp (kakemp@usc.edu), Mohsen Ahmadkhani (ahmad178@umn.edu), and Nafiseh Haghtalab (haghtala@msu.edu)*

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

Data Analytics and its related profession Data Science are hot emerging careers. Hundreds of recent articles touting the glories of this career can be found with a quick Google search. Here’s a few: <br>
- <a href="https://www.northeastern.edu/graduate/blog/data-science-careers-shaping-our-future/">11 Data Science Careers Shaping Our Future</a>
- <a href="https://www.noodle.com/articles/data-science-jobs-whos-hiring-how-much-do-they-pay">Data Science Jobs: Who's Hiring + How Much Do They Pay?</a>
- <a href="https://www.amazon.jobs/en/job_categories/data-science">From Amazon: Data Science | Amazon.jobs</a>
- <a href="https://www.linkedin.com/jobs/data-scientist-jobs">From LinkedIn: 31,000+ Data Scientist jobs in United States</a>

<c>Good news! Spatial Analytics and its related profession Spatial Data Science are just as hot!
But what do spatial data scientists do? Read on! 

When you have successfully completed this lesson, you will be able to:<br>
1. Distinguish between spatial modeling and spatial analytics.

2. Describe how spatial modeling and analytics can be used to solve an everyday problem.

3. State the First Law of Geography.

4. List some kinds of results that might be generated by spatial modeling and analytics. 

5. Run Python code to execute simple spatial modeling and analytics tasks.

This lesson is called Spatial Modeling AND Analytics. Why does it need two words to describe this topic?

They are related but distinct. Here’s some formal definitions:
    
- Spatial Analytics focus on statistical summaries and geometric analysis.
- Spatial Modeling is used for prediction and understanding spatial behavior. 

Data Science has similar variation within it - check out the variants of Data Science in the articles listed a couple of slides ago. 

However, in Geospatial Science these two are often intertwined and treated together, as we’re doing here.
<center>The key thing here is “Spatial” - **SPATIAL IS SPECIAL!**

## Spatial analytics help us discover spatial patterns
<center>Spatial patterns arise because of interactions between things distributed across the landscape. 

START HERE. NEED TO BREAK DOWN IMAGES. 

Spatial analytics help us discover spatial patterns.
<img src='supplementary/sma2-5.png' alt='Spatial patterns'>

Spatial patterns arise because of interactions between things distributed across the landscape. 

<img src='supplementary/sma2-6.png' alt='shops'>
[image: map showing dots of different types of shops, some clustered, some dispersed.]

<span style="color:red">Ditto</span>


Once we understand the underlying structure or behavior of a spatial phenomenon, we can describe it using a mathematical framework called spatial modeling.

[image: equation for spatially weighted regression] 

<img src='supplementary/sma2-7.png' alt='HK topo map'>
<span style="color:red">Double Ditto</span>


Using spatial models we can simulate spatial phenomena in a computer environment and predict their patterns.

[image: left sparse distribution of coffee shops now, right distribution dense infill of coffee shops future]

<img src='supplementary/sma2-8.png' alt='HK topo map'>
<span style="color:red">Double Double Ditto</span>

So, let’s start at the beginning with spatial analytics.

One of the most basic questions asked with spatial analytics is "what's near what?" 



For example:

- Where is the nearest coffee shop?
- How many coffee shops are within 2 blocks of where I am?
- What is the distance between that coffee shop and my favorite grocery store?
- What is the shortest route between the coffee shop and the grocery store?
- What is the average distance between all the coffee shops in downtown?

All of these questions involve knowing the location of things and calculating the distance between them. 

Now let’s try a more advanced problem. 

Below is a map of a city with a river flowing through it. The emergency management people are forecasting the
river will flood and need to evacuate all people within 500m of the river. 

Using the tools, draw the area that you would target for evacuation.

[D3 code to allow drawing a freehand line on an image. Image or frame needs a scale bar.] 


<span style="color:red">Needs Code Writeup</span>

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import elevation
from osgeo import gdal
from matplotlib.pyplot import figure
from mpl_toolkits.axes_grid1 import make_axes_locatable

dem_path = os.path.join(os.getcwd(), 'supplementary/river_buffering.tif')

# Optained from: https://boundingbox.klokantech.com 
 
bounding = (-93.322647,44.969818,-93.237282,45.05338) # river
elevation.clip(bounds = bounding, output=dem_path)
mn_dem = gdal.Open(dem_path) 
mn_array = mn_dem.ReadAsArray() # Convert gdal.Dataset to numpy ndArray

plt.figure(figsize = (10,10))
ax = plt.gca()
im = ax.imshow(mn_array, cmap = 'gray_r') # color map in reverse gray so darker is higher
spliter = make_axes_locatable(ax)
cax = spliter.append_axes("right", size = "5%", pad = 0.1) # put the axes on the right side; the width is
# 5% of the raster; the padding will be 0.1 inch.
plt.colorbar(im, cax=cax)

In [None]:
import utm
mississipi = [[45.03826,-93.28277],
[45.0336,-93.2838],
[45.02317,-93.27813],
[45.01716,-93.27676],
[45.01207,-93.27444],
[45.00332,-93.27409],
[44.99841,-93.27555],
[44.99295,-93.27349],
[44.98961,-93.26774],
# [44.98391,-93.26225],
# [44.98176,-93.25693],
# [44.97927,-93.24843],
# [44.97809,-93.24139],
# [44.97557,-93.23954],
# [44.97117,-93.23907]
             ]

# mississipi = [utm.from_latlon(mississipi[i][0], mississipi[i][1]) for i in range(len(mississipi))]
mississipi = [[mississipi[i][1], mississipi[i][0]] for i in range(len(mississipi))]
mississipi


# convert the coordinates to Polygon feature (vector data)
from shapely.geometry import LineString, Point

mississipi_line = LineString(mississipi)
mississipi_line

ps = []

for i in mississipi:
    ps.append(Point(i[0], i[1]))
    
import geopandas as gpd

df = gpd.GeoDataFrame(ps, geometry = ps)


In [None]:
df2 = gpd.GeoDataFrame([mississipi_line], geometry = [mississipi_line])

df2


In [None]:
import geopandas
import matplotlib.pyplot as plt
import rasterio
import rasterio.plot
from mpl_toolkits.axes_grid1 import make_axes_locatable

raster = rasterio.open('supplementary/river_buffering.tif')

fig, ax = plt.subplots(figsize=(10, 10))
rasterio.plot.show(raster, ax=ax, cmap = 'gray_r')
df2.plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2)

# Adding the color bar
fig = ax.get_figure()
spliter = make_axes_locatable(ax)
cax = spliter.append_axes("right", size = "5%", pad = 0.1)
sm = plt.cm.ScalarMappable(cmap='gray_r', norm=plt.Normalize(vmin=raster.read(1).min(), vmax=raster.read(1).max()))
sm._A = []
fig.colorbar(sm, cax=cax)

plt.savefig('supplementary/river_buffer.jpg')

In [None]:
from IPython.display import IFrame
print('Drag to draw!')
IFrame(src='supplementary/drawing.html', width=700, height=700)

In [None]:
import geopandas
import matplotlib.pyplot as plt
import rasterio
import rasterio.plot
raster = rasterio.open('supplementary/river_buffering.tif')
fig, ax = plt.subplots(figsize=(10, 10))
rasterio.plot.show(raster, ax=ax, cmap = 'gray')
df2.crs = 4326
buffer = df2.to_crs('epsg:3174').buffer(500).to_crs('epsg:4326')
buffer.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=4)

df2.plot(ax=ax, facecolor='none', edgecolor='white', linewidth=2)

# Adding the color bar
fig = ax.get_figure()
spliter = make_axes_locatable(ax)
cax = spliter.append_axes("right", size = "5%", pad = 0.1)
sm = plt.cm.ScalarMappable(cmap='gray', norm=plt.Normalize(vmin=raster.read(1).min(), vmax=raster.read(1).max()))
sm._A = []
fig.colorbar(sm, cax=cax)


How did you figure this out? 

<span style="color:red">[code cell to capture open text]</span>



Most likely you said something about drawing a line parallel to the river about 500 m away from it. Many of you probably drew a band around the river. 
This is something really easy to do with a GIS. We’ll learn more about that soon.

<span style="color:red">[click to reveal image from GIS of a linear buffer on a map showing building footprints, buildings inside buffer are colored red. Might have a count of buildings inside buffer, or even a table insert listing owners or addresses.]</span>


## An example of spatial analytics - Political Redistricting


In the US, after each decennial census, boundaries for electoral districts may be redrawn due to shifts in population distribution. This is intended to ensure that all the people within each electoral district are represented equitably at the next election.
Determining new boundaries is NOT EASY! It is highly political and there are many possible solutions. 
Fortunately, spatial analytics does provide many important and impartial measures to assess equitability.
Read on...


There are many spatial/geometric criteria that are used to assess the equitability of proposed redistricting schemes, including:
1. Compactness - can be measured by determining the ratio of the area of the proposed district shape to the area of a circle (the most compact shape) having the same perimeter.
2. Contiguity - exists when a single region is not interrupted by other areas, e.g. the contiguous US does not include Alaska and Hawaii.
3. Equal population - determined by adding up the individuals in each census reporting zone that falls within the proposed district boundary. 
4. preservation of existing political communities
5. partisan fairness
6. racial fairness


Try your hand at redistricting at Districtr.org. <span style="color:red">Link This</span>

Choose a state in the US, then try to draw 3 or 4 districts that have equal population. Then use the tabs and drop-downs to show how you did in equalizing other characteristics. 

Play for a few minutes, but be sure to come back here as we’ve got a lot more ground to cover!


## Spatial modeling examples


Remember that spatial modeling is used to predict or understand spatial distributions.
There are lots of great examples of spatial modeling. Spatial models are used to:

- Hydrologic modeling - Determine where water will flow during a heavy rainfall to calculate how high will the rivers get and whether water will flood across the land.
- Transportation modeling - Plan the routes of delivery vehicles on a street network to ensure that all the destinations are visited while traveling the overall shortest distance. 
- Groundwater modeling <span style="color:red">Link This</span> - Groundwater can only be measured in widely dispersed wells, so spatial interpolation along with laws of physics that determine how water moves through the ground are the framework of these spatial models. 

OK, that’s the introduction. Now let’s see if you understand the difference between spatial modeling and spatial analytics. Remember that spatial analytics involves statistics and geometric calculations while spatial modeling
is for understanding and prediction. 
The following are examples of spatial modeling and analytics. Which do you think each of these is?

<span style="color:red">[needs code to present questions with a choice of two answers]</span>

A. You are in an urban area and want to find the nearest Starbucks store. (Analytics)

B. You would like to know the shortest route to that Starbucks. (Analytics)

C. Given the distribution of existing Starbucks and of the daytime population of an area and information about traffic congestion on various streets, choose the best location to build a new Starbucks. (Modeling) 


Good, now let’s get started by learning the most important principle in spatial modeling and analytics:

                                        !!The First Law of Geography!!


<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="sma-3.ipynb">Click here to go to the next notebook.</a></font>