## Introduction to Spatial Modeling and Analytics
### Part 1 of 4
# What are
# Spatial Modeling and 
# Spatial Analytics?

## Thank you for helping our study


<a href="#/slide-1-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

Throughout this lesson you will see reminders, like the one below, to ensure that all participants understand that they are in a voluntary research study.

### Reminder

<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

In [2]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

import warnings
warnings.filterwarnings('ignore') # Hide warnings

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
# HTML(''' 
#     <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
#     <input id="toggle_code" type="button" value="Toggle raw code">
# ''')

HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')


When you have successfully completed this lesson, you will be able to:<br>
1. Distinguish between spatial modeling and spatial analytics.

2. Describe how spatial modeling and analytics can be used to solve an everyday problem.

3. State the First Law of Geography.

4. List some kinds of results that might be generated by spatial modeling and analytics. 

5. Run Python code to execute simple spatial modeling and analytics tasks.

Data Analytics and its related profession Data Science are hot emerging careers. Hundreds of recent articles touting the glories of this career can be found with a quick Google search. Here’s a few: <br>
- <a href="https://www.northeastern.edu/graduate/blog/data-science-careers-shaping-our-future/">11 Data Science Careers Shaping Our Future</a>
- <a href="https://www.noodle.com/articles/data-science-jobs-whos-hiring-how-much-do-they-pay">Data Science Jobs: Who's Hiring + How Much Do They Pay?</a>
- <a href="https://www.amazon.jobs/en/job_categories/data-science">From Amazon: Data Science | Amazon.jobs</a>
- <a href="https://www.linkedin.com/jobs/data-scientist-jobs">From LinkedIn: 31,000+ Data Scientist jobs in United States</a>

<c>Good news! Spatial Analytics and its related profession Spatial Data Science are just as hot!
But what do spatial data scientists do? Read on! 

This lesson is called Spatial Modeling AND Analytics. Why does it need two words to describe this topic?

They are related but distinct. Here’s some formal definitions:
    
- Spatial Analytics focus on statistical summaries and geometric analysis.
- Spatial Modeling is used for prediction and understanding spatial behavior. 

Data Science has similar variation within it - check out the variants of Data Science in the articles listed a couple of slides ago. 

<center>However, in Geospatial Science these two are often intertwined and treated together.<p>
<center>The key thing here is “Spatial”</p>


## Spatial is Special!



## Spatial analytics help us discover spatial patterns

Consider the table below. Each row includes a location (x and y) and some attributes. It is very hard to tell by looking at this table if there's anything going on spatially.

<img src='supplementary/sma2-5a.png' alt='Spatial pattern table'>

Now let's put these data on a grid. 

<img src='supplementary/sma2-5b.png' alt='Spatial pattern grid'>

Wow, there's a clear pattern that wasn't visible in the table. 

Spatial patterns arise because of interactions between things distributed across the landscape. Let's see if there's any pattern in the distribution of coffee shops in Minneapolis. On the left is a table of coffee shop data extracted from <a href="https://www.openstreetmap.org/">Open Street Map</a>, on the right the points are plotted on a grid. There does seem to be some spatial pattern - are they clustered in certain areas?

<table>
    <tr style="background: #fff">   
        <td><img src='supplementary/sma2-6a.png' alt='shops table'></td>
         <td><img src='supplementary/sma2-6a2.png' alt='Shops grid'></td>
    </tr>
</table>

Using the location data in the table (lat and long), we performed a statistical cluster analysis which identifies points that are closer together than would be expected in a random distribution. Ah ha! There are three clear clusters. Let's take a look at these on the landscape. 

<table>
    <tr style="background: #fff">   
        <td>Here we've labeled the three clusters by the names of their neighborhoods. 
            
Yes, it makes sense for there to be lots of coffee shops around the university and in downtown. And according to <a href='https://www.minneapolis.org/neighborhoods/south/lyndale-lake/'>Wikipedia</a>, Lyndale is a "fun selection of indie stores, entertainment, bars and restaurants in the most walkable neighborhood in Minneapolis [that] makes for a pleasant day and night of indulgence". Sounds like a coffee shop hot spot!

Here we have two kinds of spatial interaction. Between the coffee shops - if one is successful, others will pop up (think Starbucks). We also have interaction between coffee shops and the landscape - certain areas have economic activity that is favorable to coffee shop businesses. </td>
         <td width=50%><img src='supplementary/sma2-6balt.png' alt='Coffee Shops in Minneapolis'></td>
    </tr>
</table>

Once we understand the underlying structure or behavior of a spatial phenomenon, we can describe it using a mathematical framework called spatial modeling. 

Here's an example of a spatial model. (Don't worry about understanding it, just enjoy!)

<img src='supplementary/sma2-7.png' alt='Equation for spatially weighted regression' width="700" height="500">

We can use this equation to predict the value of something in a place where we did not measure it. 

So, let’s start at the beginning with spatial analytics.

One of the most basic questions asked with spatial analytics is "what's near what?" 



For example:

- Where is the nearest coffee shop?
- How many coffee shops are within 2 blocks of where I am?
- What is the distance between that coffee shop and my favorite grocery store?
- What is the shortest route between the coffee shop and the grocery store?
- What is the average distance between all the coffee shops in downtown?

All of these questions involve knowing the location of things and calculating the distance between them. 

Now let’s try a more advanced problem. 

On the next slide is a map of a city with a river flowing through it. The emergency management people are forecasting the
river will flood and need to evacuate all people within 500m of the river. 

Using the tools, draw the area that you would target for evacuation.

In [3]:
%%html
<iframe src='supplementary/drawing.html', width=920, height=700, allowfullscreen></iframe>
<style>.output_wrapper, .output {height:auto !important; max-height:800px;}</style>

In [None]:
import rasterio
from rasterio import plot
import geopandas as gpd
from shapely.geometry import LineString, Point, Polygon
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
from matplotlib_scalebar.scalebar import ScaleBar
import utm

mississipi = [[45.03826,-93.28277],   # river center line
[45.0346,-93.2838],
[45.02717,-93.28013],
[45.02216,-93.27776],
[45.01207,-93.27404],
[45.00332,-93.27409],
[44.99841,-93.27555],
[44.99295,-93.27349],
[44.98961,-93.26774]]

mississipi = [utm.from_latlon(mississipi[i][0], mississipi[i][1]) for i in range(len(mississipi))]
mississipi = [[mississipi[i][0], mississipi[i][1]] for i in range(len(mississipi))]

# convert the coordinates to Polygon feature (vector data)
mississipi_line = LineString(mississipi)
river_centerline = gpd.GeoDataFrame([mississipi_line], geometry = [mississipi_line]) # create the river centerline
raster = rasterio.open('supplementary/sma2_river.tif')# EPSG:32615 utm 15n -- Minneapolis meters e n
fig, ax = plt.subplots(figsize=(17, 17))
rasterio.plot.show(raster, ax=ax)
# river_centerline.plot(ax=ax, facecolor='none', edgecolor='Red', linewidth=5)
fig = ax.get_figure()
ax.add_artist(ScaleBar(1)) # add scale bar to the map
plt.savefig('supplementary/sma2_river_buffer.jpg', bbox_inches='tight', pad_inches=0)
plt.close(fig) #don't plot the image yet
raster = rasterio.open('supplementary/sma2_river.tif')
fig, ax = plt.subplots(figsize=(13, 13))
plot.show(raster, ax=ax)
river_centerline.crs = 3174
buffer = river_centerline.buffer(500) # a buffer of 500 m around the river center line
bounding = Polygon([[475700,4986050],[481376,4986050],[481353,4983439],[475717,4983439]]) #base map extent

# Cropp the buffer polygon to the base map extent
cropped = buffer[0].intersection(bounding)
cropped_buffer = gpd.GeoDataFrame([cropped], geometry = [cropped])

cropped_buffer.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=2.5)
ax.add_artist(ScaleBar(1))
# river_centerline.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=5)

plt.show()


In [6]:
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout
print('How did you figure this out?')
text1 = widgets.Text(name='How did you figure this out?', placeholder='Type your answer here...')
# Display widget
display(text1)

# Output function
def out1():
    print("Your answer is successfully submitted!")
    
# Submit button
hourofci.SubmitBtn(text1, out1)


How did you figure this out?


Text(value='', placeholder='Type your answer here...')

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

Most likely you said something about drawing a line parallel to the river about 500 m away from it. Many of you probably drew a band around the river. 
This is something really easy to do with a GIS. We’ll learn more about that soon.


In [None]:
import osmnx as ox 

place = "Minneapolis, MN"
# tags = {"building": True}
# Run --->> list(risky_buildings['building'].unique()) to get the type of all 
# enclosed buildings that are ['yes', 'shelter', 'garage', 'apartments', 'house', 'shed']
tags = {'building':['yes', 'shelter', 'garage', 'apartments', 'house', 'shed']}
buildings = ox.geometries_from_place(place, tags)
# buildings = buildings.to_crs('epsg:3174')
buildings = buildings.to_crs('epsg:32615')

cropped_buffer.crs = 32615
risky_buildings = gpd.overlay(cropped_buffer, buildings[buildings.geometry.type=='Polygon'], how='intersection')

# Map 1
raster = rasterio.open('supplementary/sma2_river.tif')
fig, ax = plt.subplots(figsize=(13, 13))
rasterio.plot.show(raster, ax=ax)

risky_buildings.plot(ax = ax, figsize = (13,13))
cropped_buffer.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=2.5)

plt.show()

In [None]:
# Map 2
import folium
risky_buildings = risky_buildings.to_crs(epsg='4326')
cropped_buffer = cropped_buffer.to_crs(epsg='4326')
m = folium.Map(location = [45.01594481203401, -93.276039169286], tiles='OpenStreetMap' , zoom_start = 14) # tiles="Stamen Toner"

for _, r in cropped_buffer.iterrows():
    sim_geo = gpd.GeoSeries(r['geometry']) 
    geo_j = sim_geo.to_json()
    geo_j = folium.GeoJson(data=geo_j, 
                           style_function = lambda x: {'color': 'blue', 'weight': 3, 'fill' : False })
    geo_j.add_to(m)
    
for _, r in risky_buildings.iterrows():
    sim_geo = gpd.GeoSeries(r['geometry'])  
    geo_j = sim_geo.to_json()
    geo_j = folium.GeoJson(data=geo_j, 
                           style_function = lambda x: {'color': 'red', 'weight': 1,  'fillColor': 'YlGnBu'})
    folium.Popup(f"<i>House Number: {r['addr:housenumber']}, Street: {r['addr:street']}, Postal Code: {r['addr:postcode']}</i>" , min_width=100, max_width=200).add_to(geo_j)
    folium.Tooltip(f"click me!").add_to(geo_j)
    
    geo_j.add_to(m)
m

## An example of spatial analytics - Political Redistricting


In the US, after each decennial census, boundaries for electoral districts may be redrawn due to shifts in population distribution. This is intended to ensure that all the people within each electoral district are represented equitably at the next election.
Determining new boundaries is NOT EASY! It is highly political and there are many possible solutions. 
Fortunately, spatial analytics does provide many important and impartial measures to assess equitability.
Read on...


There are many spatial/geometric criteria that are used to assess the equitability of proposed redistricting schemes, including:
1. Compactness - can be measured by determining the ratio of the area of the proposed district shape to the area of a circle (the most compact shape) having the same perimeter.
2. Contiguity - exists when a single region is not interrupted by other areas, e.g. the contiguous US does not include Alaska and Hawaii.
3. Equal population - determined by adding up the individuals in each census reporting zone that falls within the proposed district boundary. 
4. preservation of existing political communities
5. partisan fairness
6. racial fairness


Take a few minutes to try your hand at redistricting at Districtr, "a free, public web tool for districting and community identification, brought to you by the MGGG Redistricting Lab".

Choose a state in the US, then try to draw 3 or 4 districts that have equal population. Then use the tabs and drop-downs to show how you did in equalizing other characteristics. 

Play for a few minutes, but be sure to come back here as we’ve got a lot more ground to cover!

<a href="https://districtr.org/">Districtr.org</a>

## Spatial modeling examples


Remember that spatial modeling is used to predict or understand spatial distributions.
There are lots of great examples of spatial modeling. Spatial models are used to:

- *Hydrologic modeling* - Determine where water will flow during a heavy rainfall to calculate how high the rivers will get and whether water will flood across the land.
- *Transportation modeling* - Plan the routes of delivery vehicles on a street network to ensure that all the destinations are visited while traveling the overall shortest distance. 
- *Groundwater modeling* - Groundwater can only be measured in widely dispersed wells, so spatial interpolation (we'll learn about that next) along with laws of physics that determine how water moves through the ground are the framework of these spatial models.

OK, that’s the introduction. Now let’s see if you understand the difference between spatial modeling and spatial analytics. Remember that spatial analytics involves statistics and geometric calculations while spatial modeling
is for understanding and prediction. 
The following are examples of spatial modeling and analytics. Which do you think each of these is?


In [3]:

# Multiple choice question
widget1 = widgets.RadioButtons(
    options = ['Analytics', 'Modeling'],
    description = 'A. You are in an urban area and want to find the nearest Starbucks store.', style={'description_width': 'initial'},
    layout = Layout(width='100%'),
    value = None
)

display(widget1)

hourofci.SubmitBtn(widget1)


RadioButtons(description='A. You are in an urban area and want to find the nearest Starbucks store.', layout=L…

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

In [4]:
# Multiple choice question
widget2 = widgets.RadioButtons(
    options = ['Analytics', 'Modeling'],
    description = 'B. You would like to know the shortest route to that Starbucks.', style={'description_width': 'initial'},
    layout = Layout(width='100%'),
    value = None
)

display(widget2)

hourofci.SubmitBtn(widget2)



RadioButtons(description='B. You would like to know the shortest route to that Starbucks.', layout=Layout(widt…

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

In [5]:
# Multiple choice question
widget3 = widgets.RadioButtons(
    options = ['Analytics', 'Modeling'],
    description = '''C. Given the distribution of existing Starbucks and of the daytime population of an area and  <br> 
    information about traffic congestion on various streets, choose the best location to build a new Starbucks.''', style={'description_width': 'initial'},
    layout = Layout(width='200%'),
    value = None
)

display(widget3)

hourofci.SubmitBtn(widget3)



RadioButtons(description='C. Given the distribution of existing Starbucks and of the daytime population of an …

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

Good, now let’s get started by learning the most important principle in spatial modeling and analytics:

## The First Law of Geography


<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="sma-3.ipynb">Click here to go to the next notebook.</a></font>