## Spatial Modeling and Analytics

### Part 2 of 4
# Why does spatial analysis work?
       
### Tobler’s First Law of Geography and 
### Spatial Autocorrelation

## Reminder
<a href="#/slide-2-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

<br>
</br>
<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

In [6]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci


# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

Buried in a 1970 scholarly article on spatial modeling in *Economic Geography*, full of statistical equations, greek letters and 
subscripts, the late, famous Professor Waldo Tobler, then at the University of Michigan, came up with a blindingly simple way to justify his argument: 

#### *‘‘I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things’’*

<br>
<br>
<small>Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". *Economic Geography*, 46(Supplement): 234–240.

## Think about it...

In a city, are there ethnic areas where there are many restaurants and shops selling similar kinds of food? Is the income of people living in the high rises in the city center different from that of people living in the suburbs?

If the air quality is poor where you are, is it likely to be different 1 km away, 10 km away or 200 km away?

Imagine you are a gold prospector in California in 1850. Would your chances of finding gold be greater on a plot right next to where someone else has found gold, or on a plot two valleys away where it has not yet been found?

Elevation is the classic illustration of Tobler’s First Law (often abbreviated as TFL). 

Consider the graphic below showing point elevations on a small section of landscape.

On the landscape, places closer to one another generally have similar elevation. 

Elevations at places far apart tend to be less similar.

<img src='supplementary/sma3-3sm.png' alt='elevations'>

In spatial analytics we often want to know the strength of the relationship between values at nearby points. 

Traditional statistics gives us the tools to measure this through correlation analysis.

Correlation analysis is a general method of statistical evaluation used to study the strength of a relationship between two variables. This particular type of analysis is useful when a researcher wants to establish if there are possible connections between variables. 
<img src='supplementary/sma3-8.jpg' alt='correlation' width = 600>

Here is an example using real data of the relationship between occurences of the disease *brucellosis* and various attributes of the locations in which incidents of the disease were recorded. Lots of strong positive and negative correlation here.

<img src='supplementary/sma3-test.png' alt='correlation' width = 600>

<small>Ahmadkhani, Mohsen, and Alesheikh, Ali Asghar. "Space-time analysis of human brucellosis considering environmental factors in Iran." Asian Pacific J Trop Dis 7.5 (2017): 257-65.

For those who want to see an equation for calculating a correlation coefficient, here you go! 

<span style="color:red">Everyone else, feel free to go to the next slide</span> 

<img src='supplementary/sma3-7.png' alt='HK topo map' width="500">

<small>rxy = Pearson r correlation coefficient between x and y<br>
n = number of observations<br>
xi = value of x (for ith observation)<br>
yi = value of y (for ith observation)<br>

When we take correlation into the spatial context to demonstrate TFL, we are concerned about the correlation between the value of a single variable at different locations. How similar are the values at different locations?

Thus we measure

<center>Spatial <i>auto</i>correlation (auto = self)</center></p>

<br>
<br>
<small>The math for this is way beyond a beginner lesson. Here’s <a href="https://gistbok.ucgis.org/bok-topics/global-measures-spatial-association">a link</a> and <a href="https://gistbok.ucgis.org/bok-topics/local-measures-spatial-association">another</a> if you must know.

When nearby observations have similar values, the map shows  <span style="color:red">positive</span>
spatial autocorrelation. 

When nearby observations tend to have very contrasting values then the map shows  <span style="color:red">negative</span> spatial autocorrelation. 

Think of it like this:

### Strong positive spatial autocorrelation is an illustration of TFL


## Why is spatial autocorrelation so important?

When something displays strong spatial autocorrelation (like elevation), then we can do spatial modeling! 

By knowing the elevation at some points and knowing that elevation is strongly spatially autocorrelated, we can calculate the elevation at any location between them.

Using this knowledge is the process of <b>Spatial Interpolation.

## Why is spatial interpolation important?

Fortunately, you don’t have to measure things like air pollution, temperature, ozone level, or air pressure everywhere to be able to map its value over large areas. Air temperature is typically measured at only a few points (often airports) but we often see maps like this showing the temperature across the landscape. 

<img src='supplementary/sma3-14.png' alt='correlation' width = 500>

## Statistically speaking, spatial autocorrelation is REALLY important!

Strong spatial autocorrelation is usually taken to indicate that there is something of interest in the mapped distribution of values. This is particularly important in data like census or disease counts that is mapped over regions like counties or states. If we see (or measure) strong spatial autocorrelation between counties with high disease counts, that likely means there’s something going on that is causing a hot spot. 

Also, the presence of strong spatial autocorrelation (either positive or negative) implies information redundancy (i.e. you don’t need all your data points to capture the distribution). This has really important implications for a lot of traditional statistical analyses - in fact the existence of spatial autocorrelation invalidates the results of many traditional statistical analysis methods!

If you want to dive deep into this - <a href='https://gistbok.ucgis.org/bok-topics/spatial-autocorrelation'>here’s a great source</a>. Or just store that thought in your brain until you see someone doing traditional statistics on spatial data.

In [7]:
user_agent = getpass.getuser()
lesson = 'spatial-modeling-analytics'
lesson_level = 'beginner'

print('Which of these following situations demonstrate Tobler’s First Law of Geography?')
# Multiple choice question
widget1 = widgets.RadioButtons(
    options = ['No', 'Yes'],
    description = 'A. Soil types are similar within 10 feet of a soil pit.', style={'description_width': 'initial'},
    layout = Layout(width='100%'),
    value = None
)

display(widget1)

hourofci.SubmitBtn(user_agent,lesson,lesson_level,'3A',widget1)




Which of these following situations demonstrate Tobler’s First Law of Geography?


RadioButtons(description='A. Soil types are similar within 10 feet of a soil pit.', layout=Layout(width='100%'…

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

In [8]:

# Multiple choice question
widget2 = widgets.RadioButtons(
    options = ['No', 'Yes'],
    description = 'B. Cookies on the shelf in the grocery store near where your favorite cookie is found will also be delicious.', style={'description_width': 'initial'},
    layout = Layout(width='100%'),
    value = None
)

display(widget2)

hourofci.SubmitBtn(user_agent,lesson,lesson_level,'3B',widget2)



RadioButtons(description='B. Cookies on the shelf in the grocery store near where your favorite cookie is foun…

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

In [9]:

# Multiple choice question
widget3 = widgets.RadioButtons(
    options = ['No', 'Yes'],
    description = 'C. Families in neighborhoods tend to have similar demographic characteristics.', style={'description_width': 'initial'},
    layout = Layout(width='100%'),
    value = None
)

display(widget3)

hourofci.SubmitBtn(user_agent,lesson,lesson_level,'3C',widget3)


RadioButtons(description='C. Families in neighborhoods tend to have similar demographic characteristics.', lay…

Button(description='Submit', icon='check', layout=Layout(height='auto', width='auto'), style=ButtonStyle())

Output()

If you want to read more about Tobler's First Law, here are some interesting reads

- Miller, H.J. (2004), Tobler's First Law and Spatial Analysis. Annals of the Association of American Geographers, 94: 284-289. https://doi.org/10.1111/j.1467-8306.2004.09402005.x

- Daniel Z. Sui (2004) Tobler's First Law of Geography: A Big Idea for a Small World?, Annals of the Association of American Geographers, 94:2, 269-277, DOI: 10.1111/j.1467-8306.2004.09402003.x

OK, let’s try running some code chunks to illustrate TFL and spatial autocorrelation.

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" 
href="try_it1_sma.ipynb">Click here to go to Try it #1.</a></font>