# Chapter 4
## Descriptive Spatial Statistics

# Learning Objectives
* Explain central tendency as applied in a spatial context
* Define spatial measures of dispersion and recognize possible applications
* Identify potential limitations and locational issues associated with applied descriptive spatial statistics

# Descriptive Spatial Statistics
* <b> Descriptive Spatial Statistics</b>, also referred to as <b>Geostatistics</b>, are the spatial equivalent to the basic descriptive statistics.
* They can be used to summarize point patterns and the dispersion of some phenomena.

# Mean Center  
* Mean center represents an average center of a number of coordinates.
* This is calculated by averaging the X coordinates and Y coordinates separately and using the average for the Mean Center coordinate
* Considered the Center of Gravity
* Can be strongly affected by outliers  

# Mean Center Formula
$$ \overline{X_{c}} = \dfrac{\sum_{}^{} X_{i}}{n} \quad and \quad \overline{Y_{c}} = \dfrac{\sum_{}^{} Y_{i}}{n} $$
Mean of all the x coordinates and the mean of all the y coordinates.

In [18]:
import pandas as pd
tb4p3 = pd.read_excel('../data/ClassData.xlsx', index="4.4") 
tb4p3
#Read Table 4.3 from the Book

Unnamed: 0,Point,X,Y,W
0,A,2.8,1.5,5
1,B,1.6,3.8,20
2,C,3.5,3.3,8
3,D,4.4,2.0,4
4,E,4.3,1.1,6
5,F,5.2,2.4,5
6,G,4.9,3.5,3


In [19]:
XMean = tb4p3["X"].sum() / tb4p3["X"].count()
print("The mean of X is: ", XMean)
#Use Built in Function for mean
tb4p3["X"].mean()

The mean of X is:  3.8142857142857145


3.8142857142857145

In [20]:
MeanCenter = tb4p3["X"].mean(), tb4p3["Y"].mean()
print(MeanCenter)
# The Mean Center is the mean of x and the mean of y

(3.8142857142857145, 2.5142857142857147)


# Mean Center

<table>
<tr>
<td>
<img src="../figures/book/McGrew-et-al_3E---Figure-4-3.jpg" alt="Figure 4.3" width="300"/>
</td>
<td>
<img src="../figures/book/McGrew-et-al_3E---Figure-4-4.jpg" alt="Figure 4.4" width="300"/>
</td>
</tr>
</table>


# Mean Center Application

<img src="../figures/book/McGrew-et-al_3E---Figure-4-5.jpg" alt="Figure 4.5" width="700"/>
https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/mean-center.htm

# Central Tendency in a Spatial Context
* <b>Weighted Mean Center </b>
    * Points can be weighted meaning they can be given more or less influence on the calculation of the mean center.
    * Points could represent cities, frequencies, volume of sales or some other value that will affect the points influence.


# Central Tendency in a Spatial Context¶
* Weighted Mean Center (Cont.)
    * Analogous to frequencies in the calculation of grouped statistics like the weighted mean.
    * Influenced by large frequencies of a point.

# Weighted Mean Center Formula
$$ \overline{X}_{wc} = \dfrac{\sum f_{i}X_{i}}{\sum f_{i}} \quad and \quad \overline{Y}_{wc} = \dfrac{\sum f_{i}Y_{i}}{\sum f_{i}} $$
* Where:
$$ \overline{X}_{wc} = weighted\ mean\ center\ of\ X $$
$$ \overline{Y}_{wc} = weighted\ mean\ center\ of\ Y $$
$$ f_{i} = frequency\ (weight)\ of\ point\ i $$

In [22]:
tb4p3["Xw"] =  tb4p3["X"] * tb4p3["W"]
tb4p3["Yw"] =  tb4p3["Y"] * tb4p3["W"]
tb4p3

Unnamed: 0,Point,X,Y,W,Xw,Yw
0,A,2.8,1.5,5,14.0,7.5
1,B,1.6,3.8,20,32.0,76.0
2,C,3.5,3.3,8,28.0,26.4
3,D,4.4,2.0,4,17.6,8.0
4,E,4.3,1.1,6,25.8,6.6
5,F,5.2,2.4,5,26.0,12.0
6,G,4.9,3.5,3,14.7,10.5


In [25]:
WeightedMeanCenterX = tb4p3["Xw"].sum() / tb4p3["W"].sum()
WeightedMeanCenterY = tb4p3["Yw"].sum() / tb4p3["W"].sum()
print(WeightedMeanCenterX, WeightedMeanCenterY)

3.099999999999999 2.8823529411764706


# Weighted Mean Center
<img src="../figures/book/McGrew-et-al_3E---Figure-4-6.jpg" alt="Figure 4.6" width="300"/>

# Least Squares Property
* Analogous to the least squares for a mean
    * Sum of squared deviations about mean is zero
    * Sum of squared deviations about a mean is less than the sum of squared deviations about any other number
* Deviations are distances
    * Calculated as the Euclidean distance

## Euclidean Median
* Considered the Median Center
* Used when determining the central location that minimizes the unsquared rather than the squared
* Can be weighted
* Algorithmic solution rather than formulaic (requires multiple steps, but easy for a computer).
* https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/median-center.htm

# Euclidean Median
* Used in economic geography to solve the “Weber” problem which searches for the “best” location for an industry.
* The best location will result in
    * Minimized transportation costs of raw material to factory
    * Minimized transportation costs of finished products to the market

# Euclidean Median
* Heavily used in public and private facility location
* Used to minimize the average distance a person must travel to reach a destination.
    * Useful in location of fire stations, police stations, hospitals and care centers
    * Used in conjunction with demographics to select store locations that will target the desired consumers


# Standard Distance
* Analogous to the Standard Deviation in descriptive statistics
* Measures the amount of absolute dispersion in a point pattern
* Uses the straight-line Euclidean distance of each point from the mean center
* https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/standard-distance.htm

# Standard Distance
* Like Standard Deviation, strongly influenced by extreme locations
$$ S_{D} = \sqrt{ \dfrac{\sum(X_{i}-\overline{X}_{c})^{2} + \sum(Y_{i}-\overline{Y}_{c})^{2}}{n}} $$
   * There is an alternative formula in the book (4.13)
* Weighted standard distance can be used for problems that use the weighted mean center

In [60]:
#Like table 4.5
tb4p5 = pd.read_excel('../data/ClassData.xlsx', index="4.4") 
tb4p5["sqX"] = tb4p5["X"] * tb4p5["X"]
tb4p5["sqY"] = tb4p5["Y"] ** 2
tb4p5

Unnamed: 0,Point,X,Y,W,sqX,sqY
0,A,2.8,1.5,5,7.84,2.25
1,B,1.6,3.8,20,2.56,14.44
2,C,3.5,3.3,8,12.25,10.89
3,D,4.4,2.0,4,19.36,4.0
4,E,4.3,1.1,6,18.49,1.21
5,F,5.2,2.4,5,27.04,5.76
6,G,4.9,3.5,3,24.01,12.25


In [61]:
sqXSum = tb4p5["sqX"].sum()
sqYSum = tb4p5["sqY"].sum()
sqXMean = tb4p5["X"].mean() ** 2
sqYMean = tb4p5["Y"].mean() ** 2
n = tb4p5["X"].count()
SD = (((sqXSum/n)- sqXMean) + ((sqYSum/n)- sqYMean)) ** 0.5
SD
#The answer is slightly different than book because I never rounded

1.5239583260679521

In [68]:
#Like table 4.6
tb4p5 = pd.read_excel('../data/ClassData.xlsx', index="4.4") 
tb4p5["WsqX"] = tb4p5["X"] * tb4p5["X"] * tb4p5["W"]
tb4p5["WsqY"] = tb4p5["Y"] ** 2 * tb4p5["W"] 
tb4p5

Unnamed: 0,Point,X,Y,W,WsqX,WsqY
0,A,2.8,1.5,5,39.2,11.25
1,B,1.6,3.8,20,51.2,288.8
2,C,3.5,3.3,8,98.0,87.12
3,D,4.4,2.0,4,77.44,16.0
4,E,4.3,1.1,6,110.94,7.26
5,F,5.2,2.4,5,135.2,28.8
6,G,4.9,3.5,3,72.03,36.75


In [69]:
WsqXSum = tb4p5["WsqX"].sum()
WsqYSum = tb4p5["WsqY"].sum()
Wsum = tb4p5["W"].sum()
WXMeansq = WeightedMeanCenterX ** 2
WYMeansq = WeightedMeanCenterY ** 2
SDW = (((WsqXSum/Wsum)- WXMeansq) + ((WsqYSum/Wsum)- WYMeansq)) ** 0.5
SDW
#The answer is slightly different than book because I never rounded

1.6929734698305754

# Standard Deviational Ellipse
Represents the average distance points vary from the mean center
$$ SD_{x} = \sqrt{\dfrac{\sum(X_{i}-\overline{X}_{c})^2}{n}} \quad and \quad SD_{y} = \sqrt{\dfrac{\sum(Y_{i}-\overline{Y}_{c})^2}{n}} $$

# Standard Deviational Ellipse

<table>
<tr>
<td>
<img src="../figures/book/McGrew-et-al_3E---Figure-4-11.jpg" alt="Figure 4.11" width="250"/>
</td>
<td>
<img src="../figures/book/McGrew-et-al_3E---Figure-4-12.jpg" alt="Figure 4.12" width="400"/>
</td>
</tr>
</table>
https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/directional-distribution.htm

# Distance Calculations
* Euclidean Distance
$$ d_{ij} = \sqrt{(X_{i}-X_{j})^{2} + (Y_{i}-Y_{j})^{2}} $$
* General Form
$$ d_{ij} = [ {(X_{i}-X_{j})^{k} + (Y_{i}-Y_{j})^{k}} ]^{1/k} $$


# Distance Calculations
* Distance does not always have to be straight line distance (Euclidean Distance)
    * Manhattan Distance (k = 1 in general form)
    * Network Distance 

## Distance Calculations
<img src="../figures/book/McGrew-et-al_3E---Figure-4-7.jpg" alt="Figure 4.7" width="400"/>

# Coefficient of Variation 
* Calculated  by dividing the standard deviation by the mean
* Measures the relative dispersion of values
* No analogous methods exists for measuring spatial dispersion 
* Dividing the standard distance by the mean center does not provide meaningful results

# Relative Distance
* To obtain a measure of relative dispersion, the standard distance must be divided by some measure of regional magnitude
* Region magnitude cannot be mean center
* Some standard area is used to normalize the measurement. Radius of a circle the same size is often used (minimum bounding circle).

# Relative Distance
$$ R_{D} = \dfrac{S_{D}}{r_{a}} $$
* This measure allows direct comparison of the dispersion of different point patterns from different areas, even if the areas are of varying sizes.
* https://pro.arcgis.com/en/pro-app/tool-reference/data-management/minimum-bounding-geometry.htm

# Relative Distance
<table>
<tr>
<td>
<img src="../figures/book/McGrew-et-al_3E---Figure-4-13.jpg" alt="Figure 4.13" width="200"/>
</td>
<td>
<img src="../figures/book/McGrew-et-al_3E---Figure-4-14.jpg" alt="Figure 4.14" width="200"/>
</td>
</tr>
</table>


# Linear Directional Mean (LDM)
* Identifies the typical or general mean direction for a set of lines
$$ LDM = arctan \dfrac{\sum_{i=1}^{n}sin\theta_{i}}{\sum_{i=1}^{n}cos\theta_{i}} $$

# Linear Directional Mean (LDM)
<img src="../figures/book/McGrew-et-al_3E---Figure-4-8.jpg" alt="Figure 4.8" width="400"/>

https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/linear-directional-mean.htm

# Linear Directional Mean (LDM)
<img src="../figures/book/McGrew-et-al_3E---Figure-4-9.jpg" alt="Figure 4.9" width="400"/>

# Limitations and Locational Issues
Geographers should look at geostatistics very carefully
* Interpretation can be difficult
    * The mean center for a high income area could be in a low income area
* Should view geostatistics as general indicators of location instead of precise measurements

* Point pattern analysis an benefit from consideration of other possible pattern characteristics
 * Using the knowledge of descriptive statistics like skewness and kurtosis can offer insights about the symmetry of the pattern that geographers could find useful when comparing point patterns
 * Value in comparing degrees of clustering and dispersal in different point patterns thought measuring spatial kurtosis levels