In this exercise we're going to get into some key spatial statistics. So far in this course we've mostly been visualizing spatial distributions and patterns. Here we will run statistical tests to determine whether nor not a pattern or spatial structure exists, and to test what kind of pattern (dispersed vs. random vs. clustered) is present.


In [0]:
# start by installing tools as usual
!pip install geopandas
!pip install descartes
!pip install mapclassify
!pip install pysal

In [0]:
#and importing tools...
import geopandas as gpd
import requests
import zipfile
import io
import matplotlib.pyplot as plt

%matplotlib inline

import seaborn as sns
import pandas as pd
import pysal as ps
import numpy as np


In [0]:
# And now, as usual, get the data
url = 'https://github.com/ropitz/spatialarchaeology/blob/master/gabii_spatial.zip?raw=true'
local_path = 'temp/'

print('Downloading shapefile...')
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
print("Done")

z.extractall(path=local_path) # extract to folder

filenames = [y for y in sorted(z.namelist()) for ending in ['dbf', 'prj', 'shp', 'shx'] if y.endswith(ending)] 
print(filenames)
dbf, shp, shx = [filename for filename in filenames]
gabii = gpd.read_file(local_path + 'gabii_SU_poly.shp')

In [0]:
# As you've done before, print out some information on the data to check it has loaded in ok
print("Shape of the dataframe: {}".format(gabii.shape))
print("Projection of dataframe: {}".format(gabii.crs))
gabii.tail() #last 5 records in dataframe

In [0]:
# As we've done before (returning to the Gabii finds data) get the non-spatial special finds data
sf_su = pd.read_csv("https://raw.githubusercontent.com/ropitz/gabii_experiments/master/spf_SU.csv")
sf_su.head()

In [0]:
#Then let's combine our polygons representing context shape and location
#with the special finds data
# We do this with a command called 'merge'

gabii_textools = gabii.merge(sf_su, on='SU')
gabii_textools.head()

In [0]:
#Let's pull all those find types out of the big list. These commands should look familiar because you've done them before.
types = ['Loom Weight','Spool','Spindle Whorl']
textile_tools = gabii_textools.loc[gabii_textools['SF_OBJECT_TYPE'].isin(types)]

# Now let's count up how many of these tools appear in each context (SU).
# This command will print out a list of the number of textile tools in each SU next to that SU number.
textile_tool_counts = textile_tools.groupby('SU')['SF_OBJECT_TYPE'].value_counts().unstack().fillna(0)


gts = gabii_textools.merge(textile_tool_counts, on='SU')
gts_new = gts.drop_duplicates(subset="SU")
gts_new.head()

Now plot your data to visualize it.

In [0]:
# Set up figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))
# Plot SUs
#gabii.plot(ax=ax, facecolor='0.85', linewidth=0)
# Quantile choropleth of deaths at the street level
gts_new.plot(column='Spool', scheme='fisher_jenks', ax=ax, \
        cmap='YlGn', legend=True, linewidth=3)
# Plot pumps
#xys = np.array([(pt.x, pt.y) for pt in pumps.geometry])
#ax.scatter(xys[:, 0], xys[:, 1], marker='^', color='k', s=50)
# Remove axis frame
ax.set_axis_off()
# Change background color of the figure
f.set_facecolor('0.75')
# Keep axes proportionate
plt.axis('equal')
# Title
f.suptitle('Spool Distribution', size=30)
# Draw
plt.show()

So far you've (rapidly) repeated the steps you've done in a previous exercise to visualize a spatial pattern - this time of the spools discovered while excavating at Gabii. 

Now how do you statistically test if there is a pattern? Because it's not so obvious from just looking at the distribution. We can start with some of the more basic tests: Moran's and local Moran's, which are tests for spatial autocorrelation. 

Read about [Moran's](https://mgimond.github.io/Spatial/spatial-autocorrelation.html)


In [0]:
# To start your Moran's statistical test, you need to create weights that define how strongly you think things near to one another influence one another.
# see the types of weights available to you
help(pysal.lib.weights)

In [0]:
#create sme weights. I've gone with KNN weights. Ignore the warnings, we know not all the SU areas connect up physically
gts_spool = gts_new[['SU','Spool']]
gts_spool_weights = pysal.lib.weights.KNN(gts_spool,5)

In [0]:
# Rename IDs to match those in the `segIdStr` column
gts_spool_weights.remap_ids(gts_spool.index)
# Row standardize the matrix
gts_spool_weights.transform = 'R'

In [0]:
#add the weights you've created to the attribute table
gts_spool['gts_spool_weights'] = pysal.lib.weights.lag_spatial(gts_spool_weights, gts_spool['Spool'])
gts_spool.head()

Now we want to standardize our counts.

Read about [standardization](http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/modeling-spatial-relationships.htm#GUID-DB9C20A7-51DB-4704-A0D7-1D4EA22C23A7) in spatial modelling to see why.

In [0]:
# standardize the counts of the number of spools in each context and the weights
gts_spool['spool_std'] = (gts_spool['Spool'] - gts_spool['Spool'].mean()) / gts_spool['Spool'].std()
gts_spool['w_spool_std'] = pysal.lib.weights.lag_spatial(gts_spool_weights, gts_spool['spool_std'])
gts_spool.head()

In [0]:
#get some more tools for the Moran test
from pysal.explore.esda.moran import Moran

In [0]:
# Run the Moran test
mi = Moran(gts_spool['Spool'], gts_spool_weights)
mi.I


What does the value above mean?
Read how to [interpret the results](https://www.statisticshowto.datasciencecentral.com/morans-i/).

Are your spools actually clustered?

Now plot the results.

The cluster/outlier type (COType) field distinguishes between a statistically significant cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surrounded primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH). Statistical significance is set at the 95 percent confidence level. 

In [0]:
# Setup the figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))
# Plot values
sns.regplot(x='spool_std', y='w_spool_std', data=gts_spool)
# Add vertical and horizontal lines
plt.axvline(0, c='k', alpha=0.5)
plt.axhline(0, c='k', alpha=0.5)
ax.set_xlim(-2, 7)
ax.set_ylim(-2.5, 2.5)
plt.text(3, 1.5, "HH", fontsize=25)
plt.text(3, -1.5, "HL", fontsize=25)
plt.text(-1, 1.5, "LH", fontsize=25)
plt.text(-1, -1.5, "LL", fontsize=25)
# Display
plt.show()

We started by looking at the global pattern - that is the overall pattern. But might there be local patterns inside the global one? To test this, we use the local variant of the Moran's test.

In [0]:
# get the tools for the local test
from pysal.explore.esda.moran import Moran_Local

In [0]:
# run the local test
lisa = Moran_Local(gts_spool['Spool'].values, gts_spool_weights)

The local test breaks the global pattern down to test for the presence of local clusters. You can check at each SU whether or not it is likely (in a statistical significance sense) for it to participate in a local cluster.

A positive value for I indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. Note that the local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value.

In [0]:
# Break observations into significant or not
gts_spool['significant'] = lisa.p_sim < 0.05
# Store the quadrant they belong to - the high-high, high-low, low-high, low-low from before are quads 1-4
gts_spool['quadrant'] = lisa.q
gts_spool['significant'][:20]
# true means it is in a cluster, false means it is not

In [0]:
# You can read out the calculated p values for each 
lisa.p_sim[:20]

In [0]:
#add this info back onto the spatial data
gabii_spool_lisa = gabii.merge(gts_spool, on='SU')
gabii_spool_lisa.head()

Now we can make a map showing which quadrant each SU belongs to, essentially a display of where the local clusters are located.

High-high = 4
Low-low = 2

In [0]:
# Setup the figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))

# Plot baseline su poly
gabii_spool_lisa.plot(column='quadrant',  ax=ax, \
        cmap='Blues', legend=True, linewidth=3)

ax.set_axis_off()
plt.axis('equal')
plt.show()

How would you interpret the results of this analysis?

This exercise ends here. Hopefully you've learned that there are statistical tests for spatial patterns and that these let us go beyond 'just visualizing' to look for patterns.