# 4. Spatial Smoothing, Regionalization, and Neighborhood Analysis
## 4.1 Spatial Smoothing
#### 4.1.1 Introduction
In many studies the target variable may need to modeled using a rate or normalized value due to a number of reasons including:
* Varying underlying population
* Variation in the age structure of underlying population
* Variation in environmental variables across a study area
* Arbitrarily shaped areal units

These rates can be heavily influenced by high or low raw counts. In order to reduce the effect of high raw value counts on the rates, spatial smoothing can be used to moderate extreme values at an observation.

#### 4.1.2 Mean and Median Smoothing
One of the simplest ways to conduct spatial smoothing is by using locally weighted smoothing. This technique takes values from surrounding areal units (based on the W matrix) and uses a weighted average of the weights to produce a value at a given observation.

In [2]:
# Conduct mean smoothing on a dataset
import pysal
import numpy as np
from pysal.esda import smoothing as sm
# tw = pysal.queen_from_shapefile("./data/census/stpete_cenacs_2015.shp") # W
w = pysal.open('./data/census/stpete_cenacs_2015.gal', 'r').read()
cenacs = pysal.open("./data/census/stpete_cenacs_2015.dbf", 'r')
e, b = np.array(cenacs[:,37]), np.array(cenacs[:,14])
if not w.id_order_set: w.id_order = range(1,len(cenacs) + 1)
rate = sm.Disk_Smoother(e, b, w)
rate.r

  r = e * 1.0 / b


array([[        nan],
       [ 0.28978714],
       [ 0.31227247],
       [ 0.28597484],
       [ 0.25410764],
       [ 0.38129535],
       [ 0.3666606 ],
       [ 0.42608267],
       [ 0.14037201],
       [ 0.40075638],
       [ 0.40457529],
       [ 0.35861035],
       [ 0.37758482],
       [ 0.36879074],
       [ 0.37546536],
       [ 0.30926673],
       [ 0.32052604],
       [ 0.31834301],
       [ 0.36707935],
       [ 0.36302275],
       [ 0.37051485],
       [ 0.31219222],
       [ 0.28216787],
       [ 0.28507133],
       [ 0.31776291],
       [ 0.30363948],
       [ 0.28300834],
       [ 0.29529016],
       [ 0.25881302],
       [ 0.41813164],
       [ 0.40099614],
       [ 0.24318322],
       [        nan],
       [ 0.30809461],
       [ 0.18840403],
       [        nan],
       [        nan],
       [ 0.32086027],
       [ 0.44807609],
       [ 0.33281751],
       [ 0.33285112],
       [ 0.32828625],
       [ 0.23658932],
       [ 0.26084882],
       [ 0.3406834 ],
       [ 0

<img src="./img/genx_rate.png" width="1000" height="1000"/></img>
<img src="./img/genx_rate_nosmooth.png" width="1000" height="1000"/></img>

In [3]:
rate = sm.Spatial_Median_Rate(e, b, w)
rate.r

  self.r = e * 1.0 / b
  r = func(a, **kwargs)


array([        nan,  0.26782433,  0.30886076,  0.26503841,  0.23707957,
        0.35096005,  0.37196731,  0.47841727,  0.15708973,  0.46934866,
        0.35096005,  0.37065678,  0.38178699,  0.30798479,  0.28250401,
        0.23047375,  0.28178694,  0.22751452,  0.28214548,  0.3102424 ,
        0.3125    ,  0.2952444 ,  0.28598131,  0.24973223,  0.31116121,
        0.23809524,  0.2897179 ,  0.30798479,  0.2482611 ,  0.37426901,
        0.4055666 ,  0.22675417,         nan,  0.2958128 ,  0.13786646,
               nan,         nan,  0.29641694,  0.45016066,  0.28301887,
        0.35353535,  0.29641694,  0.21516755,  0.21348315,         nan,
        0.29641694,  0.26728079,  0.30814354,  0.3590333 ,  0.316609  ,
        0.4137931 ,  0.28598131,  0.27785672,  0.31116121,  0.40517241,
        0.31820487,  0.31904306,  0.26041667,  0.48616125,  0.34199766,
        0.23329283,  0.28514192,  0.3125    ,  0.2061697 ,  0.25247319,
        0.27165354,  0.2635488 ,  0.33113407,  0.16538462,  0.16

#### 4.1.3 Spatial Missing Value Imputation


## 4.2 Regionalization
#### 4.2.1 Introduction
Spatial regions are groups of observations that share similar qualities to other observations they share a boundary with. Regionalization is a technique used assign regions to a spatial dataset and attribute region membership to the observation. The technique is analagous to unsupervised learning methods such as hierarchical or k-means clustering that determine cluster membership in feature space.
#### 4.2.2 max-p
Max-p is a regionalization method that does not require the number of regions (clusters) apriori. The only required values are a floor constraint for the minimum number of observations within a region. Let's calculate max-p for the median household income values in St. Petersburg, FL.

In [4]:
# Calculate max-p for a set of areal units
stnorm = pysal.open("./data/census/stpete_cenacs_2015_norms.dbf", 'r')
y = np.array(stnorm)
r = pysal.Maxp(w, y, floor = 5, floor_variable = np.ones((231, 1)), initial = 99)
r.regions

[['43', '42', '49', '24', '50'],
 ['197', '198', '192', '54', '134', '200', '204', '206'],
 ['65', '74', '73', '104', '113'],
 ['80', '79', '64', '66', '81'],
 ['228', '25', '3', '1', '4', '224'],
 ['223', '221', '202', '199', '7', '216', '219'],
 ['15', '161', '159', '155', '166'],
 ['119', '138', '62', '137', '59', '116', '114', '145'],
 ['148', '143', '177', '178', '149', '147', '146'],
 ['17', '47', '48', '11', '46', '12'],
 ['212', '213', '215', '126', '214'],
 ['160', '170', '169', '158', '168'],
 ['180', '187', '57', '186', '182', '56'],
 ['0', '230', '222', '227', '29', '39'],
 ['107', '58', '176', '172', '162'],
 ['205', '203', '208', '211', '125'],
 ['13', '21', '16', '27', '18'],
 ['77', '84', '91', '78', '88', '93', '83', '86'],
 ['142', '115', '141', '117', '118'],
 ['82', '85', '44', '76', '229', '87'],
 ['33', '23', '45', '22', '30'],
 ['226', '225', '220', '2', '5'],
 ['133', '127', '193', '135', '136'],
 ['174', '175', '181', '191', '188', '196'],
 ['150', '179', '151'

## 4.3 Neighborhood Analysis