# Practical 5 Point Density Functions: k-functions and Nearest Neighbors in PySAL

## Task 1: k-nearest neighbor weights

The neighbors for a given observations can be defined using a k-nearest neighbor criterion. For example we could use the the centroids of our 5x5 lattice as point locations to measure the distances. 

First, we import numpy to create the coordinates as a 25x2 numpy array named data:

In [None]:
import numpy as np
x,y=np.indices((5,5))
x.shape=(25,1)
y.shape=(25,1)
data=np.hstack([x,y])

then define the KNN weight as:

In [None]:
wknn3 = pysal.weights.KNN(data, k = 3)
wknn3.neighbors[0]

>>> [1, 5, 6]

wknn3.s0

>>> 75.0


For efficiency, a KDTree is constructed to compute efficient nearest neighbor queries. To construct many K-Nearest neighbor weights from the same data, a convenience method is provided that prevents re-constructing the KDTree while letting the user change aspects of the weight object. By default, the reweight method operates in place:

In [None]:
w4 = wknn3.reweight(k=4, inplace=False)
w4.neighbors[0]

>>> [1,5,6,2]

l1norm = wknn3.reweight(p=1, inplace=False)
l1norm.neighbors

>>> [1,5,2]

set(w4.neighbors[0]) == set([1, 5, 6, 2])

>>> True

w4.s0

>>> 100.0

w4.weights[0]

>>> [1.0, 1.0, 1.0, 1.0]

Alternatively, we can use a utility function to build a knn W straight from a shapefile:

In [None]:
wknn5 = pysal.weights.KNN.from_shapefile(pysal.examples.get_path('columbus.shp'), k=5)
wknn5.neighbors[0]

>>> [2, 1, 3, 7, 4]

Or from a dataframe:

In [None]:
import geopandas as gpd
df = gpd.read_file(ps.examples.get_path('baltim.shp'))
k5 = pysal.weights.KNN.from_dataframe(df, k=5)

## Task 2: Distance band weights

Knn weights ensure that all observations have the same number of neighbors. [3] An alternative distance based set of weights relies on distance bands or thresholds to define the neighbor set for each spatial unit as those other units falling within a threshold distance of the focal unit:

In [None]:
wthresh = pysal.weights.DistanceBand.from_array(data, 2)
set(wthresh.neighbors[0]) == set([1, 2, 5, 6, 10])

>>> True

set(wthresh.neighbors[1]) == set( [0, 2, 5, 6, 7, 11, 3])

>>> True

wthresh.weights[0]

>>> [1, 1, 1, 1, 1]

wthresh.weights[1]

>>> [1, 1, 1, 1, 1, 1, 1]


As can be seen in the above example, the number of neighbors is likely to vary across observations with distance band weights in contrast to what holds for knn weights.

In addition to constructing these from the helper function, Distance Band weights. For example, a threshold binary W can be constructed from a dataframe:

In [None]:
import geopandas as gpd
df = gpd.read_file(ps.examples.get_path('baltim.shp'))
ps.weights.DistanceBand.from_dataframe(df, threshold=6, binary=True)

Distance band weights can be generated for shapefiles as well as arrays of points. [4] First, the minimum nearest neighbor distance should be determined so that each unit is assured of at least one neighbor:

In [None]:
thresh = pysal.min_threshold_dist_from_shapefile("../pysal/examples/columbus.shp")
thresh

>>> 0.61886415807685413

with this threshold in hand, the distance band weights are obtained as:

In [None]:
wt = pysal.weights.DistanceBand.from_shapefile("../pysal/examples/columbus.shp", threshold=thresh, binary=True)
wt.min_neighbors

>>> 1

wt.histogram

>>> [(1, 4), (2, 8), (3, 6), (4, 2), (5, 5), (6, 8), (7, 6), (8, 2), (9, 6), (10, 1), (11, 1)]

set(wt.neighbors[0]) == set([1,2])

>>> True

set(wt.neighbors[1]) == set([3,0])

>>> True

Distance band weights can also be specified to take on continuous values rather than binary, with the values set to the inverse distance separating each pair within a given threshold distance. We illustrate this with a small set of 6 points:

In [None]:
points = [(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]
wid = pysal.weights.DistanceBand.from_array(points,14.2,binary=False)
wid.weights[0]

>>> [0.10000000000000001, 0.089442719099991588]

If we change the distance decay exponent to -2.0 the result is so called gravity weights:

In [None]:
wid2 = pysal.weights.DistanceBand.from_array(points,14.2,alpha = -2.0, binary=False)
wid2.weights[0]

>>> [0.01, 0.0079999999999999984]

## TASK 3: Kernel Weights

A combination of distance based thresholds together with continuously valued weights is supported through kernel weights:

In [None]:
points = [(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]
kw = pysal.Kernel(points)
kw.weights[0]

>>> [1.0, 0.500000049999995, 0.4409830615267465]

kw.neighbors[0]

>>> [0, 1, 3]

kw.bandwidth

>>> array([[ 20.000002],
>>>        [ 20.000002],
>>>        [ 20.000002],
>>>        [ 20.000002],
>>>        [ 20.000002],
>>>        [ 20.000002]])

The bandwidth attribute plays the role of the distance threshold with kernel weights, while the form of the kernel function determines the distance decay in the derived continuous weights (the following are available: ‘triangular’,’uniform’,’quadratic’,’epanechnikov’,’quartic’,’bisquare’,’gaussian’). In the above example, the bandwidth is set to the default value and fixed across the observations. The user could specify a different value for a fixed bandwidth:

In [None]:
kw15 = pysal.Kernel(points,bandwidth = 15.0)
kw15[0]

>>> {0: 1.0, 1: 0.33333333333333337, 3: 0.2546440075000701}

kw15.neighbors[0]

>>> [0, 1, 3]

kw15.bandwidth

>>> array([[ 15.],
>>>        [ 15.],
>>>        [ 15.],
>>>        [ 15.],
>>>        [ 15.],
>>>        [ 15.]])

which results in fewer neighbors for the first unit. Adaptive bandwidths (i.e., different bandwidths for each unit) can also be user specified:

In [None]:
bw = [25.0,15.0,25.0,16.0,14.5,25.0]
kwa = pysal.Kernel(points,bandwidth = bw)
kwa.weights[0]

>>> [1.0, 0.6, 0.552786404500042, 0.10557280900008403]

kwa.neighbors[0]

>>> [0, 1, 3, 4]

kwa.bandwidth
>>> array([[ 25. ],
>>>        [ 15. ],
>>>        [ 25. ],
>>>        [ 16. ],
>>>        [ 14.5],
>>>        [ 25. ]])

Alternatively the adaptive bandwidths could be defined endogenously:

In [None]:
kwea = pysal.Kernel(points,fixed = False)
kwea.weights[0]

>>> [1.0, 0.10557289844279438, 9.99999900663795e-08]

kwea.neighbors[0]

>>> [0, 1, 3]

kwea.bandwidth

>>> array([[ 11.18034101],
>>>        [ 11.18034101],
>>>        [ 20.000002  ],
>>>        [ 11.18034101],
>>>        [ 14.14213704],
>>>        [ 18.02775818]])

Finally, the kernel function could be changed (with endogenous adaptive bandwidths):

In [None]:
kweag = pysal.Kernel(points,fixed = False,function = 'gaussian')
kweag.weights[0]

>>> [0.3989422804014327, 0.2674190291577696, 0.2419707487162134]

kweag.bandwidth

>>> array([[ 11.18034101],
>>>        [ 11.18034101],
>>>        [ 20.000002  ],
>>>        [ 11.18034101],
>>>        [ 14.14213704],
>>>        [ 18.02775818]])

More details on kernel weights can be found in Kernel (http://pysal.readthedocs.io/en/latest/library/weights/Distance.html#pysal.weights.Distance.Kernel).

All kernel methods also support construction from shapefiles with Kernel.from_shapefile and from dataframes with Kernel.from_dataframe.


* Please refer to http://pysal.readthedocs.io/en/latest/users/tutorials/weights.html#k-nearest-neighbor-weights for further study on the nearest neighbor and kernel weights in PySAL for analysing Point Patterns.