# DNAInfo Silhouettes

This script looks at dissimilarity of neighborhood drawings by neighborhood, and calculates the silhouette for each drawing. The silhouette is the difference in the average dissimilarity of a drawing from all other drawings in its own neighborhood and the average dissimilarity of the same drawing from all other drawings in the nearest adjacent neighborhood, divided by whichever value in the difference is the larger. The average over all the drawings is a global measure of silhouette.

The silhouette value is a measure of how similar a drawing is to its own neighborhood (cohesion) compared to other neighborhoods (separation). The silhouette ranges from -1 to 1, where a high value indicates that the drawing is well matched to its own neighborhood and poorly matched to adjacent neighborhoods. If most drawings have a high value, then the neighborhood configuration is given to cohesive, similarly defined neighborhoods. If many points have a low or negative value, then the neighborhood configuration may be given to a chaotic, overlapping or contested configuration.

In order to make computation slightly easier, the data is split according to geography- Manhattan, Bronx and Staten Island are assumed to be independent as they are separated by water. Brooklyn and Queens are processed together to account for edge effects at the Brooklyn-Queens border.

This script looks at four different measures of similarity: centroid distance, Hausdorff Distance, discrete Fréchet distance, and Jacquard distance. The centroid distance measure simplifies a drawing by using its centroid, however as this may reduce heterogeneity in drawings (as the centroid effectively averages a drawing) I also investigate shape-based measure of similarity.
Making comparisons between shapes is a difficult problem, and can be computationally expensive. There are several ways to make pairwise comparisons of simple polygons, in particular we looked at two leading vector-based candidates: Hausdorff distance (Equation 1) and Fréchet distance (Equation 2), and one set-theoretic (overlay) approach: Jacquard distance (Equation 3).

Equation 1:<center>
$ d_{H}(A,B) = \max_{a\in A}\left \{ \min_{b\in B} \left \{ d(a,b)) \right \}  \right \} $ 
</center>

where $A$ and $B$ are polygons with vertices $a$ and $b$ respectively, and $d$ is the Euclidean distance between a given vertex in $A$ and a given vertex in $B$.

Equation 2: <center>
$d_{F}(P,Q) = \min_{\substack{\alpha[0,1]\rightarrow [0,N]\\ \beta[0,1]\rightarrow [0,M]}} \{\max_{ t \in [0,1]} \{d(P(\alpha(t)), Q(\beta(t)))\}\}
$ 
</center>

where $P$ and $Q$ are polygons of length $N$ and $M$ respectively. The position of a point traversing the boundary of $P$ at time $t$ is given by $P\alpha(t)$ and similarly for $Q$ by $Q\beta(t)$, where $\alpha(0) = 0$ and $\alpha(1) = N$ and similarly $\beta(0) = 0$ and $\beta(1) = M$. The Fréchet distance is thus the minimization of the set of maximum distances ($d$ is assumed to be Euclidean distance) between points traversing the boundaries of $P$ and $Q$ for all times $t$.

Equation 3:<center>
$d_{J}\left ( R,S \right ) = \frac{\left | R \cup S \right |-\left | R \cap S \right |}{\left | R \cup S \right |}
$
</center>

where $R$ and $S$ are polygons, and $\cup$ indicates union and $\cap$ intersection.

Hausdorff distance (Eq. 1) is conceptually simple. It is the largest distance in the set of distances between each vertex in $A$ and the nearest vertex to it in $B$. Fréchet distance (Eq. 2) is slightly more difficult to understand. Imagine that a man is walking his dog on a leash, the man is walking on the boundary of one polygon and his dog on the other. At each point in time, the length of leash required by the man to walk his dog is recorded and the maximum returned when the walk is finshed (the whole polygon had been walked in both cases). Now vary the starting position of the man and his dog, recording for all possible starting positions the maximum leash length needed each time. The Fréchet distance is the smallest distance of these maximum distances. The Jacquard distance (Eq. 3) is effectively 1 - the 'intersection over the union' of the two shapes, here it is given as a ratio of areas: the area of union minus the area of intersection, over the area of union. Completely overlapping (equal) drawings have a Jacquard distance of 0, and completely separate (disjoint) drawings have a Jacquard distance of 1.


In [1]:
from __future__ import division

import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
import shapely
import numpy as np
import scipy.stats as sp
from scipy.spatial import cKDTree
import Pycluster as pc

%matplotlib inline

In [2]:
# DNAInfo Data
ny_data = r'C:\Users\djl543\OneDrive\Draw-Your-Neighborhood-master\NYC_Analysis_wgs84.geojson'
#ny_data = r'C:\Users\Dan\OneDrive\Draw-Your-Neighborhood-master\NYC_raw_wgs84.geojson'
ny_nhoods = gpd.read_file(ny_data)

# Set the crs of the input geojson to WGS84
ny_nhoods.crs = {'init': 'epsg:4326'}

# Tranform to projected coordinate system - EPSG:32118 - New York Long Island, NAD83-based projection in metres.
ny_nhoods = ny_nhoods.to_crs({'init':'epsg:32118'})

ny_nhoods.head()

Unnamed: 0,geometry,neighborhoodLive,nhood,otherNeighborhood,shapeID,yearsLived
0,"POLYGON ((311834.6421283125 77460.83003323362,...",Allerton,Allerton,,259,2
1,"POLYGON ((311305.0904452928 78317.74772848203,...",Allerton,Allerton,,298,0
2,"POLYGON ((311189.369271409 78288.7412320913, 3...",Allerton,Allerton,,5964,2
3,"POLYGON ((311199.2696228614 77331.72545915398,...",Allerton,Allerton,,59957,15
4,"POLYGON ((311137.8304619553 78879.66082988337,...",Allerton,Allerton,,61660,1


In [3]:
# Add Centroids, Area, Shape fields to data

# Calculate the geometric centroid of each record
ny_nhoods['centroid'] = ny_nhoods['geometry'].centroid

In [4]:
# Work out neighborhood median centers based on drawing centroids

# Group the data
ny_grp = ny_nhoods.groupby('nhood')

# Now calulate 
n_hood_centroids = {}
count = 0
for nid, data in ny_grp:
    centroids = []
    for row in data.iterrows():
        centroids.append(np.array(row[1]['centroid']))
    ncenter, nmask = pc.clustercentroids(centroids,method='m')
    n_hood_centroids[count] = [nid,shapely.geometry.Point(ncenter[0]),len(data)]
    count+=1
    
# Finally, put the median center information and counts of drawings into a new geopandas dataframe
n_centroids = gpd.GeoDataFrame(n_hood_centroids).transpose()
# Name the columns
n_centroids.columns = ["neighborhood","geometry","count"]
# Add a rank column based upon the number of drawings
n_centroids['rank'] = n_centroids['count'].rank(ascending=False)
# Set the CRS of n-centroids
n_centroids.crs = {'init': 'epsg:32118'}

n_centroids.head()

Unnamed: 0,neighborhood,geometry,count,rank
0,Allerton,POINT (311643.6932491333 77597.82904214269),104,84.5
1,Alphabet City,POINT (301709.2279281036 62002.55728907162),140,68.0
2,Annadale,POINT (284646.1229115428 41711.41527360405),94,88.5
3,Arden Heights,POINT (284048.4499701513 43075.44375345457),77,106.5
4,Arrochar,POINT (293869.7801887682 47913.39909703079),40,162.0


In [5]:
# Let's add borough data so that we can aggregate by NYC borough

# New York City borough boundaries.
nyc = gpd.read_file(r'C:\Users\djl543\OneDrive\Draw-Your-Neighborhood-master\NYC.shp')

# Project to EPSG:32118 - New York Long Island, NAD83-based projection in metres.
nyc = nyc.to_crs({'init':'epsg:32118'})

# Do a spatial join of nyc on n_centroids (point in polygon)
n_centroids = gpd.sjoin(n_centroids, nyc, how="left", op='within')

# Now append the BoroName to the Drawings based on the neighborhood.
ny_nhoods = ny_nhoods.merge(n_centroids.ix[:,['neighborhood','BoroName']],left_on='nhood',right_on='neighborhood')
del ny_nhoods['neighborhood']

In [6]:
# Split the data into analytical components
manhattan = ny_nhoods[ny_nhoods['BoroName']=='Manhattan']
bronx = ny_nhoods[ny_nhoods['BoroName']=='Bronx']
brooklyn_queens = ny_nhoods[ny_nhoods['BoroName'].isin(['Brooklyn','Queens'])]
statenisland = ny_nhoods[ny_nhoods['BoroName']=='Staten Island']

## Calculate Centroid Silhouettes

In [None]:
# Calculate silhouettes using some helper functions.
# This is a slow approach because for each drawing it looks at every other drawing.

def calc_ai(within,centroid,nhood):
    # If the drawing is within the neighborhood, divie by n-1, otherwise by n.
    if within:
        return nhood['geometry'].apply(lambda x: x.centroid.distance(centroid)).sum()/float(len(nhood)-1)
    else:
        return nhood['geometry'].apply(lambda x: x.centroid.distance(centroid)).sum()/float(len(nhood))

def calc_bi(centroid, nhoods):
    return np.min([calc_ai(None,centroid,n[1]) for n in nhoods])

# Group data by neighborhood
manhattan_grp = manhattan.groupby('nhood')
#bronx_grp = bronx.groupby('nhood')
#statenisland_grp = statenisland.groupby('nhood')
#brooklyn_queens_grp = brooklyn_queens.groupby('nhood')

silhouette = []
count = 0
# iterate through each drawing in Manhattan and compute silhouette
for row in manhattan.iterrows():
#for row in bronx.iterrows():
#for row in statenisland.iterrows():
#for row in brooklyn_queens.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = manhattan_grp.get_group(row[1]['nhood'])
    #nhood = bronx_grp.get_group(row[1]['nhood'])
    #nhood = statenisland_grp.get_group(row[1]['nhood'])
    #nhood = brooklyn_queens_grp.get_group(row[1]['nhood'])
    
    # calculate ai
    ai = calc_ai(True,row[1]['centroid'],nhood)
    
    # get the neighbourhoods that the candidate drawing is not in.
    nhoods = manhattan[manhattan['nhood'] != row[1]['nhood']].groupby('nhood')
    #nhoods = bronx[bronx['nhood'] != row[1]['nhood']].groupby('nhood')
    #nhoods = statenisland[statenisland['nhood'] != row[1]['nhood']].groupby('nhood')
    #nhoods = brooklyn_queens[brooklyn_queens['nhood'] != row[1]['nhood']].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['centroid'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

manhattan['silhouette'] = silhouette
#bronx['silhouette'] = silhouette
#statenisland['silhouette'] = silhouette
#brooklyn_queens['silhouette'] = silhouette

In [None]:
# Combine pieces
pieces = [manhattan,bronx,brooklyn_queens,statenisland]
nyc_s = pd.concat(pieces)
# geojson can't hold geometry objects other than the main geometry.
del nyc_s['centroid']
nyc_s.to_file('nyc_silhouette.geojson',driver='GeoJSON')

## Calculate Hausdorff Silhouettes

In [None]:
# Calculate Hausdorff silhouettes using some helper functions.
# This is a slow approach because for each drawing it looks at every other drawing.

def HausdorffDist(A,B):
    A = np.array(A.exterior.coords)
    B = np.array(B.exterior.coords)
    D_mat = cdist(A,B)
    dH = np.max(np.array([np.max(np.min(D_mat,axis=0)),np.max(np.min(D_mat,axis=1))]))
    return dH

def calc_ai(within,polygon,nhood):
    # If the drawing is within the neighborhood, divide by n-1, otherwise by n.
    if within:
        return nhood['geometry'].apply(lambda x: HausdorffDist(polygon,x)).sum()/float(len(nhood)-1)
    else:
        return nhood['geometry'].apply(lambda x: HausdorffDist(polygon,x)).sum()/float(len(nhood))

def calc_bi(polygon, nhoods):
    return np.min([calc_ai(None,polygon,n[1]) for n in nhoods])

# Group data by neighborhood
manhattan_grp = manhattan.groupby('nhood')

silhouette = []
count = 0
current_nhood = ''
# iterate through each drawing in Manhattan and compute silhouette
for row in manhattan.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = manhattan_grp.get_group(row[1]['nhood'])
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    # Create switch to avoid doing this bit too much (do it ~250 times not ~40,000!)
    if row[1]['nhood'] != current_nhood:
        # get the neighbourhoods that the candidate drawing is not in.
        nhoods = manhattan[manhattan['nhood'] != row[1]['nhood']].groupby('nhood') 
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1
manhattan['hausdorff_silhouette'] = silhouette

del manhattan_grp

bronx_grp = bronx.groupby('nhood')

silhouette = []
count = 0
current_nhood = ''
# iterate through each drawing in Manhattan and compute silhouette
for row in bronx.iterrows():
   # get the neighborhood that the candidate drawing is in. 
    nhood = bronx_grp.get_group(row[1]['nhood'])  
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    # Create switch to avoid doing this bit too much (do it ~250 times not ~40,000!)
    if row[1]['nhood'] != current_nhood:
        # get the neighbourhoods that the candidate drawing is not in.
        nhoods = bronx[bronx['nhood'] != row[1]['nhood']].groupby('nhood')
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

bronx['hausdorff_silhouette'] = silhouette

del bronx_grp

statenisland_grp = statenisland.groupby('nhood')

silhouette = []
count = 0
current_nhood = ''
# iterate through each drawing in Manhattan and compute silhouette
for row in statenisland.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = statenisland_grp.get_group(row[1]['nhood'])
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    # Create switch to avoid doing this bit too much (do it ~250 times not ~40,000!)
    if row[1]['nhood'] != current_nhood:
        # get the neighbourhoods that the candidate drawing is not in.
        nhoods = statenisland[statenisland['nhood'] != row[1]['nhood']].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

statenisland['hausdorff_silhouette'] = silhouette

del statenisland_grp

brooklyn_queens_grp = brooklyn_queens.groupby('nhood')

silhouette = []
count = 0
current_nhood = ''
# iterate through each drawing in Brooklyn and Queens and compute silhouette
for row in brooklyn_queens.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = brooklyn_queens_grp.get_group(row[1]['nhood'])
    
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    # Create switch to avoid doing this bit too much (do it ~250 times not ~40,000!)
    if row[1]['nhood'] != current_nhood:
        # get the neighbourhoods that the candidate drawing is not in.
        nhoods = brooklyn_queens[brooklyn_queens['nhood'] != row[1]['nhood']].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

brooklyn_queens['hausdorff_silhouette'] = silhouette

del brooklyn_queens_grp

# Combine pieces and save to file
pieces = [manhattan,bronx,brooklyn_queens,statenisland]
nyc_s = pd.concat(pieces)
del nyc_s['centroid']
nyc_s.to_file('nyc_Hsilhouette.geojson',driver='GeoJSON')

## Calculate Fréchet Silhouettes

In [None]:
# Calculate Discrete Frechet silhouettes using some helper functions.
# Using: https://gist.github.com/MaxBareiss/ba2f9441d9455b56fbc9
# This approach uses a KDTree in order to restrict the number of comparisons made in order to reduce computation time.
# Testing with Bronx suggested that k = 10 produces the same output as a brute force approach, here we conservatively use k=20.

def _c(ca,i,j,P,Q):
    if ca[i,j] > -1:
        return ca[i,j]
    elif i == 0 and j == 0:
        ca[i,j] = np.linalg.norm(P[0]-Q[0])
    elif i > 0 and j == 0:
        ca[i,j] = max(_c(ca,i-1,0,P,Q),np.linalg.norm(P[i]-Q[0]))
    elif i == 0 and j > 0:
        ca[i,j] = max(_c(ca,0,j-1,P,Q),np.linalg.norm(P[0]-Q[j]))
    elif i > 0 and j > 0:
        ca[i,j] = max(min(_c(ca,i-1,j,P,Q),_c(ca,i-1,j-1,P,Q),_c(ca,i,j-1,P,Q)),np.linalg.norm(P[i]-Q[j]))
    else:
        ca[i,j] = float("inf")
    return ca[i,j]

def frechetDist(P,Q):
    P = np.array(P.exterior.coords)
    Q = np.array(Q.exterior.coords)
    ca = np.ones((len(P),len(Q)))
    ca = np.multiply(ca,-1)
    return _c(ca,len(P)-1,len(Q)-1,P,Q)

def calc_ai(within,polygon,nhood):
    # If the drawing is within the neighborhood, divie by n-1, otherwise by n.
    if within:
        return nhood['geometry'].apply(lambda x: frechetDist(polygon,x)).sum()/float(len(nhood)-1)
    else:
        return nhood['geometry'].apply(lambda x: frechetDist(polygon,x)).sum()/float(len(nhood))

def calc_bi(polygon, nhoods):
    return np.min([calc_ai(None,polygon,n[1]) for n in nhoods])

# Group data by neighborhood
manhattan_grp = manhattan.groupby('nhood')

points = np.array(list(n_centroids[n_centroids['BoroName']=='Manhattan']['geometry'].apply(lambda x: [x.x,x.y])))
tree = cKDTree(points)

silhouette = []
count = 0
# iterate through each drawing in Manhattan and compute silhouette
for row in manhattan.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = manhattan_grp.get_group(row[1]['nhood'])
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    # Get nearest neighbor - dd is distance, ii is index
    dd, ii = tree.query(np.array(row[1]['geometry'].centroid), k=15)
    
    # Lookup neighborhood, returns array with neighborhood name, assuming field called 'neighborhood'.
    nh = n_centroids[n_centroids['BoroName']=='Manhattan'].reset_index().ix[ii]['neighborhood'].values
    
    # Remove neighborhood from neighbor candidates
    nh = nh[nh != row[1]['nhood']]
    
    # Make candidate neighborhoods
    nhoods = manhattan[manhattan['nhood'].isin(nh)].groupby('nhood')
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1
manhattan['frechet_silhouette'] = silhouette

del manhattan_grp

points = np.array(list(n_centroids[n_centroids['BoroName']=='Bronx']['geometry'].apply(lambda x: [x.x,x.y])))
bronx_grp = bronx.groupby('nhood')

tree = cKDTree(points)

silhouette = []
count = 0
# iterate through each drawing in Manhattan and compute silhouette
for row in bronx.iterrows():
   # get the neighborhood that the candidate drawing is in. 
    nhood = bronx_grp.get_group(row[1]['nhood'])  
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    
    # Get nearest neighbor - dd is distance, ii is index
    dd, ii = tree.query(np.array(row[1]['geometry'].centroid), k=15)
    
    # Lookup neighborhood, returns array with neighborhood name, assuming field called 'neighborhood'.
    nh = n_centroids[n_centroids['BoroName']=='Bronx'].reset_index().ix[ii]['neighborhood'].values
    
    # Remove neighborhood from neighbor candidates
    nh = nh[nh != row[1]['nhood']]
    
    # Make candidate neighborhoods
    nhoods = bronx[bronx['nhood'].isin(nh)].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

bronx['frechet_silhouette'] = silhouette

del bronx_grp
points = np.array(list(n_centroids[n_centroids['BoroName']=='Staten Island']['geometry'].apply(lambda x: [x.x,x.y])))
statenisland_grp = statenisland.groupby('nhood')

tree = cKDTree(points)

silhouette = []
count = 0
# iterate through each drawing in Manhattan and compute silhouette
for row in statenisland.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = statenisland_grp.get_group(row[1]['nhood'])
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    
    # Get nearest neighbor - dd is distance, ii is index
    dd, ii = tree.query(np.array(row[1]['geometry'].centroid), k=15)
    
    # Lookup neighborhood, returns array with neighborhood name, assuming field called 'neighborhood'.
    nh = n_centroids[n_centroids['BoroName']=='Staten Island'].reset_index().ix[ii]['neighborhood'].values
    
    # Remove neighborhood from neighbor candidates
    nh = nh[nh != row[1]['nhood']]
    
    # Make candidate neighborhoods
    nhoods = statenisland[statenisland['nhood'].isin(nh)].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

statenisland['frechet_silhouette'] = silhouette

del statenisland_grp

brooklyn_queens_grp = brooklyn_queens.groupby('nhood')

points = np.array(list(n_centroids[n_centroids['BoroName'].isin(['Brooklyn','Queens'])]['geometry'].apply(lambda x: [x.x,x.y])))

tree = cKDTree(points)

silhouette = []
count = 0
# iterate through each drawing in Manhattan and compute silhouette
for row in brooklyn_queens.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = brooklyn_queens_grp.get_group(row[1]['nhood'])
    
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    
    # Get nearest neighbor - dd is distance, ii is index
    dd, ii = tree.query(np.array(row[1]['geometry'].centroid), k=15)
    
    # Lookup neighborhood, returns array with neighborhood name, assuming field called 'neighborhood'.
    nh = n_centroids[n_centroids['BoroName'].isin(['Brooklyn','Queens'])].reset_index().ix[ii]['neighborhood'].values
    
    # Remove neighborhood from neighbor candidates
    nh = nh[nh != row[1]['nhood']]
    
    # Make candidate neighborhoods
    nhoods = brooklyn_queens[brooklyn_queens['nhood'].isin(nh)].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

brooklyn_queens['frechet_silhouette'] = silhouette

del brooklyn_queens_grp

pieces = [manhattan,bronx,statenisland,brooklyn_queens]
nyc_s = pd.concat(pieces)
del nyc_s['centroid']
nyc_s.to_file('nyc_Fsilhouette.geojson',driver='GeoJSON')

## Calculate Jacquard Silhouettes

In [None]:
# This is a simple function, but computationally heavy so using the kdtree heuristic again.

def jacquardDist(P,Q):
    union = P.union(Q).area
    intersection = P.intersection(Q).area
    return (union-intersection)/union

def calc_ai(within,polygon,nhood):
    # If the drawing is within the neighborhood, divie by n-1, otherwise by n.
    if within:
        return nhood['geometry'].apply(lambda x: jacquardDist(polygon,x)).sum()/float(len(nhood)-1)
    else:
        return nhood['geometry'].apply(lambda x: jacquardDist(polygon,x)).sum()/float(len(nhood))

def calc_bi(polygon, nhoods):
    return np.min([calc_ai(None,polygon,n[1]) for n in nhoods])

# Group data by neighborhood
manhattan_grp = manhattan.groupby('nhood')
#bronx_grp = bronx.groupby('nhood')
#statenisland_grp = statenisland.groupby('nhood')
#brooklyn_queens_grp = brooklyn_queens.groupby('nhood')

# Get points from a geopandas dataframe (needs some coercing)
points = np.array(list(n_centroids[n_centroids['BoroName']=='Manhattan']['geometry'].apply(lambda x: [x.x,x.y])))
#points = np.array(list(n_centroids[n_centroids['BoroName']=='Bronx']['geometry'].apply(lambda x: [x.x,x.y])))
#points = np.array(list(n_centroids[n_centroids['BoroName']=='Staten Island']['geometry'].apply(lambda x: [x.x,x.y])))
#points = np.array(list(n_centroids[n_centroids['BoroName'].isin(['Brooklyn','Queens'])]['geometry'].apply(lambda x: [x.x,x.y])))

# Make a tree
tree = cKDTree(points)

silhouette = []
count = 0
# iterate through each drawing in Manhattan and compute silhouette
for row in manhattan.iterrows():
#for row in bronx.iterrows():
#for row in statenisland.iterrows():
#for row in brooklyn_queens.iterrows():
    # get the neighborhood that the candidate drawing is in. 
    nhood = manhattan_grp.get_group(row[1]['nhood'])
    #nhood = bronx_grp.get_group(row[1]['nhood'])
    #nhood = statenisland_grp.get_group(row[1]['nhood'])
    #nhood = brooklyn_queens_grp.get_group(row[1]['nhood'])
    
    # calculate ai
    ai = calc_ai(True,row[1]['geometry'],nhood)
    
    # Get nearest neighbor - dd is distance, ii is index
    dd, ii = tree.query(np.array(row[1]['geometry'].centroid), k=15)
    
    # Lookup neighborhood, returns array with neighborhood name, assuming field called 'neighborhood'.
    nh = n_centroids[n_centroids['BoroName']=='Manhattan'].reset_index().ix[ii]['neighborhood'].values
    #nh = n_centroids[n_centroids['BoroName']=='Bronx'].reset_index().ix[ii]['neighborhood'].values
    
    # Remove neighborhood from neighbor candidates
    nh = nh[nh != row[1]['nhood']]
    
    # Make candidate neighborhoods
    nhoods = manhattan[manhattan['nhood'].isin(nh)].groupby('nhood')
    #nhoods = bronx[bronx['nhood'].isin(nh)].groupby('nhood')
    
    # calculate bi
    bi = calc_bi(row[1]['geometry'],nhoods)
    # Calculate si
    si = (bi-ai)/max(ai,bi)
    silhouette.append(si)
    if count %100 == 0:
        print count
    count += 1

manhattan['jacquard_silhouette'] = silhouette

## Combine Silhouette Measures for Analysis

In [None]:
# Read in each dataset
centroid_data = r'C:\Users\djl543\OneDrive\Draw-Your-Neighborhood-master\nyc_silhouette.geojson'
centroid = gpd.read_file(centroid_data)

hausdorff_data = r'C:\Users\djl543\OneDrive\Draw-Your-Neighborhood-master\nyc_Hsilhouette.geojson'
hausdorff = gpd.read_file(hausdorff_data)

frechet_data = r'C:\Users\djl543\OneDrive\Draw-Your-Neighborhood-master\nyc_Fsilhouette.geojson'
frechet = gpd.read_file(frechet_data)

jacquard_data = r'C:\Users\djl543\OneDrive\Draw-Your-Neighborhood-master\nyc_Jsilhouette.geojson'
jacquard = gpd.read_file(jacquard_data)

In [None]:
# Merge all dataframes to the centroid dataframe
centroid = centroid.merge(hausdorff, on='shapeID')
centroid = centroid.merge(frechet, on='shapeID')
centroid = centroid.merge(jacquard, on='shapeID')