# Portland, OR Neighborhood Clustering for Remote Employees

### This Notebook is used for the Capstone Project for the IBM Data Science Professional Certificate.
### Author: Skyler Schilke


### Introduction
Portland, OR is a very popular city among many different demographics.  It has friendly people, great food, good shops, and the list goes on.  

Within the city of Portland, there are 94 neighborhoods.  Choosing which neighborhood to live in can be a daunting decision to someone who would like to move to Portland or to a different neighborhood in Portland.  

For this project, I will be grouping and ranking Portland's neighborhoods based on categories that are important to a remote employee who works from home.  These categories will be walking score, biking score, number of bars, and number of coffee shops.  

### Data Requirements
In order to group all these neighborhoods, I will need the following data from the following sources:
1. List of neighborhoods from Portland Oregon Gov website
2. Latitude and Longitude of the center of each neighborhood from Google Maps Geocode API
3. Walking and Biking score from Walk Score API
4. Number of Bars and Coffee shops from Foursquare API

In [14]:
# !conda install -c conda-forge geopy --yes # uncomment this line if geopy is downloaded
# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if folium is downloaded

In [15]:
# import list of neighborhoods
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium
import requests
from sklearn import preprocessing
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [3]:
# The code was removed by Watson Studio for sharing.

(98, 10)


Unnamed: 0,OBJECTID,NAME,COMMPLAN,SHARED,COALIT,HORZ_VERT,MAPLABEL,ID,Shape_Length,Shape_Area
0,1,CATHEDRAL PARK,,,NPNS,HORZ,Cathedral Park,31,11434.254777,5424298.0
1,2,UNIVERSITY PARK,,,NPNS,HORZ,University Park,88,11950.859827,6981457.0
2,3,PIEDMONT,ALBINA,,NPNS,VERT,Piedmont,70,10849.327392,6079530.0
3,4,WOODLAWN,ALBINA,,NECN,HORZ,Woodlawn,93,8078.360994,3870554.0
4,5,CULLY ASSOCIATION OF NEIGHBORS,,,CNN,HORZ,Cully Association of Neighbors,23,18179.39209,16580620.0


In [4]:
# filter out the 'unclaimed' neighborhoods in this list
df_neigh = df_neigh[~df_neigh.NAME.str.contains("UNCLAIMED")]
df_neigh = df_neigh[['NAME']]
print(df_neigh.shape)
df_neigh.head()

(94, 1)


Unnamed: 0,NAME
0,CATHEDRAL PARK
1,UNIVERSITY PARK
2,PIEDMONT
3,WOODLAWN
4,CULLY ASSOCIATION OF NEIGHBORS


In [5]:
# change words to title format
df_neigh = df_neigh.rename(columns={'NAME':'Name'})
df_neigh['Name'] = df_neigh['Name'].str.title()
df_neigh.head()

Unnamed: 0,Name
0,Cathedral Park
1,University Park
2,Piedmont
3,Woodlawn
4,Cully Association Of Neighbors


In [6]:
# # fix data to represent areas instead of associations
# df_neigh = df_neigh.replace(' Association Of Neighbors', '', regex=True)
# df_neigh = df_neigh.replace(' Community Association', '', regex=True)
# df_neigh = df_neigh.replace(' Community Group', '', regex=True)
# df_neigh = df_neigh.replace(' Association', '', regex=True)
# df_neigh = df_neigh.replace(' Neighborhood Network', '', regex=True)
# df_neigh = df_neigh.replace(' Neighborhood District Assn.', '', regex=True)
# df_neigh = df_neigh.replace(' League', '', regex=True)
# df_neigh = df_neigh.replace(' Improvement', '', regex=True)
# df_neigh = df_neigh.replace(' Residential', '', regex=True)
# df_neigh.head()

In [7]:
# this replaces the spaces with + for the url
df_neigh_plus = df_neigh.replace(' ', '+', regex=True)

In [8]:
# The code was removed by Watson Studio for sharing.

In [9]:
# add a blank column for lat and long to df_neigh
df_neigh['Latitude'] = None
df_neigh['Longitude'] = None

# create a loop that goes through df_neigh and appends the latitude and longitude
# search for df_neigh['Name'][i] + ' Neighborhood, Portland, OR'
# if that raises an error, search for df_neigh['Name'][i] + ', Portland, OR'
# if that raises an error, set them = None
for i in range(0, len(df_neigh), 1):
    try:
        name = df_neigh_plus['Name'][i]
        request_url = 'https://maps.googleapis.com/maps/api/geocode/json?address=' + name + '+,+Portland,+OR&key=' + api_key
        request = pd.read_json(request_url)
        lat = request['results'][0]['geometry']['location']['lat']
        lng = request['results'][0]['geometry']['location']['lng']
        df_neigh.iat[i, df_neigh.columns.get_loc('Latitude')] = lat
        df_neigh.iat[i, df_neigh.columns.get_loc('Longitude')] = lng
    except:
        try:
            request_url = 'https://maps.googleapis.com/maps/api/geocode/json?address=' + name + '+Neighborhood,+Portland,+OR&key=' + api_key
            request = pd.read_json(request_url)
            lat = request['results'][0]['geometry']['location']['lat']
            lng = request['results'][0]['geometry']['location']['lng']
            df_neigh.iat[i, df_neigh.columns.get_loc('Latitude')] = lat
            df_neigh.iat[i, df_neigh.columns.get_loc('Longitude')] = lng
        except:
            lat = None
            lng = None
    
print('Longitude and Lat loaded!')

Longitude and Lat loaded!


In [10]:
# use geopy library to get lat and long of Portland
city = 'Portland, OR'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Portland, OR are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Portland, OR are 45.5202471, -122.6741949.


In [16]:
# create a map of Portland with neighborhoods marked
# create map of Portland using latitude and longitude values
map_portland = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neigh in zip(df_neigh['Latitude'], df_neigh['Longitude'], df_neigh['Name']):
    label = '{}'.format(neigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_portland)  
map_portland

# neighborhoods do not match correct lat and long.  We will proceed based on area

In [17]:
# The code was removed by Watson Studio for sharing.

In [18]:
# now we will use Foursquare API to generate nearby bars and coffee houses
# explore neighborhoods in manhattan
# create function to repeat the same process to all neighborhoods

LIMIT = 100
radius = 800 # half mile radius, since all markers are roughly a mile a part
def getNearbyVenues(query, names, latitudes, longitudes, radius=radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&query={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            query,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)



In [19]:
# get coffee shops
df_neigh_coffee = getNearbyVenues(query='coffee', names=df_neigh['Name'], latitudes=df_neigh['Latitude'], longitudes=df_neigh['Longitude'])

# check the shape of the df
print(df_neigh_coffee.shape)
df_neigh_coffee.head()

Cathedral Park
University Park
Piedmont
Woodlawn
Cully Association Of Neighbors
Arbor Lodge
Overlook
Concordia
Parkrose
Sumner Association Of Neighbors
Humboldt
King
Vernon
Northwest Heights
Rose City Park
Parkrose Heights Association Of Neighbors
Russell
Pearl District
Woodland Park
Montavilla
Laurelhurst
Kerns
North Tabor
Old Town Community Association
Glenfair
Buckman Community Association
Portland Downtown
Mt. Tabor
Sunnyside
South Portland
Homestead
South Tabor
Brooklyn Action Corps
Creston-Kenilworth
Hillsdale
Foster-Powell
Mt. Scott-Arleta
Sellwood-Moreland Improvement League
Hayhurst
Maplewood
Multnomah
Brentwood-Darlington
South Burlingame
Markham
Marshall Park
Collins View
West Portland Park
Arnold Creek
Far Southwest
Linnton
Sylvan-Highlands
Arlington Heights
Southwest Hills Residential League
Ashcreek
Crestwood
Ardenwald-Johnson Creek
Woodstock
Eastmoreland
Reed
Pleasant Valley
Centennial Community Association
Powellhurst-Gilbert
Lents
Hazelwood
Mill Park
Wilkes Community G

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cathedral Park,45.58749,-122.762478,Two Stroke Coffee,45.59168,-122.756629,Coffee Shop
1,Cathedral Park,45.58749,-122.762478,The Great North,45.590399,-122.754684,Coffee Shop
2,Cathedral Park,45.58749,-122.762478,Anna Bannanas,45.590769,-122.755616,Coffee Shop
3,Cathedral Park,45.58749,-122.762478,Affogato,45.590794,-122.755594,Coffee Shop
4,Cathedral Park,45.58749,-122.762478,St Johns Coffee Roasters,45.589342,-122.754122,Coffee Shop


In [20]:
# get bars
df_neigh_bars = getNearbyVenues(query='bars', names=df_neigh['Name'], latitudes=df_neigh['Latitude'], longitudes=df_neigh['Longitude'])

# check the shape of the df
print(df_neigh_bars.shape)
df_neigh_bars.head()

Cathedral Park
University Park
Piedmont
Woodlawn
Cully Association Of Neighbors
Arbor Lodge
Overlook
Concordia
Parkrose
Sumner Association Of Neighbors
Humboldt
King
Vernon
Northwest Heights
Rose City Park
Parkrose Heights Association Of Neighbors
Russell
Pearl District
Woodland Park
Montavilla
Laurelhurst
Kerns
North Tabor
Old Town Community Association
Glenfair
Buckman Community Association
Portland Downtown
Mt. Tabor
Sunnyside
South Portland
Homestead
South Tabor
Brooklyn Action Corps
Creston-Kenilworth
Hillsdale
Foster-Powell
Mt. Scott-Arleta
Sellwood-Moreland Improvement League
Hayhurst
Maplewood
Multnomah
Brentwood-Darlington
South Burlingame
Markham
Marshall Park
Collins View
West Portland Park
Arnold Creek
Far Southwest
Linnton
Sylvan-Highlands
Arlington Heights
Southwest Hills Residential League
Ashcreek
Crestwood
Ardenwald-Johnson Creek
Woodstock
Eastmoreland
Reed
Pleasant Valley
Centennial Community Association
Powellhurst-Gilbert
Lents
Hazelwood
Mill Park
Wilkes Community G

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cathedral Park,45.58749,-122.762478,Your Inn,45.593823,-122.761859,Bar
1,Cathedral Park,45.58749,-122.762478,Slim's,45.590471,-122.754992,Dive Bar
2,Cathedral Park,45.58749,-122.762478,Occidental Wursthaus,45.588864,-122.761344,German Restaurant
3,Cathedral Park,45.58749,-122.762478,Hoplandia Beer,45.589662,-122.755614,Beer Store
4,Cathedral Park,45.58749,-122.762478,StormBreaker St. Johns,45.589758,-122.753088,Brewery


In [22]:
# group and merge the counts into the df
df_neigh_bars_count = df_neigh_bars.groupby(['Neighborhood']).size().reset_index(name='BarsCount')
df_neigh_bars_count = df_neigh_bars_count.rename(columns={'Neighborhood':'Name'})

df_neigh = pd.merge(df_neigh, df_neigh_bars_count,
               on=['Name'],
               how="left").fillna({'Count': 0})

# remove BarsCount_x column
df_neigh = df_neigh.drop(columns='BarsCount_x')
# change column name of BarsCount_y to BarsCount
df_neigh = df_neigh.rename(columns={'BarsCount_y':'BarsCount'})
# turn all NaN's in BarsCount to 0
df_neigh['BarsCount'] = df_neigh['BarsCount'].fillna(0)
# convert to integer
df_neigh['BarsCount'] = df_neigh['BarsCount'].astype(float)
df_neigh.head()

Unnamed: 0,Name,Latitude,Longitude,BarsCount
0,Cathedral Park,45.5875,-122.762,11.0
1,University Park,45.5785,-122.732,4.0
2,Piedmont,45.5651,-122.668,10.0
3,Woodlawn,45.5697,-122.652,6.0
4,Cully Association Of Neighbors,45.5547,-122.574,4.0


In [24]:
# group and merge the counts into the df
df_neigh_coffee_count = df_neigh_coffee.groupby(['Neighborhood']).size().reset_index(name='CoffeeCount')
df_neigh_coffee_count = df_neigh_coffee_count.rename(columns={'Neighborhood':'Name'})

df_neigh = pd.merge(df_neigh, df_neigh_coffee_count,
               on=['Name'],
               how="left").fillna({'Count': 0})

# remove CoffeeCount_x column
df_neigh = df_neigh.drop(columns='CoffeeCount_x')
# change column name of CoffeeCount_y to CoffeeCount
df_neigh = df_neigh.rename(columns={'CoffeeCount_y':'CoffeeCount'})
# turn all NaN's in CoffeeCount to 0
df_neigh['CoffeeCount'] = df_neigh['CoffeeCount'].fillna(0)
# convert to integer
df_neigh['CoffeeCount'] = df_neigh['CoffeeCount'].astype(float)
df_neigh.head()

Unnamed: 0,Name,Latitude,Longitude,BarsCount,CoffeeCount
0,Cathedral Park,45.5875,-122.762,11.0,9.0
1,University Park,45.5785,-122.732,4.0,3.0
2,Piedmont,45.5651,-122.668,10.0,9.0
3,Woodlawn,45.5697,-122.652,6.0,4.0
4,Cully Association Of Neighbors,45.5547,-122.574,4.0,1.0


In [25]:
# The code was removed by Watson Studio for sharing.

In [26]:
# use the Walkscore API to extract walkscore and bikescore based on coordinates
# add a blank column for walkscore and bikescore to df_neigh
df_neigh['Walkscore'] = None
df_neigh['Bikescore'] = None

# create a loop that goes through df_neigh and appends the walkscore and bikescore
# search for df_neigh['Name'][i] + ' Neighborhood, Portland, OR'
# if that raises an error, search for df_neigh['Name'][i] + ', Portland, OR'
# if that raises an error, set them = None
for i in range(0, len(df_neigh), 1):
    try:
        lat = df_neigh['Latitude'][i]
        lng = df_neigh['Longitude'][i]
        request_url = 'http://api.walkscore.com/score?format=json&lat=' + str(lat) + '&lon=' + str(lng) + '&transit=1&bike=1&wsapikey=' + walkscore_api
        request = pd.read_json(request_url)
        walk = request['walkscore']['score']
        df_neigh.iat[i, df_neigh.columns.get_loc('Walkscore')] = walk
        bike = request['bike']['score']
        df_neigh.iat[i, df_neigh.columns.get_loc('Bikescore')] = bike
    except:
        bike = None
        walk = None
    
print('Walkscore and Bikescore loaded!')

Walkscore and Bikescore loaded!


In [27]:
# change walkscore and bikescore to floats
df_neigh['Walkscore'] = df_neigh['Walkscore'].astype(float)
df_neigh['Bikescore'] = df_neigh['Bikescore'].astype(float)

## All data is now loaded.  Now we will normalize the data, cluster it, and rank it

In [28]:
# drop columns to get only the metrics
df_neigh_cluster = df_neigh.drop('Name', 1)
df_neigh_cluster = df_neigh_cluster.drop('Latitude', 1)
df_neigh_cluster = df_neigh_cluster.drop('Longitude', 1)
df_neigh_cluster.head()

Unnamed: 0,BarsCount,CoffeeCount,Walkscore,Bikescore
0,11.0,9.0,49.0,86.0
1,4.0,3.0,65.0,95.0
2,10.0,9.0,81.0,100.0
3,6.0,4.0,76.0,95.0
4,4.0,1.0,62.0,74.0


In [29]:
# normalize columns
x = df_neigh_cluster.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_neigh_normal = pd.DataFrame(x_scaled)
df_neigh_normal.columns = df_neigh_cluster.columns

In [30]:
# set number of clusters
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_neigh_normal)

# insert the kmeans array to df_neigh
df_neigh.insert(0, 'ClusterLabels', kmeans.labels_)

In [31]:
# add an equally weighted 'overall score' column based on the average of the normalized column and sort by that
df_neigh['Score'] = (df_neigh_normal['BarsCount'] + df_neigh_normal['CoffeeCount'] + df_neigh_normal['Walkscore'] + df_neigh_normal['Bikescore']) / 4
# reorder columns and SORT from highest score to lowest
df_neigh = df_neigh[['Name', 'Score', 'ClusterLabels', 'Latitude', 'Longitude', 'BarsCount', 'CoffeeCount', 'Walkscore', 'Bikescore']]
df_neigh = df_neigh.sort_values(by=['Score'], ascending=False)
df_neigh = df_neigh.reset_index(drop=True)

In [32]:
# view top 20 neighborhoods
df_neigh.head(20)

Unnamed: 0,Name,Score,ClusterLabels,Latitude,Longitude,BarsCount,CoffeeCount,Walkscore,Bikescore
0,Mt. Tabor,0.91056,4,45.5135,-122.68,57.0,81.0,98.0,87.0
1,Glenfair,0.883567,4,45.5247,-122.674,72.0,47.0,98.0,96.0
2,Woodland Park,0.77089,4,45.5302,-122.681,50.0,37.0,97.0,95.0
3,Lloyd District Community Association,0.666556,3,45.5515,-122.672,38.0,17.0,91.0,100.0
4,Portland Downtown,0.663257,3,45.5134,-122.627,35.0,16.0,95.0,100.0
5,Portsmouth,0.659353,3,45.5343,-122.698,26.0,26.0,97.0,97.0
6,South Portland,0.638629,3,45.5161,-122.627,31.0,15.0,92.0,100.0
7,North Tabor,0.621182,3,45.5266,-122.644,28.0,12.0,94.0,99.0
8,Hosford-Abernethy Neighborhood District Assn.,0.617979,3,45.5312,-122.658,20.0,24.0,97.0,92.0
9,Healy Heights,0.605813,3,45.559,-122.65,24.0,14.0,91.0,99.0


In [33]:
# use geopy library to get lat and long of Portland
city = 'Portland, OR'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Portland, OR are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Portland, OR are 45.5202471, -122.6741949.


In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_neigh['Latitude'], df_neigh['Longitude'], df_neigh['Name'], df_neigh['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [35]:
# now that all areas are clustered, you can examine each cluster and determine the discriminating venue categories
# cluster 0
df_neigh.loc[df_neigh['ClusterLabels'] == 0, df_neigh.columns[[0] + list(range(5, df_neigh.shape[1]))]]

Unnamed: 0,Name,BarsCount,CoffeeCount,Walkscore,Bikescore
75,Woodstock,1.0,4.0,33.0,57.0
76,Multnomah,0.0,1.0,31.0,63.0
77,Arnold Creek,0.0,0.0,28.0,61.0
78,Eastmoreland,0.0,1.0,20.0,62.0
79,Ashcreek,1.0,0.0,26.0,52.0
80,Southwest Hills Residential League,1.0,0.0,26.0,52.0
81,Sylvan-Highlands,1.0,2.0,21.0,48.0
82,Sullivan'S Gulch,0.0,1.0,22.0,46.0
83,Ardenwald-Johnson Creek,0.0,1.0,14.0,48.0
84,Arlington Heights,0.0,2.0,23.0,36.0


In [36]:
# cluster 1
df_neigh.loc[df_neigh['ClusterLabels'] == 1, df_neigh.columns[[0] + list(range(5, df_neigh.shape[1]))]]

Unnamed: 0,Name,BarsCount,CoffeeCount,Walkscore,Bikescore
16,Boise,8.0,13.0,93.0,96.0
17,Humboldt,14.0,5.0,89.0,100.0
18,King,13.0,8.0,90.0,97.0
19,Hayhurst,9.0,7.0,92.0,97.0
20,Piedmont,10.0,9.0,81.0,100.0
21,Bridlemile,14.0,7.0,79.0,97.0
22,Hillside,9.0,10.0,78.0,99.0
23,Laurelhurst,11.0,4.0,87.0,92.0
24,Centennial Community Association,8.0,5.0,83.0,97.0
25,Mt. Scott-Arleta,11.0,6.0,75.0,98.0


In [37]:
# cluster 2
df_neigh.loc[df_neigh['ClusterLabels'] == 2, df_neigh.columns[[0] + list(range(5, df_neigh.shape[1]))]]

Unnamed: 0,Name,BarsCount,CoffeeCount,Walkscore,Bikescore
45,Rose City Park,6.0,4.0,66.0,83.0
46,Cathedral Park,11.0,9.0,49.0,86.0
47,Alameda,4.0,1.0,69.0,82.0
48,Brentwood-Darlington,5.0,7.0,69.0,73.0
49,Homestead,5.0,5.0,70.0,71.0
50,Parkrose,6.0,1.0,64.0,79.0
51,Pleasant Valley,0.0,2.0,56.0,92.0
52,Montavilla,4.0,2.0,64.0,77.0
53,Foster-Powell,4.0,4.0,67.0,71.0
54,Powellhurst-Gilbert,1.0,3.0,55.0,84.0


In [38]:
# cluster 3
df_neigh.loc[df_neigh['ClusterLabels'] == 3, df_neigh.columns[[0] + list(range(5, df_neigh.shape[1]))]]

Unnamed: 0,Name,BarsCount,CoffeeCount,Walkscore,Bikescore
3,Lloyd District Community Association,38.0,17.0,91.0,100.0
4,Portland Downtown,35.0,16.0,95.0,100.0
5,Portsmouth,26.0,26.0,97.0,97.0
6,South Portland,31.0,15.0,92.0,100.0
7,North Tabor,28.0,12.0,94.0,99.0
8,Hosford-Abernethy Neighborhood District Assn.,20.0,24.0,97.0,92.0
9,Healy Heights,24.0,14.0,91.0,99.0
10,Northwest District Association,27.0,13.0,87.0,100.0
11,Vernon,25.0,14.0,87.0,98.0
12,Richmond,25.0,21.0,91.0,84.0


In [39]:
# cluster 4
df_neigh.loc[df_neigh['ClusterLabels'] == 4, df_neigh.columns[[0] + list(range(5, df_neigh.shape[1]))]]

Unnamed: 0,Name,BarsCount,CoffeeCount,Walkscore,Bikescore
0,Mt. Tabor,57.0,81.0,98.0,87.0
1,Glenfair,72.0,47.0,98.0,96.0
2,Woodland Park,50.0,37.0,97.0,95.0
