# Neighborhood Analysis of Denver, Colorado
### Project by Megan Harrison

## Introduction
I was born and raised in Denver, but have lived out of the country for the last 5 years. In the next 6-8 months, I will be moving back to Denver, but do not know where to live, as the landscape of the metropolitan area has changed much due to the city's considerable growth.  

In this notebook, I will analyze the neighborhoods in and around Denver, Colorado in order to find the best place to live. I will take into consideration the following factors:

* __Neighborhood characteristics__ 
    * *Outdoor activities* (parks, trails, dog parks, etc.)
    * *Restaurants & bars* (coffee shops, breweries, restaurants)
* __Housing prices__ (to live within our budget)
* __Proximity to downtown and to the airport__ (my husband and I don't want a long commute to work, and we often fly to visit family)

Let's jump into it!

![alt text](https://assets.simpleviewinc.com/simpleview/image/upload/c_fill,h_280,q_50,w_640/v1/clients/denver/larimer-square-colorado-flags-lights_55bb2276-f0ae-5102-065b8b653961bbb5.jpg) 

                           Photo of Colorado flags in Denver's famous Larmier Street. 

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib
import json
import geocoder
from geopy.geocoders import Nominatim 
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
from bs4 import BeautifulSoup
from sklearn import preprocessing
import geopy.distance

## Data

To realize this analysis I will use the following inputs:
* __Zillow__ data for Denver neighborhoods and housing prices
* __Foursquare__ data to understand venues

### Neighborhoods

In [2]:
# Obtain Neighborhoods from Zillow
url = "http://files.zillowstatic.com/research/public/Neighborhood/Neighborhood_Zhvi_AllHomes.csv"

data_neighborhood = pd.read_csv(url)
id_vars=["RegionID", "SizeRank", "RegionName",
         "RegionType", "StateName", "State",
         "City", "Metro", "CountyName"]

data_neighborhood = data_neighborhood.melt(id_vars, var_name="date", value_name="price" )
data_neighborhood["date"] = pd.to_datetime(data_neighborhood["date"])
data_neighborhood.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,date,price
0,274772,0,Northeast Dallas,Neighborhood,TX,TX,Dallas,Dallas-Fort Worth-Arlington,Dallas County,1996-01-31,134517.0
1,112345,1,Maryvale,Neighborhood,AZ,AZ,Phoenix,Phoenix-Mesa-Scottsdale,Maricopa County,1996-01-31,
2,192689,2,Paradise,Neighborhood,NV,NV,Las Vegas,Las Vegas-Henderson-Paradise,Clark County,1996-01-31,139741.0
3,270958,3,Upper West Side,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,1996-01-31,246925.0
4,118208,4,South Los Angeles,Neighborhood,CA,CA,Los Angeles,Los Angeles-Long Beach-Anaheim,Los Angeles County,1996-01-31,134826.0


In [3]:
# Filter to Denver
denver_data = data_neighborhood[(data_neighborhood['City']=='Denver') & (data_neighborhood['date']== '2020-03-31')]

# Neighborhood data
denver_neighborhoods = denver_data.iloc[:,[2,6,5]].reset_index()
denver_neighborhoods = denver_neighborhoods.iloc[:,1:]
denver_neighborhoods.columns = ('Neighborhood','City','State')
denver_neighborhoods.head()

Unnamed: 0,Neighborhood,City,State
0,Gateway - Green Valley Ranch,Denver,CO
1,Montbello,Denver,CO
2,Hampden,Denver,CO
3,Stapleton,Denver,CO
4,Westwood,Denver,CO


In [4]:
# Create empty dataframe
denver_coords= pd.DataFrame(columns=["Neighborhood", "Latitude", "Longitude"])

# Loop through neighorhoods and identify coordinate data
for neighborhood in denver_neighborhoods['Neighborhood']:
    try:
        geolocator = Nominatim(user_agent='denver_explorer')
        location = geolocator.geocode('{}, Denver, Colorado'.format(neighborhood))

        new_row = pd.DataFrame({'Neighborhood':[neighborhood], 
                                'Latitude':[location.latitude], 
                                'Longitude':[location.longitude]})
        denver_coords = pd.concat([denver_coords,new_row])
        
    except AttributeError:
        print('Did not identify coordinates for:', neighborhood)

Did not identify coordinates for: Gateway - Green Valley Ranch
Did not identify coordinates for: Virginia Village
Did not identify coordinates for: Washington Virginia Vale
Did not identify coordinates for: Lowry Field
Did not identify coordinates for: Bear Valley
Did not identify coordinates for: College View - South Platte
Did not identify coordinates for: Cory - Merrill


In [5]:
# Add coordinates that geocoder did not identify
manual_coords = pd.DataFrame(np.array([['Gateway',39.783274, -104.791926],
                                       ['Green Valley Ranch',39.782727, -104.754307],
                                       ['Virginia Village',39.687734, -104.926123],
                                       ['Washington Virginia Vale',39.702988, -104.914266],
                                       ['Lowry Field',39.719706, -104.891184],
                                       ['Bear Valley',39.660655, -105.067345],
                                       ['College View',39.670535, -105.020746],
                                       ['South Platte',39.673043, -105.003487],
                                       ['Cory - Merrill',39.691222, -104.950056]]),
                            columns=["Neighborhood", "Latitude", "Longitude"])

denver_coords = pd.concat([denver_coords,manual_coords])
denver_coords.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Montbello,39.7842,-104.831
0,Hampden,39.6575,-104.873
0,Stapleton,39.7901,-104.984
0,Westwood,39.7042,-105.039
0,Five Points,39.7547,-104.978


In [6]:
# Initiate map for plotting Denver's neighborhoods
address = 'Denver, Colorado'

geolocator = Nominatim(user_agent = 'denver_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_denver = folium.Map(location=[latitude, longitude], zoom_start=5)

In [7]:
# Create a map of Denver's neighborhoods
for lat, lng, neighborhood in zip(denver_coords['Latitude'], 
                                  denver_coords['Longitude'], 
                                  denver_coords['Neighborhood']):
    label = neighborhood
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_denver)  
    
map_denver

We see that some of the coordinate data was not identified correctly, so we need to fix this data

In [8]:
# Remove rows of neighborhoods with faulty coordinates
to_drop = ['Country Club','Speer','University Park','Indian Creek']
denver_coords = denver_coords[~denver_coords['Neighborhood'].isin(to_drop)]
denver_coords.sort_values(['Neighborhood'])

# Add correct coordinates manually
manual_coords2 = pd.DataFrame(np.array([['Country Club',39.719574, -104.966665],
                                        ['Speer',39.717973, -104.979178],
                                        ['University Park',39.674270, -104.950838],
                                        ['Indian Creek',39.685663, -104.896827]]),
                              columns=["Neighborhood", "Latitude", "Longitude"])

denver_coords = pd.concat([denver_coords,manual_coords2])
denver_coords.sort_values(['Neighborhood'])

# Create new map of Denver's neighborhoods
map_denver2 = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, neighborhood in zip(denver_coords['Latitude'], denver_coords['Longitude'], denver_coords['Neighborhood']):
    label = neighborhood
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_denver2)  
    
map_denver2

This is much better. All neighborhoods are now correctly identified, and we can move on to upload the rest of data. 

### Venue data

In [33]:
# Import Foursquare data
# Define Credentials, Version and Search Parameters
CLIENT_ID = '4TSDP1NHK4BCLHIIAKO4J13VN41KLOJSYQZN44BYFE4T34DR' 
CLIENT_SECRET = 'VOKSVM11BJK0AWEU2Y5NDVROOYUDOYAQBKDBH5XUJ4GX55QW'
VERSION = '20180605' 

radius = 500
LIMIT = 500

In [34]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [35]:
denver_venues = getNearbyVenues(names=denver_coords['Neighborhood'],
                                latitudes=denver_coords['Latitude'],
                                longitudes=denver_coords['Longitude']
                               )

Montbello
Hampden
Stapleton
Westwood
Five Points
Capitol Hill
Hampden South
Mar Lee
Windsor
Harvey Park
Congress Park
Ruby Hill
East Colfax
Marston
North Park Hill
Sunnyside
University
West Highland
Villa Park
South Park Hill
Northeast Park Hill
Athmar Park
West Colfax
Fort Logan
Highland
Harvey Park South
Berkeley
Cheesman Park
Hilltop
Sloan Lake
Washington Park West
Washington Park
Hale
Elyria Swansea
Cherry Creek
Barnum
Barnum West
Lincoln Park
University Hills
Whittier
Montclair
Baker
North Capitol Hill
Platt Park
Goldsmith
Union Station
Southmoor Park
City Park West
Belcaro
Clayton
Cole
Regis
Globeville
Chaffee Park
Valverde
Central Business District
Wellshire
City Park
Overland
Skyland
Jefferson Park
Rosedale
Civic Center
DIA
Sun Valley
Auraria
Gateway
Green Valley Ranch
Virginia Village
Washington Virginia Vale
Lowry Field
Bear Valley
College View
South Platte
Cory - Merrill
Country Club
Speer
University Park
Indian Creek


In [36]:
print('There are {} uniques categories.'.format(len(denver_venues['Venue Category'].unique())))

There are 234 uniques categories.


That is a lot of unique categories. I'd like to group them into similar categories to simplify, and so that the algorithm can better generalize. In a separate spreadsheet, I realized this grouping. I will now upload it and merge with the Foursquare venue data for Denver. 

In [37]:
df_mapping = pd.read_csv("Venue_mapping.csv",sep=';')
print('There are {} grouped categories.'.format(len(df_mapping['Category'].unique())))

There are 14 grouped categories.


In [38]:
denver_venues = pd.DataFrame.merge(denver_venues,
                                   df_mapping,
                                   how = 'left',
                                   left_on = 'Venue Category',
                                   right_on = 'Venue'
                                  )

denver_venues = denver_venues.iloc[:,[0,-1]]
denver_venues.head()

Unnamed: 0,Neighborhood,Category
0,Montbello,Outdoor recreation
1,Montbello,Outdoor recreation
2,Montbello,Outdoor recreation
3,Hampden,Restaurant
4,Hampden,Restaurant


In [39]:
# one hot encoding
denver_onehot = pd.get_dummies(denver_venues[['Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe and move to first column
denver_onehot['Neighborhood'] = denver_venues['Neighborhood'] 

fixed_columns = [denver_onehot.columns[-1]] + list(denver_onehot.columns[:-1])
denver_onehot = denver_onehot[fixed_columns]

denver_onehot = denver_onehot.groupby('Neighborhood').mean().reset_index()
denver_onehot

denver_onehot.head()

Unnamed: 0,Neighborhood,Athletics / Sports,Bar / Brewery,Business,Entertainment,Food store,Health / Personal care,Hotel,Landmark,Nightlife,Other,Outdoor recreation,Restaurant,Shopping,Transit
0,Athmar Park,0.0,0.0,0.4,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0
1,Auraria,0.12,0.12,0.0,0.04,0.08,0.0,0.04,0.0,0.04,0.0,0.0,0.4,0.08,0.04
2,Baker,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.166667,0.0,0.166667
3,Barnum,0.0,0.0,0.4,0.0,0.4,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0
4,Barnum West,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0


In [40]:
print('There are {} neighborhoods with venue information in Denver, Colorado.'.format
      (len(denver_onehot['Neighborhood'])))

There are 78 neighborhoods with venue information in Denver, Colorado.


## Clustering neighborhoods

In [41]:
# set number of clusters
kclusters = 5

denver_grouped_clustering = denver_onehot.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(denver_grouped_clustering)

# add clustering labels and neighborhood names 
denver_grouped_clustering.insert(0, 'Cluster Labels', kmeans.labels_)
denver_grouped_clustering['Neighborhood'] = denver_onehot['Neighborhood']


# merge data to add latitude/longitude for each neighborhood
denver_grouped_clustering = pd.DataFrame.merge(denver_grouped_clustering,denver_coords,
                                               how='inner', 
                                               on='Neighborhood')

denver_grouped_clustering.head() 

Unnamed: 0,Cluster Labels,Athletics / Sports,Bar / Brewery,Business,Entertainment,Food store,Health / Personal care,Hotel,Landmark,Nightlife,Other,Outdoor recreation,Restaurant,Shopping,Transit,Neighborhood,Latitude,Longitude
0,2,0.0,0.0,0.4,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,Athmar Park,39.7037,-105.011
1,0,0.12,0.12,0.0,0.04,0.08,0.0,0.04,0.0,0.04,0.0,0.0,0.4,0.08,0.04,Auraria,39.7465,-105.007
2,2,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.166667,0.0,0.166667,Baker,39.7116,-104.994
3,2,0.0,0.0,0.4,0.0,0.4,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,Barnum,39.7177,-105.032
4,2,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,Barnum West,39.7172,-105.046


In [42]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(denver_grouped_clustering['Latitude'], 
                                  denver_grouped_clustering['Longitude'], 
                                  denver_grouped_clustering['Neighborhood'], 
                                  denver_grouped_clustering['Cluster Labels']):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Great, now we have five clusters! Let's deep dive into each one to see which has the characteristics we want in a neighborhood to live in.

In [43]:
# Define a function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [44]:
# Create a dataframe to display top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = denver_onehot['Neighborhood']

for ind in np.arange(denver_onehot.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(denver_onehot.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_clustered = pd.DataFrame.merge(neighborhoods_venues_sorted,
                                                           denver_grouped_clustering,
                                                           on = 'Neighborhood')

neighborhoods_venues_sorted_clustered = neighborhoods_venues_sorted_clustered.iloc[:,[0,11,1,2,3,4,5,6,7,8,9,10]]

### Cluster 0

In [45]:
neighborhoods_venues_sorted_clustered.loc[neighborhoods_venues_sorted_clustered['Cluster Labels'] == 0, neighborhoods_venues_sorted_clustered.columns[[0] + list(range(2, neighborhoods_venues_sorted_clustered.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Auraria,Restaurant,Bar / Brewery,Athletics / Sports,Shopping,Food store,Transit,Nightlife,Hotel,Entertainment,Outdoor recreation
6,Belcaro,Restaurant,Business,Health / Personal care,Food store,Transit,Shopping,Outdoor recreation,Other,Nightlife,Landmark
7,Berkeley,Restaurant,Shopping,Bar / Brewery,Health / Personal care,Food store,Outdoor recreation,Nightlife,Entertainment,Athletics / Sports,Transit
8,Capitol Hill,Restaurant,Bar / Brewery,Shopping,Food store,Nightlife,Entertainment,Athletics / Sports,Hotel,Transit,Outdoor recreation
9,Central Business District,Restaurant,Hotel,Food store,Nightlife,Shopping,Entertainment,Bar / Brewery,Athletics / Sports,Landmark,Health / Personal care
10,Chaffee Park,Restaurant,Outdoor recreation,Bar / Brewery,Transit,Shopping,Other,Nightlife,Landmark,Hotel,Health / Personal care
12,Cherry Creek,Restaurant,Food store,Athletics / Sports,Transit,Shopping,Outdoor recreation,Other,Nightlife,Landmark,Hotel
14,City Park West,Restaurant,Bar / Brewery,Shopping,Health / Personal care,Food store,Transit,Outdoor recreation,Other,Nightlife,Landmark
15,Civic Center,Restaurant,Bar / Brewery,Athletics / Sports,Shopping,Entertainment,Nightlife,Landmark,Hotel,Transit,Outdoor recreation
19,Congress Park,Restaurant,Shopping,Food store,Transit,Outdoor recreation,Other,Nightlife,Landmark,Hotel,Health / Personal care


### Cluster 1

In [46]:
neighborhoods_venues_sorted_clustered.loc[neighborhoods_venues_sorted_clustered['Cluster Labels'] == 1, neighborhoods_venues_sorted_clustered.columns[[0] + list(range(2, neighborhoods_venues_sorted_clustered.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Bear Valley,Outdoor recreation,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care,Food store
11,Cheesman Park,Outdoor recreation,Restaurant,Athletics / Sports,Transit,Shopping,Other,Nightlife,Landmark,Hotel,Health / Personal care
17,Cole,Restaurant,Outdoor recreation,Transit,Shopping,Other,Nightlife,Landmark,Hotel,Health / Personal care,Food store
38,Indian Creek,Outdoor recreation,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care,Food store
43,Montbello,Outdoor recreation,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care,Food store
55,South Park Hill,Outdoor recreation,Restaurant,Athletics / Sports,Transit,Shopping,Other,Nightlife,Landmark,Hotel,Health / Personal care
56,South Platte,Outdoor recreation,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care,Food store
65,University Park,Shopping,Outdoor recreation,Transit,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care,Food store
69,Washington Park,Outdoor recreation,Athletics / Sports,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care
70,Washington Park West,Outdoor recreation,Athletics / Sports,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care


### Cluster 2

In [47]:
neighborhoods_venues_sorted_clustered.loc[neighborhoods_venues_sorted_clustered['Cluster Labels'] == 2, neighborhoods_venues_sorted_clustered.columns[[0] + list(range(2, neighborhoods_venues_sorted_clustered.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Athmar Park,Business,Outdoor recreation,Entertainment,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel
2,Baker,Bar / Brewery,Transit,Restaurant,Other,Nightlife,Shopping,Outdoor recreation,Landmark,Hotel,Health / Personal care
3,Barnum,Food store,Business,Nightlife,Transit,Shopping,Restaurant,Outdoor recreation,Other,Landmark,Hotel
4,Barnum West,Shopping,Outdoor recreation,Business,Transit,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care
13,City Park,Entertainment,Outdoor recreation,Restaurant,Shopping,Landmark,Nightlife,Health / Personal care,Food store,Transit,Other
18,College View,Transit,Food store,Shopping,Business,Athletics / Sports,Restaurant,Outdoor recreation,Other,Nightlife,Landmark
24,Elyria Swansea,Restaurant,Other,Food store,Transit,Outdoor recreation,Hotel,Health / Personal care,Business,Athletics / Sports,Shopping
26,Fort Logan,Other,Transit,Shopping,Restaurant,Outdoor recreation,Nightlife,Landmark,Hotel,Health / Personal care,Food store
28,Globeville,Bar / Brewery,Transit,Shopping,Restaurant,Outdoor recreation,Nightlife,Other,Landmark,Hotel,Health / Personal care
31,Hale,Food store,Restaurant,Outdoor recreation,Health / Personal care,Business,Transit,Shopping,Other,Nightlife,Landmark


### Cluster 3

In [48]:
neighborhoods_venues_sorted_clustered.loc[neighborhoods_venues_sorted_clustered['Cluster Labels'] == 3, neighborhoods_venues_sorted_clustered.columns[[0] + list(range(2, neighborhoods_venues_sorted_clustered.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Clayton,Food store,Shopping,Restaurant,Transit,Outdoor recreation,Other,Nightlife,Landmark,Hotel,Health / Personal care
20,Cory - Merrill,Food store,Transit,Shopping,Restaurant,Outdoor recreation,Other,Nightlife,Landmark,Hotel,Health / Personal care
42,Mar Lee,Restaurant,Food store,Shopping,Transit,Outdoor recreation,Other,Nightlife,Landmark,Hotel,Health / Personal care
50,Regis,Food store,Restaurant,Athletics / Sports,Transit,Shopping,Outdoor recreation,Other,Nightlife,Landmark,Hotel
75,Westwood,Restaurant,Food store,Transit,Shopping,Outdoor recreation,Other,Nightlife,Landmark,Hotel,Health / Personal care


### Cluster 4

In [49]:
neighborhoods_venues_sorted_clustered.loc[neighborhoods_venues_sorted_clustered['Cluster Labels'] == 4, neighborhoods_venues_sorted_clustered.columns[[0] + list(range(2, neighborhoods_venues_sorted_clustered.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Country Club,Athletics / Sports,Transit,Shopping,Restaurant,Outdoor recreation,Other,Nightlife,Landmark,Hotel,Health / Personal care
30,Green Valley Ranch,Athletics / Sports,Business,Transit,Shopping,Restaurant,Outdoor recreation,Other,Nightlife,Landmark,Hotel


Given the characteristics of the clusters, we are most excited about cluster 2, given its abundance of outdoor recreational activities as well as restaurants. Now let's dive deeper into this cluster to see what we can afford while still living close to the city and airport!

In [50]:
# Filter data for cluster 2
cluster_2 = denver_grouped_clustering[denver_grouped_clustering['Cluster Labels']==2]

### Housing prices

In [51]:
# Filter Zillow data for mean home price by neighborhood
denver_house_prices = denver_data.loc[:,['RegionName','price']]

# Rename neighborhoods that were altered in mapping coordinates
rows_to_alter = denver_house_prices[(denver_house_prices['RegionName']=='Gateway - Green Valley Ranch') |
                                    (denver_house_prices['RegionName']=='College View - South Platte')]

rows_to_alter = pd.concat([rows_to_alter]*2,ignore_index=True)
new_rows = pd.DataFrame({'RegionName':['Gateway','College View','Green Valley Ranch','South Platte']})
rows_to_alter.update(new_rows)

# Remove bad rows and add correct ones
denver_house_prices = denver_house_prices[(denver_house_prices['RegionName']!='Gateway - Green Valley Ranch') &
                                          (denver_house_prices['RegionName']!='College View - South Platte')]
denver_house_prices = pd.concat([denver_house_prices,rows_to_alter])
denver_house_prices.sort_values('RegionName',inplace= True)

# Filter to neighborhoods in Cluster 2
denver_house_prices_c2 = pd.DataFrame.merge(cluster_2,
                                           denver_house_prices,
                                           how = 'left',
                                           left_on = 'Neighborhood',
                                           right_on = 'RegionName')

denver_house_prices_c2 = denver_house_prices_c2.reset_index(drop = True)

# Now, let's normalize housing prices
x = denver_house_prices_c2[['price']].values.astype(float)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
prices_normalized_c2 = pd.DataFrame(x_scaled)

# Add Neighborhood column to normalized prices
prices_normalized_c2['Neighborhood'] = denver_house_prices_c2['RegionName'] 
prices_normalized_c2.columns = ('Normalized Price','Neighborhood')
prices_normalized_c2.head()

Unnamed: 0,Normalized Price,Neighborhood
0,0.155293,Athmar Park
1,0.590954,Baker
2,0.093326,Barnum
3,0.127498,Barnum West
4,0.976423,City Park


### Proximity to Downtown Denver and to Denver International Airport (DIA)

In [52]:
# Calculate distance (in miles) from each neighborhood to downtown Denver and to the airport

distances_denver_c2 = pd.DataFrame(columns=['Neighborhood','Distance Downtown','Distance DIA'])


for lat, lng, neighborhood in zip(cluster_2['Latitude'], cluster_2['Longitude'], cluster_2['Neighborhood']):
    coords_neighborhood = (lat,lng)
    coords_downtown = (39.7392364,-104.9848623)
    coords_DIA = (39.8561,-104.6737)
    
    distance_downtown = geopy.distance.distance(coords_neighborhood,coords_downtown).miles
    distance_airport = geopy.distance.distance(coords_neighborhood,coords_DIA).miles
    
    distances_denver_c2 = distances_denver_c2.append({'Neighborhood': neighborhood,
                                                      'Distance Downtown': distance_downtown,
                                                      'Distance DIA': distance_airport}, ignore_index=True)

# Normalize these values
x_downtown = distances_denver_c2[['Distance Downtown']].values.astype(float)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled_downtown = min_max_scaler.fit_transform(x_downtown)
normalized_distances_c2 = pd.DataFrame(x_scaled_downtown)
normalized_distances_c2.columns = ['Normalized Distance Downtown']

x_airport = distances_denver_c2[['Distance DIA']].values.astype(float)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled_airport = min_max_scaler.fit_transform(x_airport)

normalized_distances_c2['Normalized Distance Airport'] = pd.DataFrame(x_scaled_airport)
normalized_distances_c2['Neighborhood'] = distances_denver_c2['Neighborhood']

normalized_distances_c2.head()

Unnamed: 0,Normalized Distance Downtown,Normalized Distance Airport,Neighborhood
0,0.149582,0.573789,Athmar Park
1,0.006789,0.448596,Baker
2,0.170665,0.63941,Barnum
3,0.2851,0.72075,Barnum West
4,0.0,0.067545,City Park


In [53]:
# Merge this information to create another clustering
denver_prices_distance_c2 = pd.DataFrame.merge(prices_normalized_c2, normalized_distances_c2, 
                                            how = 'inner',
                                            on = 'Neighborhood'
                                           )

In [54]:
# set number of clusters
kclusters = 2

denver_prices_distance_c2_clustering = denver_prices_distance_c2.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(denver_prices_distance_c2_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 0, 0, 1, 0, 0, 0, 0, 1], dtype=int32)

In [55]:
# add clustering labels and neighborhood names 
denver_prices_distance_c2_clustering.insert(0, 'Sub Cluster Labels', kmeans.labels_)
denver_prices_distance_c2_clustering['Neighborhood'] = denver_prices_distance_c2['Neighborhood']


# merge data to add latitude/longitude for each neighborhood
denver_prices_distance_c2_clustering = pd.DataFrame.merge(denver_prices_distance_c2_clustering,denver_coords,
                                               how='left', 
                                               on='Neighborhood')

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(denver_prices_distance_c2_clustering['Latitude'], 
                                  denver_prices_distance_c2_clustering['Longitude'], 
                                  denver_prices_distance_c2_clustering['Neighborhood'], 
                                  denver_prices_distance_c2_clustering['Sub Cluster Labels']):
    
    label = folium.Popup(str(poi) + ' Cluster 2.' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

In [56]:
denver_final = pd.DataFrame.merge(neighborhoods_venues_sorted_clustered,
                                  denver_prices_distance_c2_clustering,
                                  on = 'Neighborhood',
                                  how = 'right')

### Cluster 2.0

In [57]:
denver_final.loc[denver_final['Sub Cluster Labels'] == 0, denver_final.columns[[0] + list(range(13,16)) + list(range(2,12))]]

Unnamed: 0,Neighborhood,Normalized Price,Normalized Distance Downtown,Normalized Distance Airport,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Athmar Park,0.155293,0.149582,0.573789,Business,Outdoor recreation,Entertainment,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel
2,Barnum,0.093326,0.170665,0.63941,Food store,Business,Nightlife,Transit,Shopping,Restaurant,Outdoor recreation,Other,Landmark,Hotel
3,Barnum West,0.127498,0.2851,0.72075,Shopping,Outdoor recreation,Business,Transit,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care
5,College View,0.105618,0.535949,0.773215,Transit,Food store,Shopping,Business,Athletics / Sports,Restaurant,Outdoor recreation,Other,Nightlife,Landmark
6,Elyria Swansea,0.0,0.237363,0.0,Restaurant,Other,Food store,Transit,Outdoor recreation,Hotel,Health / Personal care,Business,Athletics / Sports,Shopping
7,Fort Logan,0.391966,0.862331,1.0,Other,Transit,Shopping,Restaurant,Outdoor recreation,Nightlife,Landmark,Hotel,Health / Personal care,Food store
8,Globeville,0.04617,0.158181,0.178651,Bar / Brewery,Transit,Shopping,Restaurant,Outdoor recreation,Nightlife,Other,Landmark,Hotel,Health / Personal care
10,Harvey Park,0.236142,0.569811,0.851027,Outdoor recreation,Business,Bar / Brewery,Athletics / Sports,Transit,Shopping,Restaurant,Other,Nightlife,Landmark
11,Harvey Park South,0.336458,0.569811,0.851027,Outdoor recreation,Business,Bar / Brewery,Athletics / Sports,Transit,Shopping,Restaurant,Other,Nightlife,Landmark
12,Northeast Park Hill,0.319777,1.0,0.111758,Outdoor recreation,Business,Transit,Shopping,Restaurant,Other,Nightlife,Landmark,Hotel,Health / Personal care


### Cluster 2.1

In [58]:
denver_final.loc[denver_final['Sub Cluster Labels'] == 1, denver_final.columns[[0] + list(range(13,16)) + list(range(2,12))]]

Unnamed: 0,Neighborhood,Normalized Price,Normalized Distance Downtown,Normalized Distance Airport,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Baker,0.590954,0.006789,0.448596,Bar / Brewery,Transit,Restaurant,Other,Nightlife,Shopping,Outdoor recreation,Landmark,Hotel,Health / Personal care
4,City Park,0.976423,0.0,0.067545,Entertainment,Outdoor recreation,Restaurant,Shopping,Landmark,Nightlife,Health / Personal care,Food store,Transit,Other
9,Hale,0.731874,0.163077,0.019909,Food store,Restaurant,Outdoor recreation,Health / Personal care,Business,Transit,Shopping,Other,Nightlife,Landmark
15,Skyland,0.608967,0.040752,0.036994,Business,Shopping,Food store,Transit,Restaurant,Outdoor recreation,Other,Nightlife,Landmark,Hotel
16,Sloan Lake,1.0,0.186884,0.565382,Shopping,Restaurant,Outdoor recreation,Food store,Bar / Brewery,Transit,Other,Nightlife,Landmark,Hotel
17,Stapleton,0.918158,0.266979,0.135911,Business,Athletics / Sports,Food store,Shopping,Restaurant,Health / Personal care,Transit,Outdoor recreation,Other,Nightlife


# Conclusion 

Given these clustering analyses, we have come to the conclusion that Cluster 2.1 has our ideal characteristics: abundance of bars, restaurants and outdoor spaces as well as proximity to downtown Denver and the airport. We will have to consider that housing prices for this Sub-cluster are higher than those for Sub-cluster 2.0, but perhaps choosing the specific neighborhoods with lower normalized prices (e.g. Baker, Skyland) would solve this issue. 

Please see my seperate report for an explaination of methodology and additional discussions. Thank you!