<h1> Data Science Capstone: Chicago,IL</h1>
<h3> Introduction: </h3>

Chicago is one of the most populous and growing US cities in the country. It is the home to plenty of arts, entertainment, sports, education, research and much more. No wonder Bean-town is attracting  plenty of young professionals moving into the busy city each year. Chicago is a relatively young city with the person's average age being 34.9 years old and 63% of the population being single out of 2.7 million residents and growing.

With a growing city full of young people moving to Chicago looking for entertainment and adjusting to life in a new city, there has been demand on creating programs to research and provide information to those new to the city, create guides for tourists and keep the city's economy booming.

In this project, we will use location data and clustering techniques to determine which areas of Chicago has the most activity and human traffic. This can help a new transplant select which is an accessible and quiet neighborhood to potentially select their new home.

<b> Methodology </b>
<ul>
<li>Build a dataframe of neighborhoods in Chicago, IL by web scraping data from a Wikipedia page.</li>
<li>Get the geographical coordinates of the different neighborhoods.</li>
<li>Explore and cluster the neighborhoods to determine which neighborhoods are within city center.</li>
<li>Obtain the venue data for the cluster that contain city center neighborhoods from Foursquare API.</li>
<li>Determine the top 10 venues within each neighborhood and perform k-means clustering for each.</li>
<li>Select the best neighborhood in each cluster that contains the most venues.</li>
</ul>

<i>Census data reference: https://censusreporter.org/profiles/16000US1714000-chicago-il/ </i>

<h3> <b>1. Import Required Libraries</b> </h3>

In [1]:
# Import required packages
#!conda install -c anaconda beautifulsoup4
from bs4 import BeautifulSoup

import requests
import urllib.request

import pandas as pd
from pandas.io.json import json_normalize
import numpy as np

!conda install -c conda-forge folium=0.5.0 --yes
import folium

from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

import sys
!{sys.executable} -m pip install geocoder
!{sys.executable} -m pip install geopy
import geocoder
from geopy.geocoders import Nominatim

from IPython.display import Image 
from IPython.core.display import HTML 
print('Packages Loaded')

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


C:\Users\sleung225\Downloads>set "JAVA_HOME="  

C:\Users\sleung225\Downloads>set "JAVA_HOME_CONDA_BACKUP="  

C:\Users\sleung225\Downloads>set "JAVA_HOME=C:\Users\sleung225\AppData\Local\Continuum\anaconda3\Library"  
Packages Loaded


<h3> <b>2. Data and Webscraping from Wikipedia page to a DataFrame</b> </h3>

In [2]:
# Wikipedia Data Source
url = "https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
#print(soup.prettify)

In [3]:
# Find desired table from HTML text of webpage
all_tables=soup.find_all("table")
right_table=soup.find('table', class_='wikitable sortable')
#right_table

In [4]:
# Append neighborhood table into list format
Neighborhood=[]
for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==2:
        Neighborhood.append(cells[0].find(text=True))
#Neighborhood

In [5]:
# Create dataframe for all neighborhood data
chi_df = pd.DataFrame({"Neighborhood": Neighborhood})
chi_df.head()

Unnamed: 0,Neighborhood
0,Albany Park
1,Altgeld Gardens
2,Andersonville
3,Archer Heights
4,Armour Square


In [6]:
# Shape of data frame
chi_df.shape

(246, 1)

<h3><b>3. Get Geographical Coordinates </b></h3>

In [7]:
def get_latlng(Neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Chicago,IL'.format(Neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [8]:
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [get_latlng(Neighborhood) for Neighborhood in chi_df["Neighborhood"].tolist()]
#coords

In [9]:
# Create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [10]:
# Merge coordinates into original dataframe
chi_df['Latitude'] = df_coords['Latitude']
chi_df['Longitude'] = df_coords['Longitude']

In [11]:
# Check the merged dataframe
print(chi_df.shape)
chi_df.head()

(246, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Albany Park,41.96829,-87.72338
1,Altgeld Gardens,41.65441,-87.60225
2,Andersonville,41.98046,-87.66834
3,Archer Heights,41.81154,-87.72556
4,Armour Square,41.83458,-87.63189


In [12]:
# Save the dataframe as CSV file
chi_df.to_csv("df.csv", index=False)

<h3><b>4. Create map of Chicago with coordinates and neighborhood data</b></h3>

In [13]:
# Get coordinates for Chicago
address = 'Chicago, IL'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

41.8755616 -87.6244212


In [14]:
# Create map of Chicago using latitude and longitude values
map_chi = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, neighborhood in zip(chi_df['Latitude'], chi_df['Longitude'], chi_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='dark green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7).add_to(map_chi)  
    
map_chi 

In [15]:
# Save the map as HTML file
map_chi.save('map_chi.html')

<h3><b>5. k-Means Clustering Neighborhoods</b></h3>

There are 246 different neighborhoods in the greater Chicago area. We will group these neighborhoods into 10 different clusters to get a clear visual to see which neighborhoods are in the urban areas of Chicago. From there, we will focus on analyzing the cluster with the most inner city neighborhoods.

In [16]:
# set number of clusters
k = 10

chi_clustering = chi_df.drop(["Neighborhood"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k, random_state=0).fit(chi_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 5, 7, 6, 0, 6, 4, 0, 9, 1])

In [17]:
# create a new dataframe that includes the cluster
chi_merged = chi_df.copy()

# add clustering labels
chi_merged["Cluster Labels"] = kmeans.labels_
chi_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels
0,Albany Park,41.96829,-87.72338,1
1,Altgeld Gardens,41.65441,-87.60225,5
2,Andersonville,41.98046,-87.66834,7
3,Archer Heights,41.81154,-87.72556,6
4,Armour Square,41.83458,-87.63189,0


In [18]:
# sort the results by Cluster Labels
print(chi_merged.shape)
chi_merged.sort_values(["Cluster Labels"], inplace=True)
chi_merged

(246, 4)


<b>Create a visual map of the 10 clusters in the Greater Chicago Area.</b>

In [19]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chi_merged['Latitude'], chi_merged['Longitude'], chi_merged['Neighborhood'], chi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [20]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

We can visually see that the most inner city neighborhoods are in Cluster 2. We will filter for neighborhoods for Cluster 2 only and create a new dataframe with that information.

In [21]:
#Boolean to filter cluster 2
is_2 =  chi_merged['Cluster Labels']==2
is_2.head()

52     False
208    False
33     False
171    False
45     False
Name: Cluster Labels, dtype: bool

In [22]:
#Filter for cluster 2 list of neighborhoods
chi_cluster2 = chi_merged[is_2]
print(chi_cluster2.shape)

#Cluster 2 dataframe
chi_cluster2_df = pd.DataFrame(chi_cluster2)
print('There are {} neighborhoods in the city area of Chicago based on the total in Cluster 2.'.format(len(chi_cluster2_df['Neighborhood'].unique())))
chi_cluster2_df.head()

(60, 4)
There are 60 neighborhoods in the city area of Chicago based on the total in Cluster 2.


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels
109,Lake View East,41.885757,-87.624328,2
195,Sheridan Park,41.87015,-87.65425,2
198,Smith Park,41.89241,-87.69167,2
127,Marshall Square,41.851993,-87.699064,2
124,Magnificent Mile,41.90058,-87.62421,2


<h3><b>6. Explore FOURSQUARE API Location Data (Cluster 2): Chicago, IL </b> </h3>

We will explore the venue data in Cluster 2 by using Foursquare location data.

In [23]:
CLIENT_ID = 'Client ID'
CLIENT_SECRET = 'Client Secret'
VERSION = '20200609'
radius = 2000
LIMIT = 100

# CREATE A DF WITH JUST A FEW ROWS, TO TEST OUT THE CODE
#pretend_dict = {df}

df_chi = pd.DataFrame(chi_cluster2_df)

venues = []

for lat, long, neighborhood in zip(chi_cluster2_df['Latitude'], chi_cluster2_df['Longitude'], chi_cluster2_df['Neighborhood']):
    
    print(lat, long, neighborhood)
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT) 
        
    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']
        
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append([
            neighborhood,
            lat, 
            long,
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']])

venues

41.88575700000001 -87.62432827519756 Lake View East
41.87015000000008 -87.65424999999999 Sheridan Park
41.89241000000004 -87.69166999999999 Smith Park
41.85199302125155 -87.6990644845565 Marshall Square
41.90058000000005 -87.62421007054608 Magnificent Mile
41.90333000000004 -87.66728999999998 The Villa
41.874540000000025 -87.62883999999997 South Loop
41.87834000000004 -87.61996999999997 The Loop

41.898430000000076 -87.62140999999997 Streeterville
41.928973966097374 -87.6562237405041 Wrightwood

41.884250000000065 -87.63244999999995 Talley's Corner

41.910360000000026 -87.68295999999998 Wicker Park
41.899020000000064 -87.68203999999997 Ukrainian Village
41.869380000000035 -87.66064999999998 Little Italy
41.92184000000003 -87.64743999999996 Lincoln Park

41.86932300000001 -87.68611269329514 Tri-Taylor
41.928973966097374 -87.6562237405041 Wrightwood Neighbors

41.8794200372207 -87.6331300255352 West DePaul

41.893290000000036 -87.65742999999998 West Town

41.92167344736649 -87.6534885 Sh

[['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'The Chicago Theatre',
  41.8855389313031,
  -87.6271507302385,
  'Theater'],
 ['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'sweetgreen',
  41.8849641,
  -87.6247284,
  'Salad Place'],
 ['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'Roti Modern Mediterranean',
  41.8860482,
  -87.6249484,
  'Mediterranean Restaurant'],
 ['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'Chicago Architecture Center',
  41.88772,
  -87.62365,
  'Tour Provider'],
 ['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'Virgin Hotels Chicago',
  41.8860646,
  -87.62585279999999,
  'Hotel'],
 ['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'Chicago Cultural Center',
  41.883640263310966,
  -87.62467077533687,
  'Museum'],
 ['Lake View East',
  41.88575700000001,
  -87.62432827519756,
  'LondonHouse Chicago, Curio Collection by Hilton',
  41.8878318,
  -87.6254261,
  'H

In [24]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.tail()

(5951, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
5946,East Garfield Park,41.87863,-87.70514,Tony's Italian Beef 4100 W Madison,41.881121,-87.728168,Fast Food Restaurant
5947,East Garfield Park,41.87863,-87.70514,Baba Pita,41.868758,-87.685328,Middle Eastern Restaurant
5948,East Garfield Park,41.87863,-87.70514,Iron Mountain,41.864173,-87.690909,Business Service
5949,East Garfield Park,41.87863,-87.70514,Mason (Elizabeth) Park,41.883345,-87.728445,Park
5950,East Garfield Park,41.87863,-87.70514,McDonald's,41.881077,-87.727853,Fast Food Restaurant


In [25]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Big Oaks,100,100,100,100,100,100
Bucktown,100,100,100,100,100,100
Budlong Woods,100,100,100,100,100,100
Cabrini–Green,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Dearborn Park,100,100,100,100,100,100
Douglas Park,96,96,96,96,96,96
East Garfield Park,78,78,78,78,78,78
East Pilsen,100,100,100,100,100,100
East Village,100,100,100,100,100,100


In [26]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))
print('There are {} neighborhoods.'.format(len(venues_df['Neighborhood'].unique())))

There are 245 uniques categories.
There are 60 neighborhoods.


<h3><b>7. Analyze Each Neighborhood</b></h3>

After determining the different venues in the inner city neighborhoods, we are going to explore each neighborhood and determine which venue categories are located in each neighborhood.

In [27]:
#Count unique types of venue categories
venue_count = venues_df['VenueCategory'].value_counts().to_frame(name='Count')
venue_count

Unnamed: 0,Count
Italian Restaurant,246
Mexican Restaurant,237
Coffee Shop,232
Hotel,213
Park,187
Pizza Place,175
New American Restaurant,146
Bar,144
Grocery Store,138
Theater,133


In [28]:
# one hot encoding
chi_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chi_onehot['Neighborhood'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chi_onehot.columns[-1]] + list(chi_onehot.columns[:-1])
chi_onehot = chi_onehot[fixed_columns]

print(chi_onehot.shape)
chi_onehot.head()

(5951, 246)


Unnamed: 0,Neighborhood,ATM,American Restaurant,Amphitheater,Animal Shelter,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Lake View East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Lake View East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Lake View East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Lake View East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Lake View East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
chi_grouped = chi_onehot.groupby('Neighborhood').mean().reset_index()

In [30]:
#group neighborhoods with top 10 common venues
num_top_venues = 10 
for hood in chi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = chi_grouped[chi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Big Oaks----
                      venue  freq
0                     Hotel  0.10
1                   Theater  0.07
2        Italian Restaurant  0.06
3               Coffee Shop  0.04
4   New American Restaurant  0.04
5  Mediterranean Restaurant  0.03
6                      Park  0.03
7                       Gym  0.03
8               Salad Place  0.03
9                Restaurant  0.02


----Bucktown----
                     venue  freq
0                      Bar  0.07
1              Coffee Shop  0.06
2              Pizza Place  0.05
3                     Park  0.04
4  New American Restaurant  0.03
5              Yoga Studio  0.03
6                Bookstore  0.03
7           Breakfast Spot  0.03
8     Gym / Fitness Center  0.03
9                      Gym  0.02


----Budlong Woods----
                venue  freq
0         Coffee Shop  0.05
1  Italian Restaurant  0.05
2  Mexican Restaurant  0.05
3                 Bar  0.04
4                 Gym  0.03
5          Donut Shop  0.03
6      

                       venue  freq
0         Mexican Restaurant  0.17
1                Art Gallery  0.07
2                        Bar  0.06
3                Pizza Place  0.05
4         Chinese Restaurant  0.04
5                     Bakery  0.04
6  Latin American Restaurant  0.03
7              Grocery Store  0.03
8                       Park  0.03
9                Coffee Shop  0.03


----Museum Campus----
                venue  freq
0            Aquarium  0.08
1                Park  0.06
2  Chinese Restaurant  0.04
3        Burger Joint  0.04
4         Pizza Place  0.04
5    Asian Restaurant  0.03
6       Grocery Store  0.03
7         Planetarium  0.03
8         Yoga Studio  0.02
9    Sculpture Garden  0.02


----Near North Side
----
                  venue  freq
0                 Hotel  0.09
1            Steakhouse  0.06
2           Coffee Shop  0.05
3    Italian Restaurant  0.05
4   American Restaurant  0.04
5           Pizza Place  0.03
6    Salon / Barbershop  0.03
7         Grocer

                      venue  freq
0                     Hotel  0.10
1                   Theater  0.07
2        Italian Restaurant  0.06
3               Coffee Shop  0.04
4   New American Restaurant  0.04
5  Mediterranean Restaurant  0.03
6                      Park  0.03
7                       Gym  0.03
8               Salad Place  0.03
9                Restaurant  0.02


----The Loop
----
            venue  freq
0           Hotel  0.11
1         Theater  0.07
2            Park  0.05
3     Music Venue  0.04
4        Fountain  0.03
5     Coffee Shop  0.03
6     Pizza Place  0.02
7   Grocery Store  0.02
8      Steakhouse  0.02
9  Breakfast Spot  0.02


----The Villa----
                     venue  freq
0                      Bar  0.06
1              Pizza Place  0.05
2       Mexican Restaurant  0.04
3              Coffee Shop  0.04
4         Sushi Restaurant  0.04
5                     Park  0.03
6                     Café  0.03
7                   Bakery  0.03
8       Italian Restauran

In [31]:
#Putting common venues into a new dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
#Determine the top 10 common venues per neighborhood
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = chi_grouped['Neighborhood']
for ind in np.arange(chi_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chi_grouped.iloc[ind, :], num_top_venues)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head(5)

(60, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Big Oaks,Hotel,Theater,Italian Restaurant,Coffee Shop,New American Restaurant,Park,Mediterranean Restaurant,Gym,Salad Place,Cocktail Bar
1,Bucktown,Bar,Coffee Shop,Pizza Place,Park,Bookstore,Yoga Studio,New American Restaurant,Breakfast Spot,Gym / Fitness Center,Trail
2,Budlong Woods,Coffee Shop,Mexican Restaurant,Italian Restaurant,Bar,Sushi Restaurant,Sandwich Place,Grocery Store,Gym,Theater,Donut Shop
3,Cabrini–Green,Italian Restaurant,Coffee Shop,Park,Yoga Studio,Grocery Store,Sandwich Place,Hot Dog Joint,Café,Mexican Restaurant,Ice Cream Shop
4,Chinatown,Chinese Restaurant,Pizza Place,Park,Grocery Store,Asian Restaurant,Mexican Restaurant,Yoga Studio,Bar,Thai Restaurant,Dessert Shop


<h3><b>8. Perform k-Means clustering again for inner city neighborhoods in Cluster 2 and venues. </b></h3>

In [33]:
chi_grouped_clustering = chi_grouped.drop('Neighborhood', 1)
#chi_grouped_clustering

In [34]:
# set number of clusters
ks = 5
# run k-means clustering
kmeans = KMeans(n_clusters = ks, random_state=0).fit(chi_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 3, 0, 0, 2, 1, 4, 0, 4, 3])

In [35]:
#Create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhoods

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster_Label 2', kmeans.labels_)
chi_merged = df_chi
# match/merge SE London data with latitude/longitude for each neighborhood
chi_merged_latlong = chi_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')
chi_merged_latlong.head(5)

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Label 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
109,Lake View East,41.885757,-87.624328,2,1,Hotel,Theater,New American Restaurant,Park,Italian Restaurant,Coffee Shop,Steakhouse,Grocery Store,Donut Shop,Mediterranean Restaurant
195,Sheridan Park,41.87015,-87.65425,2,0,Italian Restaurant,Coffee Shop,Grocery Store,Hot Dog Joint,Sandwich Place,Park,Mexican Restaurant,Yoga Studio,Greek Restaurant,Bar
198,Smith Park,41.89241,-87.69167,2,3,New American Restaurant,Coffee Shop,Restaurant,Breakfast Spot,Brewery,Deli / Bodega,Flower Shop,Dive Bar,Pub,Southern / Soul Food Restaurant
127,Marshall Square,41.851993,-87.699064,2,4,Mexican Restaurant,Sandwich Place,Italian Restaurant,Taco Place,Bank,Gas Station,Fast Food Restaurant,Donut Shop,Mobile Phone Shop,Supermarket
124,Magnificent Mile,41.90058,-87.62421,2,1,Hotel,American Restaurant,Italian Restaurant,Coffee Shop,Café,Grocery Store,Pizza Place,New American Restaurant,Gym,Salon / Barbershop


<i> Cluster Labels is from the original k-Means clustering to obtain city center neighborhoods only.</i>

<i> Cluster_Label 2 is the cluster label for the current k-Means clustering that was just performed.</i>

<b>Create map of Chicago for inner city neighborhoods (Cluster 2).</b>

In [36]:
# create map
map_clusters2 = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(ks)
ys = [i + x + (i*x)**2 for i in range(ks)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chi_merged_latlong['Latitude'], chi_merged_latlong['Longitude'], chi_merged_latlong['Neighborhood'], chi_merged_latlong['Cluster_Label 2']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters2)
display(map_clusters2)

<h3><b>9. Analyze Individual Clusters Created</b></h3>

In [37]:
# Cluster 0
cluster0=chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 0]
print(cluster0.shape)
chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 0]

(16, 15)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Label 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
195,Sheridan Park,41.87015,-87.65425,2,0,Italian Restaurant,Coffee Shop,Grocery Store,Hot Dog Joint,Sandwich Place,Park,Mexican Restaurant,Yoga Studio,Greek Restaurant,Bar
243,Wrightwood,41.928974,-87.656224,2,0,Coffee Shop,Pizza Place,Theater,Burger Joint,Park,Tapas Restaurant,Bakery,Donut Shop,Bar,Sushi Restaurant
117,Little Italy,41.86938,-87.66065,2,0,Mexican Restaurant,Italian Restaurant,Park,Sandwich Place,Coffee Shop,Hot Dog Joint,Yoga Studio,Bar,Grocery Store,Restaurant
114,Lincoln Park,41.92184,-87.64744,2,0,Pizza Place,Coffee Shop,Italian Restaurant,Bakery,Park,Music Venue,Gym / Fitness Center,Zoo Exhibit,Donut Shop,Café
244,Wrightwood Neighbors,41.928974,-87.656224,2,0,Coffee Shop,Pizza Place,Theater,Burger Joint,Park,Tapas Restaurant,Bakery,Donut Shop,Bar,Sushi Restaurant
194,Sheffield Neighbors,41.921673,-87.653488,2,0,Pizza Place,Italian Restaurant,Coffee Shop,Grocery Store,Music Venue,Whisky Bar,Gym,Gym / Fitness Center,Bakery,Cosmetics Shop
168,Polish Village,41.861917,-87.647798,2,0,Mexican Restaurant,Sandwich Place,Italian Restaurant,Grocery Store,Park,Hot Dog Joint,Coffee Shop,Bar,Bakery,Asian Restaurant
231,West Lakeview,41.926541,-87.639149,2,0,Pizza Place,Zoo Exhibit,Coffee Shop,Italian Restaurant,Bakery,Grocery Store,Mexican Restaurant,Music Venue,Park,Seafood Restaurant
144,North Halsted,41.885386,-87.647449,2,0,Italian Restaurant,New American Restaurant,Coffee Shop,Gym,Pizza Place,Restaurant,Deli / Bodega,Sushi Restaurant,Park,Grocery Store
138,Near West Side,41.87301,-87.67613,2,0,Mexican Restaurant,Sandwich Place,Italian Restaurant,Park,Coffee Shop,Café,Bakery,Brewery,Diner,Bar


In [38]:
# Cluster 1
cluster1 = chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 1]
print(cluster1.shape)
chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 1]

(19, 15)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Label 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
109,Lake View East,41.885757,-87.624328,2,1,Hotel,Theater,New American Restaurant,Park,Italian Restaurant,Coffee Shop,Steakhouse,Grocery Store,Donut Shop,Mediterranean Restaurant
124,Magnificent Mile,41.90058,-87.62421,2,1,Hotel,American Restaurant,Italian Restaurant,Coffee Shop,Café,Grocery Store,Pizza Place,New American Restaurant,Gym,Salon / Barbershop
206,South Loop,41.87454,-87.62884,2,1,Theater,Park,Hotel,Yoga Studio,Coffee Shop,Pizza Place,Music Venue,Grocery Store,Hot Dog Joint,Liquor Store
121,The Loop,41.87834,-87.61997,2,1,Hotel,Theater,Park,Music Venue,Fountain,Coffee Shop,Pizza Place,Steakhouse,Diner,Middle Eastern Restaurant
210,Streeterville,41.89843,-87.62141,2,1,Hotel,American Restaurant,Italian Restaurant,Coffee Shop,Seafood Restaurant,New American Restaurant,Café,Pizza Place,Salon / Barbershop,Grocery Store
211,Talley's Corner,41.88425,-87.63245,2,1,Hotel,Theater,Italian Restaurant,Coffee Shop,New American Restaurant,Park,Mediterranean Restaurant,Gym,Salad Place,Cocktail Bar
226,West DePaul,41.87942,-87.63313,2,1,Hotel,Theater,Park,Coffee Shop,Steakhouse,Italian Restaurant,Grocery Store,Gym,Cocktail Bar,Restaurant
149,Nortown,41.88425,-87.63245,2,1,Hotel,Theater,Italian Restaurant,Coffee Shop,New American Restaurant,Park,Mediterranean Restaurant,Gym,Salad Place,Cocktail Bar
173,Printer's Row,41.87369,-87.62965,2,1,Theater,Park,Hotel,Coffee Shop,Yoga Studio,Pizza Place,Music Venue,Breakfast Spot,Gym,Grocery Store
140,New City,41.88425,-87.63245,2,1,Hotel,Theater,Italian Restaurant,Coffee Shop,New American Restaurant,Park,Mediterranean Restaurant,Gym,Salad Place,Cocktail Bar


In [39]:
# Cluster 2
cluster2 =chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 2]
print(cluster2.shape)
chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 2]

(4, 15)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Label 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
170,Prairie Avenue Historic District,41.85642,-87.62088,2,2,Chinese Restaurant,Park,Aquarium,Pizza Place,Burger Joint,Asian Restaurant,Grocery Store,Italian Restaurant,Beach,Dim Sum Restaurant
139,New Chinatown,41.85276,-87.63232,2,2,Chinese Restaurant,Pizza Place,Park,Grocery Store,Yoga Studio,Burger Joint,Mexican Restaurant,Asian Restaurant,Dessert Shop,Thai Restaurant
135,Museum Campus,41.858407,-87.61453,2,2,Aquarium,Park,Pizza Place,Burger Joint,Chinese Restaurant,Planetarium,Grocery Store,Asian Restaurant,History Museum,Beach
37,Chinatown,41.85277,-87.63499,2,2,Chinese Restaurant,Pizza Place,Park,Grocery Store,Asian Restaurant,Mexican Restaurant,Yoga Studio,Bar,Thai Restaurant,Dessert Shop


In [40]:
# Cluster 3
cluster3 = chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 3]
print(cluster3.shape)
chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 3]

(14, 15)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Label 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
198,Smith Park,41.89241,-87.69167,2,3,New American Restaurant,Coffee Shop,Restaurant,Breakfast Spot,Brewery,Deli / Bodega,Flower Shop,Dive Bar,Pub,Southern / Soul Food Restaurant
217,The Villa,41.90333,-87.66729,2,3,Bar,Pizza Place,Coffee Shop,Mexican Restaurant,Sushi Restaurant,New American Restaurant,Park,Café,Bakery,Italian Restaurant
240,Wicker Park,41.91036,-87.68296,2,3,Pizza Place,Coffee Shop,Bar,Italian Restaurant,Trail,Salon / Barbershop,Breakfast Spot,Sushi Restaurant,New American Restaurant,Bakery
213,Ukrainian Village,41.89902,-87.68204,2,3,Coffee Shop,Dive Bar,Bakery,Bar,Breakfast Spot,Sushi Restaurant,Café,Cajun / Creole Restaurant,Pub,Caribbean Restaurant
238,West Town,41.89329,-87.65743,2,3,Italian Restaurant,New American Restaurant,Restaurant,Brewery,Deli / Bodega,Bar,Pizza Place,Café,Burger Joint,Coffee Shop
158,Old Town Triangle,41.90539,-87.64179,2,3,Hotel,Gym,Coffee Shop,Italian Restaurant,Café,American Restaurant,Furniture / Home Store,Steakhouse,Grocery Store,New American Restaurant
157,Old Town,41.90539,-87.64179,2,3,Hotel,Gym,Coffee Shop,Italian Restaurant,Café,American Restaurant,Furniture / Home Store,Steakhouse,Grocery Store,New American Restaurant
174,Pulaski Park,41.90637,-87.66392,2,3,Bar,Pizza Place,Mexican Restaurant,Coffee Shop,Brewery,New American Restaurant,Sushi Restaurant,Park,Gym,Italian Restaurant
141,Noble Square,41.9034,-87.66451,2,3,Bar,Pizza Place,Mexican Restaurant,Sushi Restaurant,New American Restaurant,Park,Coffee Shop,Café,Clothing Store,Bookstore
68,Fulton River District,41.886612,-87.686698,2,3,New American Restaurant,Coffee Shop,Dive Bar,Restaurant,Flower Shop,Mexican Restaurant,Breakfast Spot,Ukrainian Restaurant,Brewery,Grocery Store


In [41]:
# Cluster 4
cluster4 =chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 4]
print(cluster4.shape)
chi_merged_latlong.loc[chi_merged_latlong['Cluster_Label 2'] == 4]

(7, 15)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Label 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
127,Marshall Square,41.851993,-87.699064,2,4,Mexican Restaurant,Sandwich Place,Italian Restaurant,Taco Place,Bank,Gas Station,Fast Food Restaurant,Donut Shop,Mobile Phone Shop,Supermarket
212,Tri-Taylor,41.869323,-87.686113,2,4,Mexican Restaurant,Sandwich Place,Sports Bar,Stadium,Bar,Outdoor Sculpture,Park,Hockey Arena,Fast Food Restaurant,Pizza Place
166,Pilsen,41.85702,-87.65759,2,4,Mexican Restaurant,Bar,Sandwich Place,Italian Restaurant,Bakery,Breakfast Spot,Park,Art Gallery,Dessert Shop,Diner
128,Marynook,41.846188,-87.653358,2,4,Mexican Restaurant,Art Gallery,Bar,Pizza Place,Chinese Restaurant,Bakery,Park,Grocery Store,Latin American Restaurant,Coffee Shop
122,Lower West Side,41.85224,-87.67115,2,4,Mexican Restaurant,Bar,Italian Restaurant,Park,Bakery,Diner,Coffee Shop,Grocery Store,Brewery,Taco Place
53,East Pilsen,41.85702,-87.65759,2,4,Mexican Restaurant,Bar,Sandwich Place,Italian Restaurant,Bakery,Breakfast Spot,Park,Art Gallery,Dessert Shop,Diner
47,Douglas Park,41.86315,-87.69689,2,4,Mexican Restaurant,Park,Fast Food Restaurant,Sandwich Place,Bank,Fried Chicken Joint,Train Station,Coffee Shop,Pizza Place,Gas Station


<h3>10. Observations:</h3>
<ul>
    <li>Most Restaurant Options: Cluster 0 </li>
    <li>Most Hotels: Cluster 1</li>
    <li>Parks: Cluster 2</li>
    <li>Entertainment in Area: Cluster 3</li>
    <li>Most Bars: Cluster 4</li>
</ul>

<b>Conclusion</b>

Cluster 3 has the most diversity in venue options.
<i>Please see pdf presentation report for more details.</i>