# Chicago Neighborhood Capstone Project

### 1. Introduction

In a recent bit of good news, a friend was promoted within her company as a director.  The exciting new position requires her to relocate to Chicago, IL within the next few months.  She is ecstatic at the opportunity since she has worked hard to move up in the company but also fortunate to be near family.

As a single mother of a 4-year-old, her priority is to find a neighborhood that would be suitable for her and her daughter.  She aims to find a great elementary school as she will start kindergarten in the next couple of years along with activities and events that they both can enjoy.

### 2. Data

The data that will be needed to answer her question will require us to map out the neighborhoods that make up the city.  Once that has been established, we will find information on the top schools and the resources available in those neighborhoods.  After scouring the web for information we have found the following to solve our question.

<ol type="a">
    <li><b>Chicago Neighborhoods:</b> The JSON file will create polygons of each neighborhood on top of the city map.(<a href="https://github.com/jkgiesler/parse-chicago-neighborhoods" target="_blank">https://github.com/jkgiesler/parse-chicago-neighborhoods</a>)</li>
    <li><b>School Ranking:</b> A list of the top 50 schools in Chicago.  The rankings are based on criteria such as Standard Test Score, Student-to-Teacher Ratio, etc. (<a href="https://www.realgroupre.com/blog/50-top-chicago-neighborhood-elementary-schools.html"  target="_blank">https://www.realgroupre.com/blog/50-top-chicago-neighborhood-elementary-schools.html</a>)</li>
    <li><b>Resources:</b> An API provided by Foursquare will allow us to pull information for the various options that is made available for families such as parks, youth centers, etc. (<a href="https://foursquare.com/"  target="_blank">https://foursquare.com/</a>)</li>
    </ol>

In [73]:
import pandas as pd
import requests
import numpy as np # library to handle data in a vectorized manner

#Web-Scraping the Wikipedia page
from bs4 import BeautifulSoup 

#Import the CSV file
import csv

# library to handle JSON files
import json 

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium

In [2]:
with open('Neighborhoods_2012_polygons.json') as json_data:
    chicago_data = json.load(json_data)

In [45]:
chicago_data

{'type': 'FeatureCollection',
 'crs': {'type': 'name',
  'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}},
 'features': [{'type': 'Feature',
   'properties': {'PRI_NEIGH': 'Grand Boulevard',
    'SEC_NEIGH': 'BRONZEVILLE',
    'SHAPE_AREA': 48492503.1554,
    'SHAPE_LEN': 28196.837157,
    'style': {},
    'highlight': {}},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[-87.60670812560363, 41.81681377137387],
      [-87.60670480953505, 41.81657908583579],
      [-87.60670022648407, 41.816338713552454],
      [-87.60669581538588, 41.816099357727296],
      [-87.60668982110376, 41.8158118024656],
      [-87.60668357216157, 41.81556631526606],
      [-87.60667660553894, 41.815299912163404],
      [-87.6066796364493, 41.814994168113515],
      [-87.60668235893172, 41.81471953500853],
      [-87.60667153481008, 41.8142816453241],
      [-87.60666414094068, 41.81399460252956],
      [-87.60665643548599, 41.81366052091469],
      [-87.6066508943903, 41.81342058153228],
     

In [4]:
m = folium.Map(
    location=[41.881832, -87.623177],
    tiles="cartodbpositron",
    zoom_start=10,
)

folium.GeoJson(chicago_data, name="geojson").add_to(m)
m

In [5]:
schools = pd.read_csv("Top 50 Chicago Elementary Schools.csv")
schools.head(5)

Unnamed: 0,Rank,School,Type,Grades,Address,City,Zip,County,District,Is Charter,Is Magnet,Is Virtual,Is Title I,Phone
0,1,Keller Elementary Gifted Magnet School,"Public, Magnet",1-8,3020 W 108th St,Chicago,60655,Cook County,City of Chicago School District 299,No,Yes,No,No,(773) 535-2636
1,2,Decatur Classical Elementary School,"Public, Magnet",K-6,7030 N Sacramento Ave,Chicago,60645,Cook County,City of Chicago School District 299,No,Yes,No,No,(773) 534-2201
2,3,Skinner North Elementary School,Public,K-8,640 W Scott St,Chicago,60610,Cook County,City of Chicago School District 299,No,No,No,No,(773) 534-8500
3,4,Edison Elementary Regional Gifted Center,"Public, Magnet",K-8,4929 N Sawyer Ave,Chicago,60625,Cook County,City of Chicago School District 299,No,Yes,No,No,(773) 534-0540
4,7,Lenart Elementary Regional Gifted Center,"Public, Magnet","PK, KG-8",8101 S LA Salle St,Chicago,60620,Cook County,City of Chicago School District 299,No,Yes,No,No,(773) 535-0040


In [6]:
schools['Zip'] = schools['Zip'].astype(str)
schools["address_merge"] = schools[['Address','Zip']].apply(lambda x: ' '.join(x),axis=1)

In [7]:
geolocator = Nominatim(user_agent="foursquare_agent")

schools['loc'] = schools['address_merge'].apply(geolocator.geocode)
schools['point'] = schools['loc'].apply(lambda loc: tuple(loc.point) if loc else None)
schools[['lat','long','altitude']] = pd.DataFrame(schools['point'].to_list(),index=schools.index)

In [8]:
blank_lat = schools['lat'].isna().sum()
blank_long = schools['long'].isna().sum()

print('There are ' +str(blank_lat) + ' with Null values under the latitude column and ' + str(blank_long) + ' under the longitude column.')

There are 1 with Null values under the latitude column and 1 under the longitude column.


In [9]:
schools.dropna(subset=['lat'], axis=0, inplace=True)
schools.dropna(subset=['long'], axis=0, inplace=True)

schools.reset_index(drop=True, inplace=True)
schools.tail(5)

Unnamed: 0,Rank,School,Type,Grades,Address,City,Zip,County,District,Is Charter,Is Magnet,Is Virtual,Is Title I,Phone,address_merge,loc,point,lat,long,altitude
44,482,Garvy J Elementary School,Public,K-8,5225 N Oak Park Ave,Chicago,60656,Cook County,City of Chicago School District 299,No,No,No,No,(773) 534-1185,5225 N Oak Park Ave 60656,"(5225, North Oak Park Avenue, Norwood Park, Ch...","(41.9762252, -87.79553229995909, 0.0)",41.976225,-87.795532,0.0
45,486,LaSalle Elementary Language Academy,"Public, Magnet",K-8,1734 N Orleans St,Chicago,60614,Cook County,City of Chicago School District 299,No,Yes,No,No,(773) 534-8470,1734 N Orleans St 60614,"(1734-1742, North Orleans Street, Belgravia Te...","(41.9137744, -87.63805298402289, 0.0)",41.913774,-87.638053,0.0
46,497,Edison Park Elementary School,"Public, Alternative",K-12,6220 N Olcott Ave,Chicago,60631,Cook County,City of Chicago School District 299,No,No,No,No,(773) 534-0960,6220 N Olcott Ave 60631,"(6220, North Olcott Avenue, Edison Park, Chica...","(41.99397795, -87.81413508689438, 0.0)",41.993978,-87.814135,0.0
47,505,Jamieson Elementary School,Public,"PK, KG-8",5650 N Mozart St,Chicago,60659,Cook County,City of Chicago School District 299,No,No,No,Yes,(773) 534-2395,5650 N Mozart St 60659,"(5650, North Mozart Street, West Ridge, Chicag...","(41.98433945, -87.7009928908049, 0.0)",41.984339,-87.700993,0.0
48,506,Mitchell Elementary School,Public,"PK, KG-8",2233 W Ohio St,Chicago,60612,Cook County,City of Chicago School District 299,No,No,No,Yes,(773) 534-7655,2233 W Ohio St 60612,"(2233, West Ohio Street, Ukrainian Village, We...","(41.89194725, -87.68353612279662, 0.0)",41.891947,-87.683536,0.0


In [10]:
school_points = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lt, lng, in zip(schools.lat, schools.long):
    school_points.add_child(
        folium.features.CircleMarker(
            [lt, lng],
            radius=3, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
m.add_child(school_points)

In [48]:
import gps_to_neighborhood

all_neighborhoods = gps_to_neighborhood.get_all_neighborhoods()

hoods = []
for lng, lt in zip(schools['long'], schools['lat']):
    hood = gps_to_neighborhood.find_neighborhood(lng,lt,all_neighborhoods)
    hoods.append(hood)
schools["Neighborhood"] = hoods

schools["Neighborhood"].value_counts()

Lake View                4
Norwood Park             4
West Ridge               3
Sauganash,Forest Glen    3
Lincoln Park             3
North Center             2
Mount Greenwood          2
Little Italy, UIC        2
Bridgeport               2
Chatham                  2
Lincoln Square           1
Pullman                  1
North Park               1
Irving Park              1
Jefferson Park           1
Lower West Side          1
Chinatown                1
Old Town                 1
Calumet Heights          1
Armour Square            1
Bucktown                 1
West Loop                1
Dunning                  1
River North              1
Sheffield & DePaul       1
Uptown                   1
West Town                1
Garfield Ridge           1
Albany Park              1
Edison Park              1
East Village             1
Name: Neighborhood, dtype: int64

In [50]:
top_schools = schools[['Neighborhood','long','lat']]

chicago_hoods = top_schools.groupby(['Neighborhood'],as_index=False).mean()
chicago_hoods


Unnamed: 0,Neighborhood,long,lat
0,Albany Park,-87.709385,41.971108
1,Armour Square,-87.635136,41.843465
2,Bridgeport,-87.641031,41.841721
3,Bucktown,-87.682936,41.918861
4,Calumet Heights,-87.583301,41.726501
5,Chatham,-87.623221,41.74055
6,Chinatown,-87.633903,41.849788
7,Dunning,-87.829735,41.947928
8,East Village,-87.673968,41.902537
9,Edison Park,-87.810821,42.004637


In [49]:
CLIENT_ID = 'SUDTZ1502OTF2NLNZBZZOEFXYYYBCSQ1JQQVQWRGPDTQDKS4' # your Foursquare ID
CLIENT_SECRET = 'XFQRWAZSSGQBPWC32QAYR0H3CREFEMZWIIE5YYOC1TNKCY43' # your Foursquare Secret
ACCESS_TOKEN = 'B5KLKFWPF113SRZCO0BU3SFZ15YGXMTXPQLWHLT4UHXBPYLY' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30

In [51]:
chicago_hoods.loc[0, 'Neighborhood']

'Albany Park'

In [52]:
neighborhood_latitude = chicago_hoods.loc[0, 'lat'] # neighborhood latitude value
neighborhood_longitude = chicago_hoods.loc[0, 'long'] # neighborhood longitude value

neighborhood_name = chicago_hoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Albany Park are 41.971107599999996, -87.70938526816113.


In [53]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=SUDTZ1502OTF2NLNZBZZOEFXYYYBCSQ1JQQVQWRGPDTQDKS4&client_secret=XFQRWAZSSGQBPWC32QAYR0H3CREFEMZWIIE5YYOC1TNKCY43&v=20180604&ll=41.971107599999996,-87.70938526816113&radius=500&limit=100'

In [54]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '606e872e75375111a10e727e'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Albany Park',
  'headerFullLocation': 'Albany Park, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 37,
  'suggestedBounds': {'ne': {'lat': 41.9756076045, 'lng': -87.70334396311326},
   'sw': {'lat': 41.966607595499994, 'lng': -87.715426573209}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c1ea95b920076b005bec3e9',
       'name': 'Jaafer Sweets',
       'location': {'address': '4825 N Kedzie Ave',
        'crossStreet': 'Lawrence',
        'lat': 41.969691,
        'lng': -87.708085,
        'labeled

In [55]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [56]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Jaafer Sweets,Bakery,41.969691,-87.708085
1,Great Sea Chinese Restaurant,Chinese Restaurant,41.968496,-87.710678
2,Brazilian Bowl,Brazilian Restaurant,41.968537,-87.708558
3,Lindo Michoacan,Grocery Store,41.968864,-87.708453
4,la Michoacana Premium,Ice Cream Shop,41.968559,-87.70651


In [57]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

37 venues were returned by Foursquare.


In [58]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [59]:
# type your answer here
chicago_venues = getNearbyVenues(names=chicago_hoods['Neighborhood'],
                                   latitudes=chicago_hoods['lat'],
                                   longitudes=chicago_hoods['long']
                                  )


Albany Park
Armour Square
Bridgeport
Bucktown
Calumet Heights
Chatham
Chinatown
Dunning
East Village
Edison Park
Garfield Ridge
Irving Park
Jefferson Park
Lake View
Lincoln Park
Lincoln Square
Little Italy, UIC
Lower West Side
Mount Greenwood
North Center
North Park
Norwood Park
Old Town
Pullman
River North
Sauganash,Forest Glen
Sheffield & DePaul
Uptown
West Loop
West Ridge
West Town


In [61]:
print(chicago_venues.shape)
chicago_venues.head()

(984, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albany Park,41.971108,-87.709385,Jaafer Sweets,41.969691,-87.708085,Bakery
1,Albany Park,41.971108,-87.709385,Great Sea Chinese Restaurant,41.968496,-87.710678,Chinese Restaurant
2,Albany Park,41.971108,-87.709385,Brazilian Bowl,41.968537,-87.708558,Brazilian Restaurant
3,Albany Park,41.971108,-87.709385,Lindo Michoacan,41.968864,-87.708453,Grocery Store
4,Albany Park,41.971108,-87.709385,la Michoacana Premium,41.968559,-87.70651,Ice Cream Shop


In [62]:
chicago_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albany Park,37,37,37,37,37,37
Armour Square,10,10,10,10,10,10
Bridgeport,14,14,14,14,14,14
Bucktown,68,68,68,68,68,68
Calumet Heights,7,7,7,7,7,7
Chatham,11,11,11,11,11,11
Chinatown,44,44,44,44,44,44
Dunning,5,5,5,5,5,5
East Village,72,72,72,72,72,72
Edison Park,11,11,11,11,11,11


In [63]:
print('There are {} uniques categories.'.format(len(chicago_venues['Venue Category'].unique())))

There are 217 uniques categories.


In [65]:
# one hot encoding
chicago_onehot = pd.get_dummies(chicago_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chicago_onehot['Neighborhood'] = chicago_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chicago_onehot.columns[-1]] + list(chicago_onehot.columns[:-1])
chicago_onehot = chicago_onehot[fixed_columns]

chicago_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
979,West Town,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
980,West Town,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
981,West Town,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
982,West Town,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
983,West Town,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [66]:
chicago_onehot.shape

(984, 218)

In [67]:
chicago_grouped = chicago_onehot.groupby('Neighborhood').mean().reset_index()
chicago_grouped

Unnamed: 0,Neighborhood,Adult Boutique,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Albany Park,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,...,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Armour Square,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bridgeport,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bucktown,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,...,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0
4,Calumet Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0
5,Chatham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Chinatown,0.0,0.0,0.0,0.0,0.0,0.0,0.068182,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dunning,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.013889,0.027778,0.0,0.0,0.013889,0.0,0.0,0.013889,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.041667
9,Edison Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [69]:
chicago_grouped.shape

(31, 218)

In [70]:
num_top_venues = 5

for hood in chicago_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = chicago_grouped[chicago_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albany Park----
                venue  freq
0      Ice Cream Shop  0.05
1          Donut Shop  0.05
2  Mexican Restaurant  0.05
3         Coffee Shop  0.05
4       Grocery Store  0.05


----Armour Square----
                venue  freq
0  Chinese Restaurant   0.3
1       Grocery Store   0.2
2   Mobile Phone Shop   0.1
3    Storage Facility   0.1
4         Pizza Place   0.1


----Bridgeport----
                venue  freq
0  Chinese Restaurant  0.14
1         Pizza Place  0.14
2              Bakery  0.07
3         Video Store  0.07
4  Mexican Restaurant  0.07


----Bucktown----
             venue  freq
0              Bar  0.15
1      Coffee Shop  0.06
2      Pizza Place  0.04
3         Dive Bar  0.03
4  Thai Restaurant  0.03


----Calumet Heights----
                 venue  freq
0  Rental Car Location  0.29
1        Women's Store  0.14
2          Wings Joint  0.14
3      Supplement Shop  0.14
4               Museum  0.14


----Chatham----
                  venue  freq
0           Ga

In [71]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [74]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = chicago_grouped['Neighborhood']

for ind in np.arange(chicago_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chicago_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Ice Cream Shop,Donut Shop,Mexican Restaurant,Coffee Shop,Grocery Store,Outlet Store,Thrift / Vintage Store,Hot Dog Joint,Business Service,Chinese Restaurant
1,Armour Square,Chinese Restaurant,Grocery Store,Mobile Phone Shop,Storage Facility,Pizza Place,Mexican Restaurant,Cosmetics Shop,Pet Service,Outdoors & Recreation,Outlet Store
2,Bridgeport,Chinese Restaurant,Pizza Place,Bakery,Video Store,Mexican Restaurant,Ice Cream Shop,Tanning Salon,Dessert Shop,Smoke Shop,Italian Restaurant
3,Bucktown,Bar,Coffee Shop,Pizza Place,Dive Bar,Thai Restaurant,Park,Convenience Store,Cuban Restaurant,Dance Studio,Hot Dog Joint
4,Calumet Heights,Rental Car Location,Women's Store,Wings Joint,Supplement Shop,Museum,Deli / Bodega,Adult Boutique,Pakistani Restaurant,Park,Paper / Office Supplies Store


In [75]:
# set number of clusters
kclusters = 5

chicago_grouped_clustering = chicago_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chicago_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype = np.float
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  max_iter=max_iter, verbose=ver

array([3, 0, 0, 3, 3, 4, 0, 4, 3, 3])

In [79]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

chicago_merged = chicago_hoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
chicago_merged = chicago_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

chicago_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,long,lat,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,-87.709385,41.971108,3,Ice Cream Shop,Donut Shop,Mexican Restaurant,Coffee Shop,Grocery Store,Outlet Store,Thrift / Vintage Store,Hot Dog Joint,Business Service,Chinese Restaurant
1,Armour Square,-87.635136,41.843465,0,Chinese Restaurant,Grocery Store,Mobile Phone Shop,Storage Facility,Pizza Place,Mexican Restaurant,Cosmetics Shop,Pet Service,Outdoors & Recreation,Outlet Store
2,Bridgeport,-87.641031,41.841721,0,Chinese Restaurant,Pizza Place,Bakery,Video Store,Mexican Restaurant,Ice Cream Shop,Tanning Salon,Dessert Shop,Smoke Shop,Italian Restaurant
3,Bucktown,-87.682936,41.918861,3,Bar,Coffee Shop,Pizza Place,Dive Bar,Thai Restaurant,Park,Convenience Store,Cuban Restaurant,Dance Studio,Hot Dog Joint
4,Calumet Heights,-87.583301,41.726501,3,Rental Car Location,Women's Store,Wings Joint,Supplement Shop,Museum,Deli / Bodega,Adult Boutique,Pakistani Restaurant,Park,Paper / Office Supplies Store


In [82]:
# create map
map_clusters = folium.Map(location=[41.881832, -87.623177], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chicago_merged['lat'], chicago_merged['long'], chicago_merged['Neighborhood'], chicago_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [86]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 0, chicago_merged.columns[[0] + list(range(4, chicago_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Armour Square,Chinese Restaurant,Grocery Store,Mobile Phone Shop,Storage Facility,Pizza Place,Mexican Restaurant,Cosmetics Shop,Pet Service,Outdoors & Recreation,Outlet Store
2,Bridgeport,Chinese Restaurant,Pizza Place,Bakery,Video Store,Mexican Restaurant,Ice Cream Shop,Tanning Salon,Dessert Shop,Smoke Shop,Italian Restaurant
6,Chinatown,Chinese Restaurant,Asian Restaurant,Dessert Shop,Dim Sum Restaurant,Korean Restaurant,Bakery,Tea Room,Bubble Tea Shop,Ramen Restaurant,Flower Shop
12,Jefferson Park,Train Station,Filipino Restaurant,Donut Shop,Chinese Restaurant,Furniture / Home Store,Concert Hall,Currency Exchange,Convenience Store,Video Store,Ice Cream Shop


In [87]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 1, chicago_merged.columns[[0] + list(range(4, chicago_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,West Ridge,Indian Restaurant,Grocery Store,Pakistani Restaurant,Korean Restaurant,Boutique,Bookstore,Middle Eastern Restaurant,Fast Food Restaurant,Hookah Bar,Pharmacy


In [88]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 2, chicago_merged.columns[[0] + list(range(4, chicago_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Pullman,History Museum,Adult Boutique,Persian Restaurant,Other Great Outdoors,Other Repair Shop,Outdoor Supply Store,Outdoors & Recreation,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store


In [89]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 3, chicago_merged.columns[[0] + list(range(4, chicago_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Ice Cream Shop,Donut Shop,Mexican Restaurant,Coffee Shop,Grocery Store,Outlet Store,Thrift / Vintage Store,Hot Dog Joint,Business Service,Chinese Restaurant
3,Bucktown,Bar,Coffee Shop,Pizza Place,Dive Bar,Thai Restaurant,Park,Convenience Store,Cuban Restaurant,Dance Studio,Hot Dog Joint
4,Calumet Heights,Rental Car Location,Women's Store,Wings Joint,Supplement Shop,Museum,Deli / Bodega,Adult Boutique,Pakistani Restaurant,Park,Paper / Office Supplies Store
8,East Village,Bar,Café,Yoga Studio,Sushi Restaurant,Pizza Place,Sandwich Place,Spa,Cocktail Bar,Italian Restaurant,Argentinian Restaurant
9,Edison Park,Theater,Liquor Store,Mexican Restaurant,Hot Dog Joint,Bar,Seafood Restaurant,Pizza Place,Salon / Barbershop,Italian Restaurant,Pub
10,Garfield Ridge,Bakery,Mexican Restaurant,Construction & Landscaping,Gym / Fitness Center,Park,Bar,Bank,Home Service,Pizza Place,Intersection
11,Irving Park,Bar,Coffee Shop,Diner,Pharmacy,Pizza Place,Brewery,Train Station,Tapas Restaurant,Bookstore,Bike Rental / Bike Share
13,Lake View,Bar,Café,Cosmetics Shop,Boutique,Mexican Restaurant,Clothing Store,Ice Cream Shop,Bank,Coffee Shop,Pizza Place
14,Lincoln Park,Coffee Shop,Music Venue,Park,Theater,Hot Dog Joint,Pizza Place,Currency Exchange,Donut Shop,Fried Chicken Joint,German Restaurant
15,Lincoln Square,Bar,Thai Restaurant,Pub,Indian Restaurant,Pizza Place,Gift Shop,Sushi Restaurant,Gym,Health & Beauty Service,Train Station


In [90]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 4, chicago_merged.columns[[0] + list(range(4, chicago_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Chatham,Gas Station,Café,Fast Food Restaurant,Cosmetics Shop,Park,Sporting Goods Shop,Gym / Fitness Center,Clothing Store,Sandwich Place,Chinese Restaurant
7,Dunning,Sports Bar,Playground,Park,Health & Beauty Service,Fast Food Restaurant,Adult Boutique,Persian Restaurant,Other Repair Shop,Outdoor Supply Store,Outdoors & Recreation
18,Mount Greenwood,Diner,Home Service,Park,Fast Food Restaurant,Other Repair Shop,Outdoor Supply Store,Outdoors & Recreation,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store
21,Norwood Park,Gas Station,Park,Pharmacy,Fast Food Restaurant,Bar,Mexican Restaurant,Chinese Restaurant,Discount Store,Bus Station,Flower Shop
