## Table of Contents
0. Overview
1. Download and Explore Dataset  
2. Explore Neighbourhoods in Chicago  
3. Analyze each Neighbourhood  
4. Cluster neighborhoods  
5. Examine Clusters 
6. Conclusion

### 0. Overview

Consider the situation where I am having to relocate because of my job and I'm moving to Chicago. Chicago has 246 different community areas. I would like to figure out where I would like to move and into what area neighborhood. I am going to base that choice on my main interests. What I would like is a nice selection of Restaurants, Music venues and Pool Halls.

##### Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')
#There are 5 tables on this page; we use match to make sure we get the table we want
table_Chicago = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago', match = 'Community area')

Libraries imported.


Let's take a quick look at the data.

### 1. Download and Explore the Data

In [16]:
df = table_Chicago[0]
df.head()

Unnamed: 0,Neighborhood,Community area
0,Albany Park,Albany Park
1,Altgeld Gardens,Riverdale
2,Andersonville,Edgewater
3,Archer Heights,Archer Heights
4,Armour Square,Armour Square


In [26]:
df.shape

(246, 2)

# Data Cleaning

#### There are 4 neighborhoods whose names are incorrect and need to be replaced

In [35]:
df['Neighborhood'] = df['Neighborhood'].replace(['Legends South (Robert Taylor Homes)'],'Legends South')

In [39]:
df['Neighborhood'] = df['Neighborhood'].replace(['Sheffield Neighbors'],'Sheffield Neighborhood')

In [43]:
df['Neighborhood'] = df['Neighborhood'].replace(['Sheridan Station Corridor'],'Fort Sheridan')

In [46]:
df['Neighborhood'] = df['Neighborhood'].replace(['Wrightwood Neighbors'],'Wrightwood Park')

#### Use geopy library to get the latitude and longitude values of Chicago

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>chicago_explorer</em>, as shown below.

In [61]:
address = 'Chicago,Il'

geolocator = Nominatim(user_agent="chicago_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chicago are 41.8755616, -87.6244212.


#### Next we loop through all the rows and append the latitude and longitude to each Neighborhood. This took considerable time, so I saved the result as a csv file and then read that file when needed.

In [47]:
geolocator = Nominatim(user_agent="chicago_explorer")
list_lat = []   # create empty lists

list_long = []

i = 0
	
for index, row in df.iterrows(): # iterate over rows in dataframe


    i = i + 1
    Neighborhood = row['Neighborhood']
    State = "Il"       
    query = str(Neighborhood)+','+str(State)

    location = geolocator.geocode(query)   
    lat = location.latitude
    long = location.longitude
    print('The geograpical coordinate of  {}, {}.'.format(lat, long), query, i)

    list_lat.append(lat)
    list_long.append(long)

	
# create new columns from lists    

df['lat'] = list_lat   

df['lon'] = list_long

The geograpical coordinate of  41.9719367, -87.7161739. Albany Park,Il 1
The geograpical coordinate of  41.6552585, -87.60958355351951. Altgeld Gardens,Il 2
The geograpical coordinate of  32.1959947, -84.1399085. Andersonville,Il 3
The geograpical coordinate of  41.8114215, -87.7261651. Archer Heights,Il 4
The geograpical coordinate of  41.8400333, -87.633107. Armour Square,Il 5
The geograpical coordinate of  39.0437192, -77.4874899. Ashburn,Il 6
The geograpical coordinate of  39.0492678, -77.49332992444596. Ashburn Estates,Il 7
The geograpical coordinate of  41.74338725, -87.6560415931265. Auburn Gresham,Il 8
The geograpical coordinate of  41.7450346, -87.5886584. Avalon Park,Il 9
The geograpical coordinate of  33.4354989, -112.3495572. Avondale,Il 10
The geograpical coordinate of  -36.887916450000006, 174.68028486408917. Avondale Gardens,Il 11
The geograpical coordinate of  41.8075332, -87.6661628. Back of the Yards,Il 12
The geograpical coordinate of  -33.03678095, 151.6592516170581

In [48]:
df.head()

Unnamed: 0,Neighborhood,Community area,lat,lon
0,Albany Park,Albany Park,41.971937,-87.716174
1,Altgeld Gardens,Riverdale,41.655259,-87.609584
2,Andersonville,Edgewater,32.195995,-84.139909
3,Archer Heights,Archer Heights,41.811422,-87.726165
4,Armour Square,Armour Square,41.840033,-87.633107


In [55]:
chi_data = df.to_csv('chi.csv', index = True)

In [5]:
chi_data= pd.read_csv(r"C:\Users\peter\OneDrive\Documents\IBM Data Science\IBM Data Science Professional Certificate Capstone\chi.csv")
chi_data.head()

Unnamed: 0,Neighborhood,Community area,lat,lon
0,Albany Park,Albany Park,41.971937,-87.716174
1,Altgeld Gardens,Riverdale,41.655259,-87.609584
2,Andersonville,Edgewater,32.195995,-84.139908
3,Archer Heights,Archer Heights,41.811422,-87.726165
4,Armour Square,Armour Square,41.840033,-87.633107


#### Create a map of Chicago with neighborhoods superimposed on top.

In [68]:
# create map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, CA, Neighbourhood in zip(chi_data['lat'], chi_data['lon'], chi_data ['Community area'],chi_data['Neighborhood']):
    label = '{}, {}'.format('Community Area', Neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='lime',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [7]:
CLIENT_ID = '0NZE4JJEZIWDDVBLRV1UNVICM4S5XLYZTFPXQQCKGQHQXNTW' # your Foursquare ID
CLIENT_SECRET = 'JM5B4Y52UWIDARSGMTDMX2GDEA02H1JKCN3EMSRKMGJB5YMI' # your Foursquare Secret
ACCESS_TOKEN = 'KPOTUFLJAGKKUMOPQLPQPE1LOLNNGPEZAE5QRTHXCHWAWM40' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0NZE4JJEZIWDDVBLRV1UNVICM4S5XLYZTFPXQQCKGQHQXNTW
CLIENT_SECRET:JM5B4Y52UWIDARSGMTDMX2GDEA02H1JKCN3EMSRKMGJB5YMI


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [71]:
chi_data.loc[0, 'Neighborhood']

'Albany Park'

Get the neighborhood's latitude and longitude values.

In [75]:
neighborhood_latitude = chi_data.loc[0, 'lat'] # neighborhood latitude value
neighborhood_longitude = chi_data.loc[0, 'lon'] # neighborhood longitude value

neighborhood_name = chi_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Albany Park are 41.9719367, -87.7161739.


#### Now, let's get the top 100 venues (Restaurants, Music venues and Pool Halls) that are in Albany Park within a radius of 500 meters.

In [88]:
LIMIT = 100 # limit of number of venues returned by Foursquare API



radius = 500 # define radius

#categoryId=4bf58dd8d488d11094, 4bf58dd8d1bd941735

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4d4b7105d754a06374d81259,4bf58dd8d48988d1e3931735,4bf58dd8d48988d1e5931735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?categoryId=4d4b7105d754a06374d81259,4bf58dd8d48988d1e3931735,4bf58dd8d48988d1e5931735&client_id=0NZE4JJEZIWDDVBLRV1UNVICM4S5XLYZTFPXQQCKGQHQXNTW&client_secret=JM5B4Y52UWIDARSGMTDMX2GDEA02H1JKCN3EMSRKMGJB5YMI&v=20180604&ll=41.9719367,-87.7161739&radius=500&limit=100'

Send the GET request and examine the results

In [86]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fcb9b6ce02006393a011484'},
 'response': {'headerLocation': 'Albany Park',
  'headerFullLocation': 'Albany Park, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'query': 'food',
  'totalResults': 11,
  'suggestedBounds': {'ne': {'lat': 41.9764367045, 'lng': -87.71013251631616},
   'sw': {'lat': 41.9674366955, 'lng': -87.72221528368384}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c6ecc3d06ed6dcbaecca722',
       'name': 'Peking Mandarin Resturant',
       'location': {'address': '3459 W Lawrence Ave',
        'crossStreet': 'at St Louis Ave',
        'lat': 41.96829199927995,
        'lng': -87.71578270440223,
        'labeledLatLngs': [{'label': 'display',
          'lat': 41.96829199927995,
          'lng': -87.7157

We know that all the information is in the _items_ key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [89]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [92]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Peking Mandarin Resturant,Chinese Restaurant,41.968292,-87.715783
1,Popeyes Louisiana Kitchen,Fried Chicken Joint,41.968756,-87.713019
2,Banpojung,Korean Restaurant,41.975707,-87.715609
3,Subway,Sandwich Place,41.968748,-87.712861
4,Dunkin',Donut Shop,41.968255,-87.712964


And how many venues were returned by Foursquare?

In [93]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

11 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Chicago

#### Let's create a function to repeat the same process to all the neighborhoods in Chicago

In [103]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4d4b7105d754a06374d81259,4bf58dd8d48988d1e3931735,4bf58dd8d48988d1e5931735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [104]:
chi_venues = getNearbyVenues(names=chi_data['Neighborhood'],
                                   latitudes=chi_data['lat'],
                                   longitudes=chi_data['lon']
                                  )

Albany Park
Altgeld Gardens
Andersonville
Archer Heights
Armour Square
Ashburn
Ashburn Estates
Auburn Gresham
Avalon Park
Avondale
Avondale Gardens
Back of the Yards
Belmont Central
Belmont Gardens
Belmont Heights
Belmont Terrace
Beverly
Beverly View
Beverly Woods
Big Oaks
Boystown
Bowmanville
Brainerd
Brickyard
Bridgeport
Brighton Park
Bronzeville
Bucktown
Budlong Woods
Buena Park
Burnside
Cabrini–Green
Calumet Heights
Canaryville
Central Station
Chatham
Chicago Lawn
Chinatown
Chrysler Village
Clarendon Park
Clearing East
Clearing West
Cottage Grove Heights
Cragin
Crestline
Dearborn Homes
Dearborn Park
Douglas Park
Dunning
East Beverly
East Chatham
East Garfield Park
East Hyde Park
East Pilsen
East Side
East Village
Eden Green
Edgebrook
Edgewater
Edgewater Beach
Edgewater Glen
Edison Park
Englewood
Fernwood
Fifth City
Ford City
Forest Glen
Fuller Park
Fulton River District
Gage Park
Galewood
The Gap
Garfield Ridge
Gladstone Park
Gold Coast
Golden Gate
Goose Island
Graceland West
Grand

#### Let's check the size of the resulting dataframe

In [105]:
print(chi_venues.shape)
chi_venues.head()

(3255, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albany Park,41.971937,-87.716174,Peking Mandarin Resturant,41.968292,-87.715783,Chinese Restaurant
1,Albany Park,41.971937,-87.716174,Popeyes Louisiana Kitchen,41.968756,-87.713019,Fried Chicken Joint
2,Albany Park,41.971937,-87.716174,Banpojung,41.975707,-87.715609,Korean Restaurant
3,Albany Park,41.971937,-87.716174,Subway,41.968748,-87.712861,Sandwich Place
4,Albany Park,41.971937,-87.716174,Dunkin',41.968255,-87.712964,Donut Shop


Let's check how many venues were returned for each neighborhood

In [106]:
chi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albany Park,11,11,11,11,11,11
Altgeld Gardens,2,2,2,2,2,2
Andersonville,2,2,2,2,2,2
Archer Heights,13,13,13,13,13,13
Armour Square,16,16,16,16,16,16
Ashburn,5,5,5,5,5,5
Auburn Gresham,1,1,1,1,1,1
Avalon Park,11,11,11,11,11,11
Avondale,6,6,6,6,6,6
Avondale Gardens,4,4,4,4,4,4


Let's find out how many unique categories can be curated from all the returned venues

In [107]:
print('There are {} uniques categories.'.format(len(chi_venues['Venue Category'].unique())))

There are 121 uniques categories.


## 3. Analyze Each Neighborhood

To analyze the data we use One hot encoding to transform Categorical Data into Numerical Data for Machine Learning algorithms. 

In [108]:
# one hot encoding
chi_onehot = pd.get_dummies(chi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chi_onehot['Neighborhood'] = chi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chi_onehot.columns[-1]] + list(chi_onehot.columns[:-1])
chi_onehot = chi_onehot[fixed_columns]

chi_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cha Chaan Teng,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hot Dog Joint,Hotpot Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Peking Duck Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Steakhouse,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Albany Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Albany Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Albany Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Albany Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Albany Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [109]:
chi_onehot.shape

(3255, 122)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [110]:
chi_grouped = chi_onehot.groupby('Neighborhood').mean().reset_index()
chi_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cha Chaan Teng,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hot Dog Joint,Hotpot Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Peking Duck Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Poke Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Steakhouse,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Albany Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altgeld Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Andersonville,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Archer Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.384615,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923
4,Armour Square,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Ashburn,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Auburn Gresham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Avalon Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Avondale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Avondale Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [111]:
chi_grouped.shape

(192, 122)

#### Let's print each neighborhood along with the top 5 most common venues

In [112]:
num_top_venues = 5

for hood in chi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = chi_grouped[chi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albany Park----
                 venue  freq
0   Mexican Restaurant  0.18
1   Chinese Restaurant  0.18
2       Sandwich Place  0.09
3                Diner  0.09
4  Fried Chicken Joint  0.09


----Altgeld Gardens----
               venue  freq
0               Food   1.0
1  Afghan Restaurant   0.0
2        Salad Place   0.0
3         Restaurant   0.0
4   Ramen Restaurant   0.0


----Andersonville----
                     venue  freq
0      American Restaurant   0.5
1             Burger Joint   0.5
2        Afghan Restaurant   0.0
3  New American Restaurant   0.0
4       Russian Restaurant   0.0


----Archer Heights----
                venue  freq
0  Mexican Restaurant  0.38
1         Wings Joint  0.08
2              Bakery  0.08
3       Hot Dog Joint  0.08
4                Food  0.08


----Armour Square----
                venue  freq
0  Chinese Restaurant  0.25
1              Bakery  0.12
2    Asian Restaurant  0.12
3      Sandwich Place  0.06
4                Café  0.06


----Ashbu

#### Let's put that into a _pandas_ dataframe

First, let's write a function to sort the venues in descending order.

In [113]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [114]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = chi_grouped['Neighborhood']

for ind in np.arange(chi_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Chinese Restaurant,Mexican Restaurant,Taco Place,Donut Shop,Diner,Sandwich Place,Fried Chicken Joint,Korean Restaurant,Latin American Restaurant,Falafel Restaurant
1,Altgeld Gardens,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
2,Andersonville,American Restaurant,Burger Joint,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
3,Archer Heights,Mexican Restaurant,Bakery,Deli / Bodega,Sandwich Place,Chinese Restaurant,Food,Wings Joint,Hot Dog Joint,Italian Restaurant,Gluten-free Restaurant
4,Armour Square,Chinese Restaurant,Asian Restaurant,Bakery,Café,Indian Restaurant,Fast Food Restaurant,Sandwich Place,Food,Hot Dog Joint,Italian Restaurant


## 4. Cluster Neighborhoods

Run _k_-means to cluster the neighborhood into 5 clusters.

In [116]:
# set number of clusters
kclusters = 5

chi_grouped_clustering = chi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 4, 0, 2, 2, 2, 2, 2, 3])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [132]:
#neighborhoods_venues_sorted.drop('Cluster Labels',axis=1,inplace=True)
#chi_merged.head()


In [133]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

chi_merged = chi_data

# merge chi_grouped with chi_data to add latitude/longitude for each neighborhood
chi_merged = chi_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

chi_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Community area,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Albany Park,41.971937,-87.716174,2.0,Chinese Restaurant,Mexican Restaurant,Taco Place,Donut Shop,Diner,Sandwich Place,Fried Chicken Joint,Korean Restaurant,Latin American Restaurant,Falafel Restaurant
1,Altgeld Gardens,Riverdale,41.655259,-87.609584,1.0,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
2,Andersonville,Edgewater,32.195995,-84.139908,4.0,American Restaurant,Burger Joint,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
3,Archer Heights,Archer Heights,41.811422,-87.726165,0.0,Mexican Restaurant,Bakery,Deli / Bodega,Sandwich Place,Chinese Restaurant,Food,Wings Joint,Hot Dog Joint,Italian Restaurant,Gluten-free Restaurant
4,Armour Square,Armour Square,41.840033,-87.633107,2.0,Chinese Restaurant,Asian Restaurant,Bakery,Café,Indian Restaurant,Fast Food Restaurant,Sandwich Place,Food,Hot Dog Joint,Italian Restaurant


In [135]:
print(chi_merged.dtypes)

Neighborhood               object
Community area             object
lat                       float64
lon                       float64
Cluster Labels            float64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object


In [142]:
chi_merged['Cluster Labels'] = np.int32(chi_merged['Cluster Labels'])

chi_merged.dtypes

Neighborhood               object
Community area             object
lat                       float64
lon                       float64
Cluster Labels              int32
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

Finally, let's visualize the resulting clusters

In [158]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chi_merged['lat'], chi_merged['lon'], chi_merged['Neighborhood'], chi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

#### Cluster 1

In [144]:
chi_merged.loc[chi_merged['Cluster Labels'] == 0, chi_merged.columns[[1] + list(range(5, chi_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Archer Heights,Mexican Restaurant,Bakery,Deli / Bodega,Sandwich Place,Chinese Restaurant,Food,Wings Joint,Hot Dog Joint,Italian Restaurant,Gluten-free Restaurant
6,Ashburn,,,,,,,,,,
11,New City,Mexican Restaurant,Pizza Place,Food,Chinese Restaurant,Fast Food Restaurant,Food Court,Fried Chicken Joint,Bakery,Deli / Bodega,Empanada Restaurant
14,Dunning,,,,,,,,,,
18,Morgan Park,,,,,,,,,,
19,Norwood Park,,,,,,,,,,
23,Belmont Cragin,,,,,,,,,,
30,Burnside,,,,,,,,,,
35,Chatham,,,,,,,,,,
38,Clearing,,,,,,,,,,


#### Cluster 2

In [145]:
chi_merged.loc[chi_merged['Cluster Labels'] == 1, chi_merged.columns[[1] + list(range(5, chi_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Riverdale,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
32,Calumet Heights,Food,Deli / Bodega,Indian Chinese Restaurant,Hotpot Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant
78,Grand Boulevard,Food,Breakfast Spot,Pizza Place,Restaurant,Hawaiian Restaurant,Filipino Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
83,Douglas,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
108,Lake View,Food,Diner,Indian Chinese Restaurant,Hotpot Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant
118,South Lawndale,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
136,The Loop,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
146,North Lawndale,Food,BBQ Joint,Diner,Wings Joint,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
156,Norwood Park,Food,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
183,Riverdale,Food,Food Truck,Comfort Food Restaurant,Fondue Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant


#### Cluster 3

In [146]:
chi_merged.loc[chi_merged['Cluster Labels'] == 2, chi_merged.columns[[1] + list(range(5, chi_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Chinese Restaurant,Mexican Restaurant,Taco Place,Donut Shop,Diner,Sandwich Place,Fried Chicken Joint,Korean Restaurant,Latin American Restaurant,Falafel Restaurant
4,Armour Square,Chinese Restaurant,Asian Restaurant,Bakery,Café,Indian Restaurant,Fast Food Restaurant,Sandwich Place,Food,Hot Dog Joint,Italian Restaurant
5,Ashburn,Chinese Restaurant,Italian Restaurant,Deli / Bodega,Food Truck,BBQ Joint,Wings Joint,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
7,Auburn Gresham,Fast Food Restaurant,Wings Joint,Indonesian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
8,Avalon Park,Burger Joint,Pizza Place,Fast Food Restaurant,Fish & Chips Shop,Diner,Food,Sandwich Place,Cajun / Creole Restaurant,Falafel Restaurant,Filipino Restaurant
9,Avondale,Food,Mexican Restaurant,Bakery,Burger Joint,Fish & Chips Shop,Fondue Restaurant,Filipino Restaurant,Fast Food Restaurant,Wings Joint,Food Court
12,Belmont Cragin,Café,Sandwich Place,Asian Restaurant,Noodle House,Fast Food Restaurant,Wings Joint,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant
13,Hermosa,Café,Thai Restaurant,Sandwich Place,Seafood Restaurant,Bakery,Mediterranean Restaurant,Pizza Place,Restaurant,French Restaurant,Dumpling Restaurant
15,Dunning,American Restaurant,Restaurant,Burger Joint,Fried Chicken Joint,Japanese Restaurant,Fast Food Restaurant,Mexican Restaurant,New American Restaurant,Café,Greek Restaurant
16,Beverly,Pizza Place,Donut Shop,Fast Food Restaurant,Hot Dog Joint,Food,Sandwich Place,Poke Place,Breakfast Spot,English Restaurant,Falafel Restaurant


#### Cluster 4

In [147]:
chi_merged.loc[chi_merged['Cluster Labels'] == 3, chi_merged.columns[[1] + list(range(5, chi_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Irving Park,Café,Sushi Restaurant,Fish & Chips Shop,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
24,Bridgeport,Pizza Place,Café,Wings Joint,Fondue Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
73,Jefferson Park,Café,Diner,Wings Joint,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
74,Near North Side,Café,Indian Restaurant,Food Truck,New American Restaurant,Fondue Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant
114,Lincoln Park,Food,Café,Swiss Restaurant,Wings Joint,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
158,Lincoln Park,Café,Wings Joint,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
161,Greater Grand Crossing,Café,American Restaurant,BBQ Joint,Wings Joint,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
171,Douglas,Food,BBQ Joint,Café,Wings Joint,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
186,West Ridge,Café,Wings Joint,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
189,North Center,Café,Restaurant,Food Truck,Cafeteria,Falafel Restaurant,Fondue Restaurant,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Wings Joint


#### Cluster 5

In [148]:
chi_merged.loc[chi_merged['Cluster Labels'] == 4, chi_merged.columns[[1] + list(range(5, chi_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Edgewater,American Restaurant,Burger Joint,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
48,Dunning,Deli / Bodega,American Restaurant,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
104,Kenwood,American Restaurant,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
201,Douglas,Deli / Bodega,American Restaurant,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant
242,Woodlawn,American Restaurant,Wings Joint,Food,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant


## 5. Conclusions

##### Cluster #1 was the biggest, but most neighborhoods had no venues that I was interested in (NAN’s). 
##### Cluster # 3 was the second biggest and had a very nice variety of restaurant venues and food venues.
##### Clusters # 2, # 4 and # 5 were all rather small and did not have much variety in feud venues. 
##### None of the clusters or at least very few of them had music venues or pool halls in the top 10. 
##### Cluster # 3 is my choice for an area to start looking into the Real-estate market to find a place to live!!
##### Though I am disappointed that there are not more Music venues and Pool Halls.


So, I did do a sub search where I eliminated the Food venues and just checked Music venues and pool Halls. Now there was some overlap because many of the Music venues and Pool halls also serve food and they showed up in the food category. But as evidenced in our clusters the Music venues and Pool Halls are so few and far between that I do not feel they would be a good criterion for a unique search. They would just limit the neighborhoods too much. 

In [1]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d1e3931735,4bf58dd8d48988d1e5931735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [8]:
chi_venues = getNearbyVenues(names=chi_data['Neighborhood'],
                                   latitudes=chi_data['lat'],
                                   longitudes=chi_data['lon']
                                  )

Albany Park
Altgeld Gardens
Andersonville
Archer Heights
Armour Square
Ashburn
Ashburn Estates
Auburn Gresham
Avalon Park
Avondale
Avondale Gardens
Back of the Yards
Belmont Central
Belmont Gardens
Belmont Heights
Belmont Terrace
Beverly
Beverly View
Beverly Woods
Big Oaks
Boystown
Bowmanville
Brainerd
Brickyard
Bridgeport
Brighton Park
Bronzeville
Bucktown
Budlong Woods
Buena Park
Burnside
Cabrini–Green
Calumet Heights
Canaryville
Central Station
Chatham
Chicago Lawn
Chinatown
Chrysler Village
Clarendon Park
Clearing East
Clearing West
Cottage Grove Heights
Cragin
Crestline
Dearborn Homes
Dearborn Park
Douglas Park
Dunning
East Beverly
East Chatham
East Garfield Park
East Hyde Park
East Pilsen
East Side
East Village
Eden Green
Edgebrook
Edgewater
Edgewater Beach
Edgewater Glen
Edison Park
Englewood
Fernwood
Fifth City
Ford City
Forest Glen
Fuller Park
Fulton River District
Gage Park
Galewood
The Gap
Garfield Ridge
Gladstone Park
Gold Coast
Golden Gate
Goose Island
Graceland West
Grand

In [9]:
chi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ashburn,1,1,1,1,1,1
Belmont Gardens,2,2,2,2,2,2
Belmont Terrace,13,13,13,13,13,13
Beverly,2,2,2,2,2,2
Bowmanville,1,1,1,1,1,1
Boystown,7,7,7,7,7,7
Brainerd,2,2,2,2,2,2
Bridgeport,1,1,1,1,1,1
Cabrini–Green,2,2,2,2,2,2
Canaryville,1,1,1,1,1,1


In [10]:
print('There are {} uniques categories.'.format(len(chi_venues['Venue Category'].unique())))

There are 37 uniques categories.


In [11]:
# one hot encoding
chi_onehot = pd.get_dummies(chi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chi_onehot['Neighborhood'] = chi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chi_onehot.columns[-1]] + list(chi_onehot.columns[:-1])
chi_onehot = chi_onehot[fixed_columns]

chi_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,BBQ Joint,Bar,Boat or Ferry,Building,Café,Cajun / Creole Restaurant,Coffee Shop,Convention Center,Coworking Space,Dive Bar,Farmers Market,Historic Site,Indian Restaurant,Jazz Club,Jewelry Store,Lounge,Mediterranean Restaurant,Music School,Music Store,Music Venue,Nightclub,Office,Park,Performing Arts Venue,Piano Bar,Pool Hall,Radio Station,Recording Studio,Restaurant,Rock Club,Southern / Soul Food Restaurant,Speakeasy,Sports Bar,Tapas Restaurant,Theater
0,Ashburn,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Belmont Gardens,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Belmont Gardens,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,Belmont Terrace,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Belmont Terrace,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
