# Exploring Los Angeles County Cities: Identifying 'Quiet' Areas

## 1. Discussion and Background of the Business Problem:

### 1.1 Problem Statement: Prospects of Relatively 'Quiet' Cities in the Surrounding Los Angeles Area

Los Angeles is the largest city in the state of California. It has an estimated population of nearly four million people making it tough to find 'quiet' places that lack cultural noise.

### 1.2 Target Audience

What type of clients or a group of people would be interested in this project?

- Business Owners
- Prospective Home Owners
- Seismic Sensor Installers

### 1.3 Data Description

Data will be collected from the following sources:
- Los Angeles county city data that contains **city name**, and **population**.<br>
     *Data Source:* [Wikipedia](https://en.wikipedia.org/wiki/List_of_cities_in_Los_Angeles_County,_California)
- Coordinate data for the cities in Los Angeles county which includes **latitude** and **longitude**.<br>
     *Data Source:* The Geocoder Python library
- The number of retail establishments will be fetched using the Foursquare API.<br>
     *Data Source:* Foursquare API

### 1.4 Approach

The approach to resolve the issue of identifying the most quiet areas in Los Angeles county is as follows:

- Collect Los Angeles county city data from Wikipedia.
- Utilize the Geocoder Python library to collect city coordinate data.
- Use the Foursquare API to get the approximate number of retail establishments in each City.
- Visualize the data and do some statistical analysis.
- Analyzing by using Clustering (K-Means).
- Find the best value for K.
- Visualize the city max density of population.
- Visualize the city max density of retail establishments.
- From results, infer which areas would be the quietest and draw conclusions.

## 2. Data Preparation:

### 2.1 Scraping Los Angeles County Cities from Wikipedia

In [3]:
# https://towardsdatascience.com/exploring-the-tokyo-neighborhoods-data-science-in-real-life-8b6c2454ca16
# https://towardsdatascience.com/classification-of-moscow-metro-stations-using-foursquare-data-fb8aad3e0e4
# https://ruddra.com/posts/project-battle-of-capstones/

import pandas as pd

import numpy as np

# A List of Cities in Los Angeles County, California
wiki_link = "https://en.wikipedia.org/wiki/List_of_cities_in_Los_Angeles_County,_California"

Pandas provides a method for reading html directly into a DataFrame.

In [4]:
# Read the wikipedia tables into dataframes
dfs = pd.read_html(wiki_link)
# the first DF contains borough data
df = dfs[0]
df.columns

Index(['City', 'Date incorporated', 'Population as of(2010 Census)'], dtype='object')

### 2.2 Clean the City Data

Now that the data is obtained, we must clean it up a bit. The date the city was incorporated will be of no use to us during analysis. We should also rename the Population column so it is more easily accessed.  

In [5]:
# Date Incorporated is useless
df = df.drop(['Date incorporated'], axis=1)

# Rename Population Column
df = df.rename(columns={'Population as of(2010 Census)':'Population'})
df.head()

Unnamed: 0,City,Population
0,Agoura Hills,20330
1,Alhambra,83653
2,Arcadia,56364
3,Artesia,16522
4,Avalon,3728


In [30]:
df.describe()

Unnamed: 0,Population,Latitude,Longitude
count,88.0,88.0,88.0
mean,100128.7,34.014636,-118.179643
std,403102.1,0.174567,0.224828
min,112.0,33.34411,-118.81875
25%,20213.5,33.906053,-118.343337
50%,39767.0,34.019855,-118.150575
75%,80949.25,34.108705,-118.042162
max,3792621.0,34.6989,-117.7164


### 2.3 Obtaining Coordinate Data

In [7]:
import geocoder

We will use the ArcGIS provider, as it seems to be the most reliable.

In [8]:
def get_lat_lon(city):
    print('Trying to get coordinates for {}'.format(city))
    lat_lon = None
    while not lat_lon:
        geo_str = '{}, Los Angeles County, California'.format(city)
        g = geocoder.arcgis(geo_str)
        lat_lon = g.latlng
    print('Successfully got coordinates for {}'.format(city))
    return lat_lon

Use the applymap function to send each City to the get_lat_lon function. Please be patient, this may take well over 1 minute to obtain all the coordinates.

In [9]:
lat_lon_df = df[['City']].applymap(get_lat_lon)

Trying to get coordinates for Agoura Hills
Successfully got coordinates for Agoura Hills
Trying to get coordinates for Alhambra
Successfully got coordinates for Alhambra
Trying to get coordinates for Arcadia
Successfully got coordinates for Arcadia
Trying to get coordinates for Artesia
Successfully got coordinates for Artesia
Trying to get coordinates for Avalon
Successfully got coordinates for Avalon
Trying to get coordinates for Azusa
Successfully got coordinates for Azusa
Trying to get coordinates for Baldwin Park
Successfully got coordinates for Baldwin Park
Trying to get coordinates for Bell
Successfully got coordinates for Bell
Trying to get coordinates for Bell Gardens
Successfully got coordinates for Bell Gardens
Trying to get coordinates for Bellflower
Successfully got coordinates for Bellflower
Trying to get coordinates for Beverly Hills
Successfully got coordinates for Beverly Hills
Trying to get coordinates for Bradbury
Successfully got coordinates for Bradbury
Trying to ge

Successfully got coordinates for Bellflower
Trying to get coordinates for Beverly Hills
Successfully got coordinates for Beverly Hills
Trying to get coordinates for Bradbury
Successfully got coordinates for Bradbury
Trying to get coordinates for Burbank
Successfully got coordinates for Burbank
Trying to get coordinates for Calabasas
Successfully got coordinates for Calabasas
Trying to get coordinates for Carson
Successfully got coordinates for Carson
Trying to get coordinates for Cerritos
Successfully got coordinates for Cerritos
Trying to get coordinates for Claremont
Successfully got coordinates for Claremont
Trying to get coordinates for Commerce
Successfully got coordinates for Commerce
Trying to get coordinates for Compton
Successfully got coordinates for Compton
Trying to get coordinates for Covina
Successfully got coordinates for Covina
Trying to get coordinates for Cudahy
Successfully got coordinates for Cudahy
Trying to get coordinates for Culver City
Successfully got coordina

Make sure the shape of this DF matches the original

In [31]:
lat_lon_df.shape

(88, 1)

We now will insert the latitude and longitude values into the original DataFrame.

In [32]:
df['Latitude'] = lat_lon_df.City.map(lambda x: x[0])
df['Longitude'] = lat_lon_df.City.map(lambda x: x[1])
df.head()

Unnamed: 0,City,Population,Latitude,Longitude
0,Agoura Hills,20330,34.14611,-118.77812
1,Alhambra,83653,34.0937,-118.12727
2,Arcadia,56364,34.13614,-118.03887
3,Artesia,16522,33.86114,-118.07968
4,Avalon,3728,33.34411,-118.32139


Are there any other data points we might find useful? City area?

## 3. Exploring and Clustering Los Angeles Cities

In [12]:
import folium

import requests

Use geocoder with the ArcGIS provider to get the address of Los Angeles, CA.

In [13]:
address = 'Los Angeles, CA'

g = geocoder.arcgis(address)
lat = g.latlng[0]
lon = g.latlng[1]

In [14]:
print('The geographical coordinates of Los Angeles, CA are {}, {}.'.format(lat, lon))

The geographical coordinates of Los Angeles, CA are 34.05349000000007, -118.24531999999999.


Create a map of Los Angeles with cities superimposed on top.

In [15]:
map_la = folium.Map(location=g.latlng, zoom_start=9)

for lat, lon, city in zip(df['Latitude'], df['Longitude'], 
                          df['City']):
    label = '{}, Los Angeles'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(location=[lat, lon],
                    radius=5,
                    popup=label,
                    color='blue',
                    fill=True,
                    fill_color='#3186cc',
                    fill_opacity=0.7,
                    parse_html=False).add_to(map_la)

map_la

Need to specify Foursquare credentials to get the data through their API.

In [16]:
CLIENT_ID = 'S2TKMI5O5YER1QGL0BXENI4QIZFRGUIKXDHBSTW3MYBDYRIY' # your Foursquare ID
CLIENT_SECRET = 'DSJBVXIKE430PKYPHM5Z3Q5YA1VSNTRUY5IIRGNTLUAJCDYH' # your Foursquare Secret
#VERSION = '20180605' # Foursquare API version
VERSION = '20200506' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: S2TKMI5O5YER1QGL0BXENI4QIZFRGUIKXDHBSTW3MYBDYRIY
CLIENT_SECRET:DSJBVXIKE430PKYPHM5Z3Q5YA1VSNTRUY5IIRGNTLUAJCDYH


Define a function to obtain venues that are nearby to the Cities. What is a good value for radius? Shouldn't this be made dynamically based on the area the city encompasses?

In [63]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    print('Getting venue list, this may take a while...')
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
            # how to get the only the primary category?
            v['venue']['categories'][0]['name']) for v in results])
                

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Use previously defined function to obtain a list of LA foursquare venues.

In [64]:
la_venues = getNearbyVenues(names=df['City'], 
                            latitudes=df['Latitude'],
                            longitudes=df['Longitude']
                       )
print('Done!')

Getting venue list, this may take a while...
Done!


Check the size and view the first few results.

In [66]:
print(la_venues.shape)
la_venues.head()

(2324, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agoura Hills,34.14611,-118.77812,Future Track Running Center,34.145819,-118.779251,Sporting Goods Shop
1,Agoura Hills,34.14611,-118.77812,Twisted Oak Tavern,34.145308,-118.778679,Gastropub
2,Agoura Hills,34.14611,-118.77812,Grissini Ristorante,34.145815,-118.778534,Italian Restaurant
3,Agoura Hills,34.14611,-118.77812,Cafe Bizou,34.14841,-118.782587,French Restaurant
4,Agoura Hills,34.14611,-118.77812,Pizza Nosh,34.148311,-118.782181,Pizza Place


What venue categories do we have?

In [65]:
len(la_venues['Venue Category'].unique())

276

Check how many vanues were returned for each neighborhood.

In [67]:
la_venues.groupby('City').count()['Venue'].sort_values(ascending=True)

City
Arcadia                   1
Vernon                    2
Cudahy                    3
Rolling Hills             3
Bradbury                  3
Diamond Bar               4
Rolling Hills Estates     4
Calabasas                 5
Signal Hill               5
La Mirada                 6
Commerce                  6
Palos Verdes Estates      7
Pico Rivera               7
Bell Gardens              7
South El Monte            8
South Gate                8
Lynwood                   8
Santa Fe Springs          9
La Puente                 9
San Gabriel               9
Huntington Park          10
El Monte                 10
Santa Clarita            12
Montebello               12
Westlake Village         15
La Cañada Flintridge     15
Gardena                  15
Torrance                 16
Hidden Hills             17
Palmdale                 17
                         ..
Hawthorne                33
Norwalk                  33
San Fernando             34
Bellflower               34
Covina         

Find out how many unique categories can be curated from all the returned venues.

In [72]:
cat_count = len(toronto_venues['Venue Category'].unique())
print('There are {} unique venue categories.'.format(cat_count))

There are 219 unique venue categories.


# Analyzing Each Neighborhood

We would like to see how many of each category is in each Neighborhood. The cell below will construct a new DataFrame containing counts of each category, along with a Neighborhood column.

In [82]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(toronto_onehot)
cols.insert(0, cols.pop(cols.index('Neighborhood')))
toronto_onehot = toronto_onehot.loc[:, cols]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [75]:
toronto_onehot.shape

(1599, 219)

We can see the frequency at which each category occurs in each neighborhood by using groupby.

In [85]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,...,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925
1,"Brockton, Parkdale Village, Exhibition Place",0.022727,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.0,0.02,0.0,0.01,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905
7,"Commerce Court, Victoria Hotel",0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [87]:
toronto_grouped.shape

(39, 219)

We can print each neighborhood along with the top 5 most common venues.

In [89]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [92]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Lounge,Bakery,Breakfast Spot,Restaurant,Cheese Shop,Hotel
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Thrift / Vintage Store,Diner,Pizza Place,Gift Shop,Sandwich Place,Boutique,Italian Restaurant,Brewery
2,Business reply mail Processing Centre,Coffee Shop,Hotel,Japanese Restaurant,Café,Restaurant,Asian Restaurant,Italian Restaurant,Theater,Steakhouse,Bookstore
3,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Café,Restaurant,French Restaurant,Park,Bar,Speakeasy,Lounge,Italian Restaurant,Japanese Restaurant
4,Central Bay Street,Coffee Shop,Café,Middle Eastern Restaurant,Plaza,Clothing Store,Restaurant,Bubble Tea Shop,Sandwich Place,Hotel,Mexican Restaurant


Wow, they sure do like coffee in Toronto!

# Cluster Neighborhoods

In [108]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

In [134]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
toronto_grouped_clustering
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0], dtype=int32)

In [135]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_boroughs

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.650964,-79.353041,0,Pub,Café,Athletics & Sports,Coffee Shop,Performing Arts Venue,Theater,Seafood Restaurant,Mexican Restaurant,Food Truck,French Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66179,-79.38939,0,Coffee Shop,Sushi Restaurant,Café,Yoga Studio,Discount Store,Pharmacy,Park,Middle Eastern Restaurant,Juice Bar,Italian Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657491,-79.377529,0,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Café,Cosmetics Shop,Restaurant,Hotel,Bar,Italian Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651734,-79.375554,0,Coffee Shop,Café,American Restaurant,Seafood Restaurant,Cosmetics Shop,Gastropub,Cocktail Bar,Theater,Italian Restaurant,Hotel
4,M4E,East Toronto,The Beaches,43.678148,-79.295349,0,Health Food Store,Trail,Pub,Yoga Studio,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [136]:
# create map
map_clusters = folium.Map(location=g.latlng, zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine Clusters

### Cluster 1

In [137]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Pub,Café,Athletics & Sports,Coffee Shop,Performing Arts Venue,Theater,Seafood Restaurant,Mexican Restaurant,Food Truck,French Restaurant
1,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Café,Yoga Studio,Discount Store,Pharmacy,Park,Middle Eastern Restaurant,Juice Bar,Italian Restaurant
2,Downtown Toronto,0,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Café,Cosmetics Shop,Restaurant,Hotel,Bar,Italian Restaurant
3,Downtown Toronto,0,Coffee Shop,Café,American Restaurant,Seafood Restaurant,Cosmetics Shop,Gastropub,Cocktail Bar,Theater,Italian Restaurant,Hotel
4,East Toronto,0,Health Food Store,Trail,Pub,Yoga Studio,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
5,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Lounge,Bakery,Breakfast Spot,Restaurant,Cheese Shop,Hotel
6,Downtown Toronto,0,Coffee Shop,Café,Middle Eastern Restaurant,Plaza,Clothing Store,Restaurant,Bubble Tea Shop,Sandwich Place,Hotel,Mexican Restaurant
7,Downtown Toronto,0,Grocery Store,Café,Park,Baby Store,Athletics & Sports,Coffee Shop,Candy Store,Playground,Fish & Chips Shop,Fish Market
8,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Clothing Store,Salad Place,Gym,Sushi Restaurant,Deli / Bodega,Thai Restaurant,Hotel
9,West Toronto,0,Park,Smoke Shop,Pharmacy,Brazilian Restaurant,Café,Liquor Store,Bank,Bakery,Furniture / Home Store,Pool


### Cluster 2

In [138]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Central Toronto,1,Gym / Fitness Center,Park,Yoga Studio,Eastern European Restaurant,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


### Cluster 3

In [139]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,2,Bus Line,Swim School,Yoga Studio,Food Court,Food,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


### Cluster 4

In [140]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,East Toronto,3,Business Service,Government Building,Night Market,Yoga Studio,Electronics Store,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


### Cluster 5

In [141]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,4,IT Services,Yoga Studio,Eastern European Restaurant,Food,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


We can see 'Cluster 1' ends up being really the only true cluster as all other clusters end up with only one Borough in each. One though on why this may be is that in Cluster 1, Cafes and Coffee Shops are the most common venues, while in the others we don't see coffee shops or cafes. 