## IBM Applied Data Science Capstone Project 
### Evaluating Restaurant Locations in Toronto By Using Major Crime Indicators (MCI)

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

1. In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.

2. In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

Review criterialess 
This capstone project will be graded by your peers. This capstone project is worth 70% of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth 30% whereas week 2 submissions will be worth 40% of your total grade.

For this week, you will required to submit the following:

1. A description of the problem and a discussion of the background. (15 marks)
2. A description of the data and how it will be used to solve the problem. (15 marks)

For the second week, the final deliverables of the project will be:

1. A link to your Notebook on your Github repository, showing your code. (15 marks)
2. A full report consisting of all of the following components (15 marks):

Introduction where you discuss the business problem and who would be interested in this project.
Data where you describe the data that will be used to solve the problem and the source of the data.
Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
Results section where you discuss the results.
Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
Conclusion section where you conclude the report.

3. Your choice of a presentation or blogpost. (10 marks)

### Create Toronto Dataframe

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from urllib.request import urlopen
from bs4 import BeautifulSoup

#### Define function to read content of the link using BeautifulSoup

In [2]:
def getHTMLContent(link):
    html = urlopen(link)
    soup = BeautifulSoup(html, 'html.parser')
    return soup

In [3]:
# Call the previously created getHTMLContent method
content = getHTMLContent('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
postal_code_table = content.find("table", attrs={"class": "wikitable"})
postal_code_table_data = postal_code_table.tbody.find_all("tr")  
postal_code_table_data

table_data = []
# Get all the rows of table
for tr in postal_code_table.tbody.find_all("tr"): # find all tr's from table's tbody
    t_row = []
    # find all td's(3) in tr and zip it with t_header
    for td in tr.find_all("td"): 
         t_row.append(td.text.replace('\n', '').strip())
    #read only rows which has values in it
    if(len(t_row)>0):
         table_data.append(t_row)

In [4]:
# Add header to the dataframe
df = pd.DataFrame(table_data,columns = ['Postal Code','Borough', 'Neighborhood'] ) 
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [5]:
# Remove rows with 'Not assigned' Borough
df.drop(df[df['Borough']=='Not assigned']. index, axis=0, inplace=True)
df = df.reset_index(drop=True)
df = df.drop_duplicates()
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [6]:
df.shape

(103, 3)

In [7]:
# Get Geocode to a dataframe
df_geocode = pd.read_csv('Geospatial_Coordinates.csv')
df_geocode.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
# Merge Geocode to the dataframe
df_toronto = pd.merge(df, df_geocode, how='inner', on='Postal Code')
df_toronto.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


In [9]:
df_toronto.shape

(103, 5)

#### Filter Borough with 'Toronto' in it

In [10]:
df_toronto.drop(df_toronto[~df_toronto['Borough'].str.contains('Toronto', regex=True)]. index, axis=0, inplace=True)
df_toronto = df_toronto.reset_index(drop=True)
df_toronto.shape

(39, 5)

#### Get Geocode of Toronto

In [11]:
from geopy.geocoders import Nominatim 

city = 'Toronto'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('Geograpical coordinate of Toronto, ON, Canada are {}, {}.'.format(latitude, longitude))

Geograpical coordinate of Toronto, ON, Canada are 43.6534817, -79.3839347.


#### Create map of Toronto using latitude and longitude values

In [12]:
import folium 

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# Add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], 
    df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Use Foursquare API to explore the neighborhoods and segment them

In [13]:
# My Foursquare credential
VERSION = '20200401' 
CLIENT_ID='0QUMVQD2MXECCBFR4U3CDLQFO2WVVQOE5JGGBR04VJJ5Q5OO'
CLIENT_SECRET = 'V2RDAFWOSJ4IKWKLNNAIZDEE30EDGSA5FE0LTROSJAVGYHFQ'

#### Explore the first neighborhood in Toronto dataframe

In [14]:
df_toronto[0:1]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636


#### Get the 1st neighborhood's name, latitude and longitude values

In [15]:
neighborhood_latitude = df_toronto.loc[0, 'Latitude'] 
neighborhood_longitude = df_toronto.loc[0, 'Longitude']
neighborhood_name = df_toronto.loc[0, 'Neighborhood']
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park / Harbourfront are 43.6542599, -79.3606359.


#### Get the top 100 venues that are in V within a radius of 500 meters

In [16]:
num_venues = 100
radius = 500 
URL = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, num_venues)
URL

'https://api.foursquare.com/v2/venues/explore?&client_id=0QUMVQD2MXECCBFR4U3CDLQFO2WVVQOE5JGGBR04VJJ5Q5OO&client_secret=V2RDAFWOSJ4IKWKLNNAIZDEE30EDGSA5FE0LTROSJAVGYHFQ&v=20200401&ll=43.6542599,-79.3606359&radius=500&limit=100'

#### Make HTTP GET Request via Foursquare API

In [17]:
import requests
results = requests.get(URL).json()
results

{'meta': {'code': 200, 'requestId': '5e8578e6b1cac0001bd9938f'},
 'response': {'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 45,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.653446723052674,
          'lng': -79.3620167174383}],
        'distance': 143,
       

The information from the above response is in the items key. We need the following get_category_type() function from the Foursquare lab to transform the JSON data.

In [18]:
# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Transform JSON Respon into a Pandas Dataframe

In [19]:
from pandas.io.json import json_normalize

venues = results['response']['groups'][0]['items']

# Normalize Json data
nearby_venues = json_normalize(venues)

# Filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# Filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# Clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149


In [20]:
print('There are {} venues identified by Foursquare.'.format(nearby_venues.shape[0]))

There are 45 venues identified by Foursquare.


### Explore All Neighborhoods in Toronto

Define a function to repeat the above process to all neighborhoods in Toronto

#### Use the above function to find nearby venues for all neighborhoods and create a new dataframe called df_toronto_venues.

In [21]:
LIMIT = 100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
df_toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'], latitudes=df_toronto['Latitude'],
                    longitudes=df_toronto['Longitude'])

Regent Park / Harbourfront
Queen's Park / Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
The Danforth West / Riverdale
Toronto Dominion Centre / Design Exchange
Brockton / Parkdale Village / Exhibition Place
India Bazaar / The Beaches West
Commerce Court / Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park / The Junction South
North Toronto West
The Annex / North Midtown / Yorkville
Parkdale / Roncesvalles
Davisville
University of Toronto / Harbord
Runnymede / Swansea
Moore Park / Summerhill East
Kensington Market / Chinatown / Grange Park
Summerhill West / Rathnelly / South Hill / Forest Hill SE / Deer Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport
Roseda

In [23]:
print(df_toronto_venues.shape)
df_toronto_venues.head()

(1693, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park / Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park / Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park / Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Regent Park / Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Regent Park / Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


#### Find out the Unique Venue Category

In [24]:
print('There are {} uniques categories.'.format(len(df_toronto_venues['Venue Category'].unique())))

There are 236 uniques categories.


In [25]:
print('Uniques categories are: {}'.format(df_toronto_venues['Venue Category'].unique()))

Uniques categories are: ['Bakery' 'Coffee Shop' 'Distribution Center' 'Spa' 'Breakfast Spot'
 'Restaurant' 'Park' 'Historic Site' 'Pub' 'Farmers Market'
 'Chocolate Shop' 'Dessert Shop' 'Theater' 'Performing Arts Venue'
 'French Restaurant' 'Café' 'Mexican Restaurant' 'Event Space'
 'Yoga Studio' 'Ice Cream Shop' 'Asian Restaurant' 'Shoe Store'
 'Cosmetics Shop' 'Electronics Store' 'Bank' 'Beer Store' 'Hotel'
 'Health Food Store' 'Antique Shop' 'Italian Restaurant' 'Beer Bar'
 'Creperie' 'Arts & Crafts Store' 'Burrito Place' 'Diner' 'Hobby Shop'
 'Discount Store' 'Wings Joint' 'Nightclub' 'Fried Chicken Joint'
 'Burger Joint' 'Boutique' 'Juice Bar' 'Gym' 'College Auditorium' 'Bar'
 'Music Venue' 'Clothing Store' 'Comic Shop' 'Pizza Place' 'Plaza'
 'Tea Room' 'Ramen Restaurant' 'Thai Restaurant' 'Movie Theater'
 'Art Gallery' 'Sandwich Place' 'Steakhouse' 'Shopping Mall'
 'American Restaurant' 'Japanese Restaurant' 'College Rec Center'
 'Gastropub' 'Bookstore' 'Sushi Restaurant' 'Tannin

In [26]:
df_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,56,56,56,56,56,56
Brockton / Parkdale Village / Exhibition Place,23,23,23,23,23,23
Business reply mail Processing CentrE,15,15,15,15,15,15
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,16,16,16,16,16,16
Central Bay Street,77,77,77,77,77,77
Christie,18,18,18,18,18,18
Church and Wellesley,81,81,81,81,81,81
Commerce Court / Victoria Hotel,100,100,100,100,100,100
Davisville,36,36,36,36,36,36
Davisville North,9,9,9,9,9,9


### Analyze Each Neighborhood

In [27]:
# one hot encoding
toronto_onehot = pd.get_dummies(df_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = df_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [28]:
toronto_onehot.shape

(1693, 236)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [29]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0
1,Brockton / Parkdale Village / Exhibition Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing CentrE,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,CN Tower / King and Spadina / Railway Lands / ...,0.0,0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,...,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.0


In [30]:
toronto_grouped.shape

(39, 236)

#### Let's print each neighborhood along with the top 5 most common venues

In [31]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2      Beer Bar  0.04
3          Café  0.04
4        Bakery  0.04


----Brockton / Parkdale Village / Exhibition Place----
            venue  freq
0            Café  0.13
1  Breakfast Spot  0.09
2       Nightclub  0.09
3     Coffee Shop  0.09
4             Gym  0.04


----Business reply mail Processing CentrE----
                  venue  freq
0           Yoga Studio  0.07
1         Auto Workshop  0.07
2            Skate Park  0.07
3                   Spa  0.07
4  Fast Food Restaurant  0.07


----CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport----
                 venue  freq
0       Airport Lounge  0.12
1      Airport Service  0.12
2     Airport Terminal  0.12
3          Coffee Shop  0.06
4  Rental Car Location  0.06


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.18
1   Italian Restaurant  0.05
2     

#### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Café,Restaurant,Seafood Restaurant,Beer Bar,Farmers Market,Breakfast Spot
1,Brockton / Parkdale Village / Exhibition Place,Café,Coffee Shop,Breakfast Spot,Nightclub,Gym,Stadium,Burrito Place,Restaurant,Climbing Gym,Performing Arts Venue
2,Business reply mail Processing CentrE,Yoga Studio,Auto Workshop,Park,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Light Rail Station,Skate Park
3,CN Tower / King and Spadina / Railway Lands / ...,Airport Lounge,Airport Service,Airport Terminal,Sculpture Garden,Harbor / Marina,Rental Car Location,Plane,Coffee Shop,Boat or Ferry,Boutique
4,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Japanese Restaurant,Thai Restaurant,Burger Joint,Café,Ice Cream Shop,Spa,Middle Eastern Restaurant


### Cluster Neighborhoods

In [34]:
# one hot encoding
toronto_onehot = pd.get_dummies(df_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = df_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0
1,Brockton / Parkdale Village / Exhibition Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing CentrE,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,CN Tower / King and Spadina / Railway Lands / ...,0.0,0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,...,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.0


#### Run k-means to cluster the neighborhood into 5 clusters.

In [35]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 1], dtype=int32)

#### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [36]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,0,Coffee Shop,Pub,Park,Theater,Mexican Restaurant,Breakfast Spot,Bakery,Restaurant,Café,Shoe Store
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,0,Coffee Shop,Gym,Diner,Music Venue,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Wings Joint,Fried Chicken Joint
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Thai Restaurant,Bookstore,Pizza Place,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Italian Restaurant,Café,Restaurant,Clothing Store,Cocktail Bar,Beer Bar,Cosmetics Shop,Bakery,Diner
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Trail,Pub,Health Food Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store


#### Finally, let's visualize the resulting clusters

In [37]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], 
                                  toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

#### Top Cluster 1

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Pub,Park,Theater,Mexican Restaurant,Breakfast Spot,Bakery,Restaurant,Café,Shoe Store
1,Downtown Toronto,0,Coffee Shop,Gym,Diner,Music Venue,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Wings Joint,Fried Chicken Joint
2,Downtown Toronto,0,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Thai Restaurant,Bookstore,Pizza Place,Middle Eastern Restaurant
3,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Café,Restaurant,Clothing Store,Cocktail Bar,Beer Bar,Cosmetics Shop,Bakery,Diner
4,East Toronto,0,Trail,Pub,Health Food Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store
5,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Café,Restaurant,Seafood Restaurant,Beer Bar,Farmers Market,Breakfast Spot
6,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Sandwich Place,Japanese Restaurant,Thai Restaurant,Burger Joint,Café,Ice Cream Shop,Spa,Middle Eastern Restaurant
7,Downtown Toronto,0,Grocery Store,Café,Park,Gas Station,Baby Store,Restaurant,Athletics & Sports,Italian Restaurant,Candy Store,Diner
8,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Bakery,Bar,Thai Restaurant,Clothing Store,Gastropub,Juice Bar,Salad Place
9,West Toronto,0,Bakery,Pharmacy,Bar,Pool,Bank,Supermarket,Grocery Store,Park,Gym / Fitness Center,Music Venue


#### Top Cluster 2

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,East Toronto,1,Sandwich Place,Gym,Food & Drink Shop,Liquor Store,Burrito Place,Italian Restaurant,Restaurant,Fast Food Restaurant,Steakhouse,Fish & Chips Shop
18,Central Toronto,1,Park,Lawyer,Swim School,Bus Line,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
20,Central Toronto,1,Gym,Hotel,Convenience Store,Department Store,Sandwich Place,Breakfast Spot,Food & Drink Shop,Asian Restaurant,Park,Gas Station
21,Central Toronto,1,Trail,Sushi Restaurant,Park,Bus Line,Jewelry Store,Home Service,Doner Restaurant,Distribution Center,Dog Run,Women's Store
38,East Toronto,1,Yoga Studio,Auto Workshop,Park,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Light Rail Station,Skate Park


#### Top Cluster 3

In [40]:
# Top Cluster #3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Central Toronto,2,Gym,Playground,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


### MCI Data
http://data.torontopolice.on.ca/datasets/mci-2014-to-2019/data?geometry=-79.404%2C43.714%2C-79.359%2C43.725

In [41]:
#MCI_DATA_URL = "https://opendata.arcgis.com/datasets/f4c2e5de021f4836a3caf77f8421f487_0.geojson"
#MCI_data = pd.read_json(MCI_DATA_URL)

mci_data = pd.read_csv('MCI_2014_to_2019.csv')

In [42]:
mci_data.head()

Unnamed: 0,X,Y,Index_,event_unique_id,occurrencedate,reporteddate,premisetype,ucr_code,ucr_ext,offence,...,occurrencedayofyear,occurrencedayofweek,occurrencehour,MCI,Division,Hood_ID,Neighbourhood,Long,Lat,ObjectId
0,-79.395706,43.722324,7734,GO-20151746998,2015-10-10T01:45:00.000Z,2015-10-10T04:05:00.000Z,Outside,1610,220,Robbery - Other,...,283,Saturday,1,Robbery,D53,103,Lawrence Park South (103),-79.395706,43.722324,7375
1,-79.395706,43.722324,7735,GO-20151746998,2015-10-10T01:45:00.000Z,2015-10-10T04:05:00.000Z,Outside,1610,200,Robbery - Mugging,...,283,Saturday,1,Robbery,D53,103,Lawrence Park South (103),-79.395706,43.722324,7376
2,-79.400047,43.722816,7627,GO-2015498747,2015-03-25T10:50:00.000Z,2015-03-25T11:42:00.000Z,House,2120,200,B&E,...,84,Wednesday,10,Break and Enter,D53,103,Lawrence Park South (103),-79.400047,43.722817,7781
3,-79.383369,43.71801,6448,GO-20151734904,2015-10-07T17:00:00.000Z,2015-10-08T04:47:00.000Z,House,2120,200,B&E,...,280,Wednesday,17,Break and Enter,D53,41,Bridle Path-Sunnybrook-York Mills (41),-79.383369,43.71801,6469
4,-79.381615,43.723148,6387,GO-201581071,2015-01-14T14:40:00.000Z,2015-01-14T21:09:00.000Z,House,2120,200,B&E,...,14,Wednesday,14,Break and Enter,D53,41,Bridle Path-Sunnybrook-York Mills (41),-79.381615,43.723148,6697


In [43]:
mci_data.shape

(566, 29)

#### Find out the Unique MCI Category

In [44]:
print('There are {} uniques MCI categories.'.format(len(mci_data['MCI'].unique())))

There are 5 uniques MCI categories.


In [45]:
print('Uniques MCI categories are: {}'.format(mci_data['MCI'].unique()))

Uniques MCI categories are: ['Robbery' 'Break and Enter' 'Assault' 'Theft Over' 'Auto Theft']


In [46]:
mci_data.groupby('Neighbourhood').count()

Unnamed: 0_level_0,X,Y,Index_,event_unique_id,occurrencedate,reporteddate,premisetype,ucr_code,ucr_ext,offence,...,occurrenceday,occurrencedayofyear,occurrencedayofweek,occurrencehour,MCI,Division,Hood_ID,Long,Lat,ObjectId
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Bridle Path-Sunnybrook-York Mills (41),167,167,167,167,167,167,167,167,167,167,...,167,167,167,167,167,167,167,167,167,167
Lawrence Park North (105),7,7,7,7,7,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7
Lawrence Park South (103),217,217,217,217,217,217,217,217,217,217,...,217,217,217,217,217,217,217,217,217,217
Leaside-Bennington (56),117,117,117,117,117,117,117,117,117,117,...,117,117,117,117,117,117,117,117,117,117
Mount Pleasant East (99),49,49,49,49,49,49,49,49,49,49,...,49,49,49,49,49,49,49,49,49,49
Thorncliffe Park (55),9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9


In [47]:
# one hot encoding
toronto_mci_onehot = pd.get_dummies(mci_data[['MCI']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_mci_onehot['Neighbourhood'] = mci_data['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_mci_onehot.columns[-1]] + list(toronto_mci_onehot.columns[:-1])
toronto_mci_onehot = toronto_mci_onehot[fixed_columns]

toronto_mci_onehot.head()

Unnamed: 0,Neighbourhood,Assault,Auto Theft,Break and Enter,Robbery,Theft Over
0,Lawrence Park South (103),0,0,0,1,0
1,Lawrence Park South (103),0,0,0,1,0
2,Lawrence Park South (103),0,0,1,0,0
3,Bridle Path-Sunnybrook-York Mills (41),0,0,1,0,0
4,Bridle Path-Sunnybrook-York Mills (41),0,0,1,0,0


In [48]:
toronto_mci_grouped = toronto_mci_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_mci_grouped.head()

Unnamed: 0,Neighbourhood,Assault,Auto Theft,Break and Enter,Robbery,Theft Over
0,Bridle Path-Sunnybrook-York Mills (41),0.479042,0.101796,0.347305,0.017964,0.053892
1,Lawrence Park North (105),0.571429,0.285714,0.0,0.142857,0.0
2,Lawrence Park South (103),0.258065,0.133641,0.520737,0.050691,0.036866
3,Leaside-Bennington (56),0.435897,0.17094,0.230769,0.042735,0.119658
4,Mount Pleasant East (99),0.346939,0.081633,0.530612,0.0,0.040816


#### Let's print each neighborhood along with the top 5 most common crimes

In [49]:
num_top_venues = 5

for hood in toronto_mci_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_mci_grouped[toronto_mci_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bridle Path-Sunnybrook-York Mills (41)----
             venue  freq
0          Assault  0.48
1  Break and Enter  0.35
2       Auto Theft  0.10
3       Theft Over  0.05
4          Robbery  0.02


----Lawrence Park North (105)----
             venue  freq
0          Assault  0.57
1       Auto Theft  0.29
2          Robbery  0.14
3  Break and Enter  0.00
4       Theft Over  0.00


----Lawrence Park South (103)----
             venue  freq
0  Break and Enter  0.52
1          Assault  0.26
2       Auto Theft  0.13
3          Robbery  0.05
4       Theft Over  0.04


----Leaside-Bennington (56)----
             venue  freq
0          Assault  0.44
1  Break and Enter  0.23
2       Auto Theft  0.17
3       Theft Over  0.12
4          Robbery  0.04


----Mount Pleasant East (99)----
             venue  freq
0  Break and Enter  0.53
1          Assault  0.35
2       Auto Theft  0.08
3       Theft Over  0.04
4          Robbery  0.00


----Thorncliffe Park (55)----
             venue  freq
0    

#### Let's put that into a pandas dataframe

In [50]:
def return_most_common_crimes(row, num_top_crime):
    row_crimes = row.iloc[1:]
    row_crimes_sorted = row_crimes.sort_values(ascending=False)
    
    return row_crimes_sorted.index.values[0:num_top_crime]

#### Now let's create the new dataframe and display the top 5 crimes for each neighborhood.

In [51]:
num_top_crimes = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top crimes
columns_mci = ['Neighbourhood']
for ind in np.arange(num_top_crimes):
    try:
        columns_mci.append('{}{} Most Common Crime'.format(ind+1, indicators[ind]))
    except:
        columns_mci.append('{}th Most Common Crime'.format(ind+1))

# create a new dataframe
neighborhoods_crimes_sorted = pd.DataFrame(columns=columns_mci)
neighborhoods_crimes_sorted['Neighbourhood'] = toronto_mci_grouped['Neighbourhood']

for ind in np.arange(toronto_mci_grouped.shape[0]):
    neighborhoods_crimes_sorted.iloc[ind, 1:] = return_most_common_crimes(toronto_mci_grouped.iloc[ind, :], num_top_crimes)

neighborhoods_crimes_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime
0,Bridle Path-Sunnybrook-York Mills (41),Assault,Break and Enter,Auto Theft,Theft Over,Robbery
1,Lawrence Park North (105),Assault,Auto Theft,Robbery,Theft Over,Break and Enter
2,Lawrence Park South (103),Break and Enter,Assault,Auto Theft,Robbery,Theft Over
3,Leaside-Bennington (56),Assault,Break and Enter,Auto Theft,Theft Over,Robbery
4,Mount Pleasant East (99),Break and Enter,Assault,Auto Theft,Theft Over,Robbery


### Cluster Neighborhoods using MCI Data

#### Run k-means to cluster the neighborhood into 5 clusters.

In [52]:
# set number of clusters
kclusters = 5

toronto_mci_grouped_clustering = toronto_mci_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_mci_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 3, 0, 2, 0, 1], dtype=int32)

#### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [53]:
# add clustering labels
neighborhoods_crimes_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_mci_merged = mci_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_mci_merged = toronto_mci_merged.join(neighborhoods_crimes_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_mci_merged.head() # check the last columns!

Unnamed: 0,X,Y,Index_,event_unique_id,occurrencedate,reporteddate,premisetype,ucr_code,ucr_ext,offence,...,Neighbourhood,Long,Lat,ObjectId,Cluster Labels,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime
0,-79.395706,43.722324,7734,GO-20151746998,2015-10-10T01:45:00.000Z,2015-10-10T04:05:00.000Z,Outside,1610,220,Robbery - Other,...,Lawrence Park South (103),-79.395706,43.722324,7375,0,Break and Enter,Assault,Auto Theft,Robbery,Theft Over
1,-79.395706,43.722324,7735,GO-20151746998,2015-10-10T01:45:00.000Z,2015-10-10T04:05:00.000Z,Outside,1610,200,Robbery - Mugging,...,Lawrence Park South (103),-79.395706,43.722324,7376,0,Break and Enter,Assault,Auto Theft,Robbery,Theft Over
2,-79.400047,43.722816,7627,GO-2015498747,2015-03-25T10:50:00.000Z,2015-03-25T11:42:00.000Z,House,2120,200,B&E,...,Lawrence Park South (103),-79.400047,43.722817,7781,0,Break and Enter,Assault,Auto Theft,Robbery,Theft Over
3,-79.383369,43.71801,6448,GO-20151734904,2015-10-07T17:00:00.000Z,2015-10-08T04:47:00.000Z,House,2120,200,B&E,...,Bridle Path-Sunnybrook-York Mills (41),-79.383369,43.71801,6469,4,Assault,Break and Enter,Auto Theft,Theft Over,Robbery
4,-79.381615,43.723148,6387,GO-201581071,2015-01-14T14:40:00.000Z,2015-01-14T21:09:00.000Z,House,2120,200,B&E,...,Bridle Path-Sunnybrook-York Mills (41),-79.381615,43.723148,6697,4,Assault,Break and Enter,Auto Theft,Theft Over,Robbery


#### Finally, let's visualize the resulting clusters based MCI Data

In [54]:
# create map
map_mci_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_mci_merged['Lat'], toronto_mci_merged['Long'], 
                                  toronto_mci_merged['Neighbourhood'], toronto_mci_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_mci_clusters)
       
map_mci_clusters

In [55]:
top_cluster = toronto_merged[toronto_merged['Cluster Labels'].isin([0])]