# Toronto FourSquare Analysis
## Part 1

> The wikipedia Table has been updated from time of course writing. As such it already is in the format necessary being grouped by postal codes.

### Step 1
> Just need to download it and bring it into a Panadas Dataframe using at least pandas version 1.0.0 or newer as string is now an official dtype separate from object



In [1]:
! pip install --user pandas==1.0.5



In [2]:
import pandas as pd
print(pd.__version__)

1.0.5


In [3]:
toronto_neigh=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

In [4]:
toronto_neigh
toronto_neigh=toronto_neigh.astype("string")
toronto_neigh.dtypes

Postal Code      string
Borough          string
Neighbourhood    string
dtype: object

### Step 2
> Clean the table by removing the not assigned values


In [5]:
clean_df=toronto_neigh[toronto_neigh['Borough'] != "Not assigned"]

In [6]:
clean_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


## Part 2
> get the lat and long locational data and add it to the data frame

### Step 1
> import geocodor

In [7]:
! pip install geopy



In [8]:
from geopy.geocoders import Nominatim

The below code would only return the lat and long for the first location given after that it would only return None. I oculdn't get it working and the google API is no longer usable without an api key and the geopy package and gecoder code has been changed so much from when the course was done that I had to really dig through some doumentation to get it working

In [9]:
#not sure why this keeps failign it gets the first lcoationa dn then failes forever after that

#lat_list=[]
#long_list=[]
#
#for pos_c in clean_df['Postal Code']:
#    print(pos_c)
#    location=None
#    while(location is None):
#        g = Nominatim(user_agent="toronto_explorer")
#        location = g.geocode('{}, Toronto, Ontario'.format(pos_c))
#    lat_list.append(location.latitude)
#    long_list.append(location.longitude)
#    
#print(lat_list)
#print(long_list)
#clean_df['Latitude']=lat_list
#clean_df['Longitude']=long_list

In [10]:
coords=pd.read_csv('Geospatial_Coordinates.csv')
coords

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [11]:
clean_df=pd.merge(clean_df, coords, on=['Postal Code'])
clean_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## Part 3
### step 1
 > analyze and map like in the exmaple

In [12]:
! pip install folium
import folium



In [13]:
address = 'Toronto, Ontario, Canada'

geolocator = Nominatim(user_agent="toronto")
t_location = geolocator.geocode(address)
t_latitude = t_location.latitude
t_longitude = t_location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(t_latitude, t_longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [14]:
map_toronto = folium.Map(location=[t_latitude, t_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(clean_df['Latitude'], clean_df['Longitude'], clean_df['Postal Code']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [15]:
CLIENT_ID = '0RGBNDWZRDACEKIXYD4TLZ24JCK2CHZ51TLYCDCPXYVRLDJP' # your Foursquare ID
CLIENT_SECRET = '01B5TCEM3IUMXB1NKTQK4ACOKFEI525X2OLAW1WLUMCN0F1J' # your Foursquare Secret
VERSION = '20200725' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0RGBNDWZRDACEKIXYD4TLZ24JCK2CHZ51TLYCDCPXYVRLDJP
CLIENT_SECRET:01B5TCEM3IUMXB1NKTQK4ACOKFEI525X2OLAW1WLUMCN0F1J


In [16]:
clean_df.loc[0, 'Postal Code']

neighborhood_latitude = clean_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = clean_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = clean_df.loc[0, 'Postal Code'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of M3A are 43.7532586, -79.3296565.


In [17]:
import requests
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius


url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f24c6971474c964139f8e7a'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

In [18]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,Corrosion Service Company Limited,Construction & Landscaping,43.752432,-79.334661


In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
t_venues = getNearbyVenues(names=clean_df['Postal Code'],
                                   latitudes=clean_df['Latitude'],
                                   longitudes=clean_df['Longitude'])

M3A
M4A
M5A
M6A
M7A
M9A
M1B
M3B
M4B
M5B
M6B
M9B
M1C
M3C
M4C
M5C
M6C
M9C
M1E
M4E
M5E
M6E
M1G
M4G
M5G
M6G
M1H
M2H
M3H
M4H
M5H
M6H
M1J
M2J
M3J
M4J
M5J
M6J
M1K
M2K
M3K
M4K
M5K
M6K
M1L
M2L
M3L
M4L
M5L
M6L
M9L
M1M
M2M
M3M
M4M
M5M
M6M
M9M
M1N
M2N
M3N
M4N
M5N
M6N
M9N
M1P
M2P
M4P
M5P
M6P
M9P
M1R
M2R
M4R
M5R
M6R
M7R
M9R
M1S
M4S
M5S
M6S
M1T
M4T
M5T
M1V
M4V
M5V
M8V
M9V
M1W
M4W
M5W
M8W
M9W
M1X
M4X
M5X
M8X
M4Y
M7Y
M8Y
M8Z


In [23]:
print(t_venues.shape)
t_venues.head()

(2140, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M3A,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,M4A,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,M4A,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [24]:
t_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,1,1,1,1,1,1
M1C,1,1,1,1,1,1
M1E,9,9,9,9,9,9
M1G,3,3,3,3,3,3
M1H,8,8,8,8,8,8
...,...,...,...,...,...,...
M9N,2,2,2,2,2,2
M9P,8,8,8,8,8,8
M9R,4,4,4,4,4,4
M9V,8,8,8,8,8,8


In [25]:
print('There are {} uniques categories.'.format(len(t_venues['Venue Category'].unique())))

There are 267 uniques categories.


In [26]:
t_onehot = pd.get_dummies(t_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
t_onehot['Neighborhood'] = t_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [t_onehot.columns[-1]] + list(t_onehot.columns[:-1])
t_onehot = t_onehot[fixed_columns]

t_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
t_onehot.shape

(2140, 267)

In [28]:
t_grouped = t_onehot.groupby('Neighborhood').mean().reset_index()
t_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,M9N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,M9P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
98,M9R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
99,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
t_grouped.shape

(101, 267)

In [30]:
num_top_venues = 5

for hood in t_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = t_grouped[t_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M1B----
                  venue  freq
0  Fast Food Restaurant   1.0
1                 Motel   0.0
2     Martial Arts Dojo   0.0
3        Massage Studio   0.0
4        Medical Center   0.0


----M1C----
                 venue  freq
0                  Bar   1.0
1          Yoga Studio   0.0
2   Miscellaneous Shop   0.0
3  Moroccan Restaurant   0.0
4  Monument / Landmark   0.0


----M1E----
                venue  freq
0       Moving Target  0.11
1  Mexican Restaurant  0.11
2      Medical Center  0.11
3                Bank  0.11
4          Restaurant  0.11


----M1G----
                 venue  freq
0          Coffee Shop  0.67
1    Korean Restaurant  0.33
2          Yoga Studio  0.00
3  Moroccan Restaurant  0.00
4  Monument / Landmark  0.00


----M1H----
                 venue  freq
0                 Bank  0.12
1  Fried Chicken Joint  0.12
2     Hakka Restaurant  0.12
3               Bakery  0.12
4          Gas Station  0.12


----M1J----
                       venue  freq
0            

4  Monument / Landmark   0.0


----M5P----
              venue  freq
0     Jewelry Store  0.25
1  Sushi Restaurant  0.25
2          Bus Line  0.25
3             Trail  0.25
4             Motel  0.00


----M5R----
            venue  freq
0            Café  0.14
1  Sandwich Place  0.14
2     Coffee Shop  0.09
3     Pizza Place  0.05
4    Burger Joint  0.05


----M5S----
            venue  freq
0            Café  0.14
1             Bar  0.06
2  Sandwich Place  0.06
3          Bakery  0.06
4      Restaurant  0.06


----M5T----
                           venue  freq
0                           Café  0.08
1                    Coffee Shop  0.06
2  Vegetarian / Vegan Restaurant  0.06
3             Mexican Restaurant  0.05
4                            Bar  0.05


----M5V----
              venue  freq
0   Airport Service  0.20
1    Airport Lounge  0.13
2  Airport Terminal  0.13
3   Harbor / Marina  0.07
4           Airport  0.07


----M5W----
                venue  freq
0         Coffee Shop  0.

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = t_grouped['Neighborhood']

for ind in np.arange(t_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(t_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Fast Food Restaurant,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
1,M1C,Bar,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant,Falafel Restaurant
2,M1E,Electronics Store,Moving Target,Breakfast Spot,Medical Center,Intersection,Mexican Restaurant,Bank,Rental Car Location,Restaurant,Cosmetics Shop
3,M1G,Coffee Shop,Korean Restaurant,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Donut Shop
4,M1H,Fried Chicken Joint,Gas Station,Bank,Hakka Restaurant,Athletics & Sports,Caribbean Restaurant,Thai Restaurant,Bakery,Department Store,Dessert Shop


In [33]:
# set number of clusters
kclusters = 5

t_grouped_clustering = t_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(t_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 3, 0, 3, 3, 3, 3, 3, 3])

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

t_merged = clean_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
t_merged = t_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Postal Code')

t_merged.head(50) # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Construction & Landscaping,Park,Food & Drink Shop,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,French Restaurant,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Discount Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3.0,Coffee Shop,Bakery,Pub,Park,Breakfast Spot,Café,Theater,Event Space,Chocolate Shop,Beer Store
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3.0,Clothing Store,Accessories Store,Women's Store,Event Space,Shoe Store,Miscellaneous Shop,Furniture / Home Store,Boutique,Vietnamese Restaurant,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3.0,Coffee Shop,Diner,Gym,Park,Mexican Restaurant,Japanese Restaurant,Italian Restaurant,Hobby Shop,General Entertainment,Fried Chicken Joint
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,2.0,Pizza Place,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,3.0,Fast Food Restaurant,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
7,M3B,North York,Don Mills,43.745906,-79.352188,3.0,Gym,Café,Caribbean Restaurant,Japanese Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,3.0,Pizza Place,Pharmacy,Gastropub,Café,Intersection,Athletics & Sports,Bank,Gym / Fitness Center,Pet Store,Doner Restaurant
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3.0,Clothing Store,Coffee Shop,Café,Italian Restaurant,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Fast Food Restaurant,Electronics Store,Pizza Place


In [35]:
t_merged=t_merged.dropna()

In [36]:
t_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Construction & Landscaping,Park,Food & Drink Shop,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,French Restaurant,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Discount Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,3.0,Coffee Shop,Bakery,Pub,Park,Breakfast Spot,Café,Theater,Event Space,Chocolate Shop,Beer Store
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3.0,Clothing Store,Accessories Store,Women's Store,Event Space,Shoe Store,Miscellaneous Shop,Furniture / Home Store,Boutique,Vietnamese Restaurant,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3.0,Coffee Shop,Diner,Gym,Park,Mexican Restaurant,Japanese Restaurant,Italian Restaurant,Hobby Shop,General Entertainment,Fried Chicken Joint
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,3.0,River,Pool,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,3.0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Café,Hotel,Bubble Tea Shop,Pub,Yoga Studio
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,3.0,Light Rail Station,Yoga Studio,Auto Workshop,Skate Park,Burrito Place,Spa,Farmers Market,Fast Food Restaurant,Restaurant,Butcher
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,1.0,Business Service,Baseball Field,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store,Falafel Restaurant


In [37]:
map_clusters = folium.Map(location=[t_latitude, t_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(t_merged['Latitude'], t_merged['Longitude'], t_merged['Postal Code'], t_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Final notes

> Well this honestly was a really hard and frustrating assignemnt. I spent so much more time on pandas and jupyer, and geopy, going through their documentation. This was to just make things work let alone actually learn and understand the analysis of the map and the end result. So I really don't know what my results mean at all or how accutrate they are nor do I have the time to figure that out. It took so much to take what the course gave for pandas and jupyter adn geopy and get it working as the course is in need of updating to match mroe recent changes. including pandas 1.0.0+ and geopy 2.0 and pandas has alwyas annoyed me for literally always making me dig into thier backend to figure out what is going on and how to get it working. I admit I much prefer r's tidyverse over pandas by a long shot.