# Business Problem

The aim is to help tourists choose their destinations depending on the experiences that the neighbourhoods have to offer and what they would want to have. This also helps people make decisions if they are thinking about migrating to London or Toronto or even if they want to relocate neighbourhoods within the city. Our findings will help stakeholders make informed decisions and address any concerns they have including the different kinds of cuisines, provision stores and what the city has to offer.

# Data Description

We require the gro location data of both the cities. we need the postal codes to find the nearby neighbourhoods, boroughs, venues and their most popular venue categories.

## London

To derive our solution, We scrape our data from https://en.wikipedia.org/wiki/List_of_areas_of_London

This wikipedia page has information about all the neighbourhoods, we limit it London.

1. *borough* : Name of Neighbourhood
2. *town* : Name of borough
3. *post_code* : Postal codes for London.

This wikipedia page lacks information about the geographical locations. To solve this problem we use ArcGIS API

## ArcGIS API
we use ArcGIS to get the geo locations of the neighbourhoods of London. The following columns are added to our initial dataset which prepares our data. 

4. *latitude* : Latitude for Neighbourhood
5. *longitude* : Longitude for Neighbourhood

## Toronto

To derive our solution, We scrape the data from https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&direction=prev&oldid=926287641


The JSON file has data about all the neighbourhoods in Canada, we limit to toronto.

1. *post_code* : Postal codes for Toronto.
2. *borough* : Name of the boroughs
3. *Neighbourhood* : Name of Neighbourhood

We use data from https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv to get the latitudes and logitudes for the postal codes. 

4. *latitude* : Latitude for Neighbourhood
5. *longitude* : Longitude for Neighbourhood

## Foursquare API Data

We will like knowledge regarding totally different venues in numerous neighbourhoods of that specific borough. so as to realize that info we'll use "Foursquare" locational info. Foursquare may be a location knowledge supplier with info regarding all manner of venues and events at intervals a part of interest. Such info includes venue names, locations, menus and even photos. As such, the foursquare location platform are going to be used because the sole knowledge supply since all the expressed needed info are often obtained through the API.

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.

Based on all the knowledge collected for each London and Toronto, we've got spare information to create our model. we have a tendency to cluster the neighbourhoods along supported similar venue classes. we have a tendency to then gift our observations and findings. victimization this information, our stakeholders will take the required call.

# Libraries used

In [113]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim 
import folium

# import k-means for the clustering stage
from sklearn.cluster import KMeans

# Exploring London

## Data Collection

In [5]:
wiki_london = requests.get("https://en.wikipedia.org/wiki/List_of_areas_of_London")
london_data = pd.read_html(wiki_london.text)

In [7]:
london_data = london_data[1]
london_data

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


## Data Preprocessing

In [8]:
london_data.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)

## Feature Selection

we only need borough,Neighborhood and postal codes

In [9]:
df = london_data.drop( [ london_data.columns[0], london_data.columns[4], london_data.columns[5] ], axis=1)

In [10]:
df.columns = ['borough','neighborhood','post_code']
df

Unnamed: 0,borough,neighborhood,post_code
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...
526,Greenwich,LONDON,SE18
527,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
528,Hammersmith and Fulham,LONDON,W12
529,Hillingdon,HAYES,UB4


## Geolocations of the London Neighbourhoods
### ArcGis API
We need to get the geographical co-ordinates for the neighbourhoods to plot out map.

In [12]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

In [13]:
def get_x_y_uk(address1):
   lat_coords = 0
   lng_coords = 0
   g = geocode(address='{}, London, England, GBR'.format(address1))[0]
   lng_coords = g['location']['x']
   lat_coords = g['location']['y']
   return str(lat_coords) +","+ str(lng_coords)

In [14]:
c = get_x_y_uk('SE2')
c

'51.499741450000045,0.12406135200006929'

In [18]:
geo_coordinates_uk = df['post_code']    
geo_coordinates_uk

0            SE2
1         W3, W4
2            CR0
3            CR0
4      DA5, DA14
         ...    
526         SE18
527          KT4
528          W12
529          UB4
530          UB7
Name: post_code, Length: 531, dtype: object

In [19]:
coordinates_latlng_uk = geo_coordinates_uk.apply(lambda x: get_x_y_uk(x))
coordinates_latlng_uk

0       51.499741450000045,0.12406135200006929
1        51.49776500000007,-0.2558519459999502
2       51.38475500000004,-0.05149847299992416
3       51.38475500000004,-0.05149847299992416
4      51.507408360000056,-0.12769869299995662
                        ...                   
526     51.50312900000006,-0.10802518599996347
527    51.507408360000056,-0.12769869299995662
528    51.515085000000056,-0.24269643599996016
529    51.507408360000056,-0.12769869299995662
530    51.507408360000056,-0.12769869299995662
Name: post_code, Length: 531, dtype: object

## Latitude

In [20]:
lat_uk = coordinates_latlng_uk.apply(lambda x: x.split(',')[0])
lat_uk

0      51.499741450000045
1       51.49776500000007
2       51.38475500000004
3       51.38475500000004
4      51.507408360000056
              ...        
526     51.50312900000006
527    51.507408360000056
528    51.515085000000056
529    51.507408360000056
530    51.507408360000056
Name: post_code, Length: 531, dtype: object

## Longitude

In [57]:
lng_uk = coordinates_latlng_uk.apply(lambda x: x.split(',')[1])
lng_uk

0       0.12406135200006929
1       -0.2558519459999502
2      -0.05149847299992416
3      -0.05149847299992416
4      -0.12769869299995662
               ...         
526    -0.10802518599996347
527    -0.12769869299995662
528    -0.24269643599996016
529    -0.12769869299995662
530    -0.12769869299995662
Name: post_code, Length: 531, dtype: object

We now have the geographical co-ordinates of the London Neighbourhoods.

We proceed with Merging our source data with the geographical co-ordinates 

In [69]:
london_merged = pd.concat([df,lat_uk.astype(float), lng_uk.astype(float)], axis=1)
london_merged.columns= ['borough','neighborhood','post_code','latitude','longitude']
london_merged

Unnamed: 0,borough,neighborhood,post_code,latitude,longitude
0,"Bexley, Greenwich [7]",LONDON,SE2,51.499741,0.124061
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",51.497765,-0.255852
2,Croydon[8],CROYDON,CR0,51.384755,-0.051498
3,Croydon[8],CROYDON,CR0,51.384755,-0.051498
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14",51.507408,-0.127699
...,...,...,...,...,...
526,Greenwich,LONDON,SE18,51.503129,-0.108025
527,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,51.507408,-0.127699
528,Hammersmith and Fulham,LONDON,W12,51.515085,-0.242696
529,Hillingdon,HAYES,UB4,51.507408,-0.127699


### Co-ordinates for London

In [60]:
london = geocode(address='London, England, GBR')[0]
london_lng_coords = london['location']['x']
london_lat_coords = london['location']['y']
london_lng_coords

-0.12769869299995662

In [61]:
london_lat_coords

51.507408360000056

## Visualize the Map of London

In [62]:
# Creating the map of London
map_London = folium.Map(location=[london_lat_coords, london_lng_coords], zoom_start=12)
map_London

# adding markers to map
for latitude, longitude, borough, town in zip(london_merged['latitude'], london_merged['longitude'], london_merged['borough'], london_merged['neighborhood']):
    label = '{}, {}'.format(town, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_London)  
    
map_London

### Venues in London

To proceed with the next part, we need to define Foursquare API credentials.

Using Foursquare API, we are able to get the venue and venue categories around each neighbourhood in London.

In [74]:
CLIENT_ID = '1MEGV2SFZYQGYSH4YEOUJHDJRNHOZ45XGGGRYZWORE30UAQE' # your Foursquare ID
CLIENT_SECRET = '1BY3IPKHSRNCR5LNWGEFW3BNSWJDODBC1JV52A4N2YFXJINY' # your Foursquare Secret
VERSION = '20210501' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
LIMIT = 100

Your credentails:
CLIENT_ID: 1MEGV2SFZYQGYSH4YEOUJHDJRNHOZ45XGGGRYZWORE30UAQE
CLIENT_SECRET:1BY3IPKHSRNCR5LNWGEFW3BNSWJDODBC1JV52A4N2YFXJINY


Defining a function to get the neraby venues in the neighbourhood. This will help us get venue categories which is important for our analysis

In [76]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Getting the venues in London

In [77]:
venues_in_London = getNearbyVenues(london_merged['borough'], london_merged['latitude'], london_merged['longitude'])

Bexley, Greenwich [7]
Ealing, Hammersmith and Fulham[8]
Croydon[8]
Croydon[8]
Bexley
Redbridge[9]
City[10]
Westminster[10]
Brent[11]
Bromley[11]
Islington[8]
Bromley[11]
Islington[12]
Havering[12]
Barnet[12]
Enfield[12]
Wandsworth[13]
Southwark[14]
City[14]
Barking and Dagenham[14]
Redbridge[15]
Bexley[15]
Richmond upon Thames[15]
Bexley[16]
Barnet
Barnet[16]
Islington[17]
Wandsworth[18]
Westminster[19]
Bromley[20]
Newham[20]
Barking and Dagenham[20]
Barking and Dagenham[21]
Sutton[21]
Ealing[21]
Westminster[22]
Lewisham[22]
Harrow[22]
Sutton[22]
Camden[23]
Bexley[23]
Southwark[24]
Kingston upon Thames[24]
Tower Hamlets[25]
Bexley[25]
Bexley[26]
Bromley[26]
Bromley[26]
Bexley[27]
City[27]
Lewisham[28]
Greenwich
Tower Hamlets[28]
Bexley
Camden[29]
Enfield[30]
Haringey[31]
Tower Hamlets[31]
Haringey[32]
Hounslow[33]
Barnet
Brent
Enfield[34]
Lambeth[34]
Lewisham[34]
Bromley[35]
Tower Hamlets[36]
Bromley[35]
Kensington and Chelsea, Hammersmith and Fulham[36]
Brent[36]
Barnet[37]
Enfield[38

In [78]:
venues_in_London.head()

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Bexley, Greenwich [7]",51.499741,0.124061,Southmere Lake,51.500381,0.125012,Lake
1,"Ealing, Hammersmith and Fulham[8]",51.497765,-0.255852,Hack & Veldt,51.494845,-0.255256,Coffee Shop
2,"Ealing, Hammersmith and Fulham[8]",51.497765,-0.255852,Good Boy Coffee,51.49445,-0.255996,Coffee Shop
3,"Ealing, Hammersmith and Fulham[8]",51.497765,-0.255852,Lara Restaurant,51.495515,-0.255263,Mediterranean Restaurant
4,"Ealing, Hammersmith and Fulham[8]",51.497765,-0.255852,Chief Coffee,51.493997,-0.254861,Coffee Shop


In [79]:
venues_in_London.shape

(28548, 7)

### Grouping by Venue Categories

We need to now see how many Venue Categories are there for further processing

In [80]:
venues_in_London.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,Camden[29],51.517165,-0.126810,James Smith & Sons,51.516902,-0.126843
Adult Boutique,Hackney,51.529675,-0.083470,Sh! Women's Erotic Emporium,51.527102,-0.083728
Afghan Restaurant,Enfield,51.535185,-0.100543,Afghan Kitchen,51.535760,-0.102973
African Restaurant,Westminster,51.586805,-0.065515,Le Chamarel,51.589688,-0.070170
American Restaurant,Westminster[19],51.515390,0.029115,When Mac Met Cheese,51.517239,0.024169
...,...,...,...,...,...,...
Wine Bar,Westminster[22],51.557670,-0.069523,Vivat Bacchus,51.555434,-0.068002
Wine Shop,Westminster[22],51.553177,-0.061776,Wine Pantry,51.556078,-0.061502
Winery,Lambeth,51.463370,-0.115820,The Wine Parlour,51.461466,-0.111461
Women's Store,Westminster[22],51.504456,-0.140785,Victoria Beckham,51.508666,-0.142451


We can see 321 records, just goes to show how diverse and interesting the place is.

### One Hot Encoding 
We need to Encode our venue categories to get a better result for our clustering

In [81]:
London_venue_cat = pd.get_dummies(venues_in_London[['Venue Category']], prefix="", prefix_sep="")
London_venue_cat

Unnamed: 0,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28543,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
28544,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
28545,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
28546,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Adding Neighbourhood into the mix.

In [82]:
London_venue_cat['Neighbourhood'] = venues_in_London['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [London_venue_cat.columns[-1]] + list(London_venue_cat.columns[:-1])
London_venue_cat = London_venue_cat[fixed_columns]

London_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,"Bexley, Greenwich [7]",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Ealing, Hammersmith and Fulham[8]",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Ealing, Hammersmith and Fulham[8]",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Ealing, Hammersmith and Fulham[8]",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Ealing, Hammersmith and Fulham[8]",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Venue categories mean value
We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood

In [83]:
London_grouped = London_venue_cat.groupby('Neighbourhood').mean().reset_index()
London_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.010178,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.030534,0.0,0.0,0.0,0.0
1,Barking and Dagenham[14],0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Barking and Dagenham[20],0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.030928,0.0,0.0,0.0,0.0
3,Barking and Dagenham[21],0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.030928,0.0,0.0,0.0,0.0
4,Barnet,0.0,0.0,0.0,0.001773,0.0,0.0,0.0,0.0,0.0,...,0.007092,0.0,0.010638,0.0,0.0,0.005319,0.0,0.0,0.0,0.0



Let's make a function to get the top most common venue categories

In [84]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


There are way too many venue categories, we can take the top 10 to cluster the neighbourhoods.

Creating a function to label the columns of the venue correctly

In [85]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


### Top venue categories

Getting the top venue categories in London

In [86]:
# create a new dataframe for London
neighborhoods_venues_sorted_london = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_london['Neighbourhood'] = London_grouped['Neighbourhood']

for ind in np.arange(London_grouped.shape[0]):
    neighborhoods_venues_sorted_london.iloc[ind, 1:] = return_most_common_venues(London_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_london.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Theater,Hotel,Plaza,Pub,Bakery,English Restaurant,Japanese Restaurant,Ice Cream Shop,Cocktail Bar,Garden
1,Barking and Dagenham[14],Grocery Store,Spa,Supermarket,Park,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market
2,Barking and Dagenham[20],Theater,Hotel,Pub,Plaza,Bakery,Japanese Restaurant,Monument / Landmark,English Restaurant,Cocktail Bar,Garden
3,Barking and Dagenham[21],Theater,Hotel,Pub,Plaza,Bakery,Japanese Restaurant,Monument / Landmark,English Restaurant,Cocktail Bar,Garden
4,Barnet,Pub,Coffee Shop,Park,Bakery,Café,Cocktail Bar,Bus Stop,Gastropub,Theater,Hotel


## Model Building

### K Means
Let's cluster the city of london to roughly 5 to make it easier to analyze. 

We use the K Means clustering technique to do so.

In [87]:
# set number of clusters
k_num_clusters = 5

London_grouped_clustering = London_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=k_num_clusters, random_state=0).fit(London_grouped_clustering)
kmeans_london

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [88]:
kmeans_london.labels_

array([3, 0, 3, 3, 0, 0, 3, 3, 0, 0, 3, 3, 3, 0, 1, 0, 3, 3, 3, 1, 3, 3,
       0, 0, 3, 3, 3, 0, 3, 3, 3, 3, 3, 0, 3, 0, 3, 0, 0, 3, 3, 0, 3, 4,
       3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3,
       3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 3, 4,
       3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 3, 0, 0, 3], dtype=int32)

So our model has labeled the city

In [89]:
neighborhoods_venues_sorted_london.insert(0, 'Cluster Labels', kmeans_london.labels_ +1)

Join London_merged with our neighbourhood venues sorted to add latitude & longitude for each of the neighborhood to prepare it for plotting

In [90]:
london_data = london_merged

london_data = london_data.join(neighborhoods_venues_sorted_london.set_index('Neighbourhood'), on='borough')

london_data.head()

Unnamed: 0,borough,neighborhood,post_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich [7]",LONDON,SE2,51.499741,0.124061,2,Lake,Yoga Studio,Ethiopian Restaurant,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",51.497765,-0.255852,1,Coffee Shop,Park,Spa,Grocery Store,Pub,Comedy Club,Fish & Chips Shop,Soccer Field,Bus Stop,Platform
2,Croydon[8],CROYDON,CR0,51.384755,-0.051498,5,Grocery Store,Yoga Studio,Food Stand,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
3,Croydon[8],CROYDON,CR0,51.384755,-0.051498,5,Grocery Store,Yoga Studio,Food Stand,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14",51.507408,-0.127699,4,Theater,Hotel,Bakery,Plaza,Pub,English Restaurant,Cocktail Bar,Japanese Restaurant,Monument / Landmark,Wine Bar



Drop all the NaN values to prevent data skew

In [91]:
london_data_nonan = london_data.dropna(subset=['Cluster Labels'])

### Visualizing the clustered neighbourhood
Let's plot the clusters

In [92]:
map_clusters_london = folium.Map(location=[london_lat_coords, london_lng_coords], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_data_nonan['latitude'], london_data_nonan['longitude'], london_data_nonan['borough'], london_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters_london)
        
map_clusters_london

## Examining our Clusters

Cluster 1

In [93]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 1, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,LONDON,1,Coffee Shop,Park,Spa,Grocery Store,Pub,Comedy Club,Fish & Chips Shop,Soccer Field,Bus Stop,Platform
7,LONDON,1,Coffee Shop,Pub,Sandwich Place,Hotel,Restaurant,Theater,French Restaurant,Italian Restaurant,History Museum,Building
10,LONDON,1,Pub,Coffee Shop,Cocktail Bar,Sandwich Place,Thai Restaurant,Bar,Grocery Store,Bus Stop,Café,Food & Drink Shop
12,LONDON,1,Café,Thai Restaurant,Pub,Park,Indian Restaurant,Italian Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Cocktail Bar,Coffee Shop
15,LONDON,1,Pub,Coffee Shop,Cocktail Bar,Sandwich Place,Thai Restaurant,Bar,Grocery Store,Bus Stop,Café,Food & Drink Shop
...,...,...,...,...,...,...,...,...,...,...,...,...
519,LONDON,1,Pub,Café,Coffee Shop,Cocktail Bar,Gastropub,Hotel,Theater,Food & Drink Shop,Breakfast Spot,French Restaurant
520,LONDON,1,Café,Pub,Grocery Store,Coffee Shop,Park,Bus Stop,Yoga Studio,Hotel,Fast Food Restaurant,Restaurant
525,LONDON,1,Pub,Coffee Shop,Park,Bakery,Café,Cocktail Bar,Bus Stop,Gastropub,Theater,Hotel
526,LONDON,1,Pub,Coffee Shop,Hotel,Bar,Gym / Fitness Center,Café,Italian Restaurant,Grocery Store,Bakery,Thai Restaurant


Cluster 2

In [94]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 2, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,LONDON,2,Lake,Yoga Studio,Ethiopian Restaurant,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
45,"BEXLEYHEATH, LONDON",2,Lake,Yoga Studio,Ethiopian Restaurant,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant


Cluster 3

In [95]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 3, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,LONDON,3,Playground,Hardware Store,Yoga Studio,Filipino Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market


Cluster 4

In [96]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 4, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"BEXLEY, SIDCUP",4,Theater,Hotel,Bakery,Plaza,Pub,English Restaurant,Cocktail Bar,Japanese Restaurant,Monument / Landmark,Wine Bar
5,ILFORD,4,Theater,Hotel,Pub,Plaza,Bakery,Japanese Restaurant,Monument / Landmark,English Restaurant,Cocktail Bar,Garden
6,LONDON,4,Hotel,Coffee Shop,Cocktail Bar,Gym / Fitness Center,Restaurant,Italian Restaurant,Wine Bar,Garden,English Restaurant,Asian Restaurant
8,WEMBLEY,4,Theater,Hotel,Pub,Plaza,Bakery,Japanese Restaurant,Monument / Landmark,English Restaurant,Cocktail Bar,Garden
9,LONDON,4,Theater,Hotel,Bakery,Plaza,Pub,English Restaurant,Steakhouse,Ice Cream Shop,Cocktail Bar,Japanese Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
523,ISLEWORTH,4,Theater,Hotel,Pub,Bakery,Plaza,Monument / Landmark,Garden,Ice Cream Shop,Japanese Restaurant,Steakhouse
524,CROYDON,4,Theater,Hotel,Pub,Bakery,Plaza,Wine Bar,Garden,Ice Cream Shop,Cocktail Bar,Steakhouse
527,WORCESTER PARK,4,Theater,Hotel,Pub,Plaza,Bakery,Japanese Restaurant,Monument / Landmark,English Restaurant,Cocktail Bar,Garden
529,HAYES,4,Theater,Hotel,Pub,Plaza,Bakery,Japanese Restaurant,Monument / Landmark,English Restaurant,Cocktail Bar,Garden


Cluster 5

In [97]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 5, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,CROYDON,5,Grocery Store,Yoga Studio,Food Stand,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
3,CROYDON,5,Grocery Store,Yoga Studio,Food Stand,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
33,"WALLINGTON, CROYDON",5,Grocery Store,Yoga Studio,Food Stand,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


# Exploring Toronto

## Data Collection

To get the neighbourhoods in london, we start by scraping the list of areas of toronto wiki page. after scraping, 

In [107]:
html_data = requests.get('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&direction=prev&oldid=926287641')
wiki_raw = pd.read_html(html_data.content, header = 0)[0]
df1 = wiki_raw[wiki_raw.Borough != 'Not assigned']
df1 = df1.rename(columns={'Postcode': 'Postal Code','Neighbourhood':'Neighborhood'})
df1.reset_index(inplace = True)
del df1['index']
df1.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


In [118]:
df1.shape

(211, 5)

we have 211 neighborhoods to work with.

we use the csv containing in the url to use

In [110]:
df1_geo = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')
df1_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Data Preprocessing

merging our dataframe with new coordinates to work with

In [111]:
df1 = df1.join(df1_geo.set_index('Postal Code'), on='Postal Code')
df1.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763


In [119]:
df1.shape

(211, 5)

### Co-ordinates for Toronto

Getting the geocode for toronto to help visualize it on the map

In [114]:
address = 'Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


## Visualize the Map of Toronto

To help visualize the Map of Toronto and the neighbourhoods in Toronto, we make use of the folium package.

In [116]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df1['Latitude'], df1['Longitude'], df1['Borough'], df1['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Venues in Toronto

To proceed with the next part, we need to define Foursquare API credentials.

Using Foursquare API, we are able to get the venue and venue categories around each neighbourhood in toronto.

Getting the venues in Toronto

In [122]:
toronto_venues = getNearbyVenues(names=df1['Neighborhood'],
                                   latitudes=df1['Latitude'],
                                   longitudes=df1Getting the venues in London['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront
Regent Park
Lawrence Heights
Lawrence Manor
Not assigned
Islington Avenue
Rouge
Malvern
Don Mills North
Woodbine Gardens
Parkview Hill
Ryerson
Garden District
Glencairn
Cloverdale
Islington
Martin Grove
Princess Gardens
West Deane Park
Highland Creek
Rouge Hill
Port Union
Flemingdon Park
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens
Eringate
Markland Wood
Old Burnhamthorpe
Guildwood
Morningside
West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor
Downsview North
Wilson Heights
Thorncliffe Park
Adelaide
King
Richmond
Dovercourt Village
Dufferin
Scarborough Village
Fairview
Henry Farm
Oriole
Northwood Park
York University
East Toronto
Harbourfront East
Toronto Islands
Union Station
Little Portugal
Trinity
East Birchmount Park
Ionview
Kennedy Park
Bayview Village
CFB Toronto
Downsview East
The Danforth West
Riverdale
Design E

In [123]:
print('{} venues were returned by Foursquare.'.format(toronto_venues.shape[0]))
toronto_venues = toronto_venues.rename(columns={'Neighbourhood':'Neighborhood'})
toronto_venues.head()

4214 venues were returned by Foursquare.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [124]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 271 uniques categories.


### One Hot Encoding 
We need to Encode our venue categories to get a better result for our clustering

In [125]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
manhattan_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()### One Hot Encoding 
We need to Encode our venue categories to get a better result for our clustering

Unnamed: 0,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [126]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,...,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Albion Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0



There are way too many venue categories, we can take the top 10 to cluster the neighbourhoods.

Creating a function to label the columns of the venue correctly

In [149]:
num_top_venues = 10

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
             venue  freq
0      Coffee Shop  0.11
1             Café  0.05
2       Restaurant  0.04
3  Thai Restaurant  0.04
4              Gym  0.03
5   Clothing Store  0.03
6    Deli / Bodega  0.03
7   Cosmetics Shop  0.02
8    Burrito Place  0.02
9        Bookstore  0.02


----Agincourt----
                             venue  freq
0                   Clothing Store  0.25
1                           Lounge  0.25
2                   Breakfast Spot  0.25
3        Latin American Restaurant  0.25
4               Miscellaneous Shop  0.00
5              Moroccan Restaurant  0.00
6              Monument / Landmark  0.00
7  Molecular Gastronomy Restaurant  0.00
8       Modern European Restaurant  0.00
9                Mobile Phone Shop  0.00


----Agincourt North----
                             venue  freq
0                       Playground  0.33
1                             Park  0.33
2                     Intersection  0.33
3        Middle Eastern Restaurant  0.00
4     

9            Gastropub  0.02


----Don Mills North----
                  venue  freq
0  Caribbean Restaurant  0.14
1        Baseball Field  0.14
2                  Café  0.14
3                   Gym  0.14
4    Athletics & Sports  0.14
5          Dessert Shop  0.14
6   Japanese Restaurant  0.14
7   Monument / Landmark  0.00
8         Movie Theater  0.00
9                 Motel  0.00


----Don Mills South----
                venue  freq
0         Coffee Shop  0.10
1          Restaurant  0.10
2                 Gym  0.10
3      Clothing Store  0.05
4      Discount Store  0.05
5      Sandwich Place  0.05
6          Beer Store  0.05
7         Supermarket  0.05
8  Italian Restaurant  0.05
9           Bike Shop  0.05


----Dorset Park----
                             venue  freq
0                Indian Restaurant  0.33
1                        Pet Store  0.17
2           Thrift / Vintage Store  0.17
3            Vietnamese Restaurant  0.17
4               Chinese Restaurant  0.17
5            

9    Italian Restaurant  0.04


----Ionview----
                 venue  freq
0           Hobby Shop  0.17
1    Convenience Store  0.17
2   Chinese Restaurant  0.17
3     Department Store  0.17
4          Coffee Shop  0.17
5          Bus Station  0.17
6    Accessories Store  0.00
7  Monument / Landmark  0.00
8        Movie Theater  0.00
9                Motel  0.00


----Island airport----
                 venue  freq
0       Airport Lounge  0.13
1      Airport Service  0.13
2      Harbor / Marina  0.07
3                  Bar  0.07
4                Plane  0.07
5  Rental Car Location  0.07
6          Coffee Shop  0.07
7     Sculpture Garden  0.07
8        Boat or Ferry  0.07
9             Boutique  0.07


----Jamestown----
                        venue  freq
0               Grocery Store  0.22
1                 Pizza Place  0.22
2                    Pharmacy  0.11
3                  Beer Store  0.11
4         Fried Chicken Joint  0.11
5        Fast Food Restaurant  0.11
6              Sa

9  Mediterranean Restaurant  0.00


----Not assigned----
                 venue  freq
0          Coffee Shop  0.20
1     Sushi Restaurant  0.07
2          Yoga Studio  0.03
3   Italian Restaurant  0.03
4   Mexican Restaurant  0.03
5        Smoothie Shop  0.03
6                 Café  0.03
7  Fried Chicken Joint  0.03
8             Beer Bar  0.03
9       Sandwich Place  0.03


----Oakridge----
                             venue  freq
0                           Bakery  0.25
1                         Bus Line  0.12
2                   Ice Cream Shop  0.12
3                     Soccer Field  0.12
4                     Intersection  0.12
5                             Park  0.12
6                      Bus Station  0.12
7              Moroccan Restaurant  0.00
8              Monument / Landmark  0.00
9  Molecular Gastronomy Restaurant  0.00


----Old Burnhamthorpe----
                 venue  freq
0                 Café  0.12
1          Pizza Place  0.12
2    Convenience Store  0.12
3         

9             Bagel Shop  0.07


----South Niagara----
                 venue  freq
0       Airport Lounge  0.13
1      Airport Service  0.13
2      Harbor / Marina  0.07
3                  Bar  0.07
4                Plane  0.07
5  Rental Car Location  0.07
6          Coffee Shop  0.07
7     Sculpture Garden  0.07
8        Boat or Ferry  0.07
9             Boutique  0.07


----South Steeles----
                        venue  freq
0               Grocery Store  0.22
1                 Pizza Place  0.22
2                    Pharmacy  0.11
3                  Beer Store  0.11
4         Fried Chicken Joint  0.11
5        Fast Food Restaurant  0.11
6              Sandwich Place  0.11
7             Organic Grocery  0.00
8  Modern European Restaurant  0.00
9                   Pet Store  0.00


----South of Bloor----
                    venue  freq
0                  Bakery  0.08
1  Thrift / Vintage Store  0.08
2            Burger Joint  0.08
3             Wings Joint  0.08
4          Hardware S

                venue  freq
0        Dance Studio  0.11
1        Intersection  0.11
2                Park  0.11
3            Bus Stop  0.11
4         Video Store  0.11
5         Curling Ice  0.11
6  Athletics & Sports  0.11
7          Beer Store  0.11
8        Skating Rink  0.11
9   Mobile Phone Shop  0.00


----York Mills West----
                             venue  freq
0       Construction & Landscaping  0.33
1                Convenience Store  0.33
2                             Park  0.33
3                Accessories Store  0.00
4        Middle Eastern Restaurant  0.00
5              Monument / Landmark  0.00
6  Molecular Gastronomy Restaurant  0.00
7       Modern European Restaurant  0.00
8                Mobile Phone Shop  0.00
9               Miscellaneous Shop  0.00


----York University----
                      venue  freq
0      Caribbean Restaurant  0.17
1               Coffee Shop  0.17
2        Falafel Restaurant  0.17
3        Miscellaneous Shop  0.17
4                  

In [139]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Thai Restaurant,Restaurant,Clothing Store,Deli / Bodega,Gym,Salad Place,Steakhouse,Hotel
1,Agincourt,Lounge,Latin American Restaurant,Breakfast Spot,Clothing Store,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore
2,Agincourt North,Playground,Intersection,Park,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
3,Albion Gardens,Pizza Place,Grocery Store,Pharmacy,Beer Store,Sandwich Place,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
4,Alderwood,Pizza Place,Coffee Shop,Pub,Sandwich Place,Gym,Electronics Store,Escape Room,Eastern European Restaurant,Dumpling Restaurant,Drugstore


## Model Building

### K Means
Let's cluster the city of london to roughly 5 to make it easier to analyze. 

We use the K Means clustering technique to do so.

In [140]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clusterin## Model Building

### K Means
Let's cluster the city of london to roughly 5 to make it easier to analyze. 

We use the K Means clustering technique to do so.g
kmeans_toronto = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans_toronto

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [141]:
# check cluster labels generated for each row in the dataframe
kmeans_toronto.labels_

array([3, 3, 1, 0, 0, 3, 3, 3, 0, 3, 3, 3, 0, 3, 3, 1, 3, 3, 1, 3, 3, 3,
       3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3,
       3, 3, 3, 3, 1, 4, 0, 3, 3, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 2, 3, 4, 3, 3, 0, 4, 1, 3, 3, 3, 0, 3, 3, 3, 3, 3, 4, 1,
       4, 3, 1, 0, 3, 3, 3, 1, 3, 3, 0, 0, 0, 1, 3, 1, 4, 3, 3, 3, 3, 3,
       3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 4, 3, 3, 3, 0, 1, 2, 3, 3, 3,
       3, 1, 3, 3, 1, 3, 0, 2, 4, 3, 3, 3, 3, 1, 3, 0, 3, 3, 3, 0, 3, 3,
       1, 1, 3, 3, 0, 3, 3, 4, 3, 0, 3, 3, 3, 3, 3, 3, 3, 4, 3, 0, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 0, 3, 1, 3, 3],
      dtype=int32)

So our model has labeled the city. Join London_merged with our neighbourhood venues sorted to add latitude & longitude for each of the neighborhood to prepare it for plotting. Drop all the NaN values to prevent data skew

In [142]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'ClusterLabels', kmeans_toronto.labels_)

toronto_merged = df1

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged = toronto_merged.dropna()


toronto_merged.head() 

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Park,Food & Drink Shop,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,Pizza Place,Intersection,Coffee Shop,Portuguese Restaurant,Hockey Arena,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,3.0,Coffee Shop,Café,Park,Pub,Bakery,Theater,Restaurant,Breakfast Spot,Brewery,French Restaurant
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,3.0,Coffee Shop,Café,Park,Pub,Bakery,Theater,Restaurant,Breakfast Spot,Brewery,French Restaurant
4,M6A,North York,Lawrence Heights,43.718518,-79.464763,3.0,Clothing Store,Accessories Store,Coffee Shop,Miscellaneous Shop,Furniture / Home Store,Athletics & Sports,Arts & Crafts Store,Carpet Store,Boutique,Vietnamese Restaurant


### Visualizing the clustered neighbourhood
Let's plot the clusters

In [133]:
# create map
map_clusters_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_clusters_toronto### Visualizing the clustered neighbourhood
Let's plot the clusters)
       
map_clusters_toronto

## Examining our Clusters

Cluster 1

In [134]:
toronto_merged.loc[toronto_merged['ClusterLabels'] ==1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,1.0,Park,Food & Drink Shop,Yoga Studio,Drugstore,Discount Store
28,York,1.0,Hockey Arena,Park,Field,Trail,Donut Shop
38,York,1.0,Park,Women's Store,Pool,Drugstore,Diner
54,Scarborough,1.0,Playground,Yoga Studio,Drugstore,Diner,Discount Store
60,East York,1.0,Intersection,Park,Convenience Store,Yoga Studio,Drugstore
70,North York,1.0,Airport,Park,Bus Stop,Yoga Studio,Drugstore
71,North York,1.0,Airport,Park,Bus Stop,Yoga Studio,Drugstore
112,Central Toronto,1.0,Park,Bus Line,Swim School,Yoga Studio,Donut Shop
120,North York,1.0,Park,Construction & Landscaping,Convenience Store,Yoga Studio,Drugstore
122,Central Toronto,1.0,Jewelry Store,Park,Sushi Restaurant,Trail,Drugstore


Cluster 2

In [144]:
toronto_merged.loc[toronto_merged['ClusterLabels'] ==2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Scarborough,2.0,Bar,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Farmers Market
22,Scarborough,2.0,Bar,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Farmers Market
23,Scarborough,2.0,Bar,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Farmers Market


Cluster 3

In [145]:
toronto_merged.loc[toronto_merged['ClusterLabels'] ==3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,3.0,Pizza Place,Intersection,Coffee Shop,Portuguese Restaurant,Hockey Arena,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run
2,Downtown Toronto,3.0,Coffee Shop,Café,Park,Pub,Bakery,Theater,Restaurant,Breakfast Spot,Brewery,French Restaurant
3,Downtown Toronto,3.0,Coffee Shop,Café,Park,Pub,Bakery,Theater,Restaurant,Breakfast Spot,Brewery,French Restaurant
4,North York,3.0,Clothing Store,Accessories Store,Coffee Shop,Miscellaneous Shop,Furniture / Home Store,Athletics & Sports,Arts & Crafts Store,Carpet Store,Boutique,Vietnamese Restaurant
5,North York,3.0,Clothing Store,Accessories Store,Coffee Shop,Miscellaneous Shop,Furniture / Home Store,Athletics & Sports,Arts & Crafts Store,Carpet Store,Boutique,Vietnamese Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
206,Etobicoke,3.0,Grocery Store,Convenience Store,Fast Food Restaurant,Burger Joint,Sandwich Place,Supplement Shop,Discount Store,Bakery,Thrift / Vintage Store,Tanning Salon
207,Etobicoke,3.0,Grocery Store,Convenience Store,Fast Food Restaurant,Burger Joint,Sandwich Place,Supplement Shop,Discount Store,Bakery,Thrift / Vintage Store,Tanning Salon
208,Etobicoke,3.0,Grocery Store,Convenience Store,Fast Food Restaurant,Burger Joint,Sandwich Place,Supplement Shop,Discount Store,Bakery,Thrift / Vintage Store,Tanning Salon
209,Etobicoke,3.0,Grocery Store,Convenience Store,Fast Food Restaurant,Burger Joint,Sandwich Place,Supplement Shop,Discount Store,Bakery,Thrift / Vintage Store,Tanning Salon


Cluster 4

In [146]:
toronto_merged.loc[toronto_merged['ClusterLabels'] ==4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
106,North York,4.0,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Farmers Market
107,North York,4.0,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Farmers Market
198,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
199,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
200,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
201,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
202,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
203,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
204,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
205,Etobicoke,4.0,Park,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


Cluster 5

In [147]:
toronto_merged.loc[toronto_merged['ClusterLabels'] ==5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


# Results and Discussion

The neighbourhoods of London are very mulitcultural. we have a 321 kinds of venues to attend. There are a lot of different cusines including Indian, Italian, Turkish and Chinese. London seems to take a step further in this direction by having a lot of Restaurants, bars, juice bars, coffee shops, Fish and Chips shop and Breakfast spots. It has a lot of shopping options too with that of the Flea markets, flower shops, fish markets, Fishing stores, clothing stores. The main modes of transport seem to be Buses and trains. For leisure, the neighbourhoods are set up to have lots of parks, golf courses, zoo, gyms and Historic sites.

Overall, the city of London offers a multicultural, diverse and certainly an entertaining experience.

Toronto is relatively small in size geographically. It has a wide variety of cusines and eateries including French, Thai, Cambodian, Asian, Chinese etc. There are a lot of hangout spots including many Restaurants and Bars. Different means of public transport in toronto which includes buses, bikes, boats or ferries. For leisure and sight seeing, there are a lot of Plazas, Trails, Parks, Historic sites, clothing shops, Art galleries and Museums. Overall, Toronto is good place to migrate with good living conditions but london is better. 


# Conclusion

The purpose of this project was to explore the cities of London and Toronto and see however enticing it's to potential tourists and migrants. we have a tendency to explored each the cities supported their communicating codes and so compute the common venues gift in every of the neighbourhoods finally final with bunch similar neighbourhoods along.

We may see that every of the neighbourhoods in each the cities have a large style of experiences to supply that is exclusive in it's own method. The cultural diversity is kind of evident that conjointly offers the sensation of a way of inclusion.

Both Toronto and London appear to supply a vacation keep or a romantic gateaway with a great deal of places to explore, lovely landscapes and a large style of culture.Overall, it's upto the stakeholders to choose that expertise they'd like a lot of and which might a lot of to their feeling.