# Segmenting & Clustering Neighborhoods in Toronto, Canada

### By Lee Gang

## Step 1: Extract Neighborhood Data from Web (Wikipedia) and Process Data

We will be using the Toronto neighborhood information from the wikipedia page ([List of postal codes of Canada: M](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)) in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

First, import the required libraries

In [164]:
#!conda install -c conda-forge folium
#!conda install -c conda-forge beautifulsoup4

In [208]:
import numpy as np
import pandas as pd
import folium
import json
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans as KM

### Scrap Webpage And Convert Postcode Table into Dataframe

I will be using the *BeautifulSoup4* library to scrap the table from the webpage, and read into a *pandas* dataframe.

In [93]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content, 'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Process and Clean Dataframe

We will then proceed to process the dataframe based on the following criteria:

- Only process the cells that have an assigned **Borough**. Ignore cells with a **Borough** that is 'Not assigned'.
- Combine **Neighborhoods** into a single row for each **Postal Code**, separated with a comma.
- If a cell has a **Borough** but a 'Not assigned' **Neighborhood**, then the **Neighborhood** will be the same as the **Borough**.

In [95]:
# Rename columns for consistency
df.rename(columns = {'Postcode':'PostalCode', 'Neighbourhood':'Neighborhood'}, inplace = True)

# Filter out "Not assigned" values in "Borough"
df = df[df['Borough'] !='Not assigned']
df = df.reset_index(drop = True)

# Assign "Borough" value to "Neighborhood" with 'Not assigned' values
df.loc[df['Neighborhood'] == 'Not assigned', 'Neighborhood'] = df['Borough']

# Combine Neighborhoods into a single row for the same Postal Code
df = df.groupby(['PostalCode','Borough'])['Neighborhood'].apply(lambda x: ','.join(x)).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [96]:
# Print the dimension of the dataframe
print(df.shape)

(103, 3)


### Append Latitude and Longitude Values To Each Postal Code in Dataframe

We will be using the *geopy* library to retrieve the longitude and latitude coordinates for each **PostalCode**.

In [149]:
postalcode = df['PostalCode']
longitude = np.zeros(len(postalcode))
latitude = np.zeros(len(postalcode))

for n in range(0,len(postalcode)):
    geolocator = Nominatim(user_agent = 'toronto')
    location = geolocator.geocode('{}, Toronto, Ontario, Canada'.format(postalcode[n]))
    if location is None:
        latitude[n] = 0
        longitude[n] = 0
    else:
        latitude[n] = location.latitude
        longitude[n] = location.longitude

df['Latitude'] = latitude
df['Longitude'] = longitude

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M1B,Scarborough,"Rouge,Malvern",-79.387207,43.653963
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",-79.387207,43.653963
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",0.0,0.0
3,M1G,Scarborough,Woburn,-79.221898,43.765717
4,M1H,Scarborough,Cedarbrae,0.0,0.0


In [153]:
print('Successfully Geocoded ' + str(df[df["Longitude"]!=0].shape[0]) + ' Rows')
print('Unsuccessfully Geocoded ' + str(df[df["Longitude"]==0].shape[0]) + ' Rows')

Successfully Geocoded 27 Rows
Unsuccessfully Geocoded 76 Rows


Since we are unable to successfully geocode all of the **PostalCode**, we download the csv files containing the coordinates.

In [155]:
df_coord = pd.read_csv("https://cocl.us/Geospatial_data")
df_coord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [156]:
df_coord.shape

(103, 3)

For consistency, we will drop all of the 'Latitude' and 'Longitude' columns in our main dataframe. We will then merge the coordinates dataframe with our main dataframe.

In [162]:
# Drop "Latitude" and "Longitude" columns from main df
df.drop(["Latitude", "Longitude"], axis = 1, inplace = True)

# Rename coordinate dataframe columns
df_coord.rename(columns = {'Postal Code':'PostalCode'}, inplace = True)

# Merge coordinates and main dataframe
df_main = df.merge(df_coord, how = 'left', on = 'PostalCode')
df_main.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [163]:
df_main.shape

(103, 5)

## Step 2: Extract Nearby Venues Using Foursquare API

Create url query function for Foursquare API. We will only look at top 100 venues which is within 500m radius.

In [169]:
CLIENT_ID = '' # Foursquare Client ID
CLIENT_SECRET = '' # Foursquare Secret ID
VERSION = '20200223' # Foursquare API version

In [170]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
             
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we will apply the function to our Dataframe to extract all the nearby venues for each **Postal Code**.

In [175]:
df_venues = getNearbyVenues(names = df_main['Neighborhood'],
                            latitudes = df_main['Latitude'],
                            longitudes = df_main['Longitude'])

In [176]:
df_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum
3,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood,Morningside,West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


Let's have a quick look at the summary counts of venues and number of unique venue categories.

In [184]:
pd.DataFrame(df_venues.groupby('Neighborhood')['Venue'].count()).sort_values(['Venue'], ascending = False)[0:10]

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
"Adelaide,King,Richmond",100
St. James Town,100
"Ryerson,Garden District",100
"Harbourfront East,Toronto Islands,Union Station",100
"First Canadian Place,Underground city",100
"Design Exchange,Toronto Dominion Centre",100
"Commerce Court,Victoria Hotel",100
Stn A PO Boxes 25 The Esplanade,96
"Chinatown,Grange Park,Kensington Market",84
Church and Wellesley,83


In [185]:
print('There are {} uniques categories.'.format(len(df_venues['Venue Category'].unique())))

There are 269 uniques categories.


### Convert Dataframe to Contain the Top 10 Venue Categories for Each Neighborhood

Convert Venues Categories in the Venues Dataframe to Dummy Variables.

In [186]:
# Perform one hot encoding on 'Venue Category'
df_oh = pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
df_oh['Neighborhood'] = df_venues['Neighborhood'] 

# Move neighborhood column to the first column
fixed_columns = [df_oh.columns[-1]] + list(df_oh.columns[:-1])
df_oh = df_oh[fixed_columns]

Summarise each venue category as the mean of frequency/ numbers in a **Neighborhood**.

In [187]:
df_group = df_oh.groupby('Neighborhood').mean().reset_index()
df_group.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We will now define a function to extract the top categories of venues in a **Neighborhood**, row by row.

In [188]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Extract the Top 10 Venue Categories found in each **Neighborhood**.

In [202]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
df_sorted = pd.DataFrame(columns=columns)
df_sorted['Neighborhood'] = df_group['Neighborhood']

# Call the function defined earlier to fill the new dataframe
for ind in np.arange(df_group.shape[0]):
    df_sorted.iloc[ind, 1:] = return_most_common_venues(df_group.iloc[ind, :], num_top_venues)

df_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Bar,Café,Thai Restaurant,Bakery,Burger Joint,Cosmetics Shop,Steakhouse,Restaurant,Sushi Restaurant
1,Agincourt,Chinese Restaurant,Latin American Restaurant,Breakfast Spot,Lounge,Ethiopian Restaurant,Electronics Store,Event Space,Eastern European Restaurant,Dumpling Restaurant,Dessert Shop
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Park,Playground,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Pharmacy,Pizza Place,Sandwich Place,Coffee Shop,Airport Service,Department Store
4,"Alderwood,Long Branch",Pizza Place,Pharmacy,Athletics & Sports,Dance Studio,Coffee Shop,Pool,Pub,Sandwich Place,Skating Rink,Gym


## Step 3: Segment and Cluster Neighborhoods Using K-Means Clustering

Perform K-Means Clustering on Sorted Dataframe Using a K value of 5. K value of 5 is used after multiple reiterations of various k values and visualization of cluster on maps.

In [245]:
# Set k value
k = 5

df_km = df_group.drop('Neighborhood', axis = 1)

# run k-means clustering
kmeans = KM(n_clusters=k, random_state=2020, n_init = 20).fit(df_km)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([2, 2, 4, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

Combine Neighborhood Coordinates, Top 10 Venue Categories and Cluster Labels into a final dataframe.

In [246]:
# Create Cluster Label Dataframe
df_label = pd.DataFrame()
df_label['Neighborhood'] = df_group['Neighborhood']
df_label['Cluster'] = kmeans.labels_

# Create Neighbourhood Coordinates Dataframe
df_cd = df_main[['Neighborhood', 'Latitude', 'Longitude']]

# Merge Coordinates To Cluster Label Dataframe
df_label = df_label.merge(df_cd, how = 'left', on = 'Neighborhood')

# Merge Cluster Label Dataframe to Sorted Dataframe
df_final = df_label.merge(df_sorted, how = 'left', on = "Neighborhood")
df_final.head()

Unnamed: 0,Neighborhood,Cluster,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",2,43.650571,-79.384568,Coffee Shop,Bar,Café,Thai Restaurant,Bakery,Burger Joint,Cosmetics Shop,Steakhouse,Restaurant,Sushi Restaurant
1,Agincourt,2,43.7942,-79.262029,Chinese Restaurant,Latin American Restaurant,Breakfast Spot,Lounge,Ethiopian Restaurant,Electronics Store,Event Space,Eastern European Restaurant,Dumpling Restaurant,Dessert Shop
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",4,43.815252,-79.284577,Park,Playground,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",2,43.739416,-79.588437,Grocery Store,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Pharmacy,Pizza Place,Sandwich Place,Coffee Shop,Airport Service,Department Store
4,"Alderwood,Long Branch",2,43.602414,-79.543484,Pizza Place,Pharmacy,Athletics & Sports,Dance Studio,Coffee Shop,Pool,Pub,Sandwich Place,Skating Rink,Gym


## Step 4: Plot Neigborhoods Onto Map

Let's now visualize the Neighborhood Clustering on a map.

In [247]:
# Get Toronto Coordinates
geolocater = Nominatim(user_agent = 'toronto')
toronto = geolocater.geocode("Toronto, Ontario, Canada")

# Initialize Folium Map centred in Toronto
map_clusters = folium.Map(location=[toronto.latitude, toronto.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_final['Latitude'], df_final['Longitude'], df_final['Neighborhood'], df_final['Cluster']):
    label = folium.Popup(str(poi) + ' (Cluster ' + str(cluster) + ')', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


Let's now look at each cluster's Neighborhood and their top 10 venue categories individually.

### Cluster 1

In [248]:
df_final.loc[df_final['Cluster'] == 0, df_final.columns[[0,1] + list(range(4, df_final.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
92,Victoria Village,0,Hockey Arena,Coffee Shop,French Restaurant,Portuguese Restaurant,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
97,Woburn,0,Coffee Shop,Korean Restaurant,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant


### Cluster 2

In [249]:
df_final.loc[df_final['Cluster'] == 1, df_final.columns[[0,1] + list(range(4, df_final.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Downsview Central,1,Home Service,Food Truck,Baseball Field,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Women's Store
42,"Emery,Humberlea",1,Baseball Field,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Fast Food Restaurant
56,"Humber Bay,King's Mill Park,Kingsway Park Sout...",1,Construction & Landscaping,Baseball Field,Women's Store,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### Cluster 3

In [250]:
df_final.loc[df_final['Cluster'] == 2, df_final.columns[[0,1] + list(range(4, df_final.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",2,Coffee Shop,Bar,Café,Thai Restaurant,Bakery,Burger Joint,Cosmetics Shop,Steakhouse,Restaurant,Sushi Restaurant
1,Agincourt,2,Chinese Restaurant,Latin American Restaurant,Breakfast Spot,Lounge,Ethiopian Restaurant,Electronics Store,Event Space,Eastern European Restaurant,Dumpling Restaurant,Dessert Shop
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",2,Grocery Store,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Pharmacy,Pizza Place,Sandwich Place,Coffee Shop,Airport Service,Department Store
4,"Alderwood,Long Branch",2,Pizza Place,Pharmacy,Athletics & Sports,Dance Studio,Coffee Shop,Pool,Pub,Sandwich Place,Skating Rink,Gym
5,"Bathurst Manor,Downsview North,Wilson Heights",2,Coffee Shop,Ice Cream Shop,Pharmacy,Pizza Place,Chinese Restaurant,Middle Eastern Restaurant,Restaurant,Deli / Bodega,Sandwich Place,Diner
6,Bayview Village,2,Chinese Restaurant,Café,Bank,Japanese Restaurant,Dessert Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Women's Store
7,"Bedford Park,Lawrence Manor East",2,Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Pizza Place,Butcher,Indian Restaurant,Café,Sushi Restaurant,Pub
8,Berczy Park,2,Coffee Shop,Cheese Shop,Café,French Restaurant,Steakhouse,Cocktail Bar,Bakery,Beer Bar,Farmers Market,Seafood Restaurant
9,"Birch Cliff,Cliffside West",2,General Entertainment,College Stadium,Café,Skating Rink,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
10,"Bloordale Gardens,Eringate,Markland Wood,Old B...",2,Cosmetics Shop,Pizza Place,Coffee Shop,Beer Store,Liquor Store,Café,Convenience Store,Drugstore,Discount Store,Dog Run


### Cluster 4

In [251]:
df_final.loc[df_final['Cluster'] == 3, df_final.columns[[0,1] + list(range(4, df_final.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
77,"Rouge,Malvern",3,Fast Food Restaurant,Deli / Bodega,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant


### Cluster 5

In [252]:
df_final.loc[df_final['Cluster'] == 4, df_final.columns[[0,1] + list(range(4, df_final.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",4,Park,Playground,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
13,"CFB Toronto,Downsview East",4,Park,Airport,Snack Place,Women's Store,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
16,Caledonia-Fairbanks,4,Park,Market,Pool,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
39,"Downsview,North Park,Upwood Park",4,Basketball Court,Park,Bakery,Construction & Landscaping,Women's Store,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
41,East Toronto,4,Park,Convenience Store,Women's Store,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop
47,Glencairn,4,Pub,Park,Pizza Place,Japanese Restaurant,Women's Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
57,Humber Summit,4,Pizza Place,Dance Studio,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
58,Humewood-Cedarvale,4,Hockey Arena,Park,Field,Trail,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
59,"Kingsview Village,Martin Grove Gardens,Richvie...",4,Park,Pizza Place,Mobile Phone Shop,Sandwich Place,Women's Store,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
63,Lawrence Park,4,Photography Studio,Swim School,Bus Line,Construction & Landscaping,Park,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant
