# Segmenting and Clustering 

### Neighborhoods in Toronto:

#### Parsing data from Wikipedia page, and creating a dataframe.

#### Import libraries:

In [1]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd

#### Url for wiki page:

In [2]:
url_w = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#### Using BeautifulSoup to parse the html wiki page, and find table within it:

In [3]:
shtml = requests.get(url_w).text
soup = BeautifulSoup(shtml, 'html.parser')

In [4]:
table = soup.table
soup.find('table')
print("... table data parsed ...")

... table data parsed ...


#### Define "table_rows" and find all "tr" tags:

In [5]:
table_rows = table.find_all('tr')

#### Define row_list list, and use loop to append all the rows into it:

In [6]:
rows_list = []

In [7]:
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    rows_list.append(row)
    print(row)

[]
['M1A', 'Not assigned', 'Not assigned\n']
['M2A', 'Not assigned', 'Not assigned\n']
['M3A', 'North York', 'Parkwoods\n']
['M4A', 'North York', 'Victoria Village\n']
['M5A', 'Downtown Toronto', 'Harbourfront\n']
['M5A', 'Downtown Toronto', 'Regent Park\n']
['M6A', 'North York', 'Lawrence Heights\n']
['M6A', 'North York', 'Lawrence Manor\n']
['M7A', "Queen's Park", 'Not assigned\n']
['M8A', 'Not assigned', 'Not assigned\n']
['M9A', 'Etobicoke', 'Islington Avenue\n']
['M1B', 'Scarborough', 'Rouge\n']
['M1B', 'Scarborough', 'Malvern\n']
['M2B', 'Not assigned', 'Not assigned\n']
['M3B', 'North York', 'Don Mills North\n']
['M4B', 'East York', 'Woodbine Gardens\n']
['M4B', 'East York', 'Parkview Hill\n']
['M5B', 'Downtown Toronto', 'Ryerson\n']
['M5B', 'Downtown Toronto', 'Garden District\n']
['M6B', 'North York', 'Glencairn\n']
['M7B', 'Not assigned', 'Not assigned\n']
['M8B', 'Not assigned', 'Not assigned\n']
['M9B', 'Etobicoke', 'Cloverdale\n']
['M9B', 'Etobicoke', 'Islington\n']
['M9B'

In [8]:
rows_list[0:5]

[[],
 ['M1A', 'Not assigned', 'Not assigned\n'],
 ['M2A', 'Not assigned', 'Not assigned\n'],
 ['M3A', 'North York', 'Parkwoods\n'],
 ['M4A', 'North York', 'Victoria Village\n']]

#### Add list content to dataframe (df_neigh):

In [9]:
df_neigh = pd.DataFrame(rows_list)
df_neigh.head(5)

Unnamed: 0,0,1,2
0,,,
1,M1A,Not assigned,Not assigned\n
2,M2A,Not assigned,Not assigned\n
3,M3A,North York,Parkwoods\n
4,M4A,North York,Victoria Village\n


#### Rename the columns with proper names, drop zero first row, and remove all the rows where "Borough"is "Not assined"

In [10]:
df_neigh.columns = ['PostalCode', 'Borough','Neighborhood']
df_neigh.drop(0, inplace = True)
df_neigh.drop(df_neigh.loc[df_neigh['Borough']=='Not assigned'].index, inplace=True)
df_neigh.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
3,M3A,North York,Parkwoods\n
4,M4A,North York,Victoria Village\n
5,M5A,Downtown Toronto,Harbourfront\n
6,M5A,Downtown Toronto,Regent Park\n
7,M6A,North York,Lawrence Heights\n


#### During parsing of html, the new line character (\n) got captured too, so we need to remove it from the values in "Neighborhood" column:

In [11]:
df_neigh['Neighborhood'] = df_neigh['Neighborhood'].map(lambda x: x.rstrip('\n'))
df_neigh.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights


#### In next step idea is to group the dataframe accoring to Postal Code, and combine Neighbourhoods under same Borough, separated by ',' character. Also we want to reset the index for our dataset.

In [12]:
df_tor = df_neigh.astype(str).groupby('PostalCode').agg(lambda x: ','.join(x.unique()))
df_tor.reset_index(inplace = True) 
df_tor.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### In order to replace "Not assigned" values in "Neighbourhood", with values from "Borough", the idea is to replace all non assigned values in "Borough " with NaN value. Than we can fill the Nan values, with values from "Neighbourhood" column:

In [13]:
df_tor['Neighborhood'].replace("Not assigned", np.nan, inplace = True)
df_tor.Neighborhood.fillna(df_tor.Borough, inplace=True)
df_tor.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


#### Just to keep things safe, we are going to save dataset to .csv file:

In [14]:
df_tor.to_csv('Toronto_PostalCodes.csv')

#### For testing, and observation purposes sorting dataset by 'Neighborhood':

In [15]:
toronto_set = df_tor.sort_values(by=['Neighborhood'])
toronto_set.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
58,M5H,Downtown Toronto,"Adelaide,King,Richmond"
12,M1S,Scarborough,Agincourt
14,M1V,Scarborough,"Agincourt North,L'Amoreaux East,Milliken,Steel..."
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam..."
89,M8W,Etobicoke,"Alderwood,Long Branch"
28,M3H,North York,"Bathurst Manor,Downsview North,Wilson Heights"
19,M2K,North York,Bayview Village
62,M5M,North York,"Bedford Park,Lawrence Manor East"
56,M5E,Downtown Toronto,Berczy Park
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


### Size of the set:

In [16]:
toronto_set.shape

(103, 3)

## Importing Geospatial data for Neighborhoods:

##### After downloading Geospatial Coordinates data we are going to read it with pandas:

In [17]:
#url = 'https://cocl.us/Geospatial_data'

In [18]:
df_cord = pd.read_csv('Geospatial_Coordinates.csv')
df_cord.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Listing earlier set, for comparison purpouse:

In [19]:
df_tor.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### Lets look at the size of two sets:

In [20]:
df_cord.shape

(103, 3)

In [21]:
df_tor.shape

(103, 3)

#### Next we are going to merge two datasets into one:

In [22]:
df_toronto = pd.concat([df_tor, df_cord], axis = 1)

#### Using .head() and .tail() to observe if PostalCode values match after merge:

In [23]:
df_toronto.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,M1J,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",M1K,43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",M1L,43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",M1M,43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",M1N,43.692657,-79.264848


In [24]:
df_toronto.tail(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
93,M9A,Etobicoke,Islington Avenue,M9A,43.667856,-79.532242
94,M9B,Etobicoke,"Cloverdale,Islington,Martin Grove,Princess Gar...",M9B,43.650943,-79.554724
95,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B...",M9C,43.643515,-79.577201
96,M9L,North York,Humber Summit,M9L,43.756303,-79.565963
97,M9M,North York,"Emery,Humberlea",M9M,43.724766,-79.532242
98,M9N,York,Weston,M9N,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,M9P,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",M9R,43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",M9V,43.739416,-79.588437
102,M9W,Etobicoke,Northwest,M9W,43.706748,-79.594054


#### Now we can drop "Postal Code" column from our set:

In [25]:
df_toronto.drop('Postal Code', axis=1, inplace = True)
df_toronto.head(10)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


#### As a safety measure, we are going to save full set to .csv file:

In [26]:
df_toronto.to_csv('Toronto_FullSet.csv')

# Segmenting and Clustering Neighborhoods in Toronto

Lets import the libraries that we need:

In [27]:
import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from geopy.geocoders import Nominatim
import requests 

from pandas.io.json import json_normalize 
import json

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
import folium

Lets take a look at our dataset, that was created above:

In [28]:
df_toronto.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Lets use geopy library to get the latitude and longitude values of Toronto.

In [29]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [30]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<img src = "https://github.com/juricajelic/Coursera_Capstone/blob/master/toronto_map.png?raw=true" width = 1000> </a>

#### Lets check out neighbourhoods in Scarborough, borough:

In [31]:
scarborough_data = df_toronto[df_toronto['Borough'] == 'Scarborough'].reset_index(drop=True)
scarborough_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [32]:
address = 'Scarborough, Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Scarborough, Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Scarborough, Toronto are 43.773077, -79.257774.


In [33]:
# create map of Manhattan using latitude and longitude values
map_scarborough = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarborough)  
    
map_scarborough

<img src = "https://github.com/juricajelic/Coursera_Capstone/blob/master/Scarborough.png?raw=true" width = 1000> </a>

#### Lets use Foursquare API to explore the neighborhoods and segment them

CREDENTIALS:

In [34]:
CLIENT_ID = 'KOEKRAHZNXXK5WU4QDWWUQGMNYRXSYXUFB4TAZDOI1FS43D5' # your Foursquare ID
CLIENT_SECRET = '4CL5JYQOKD15LNR0S4XALDHHLPNGLBZLNIUDQY2TIMLXKN1F' # your Foursquare Secret
VERSION = '20190623' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KOEKRAHZNXXK5WU4QDWWUQGMNYRXSYXUFB4TAZDOI1FS43D5
CLIENT_SECRET:4CL5JYQOKD15LNR0S4XALDHHLPNGLBZLNIUDQY2TIMLXKN1F


#### Get the neighbourhood name:

In [35]:
scarborough_data.loc[4, 'Neighborhood']

'Cedarbrae'

In [36]:
neighborhood_latitude = scarborough_data.loc[4, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = scarborough_data.loc[4, 'Longitude'] # neighborhood longitude value

neighborhood_name = scarborough_data.loc[4, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Cedarbrae are 43.773136, -79.23947609999999.


#### Now, let's get the top 100 venues that are in Cedarbrae a radius of 600 meters.

In [37]:
radius = 600 # define radius
LIMIT = 300
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=KOEKRAHZNXXK5WU4QDWWUQGMNYRXSYXUFB4TAZDOI1FS43D5&client_secret=4CL5JYQOKD15LNR0S4XALDHHLPNGLBZLNIUDQY2TIMLXKN1F&v=20190623&ll=43.773136,-79.23947609999999&radius=600&limit=300'

#### Lets see the results:

In [38]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d107e6a5315930039878d0b'},
 'response': {'headerLocation': 'Woburn',
  'headerFullLocation': 'Woburn, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 15,
  'suggestedBounds': {'ne': {'lat': 43.77853600540001,
    'lng': -79.23201170809249},
   'sw': {'lat': 43.767735994599995, 'lng': -79.2469404919075}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b1711a6f964a520cbc123e3',
       'name': 'Federick Restaurant',
       'location': {'address': '1920 Ellesmere Rd',
        'crossStreet': 'at Bellamy Rd. N',
        'lat': 43.77469659057996,
        'lng': -79.24114242818267,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.77469659057996,
          'lng': -79.24114242818267}],
        'd

#### Lets extract category of the venue:

In [39]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [40]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Federick Restaurant,Hakka Restaurant,43.774697,-79.241142
1,Drupati's Roti & Doubles,Caribbean Restaurant,43.775222,-79.241678
2,CANBE Foods Inc,Indian Restaurant,43.773546,-79.246082
3,Thai One On,Thai Restaurant,43.774468,-79.241268
4,Hakka Legend,Chinese Restaurant,43.776309,-79.234939
5,Centennial Recreation Centre,Athletics & Sports,43.774593,-79.2365
6,La Sani Grill,Indian Restaurant,43.776214,-79.234848
7,TD Canada Trust,Bank,43.774952,-79.241343
8,B&A Bakery,Bakery,43.774391,-79.243877
9,Popeyes Louisiana Kitchen,Fried Chicken Joint,43.77593,-79.235328


In [41]:
print('{} venues were returned by Foursquare. Due to given Limit'.format(nearby_venues.shape[0]))

15 venues were returned by Foursquare. Due to given Limit


##  Exploring Neighborhoods in Scarborough, Toronto:

In [42]:
def getNearbyVenues(names, latitudes, longitudes, radius=600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [43]:
scarborough_venues = getNearbyVenues(names=scarborough_data['Neighborhood'],
                                   latitudes=scarborough_data['Latitude'],
                                   longitudes=scarborough_data['Longitude']
                                  )


Rouge,Malvern
Highland Creek,Rouge Hill,Port Union
Guildwood,Morningside,West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park,Ionview,Kennedy Park
Clairlea,Golden Mile,Oakridge
Cliffcrest,Cliffside,Scarborough Village West
Birch Cliff,Cliffside West
Dorset Park,Scarborough Town Centre,Wexford Heights
Maryvale,Wexford
Agincourt
Clarks Corners,Sullivan,Tam O'Shanter
Agincourt North,L'Amoreaux East,Milliken,Steeles East
L'Amoreaux West
Upper Rouge


Lets observe the resulting frame:

In [44]:
print(scarborough_venues.shape)
scarborough_venues.head()

(146, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
2,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
3,"Rouge,Malvern",43.806686,-79.194353,Lee Valley,43.803161,-79.199681,Hobby Shop
4,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping


Lets check venues by each neighbourhood

In [45]:
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",6,6,6,6,6,6
"Birch Cliff,Cliffside West",5,5,5,5,5,5
Cedarbrae,15,15,15,15,15,15
"Clairlea,Golden Mile,Oakridge",17,17,17,17,17,17
"Clarks Corners,Sullivan,Tam O'Shanter",13,13,13,13,13,13
"Cliffcrest,Cliffside,Scarborough Village West",8,8,8,8,8,8
"Dorset Park,Scarborough Town Centre,Wexford Heights",11,11,11,11,11,11
"East Birchmount Park,Ionview,Kennedy Park",8,8,8,8,8,8
"Guildwood,Morningside,West Hill",16,16,16,16,16,16


# Neighbourhood analysis:

In [46]:
# one hot encoding
scarborough_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
scarborough = scarborough_onehot[fixed_columns]

scarborough_onehot.head()

Unnamed: 0,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Breakfast Spot,Bus Line,Bus Station,Bus Stop,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Fish Market,Flower Shop,Fried Chicken Joint,Furniture / Home Store,Gas Station,General Entertainment,Greek Restaurant,Grocery Store,Hakka Restaurant,Hobby Shop,Hockey Arena,Home Service,Indian Restaurant,Intersection,Italian Restaurant,Korean Restaurant,Latin American Restaurant,Light Rail Station,Lounge,Medical Center,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motel,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Seafood Restaurant,Skating Rink,Soccer Field,Spa,Sporting Goods Shop,Tech Startup,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant,Wings Joint,Neighborhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,"Rouge,Malvern"
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern"
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern"
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern"
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Highland Creek,Rouge Hill,Port Union"


In [47]:
scarborough_onehot.shape

(146, 68)

#### Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [48]:
scarborough_grouped = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Breakfast Spot,Bus Line,Bus Station,Bus Stop,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Fish Market,Flower Shop,Fried Chicken Joint,Furniture / Home Store,Gas Station,General Entertainment,Greek Restaurant,Grocery Store,Hakka Restaurant,Hobby Shop,Hockey Arena,Home Service,Indian Restaurant,Intersection,Italian Restaurant,Korean Restaurant,Latin American Restaurant,Light Rail Station,Lounge,Medical Center,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motel,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Seafood Restaurant,Skating Rink,Soccer Field,Spa,Sporting Goods Shop,Tech Startup,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant,Wings Joint
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Birch Cliff,Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.066667,0.066667,0.0,0.133333,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0
4,"Clairlea,Golden Mile,Oakridge",0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.117647,0.058824,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.058824,0.0,0.117647,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Clarks Corners,Sullivan,Tam O'Shanter",0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.153846,0.153846,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0
6,"Cliffcrest,Cliffside,Scarborough Village West",0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125
7,"Dorset Park,Scarborough Town Centre,Wexford He...",0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909
8,"East Birchmount Park,Ionview,Kennedy Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Guildwood,Morningside,West Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.125,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0625,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [49]:
num_top_venues = 5

for hood in scarborough_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = scarborough_grouped[scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                 venue  freq
0  Sporting Goods Shop  0.25
1       Sandwich Place  0.25
2       Breakfast Spot  0.25
3               Lounge  0.25
4  American Restaurant  0.00


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                  venue  freq
0             BBQ Joint  0.33
1  Fast Food Restaurant  0.33
2    Chinese Restaurant  0.17
3                  Park  0.17
4   American Restaurant  0.00


----Birch Cliff,Cliffside West----
                   venue  freq
0        College Stadium   0.2
1  General Entertainment   0.2
2                  Diner   0.2
3                   Café   0.2
4           Skating Rink   0.2


----Cedarbrae----
                  venue  freq
0     Indian Restaurant  0.13
1                Bakery  0.13
2  Caribbean Restaurant  0.07
3      Asian Restaurant  0.07
4                Lounge  0.07


----Clairlea,Golden Mile,Oakridge----
          venue  freq
0  Intersection  0.18
1        Bakery  0.12
2   Coffee Shop  0.12
3      Bus Li

#### Let's put that into a *pandas* dataframe

In [50]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] =scarborough_grouped['Neighborhood']

for ind in np.arange(scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Sandwich Place,Lounge,Sporting Goods Shop,Breakfast Spot,Wings Joint,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store
1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Fast Food Restaurant,BBQ Joint,Park,Chinese Restaurant,Gas Station,Furniture / Home Store,Fried Chicken Joint,Flower Shop,Fish Market,Coffee Shop
2,"Birch Cliff,Cliffside West",College Stadium,General Entertainment,Café,Diner,Skating Rink,Electronics Store,Construction & Landscaping,Convenience Store,Department Store,Discount Store
3,Cedarbrae,Indian Restaurant,Bakery,Hakka Restaurant,Caribbean Restaurant,Fried Chicken Joint,Flower Shop,Lounge,Coffee Shop,Chinese Restaurant,Asian Restaurant
4,"Clairlea,Golden Mile,Oakridge",Intersection,Coffee Shop,Diner,Bakery,Bus Line,Convenience Store,Park,Soccer Field,Fast Food Restaurant,Metro Station


## Cluster Neighborhoods

In [51]:
# set number of clusters
kclusters = 5

scarborough_grouped_clustering = scarborough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarborough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 3, 2, 2, 1, 2, 2, 2, 1, 2], dtype=int32)

#### add clustering labels:

In [52]:

neighborhoods_venues_sorted.insert(0, 'Clusters', kmeans.labels_)

scarborough_merged = scarborough_data

scarborough_merged = scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


scarborough_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,3.0,Fast Food Restaurant,Hobby Shop,Spa,Wings Joint,Electronics Store,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,0.0,Construction & Landscaping,Bar,Wings Joint,Fast Food Restaurant,Convenience Store,Department Store,Diner,Discount Store,Electronics Store,Fish Market
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,2.0,Pizza Place,Electronics Store,Intersection,Bus Line,Breakfast Spot,Fried Chicken Joint,Rental Car Location,Mexican Restaurant,Park,Fast Food Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,2.0,Coffee Shop,Business Service,Convenience Store,Korean Restaurant,Furniture / Home Store,Fried Chicken Joint,Flower Shop,Fish Market,College Stadium,Fast Food Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2.0,Indian Restaurant,Bakery,Hakka Restaurant,Caribbean Restaurant,Fried Chicken Joint,Flower Shop,Lounge,Coffee Shop,Chinese Restaurant,Asian Restaurant


Turn cluster values to int, for plotting in Folium:

In [53]:
scarborough_merged.dropna(axis=0,inplace = True)
scarborough_merged['Clusters'] = scarborough_merged['Clusters'].astype('int')
scarborough_merged


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,3,Fast Food Restaurant,Hobby Shop,Spa,Wings Joint,Electronics Store,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,0,Construction & Landscaping,Bar,Wings Joint,Fast Food Restaurant,Convenience Store,Department Store,Diner,Discount Store,Electronics Store,Fish Market
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,2,Pizza Place,Electronics Store,Intersection,Bus Line,Breakfast Spot,Fried Chicken Joint,Rental Car Location,Mexican Restaurant,Park,Fast Food Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,2,Coffee Shop,Business Service,Convenience Store,Korean Restaurant,Furniture / Home Store,Fried Chicken Joint,Flower Shop,Fish Market,College Stadium,Fast Food Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2,Indian Restaurant,Bakery,Hakka Restaurant,Caribbean Restaurant,Fried Chicken Joint,Flower Shop,Lounge,Coffee Shop,Chinese Restaurant,Asian Restaurant
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,2,Playground,Caribbean Restaurant,Pizza Place,Basketball Court,Middle Eastern Restaurant,Diner,College Stadium,Construction & Landscaping,Convenience Store,Department Store
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029,1,Coffee Shop,Bus Line,Convenience Store,Metro Station,Hockey Arena,Department Store,Light Rail Station,Discount Store,Diner,Electronics Store
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577,1,Intersection,Coffee Shop,Diner,Bakery,Bus Line,Convenience Store,Park,Soccer Field,Fast Food Restaurant,Metro Station
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476,2,Motel,Wings Joint,Park,Chinese Restaurant,Furniture / Home Store,Home Service,American Restaurant,Soccer Field,Skating Rink,Fried Chicken Joint
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848,2,College Stadium,General Entertainment,Café,Diner,Skating Rink,Electronics Store,Construction & Landscaping,Convenience Store,Department Store,Discount Store


### let's visualize the resulting clusters

In [54]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarborough_merged['Latitude'], scarborough_merged['Longitude'], scarborough_merged['Neighborhood'],scarborough_merged['Clusters']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<img src = "https://github.com/juricajelic/Coursera_Capstone/blob/master/clusters_new.png?raw=true" width = 1000> </a>

### Overview of clusters:

#### 1st Cluster:

In [55]:
scarborough_merged.loc[scarborough_merged['Clusters'] == 0, scarborough_merged.columns[[1] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,0,Construction & Landscaping,Bar,Wings Joint,Fast Food Restaurant,Convenience Store,Department Store,Diner,Discount Store,Electronics Store,Fish Market


#### 2nd Cluster:

In [56]:
scarborough_merged.loc[scarborough_merged['Clusters'] == 1, scarborough_merged.columns[[1] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,1,Coffee Shop,Bus Line,Convenience Store,Metro Station,Hockey Arena,Department Store,Light Rail Station,Discount Store,Diner,Electronics Store
7,Scarborough,1,Intersection,Coffee Shop,Diner,Bakery,Bus Line,Convenience Store,Park,Soccer Field,Fast Food Restaurant,Metro Station


#### 3rd Cluster:

In [57]:
scarborough_merged.loc[scarborough_merged['Clusters'] == 2, scarborough_merged.columns[[1] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,2,Pizza Place,Electronics Store,Intersection,Bus Line,Breakfast Spot,Fried Chicken Joint,Rental Car Location,Mexican Restaurant,Park,Fast Food Restaurant
3,Scarborough,2,Coffee Shop,Business Service,Convenience Store,Korean Restaurant,Furniture / Home Store,Fried Chicken Joint,Flower Shop,Fish Market,College Stadium,Fast Food Restaurant
4,Scarborough,2,Indian Restaurant,Bakery,Hakka Restaurant,Caribbean Restaurant,Fried Chicken Joint,Flower Shop,Lounge,Coffee Shop,Chinese Restaurant,Asian Restaurant
5,Scarborough,2,Playground,Caribbean Restaurant,Pizza Place,Basketball Court,Middle Eastern Restaurant,Diner,College Stadium,Construction & Landscaping,Convenience Store,Department Store
8,Scarborough,2,Motel,Wings Joint,Park,Chinese Restaurant,Furniture / Home Store,Home Service,American Restaurant,Soccer Field,Skating Rink,Fried Chicken Joint
9,Scarborough,2,College Stadium,General Entertainment,Café,Diner,Skating Rink,Electronics Store,Construction & Landscaping,Convenience Store,Department Store,Discount Store
10,Scarborough,2,Indian Restaurant,Electronics Store,Wings Joint,Pet Store,Vietnamese Restaurant,Latin American Restaurant,Coffee Shop,Chinese Restaurant,Bakery,Basketball Court
11,Scarborough,2,Middle Eastern Restaurant,Pizza Place,Grocery Store,Vietnamese Restaurant,Fish Market,Intersection,Korean Restaurant,Coffee Shop,Café,Pharmacy
13,Scarborough,2,Pizza Place,Pharmacy,Chinese Restaurant,Fried Chicken Joint,Thai Restaurant,Fast Food Restaurant,Bank,Noodle House,Sandwich Place,Bus Stop
15,Scarborough,2,Chinese Restaurant,Fast Food Restaurant,Pharmacy,Pizza Place,Breakfast Spot,Coffee Shop,American Restaurant,Grocery Store,Thrift / Vintage Store,Sandwich Place


#### 4th Cluster:

In [58]:
scarborough_merged.loc[scarborough_merged['Clusters'] == 3, scarborough_merged.columns[[1] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,3,Fast Food Restaurant,Hobby Shop,Spa,Wings Joint,Electronics Store,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store
14,Scarborough,3,Fast Food Restaurant,BBQ Joint,Park,Chinese Restaurant,Gas Station,Furniture / Home Store,Fried Chicken Joint,Flower Shop,Fish Market,Coffee Shop


#### 5th Cluster:

In [59]:
scarborough_merged.loc[scarborough_merged['Clusters'] == 4, scarborough_merged.columns[[1] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,4,Sandwich Place,Lounge,Sporting Goods Shop,Breakfast Spot,Wings Joint,Construction & Landscaping,Convenience Store,Department Store,Diner,Discount Store
