# IBM Applied Data Science Capstone for Professional Certificate

## Week 3 Final Assignment - Segmenting & Clustering Neighborhoods

#### Install packages to get started!

In [1]:
# install packages
import pandas as pd
import numpy as np
!pip install requests
# !pip install beautifulsoup4 -- not going with this approach as much more complicated
print('Libraries imported successfully!')

Libraries imported successfully!


#### Bring in the Toronto table into a pandas dataframe and clean up for items noted in assignment

In [2]:
# bring in table with pandas
dfs = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M",header=0)
tor_df = dfs[0] # bring in first table as df
print('Size of original dataframe is', tor_df.shape)
bor_na = tor_df['Borough'] != 'Not assigned' # Boolean test for Borough <> Not assigned
tor_df1 = tor_df[bor_na] # final dataframe w/out Borough = Not assigned
print('Size of df with Borough "Not assigned" removed is', tor_df1.shape)

# any Neighbourhood = Not assigned?
neibrhd_na = tor_df1[tor_df1.Neighbourhood == 'Not assigned']
print('Number of records with Neighbourhood = Not assigned is', neibrhd_na.shape)
print(neibrhd_na)
print('No rows contain Neighbourhoods = Not assigned')

Size of original dataframe is (180, 3)
Size of df with Borough "Not assigned" removed is (103, 3)
Number of records with Neighbourhood = Not assigned is (0, 3)
Empty DataFrame
Columns: [Postal Code, Borough, Neighbourhood]
Index: []
No rows contain Neighbourhoods = Not assigned


#### Reset the index to the final Toronto dataframe and print shape along with viewing via .head()

In [3]:
tor_df1.reset_index(drop=True,inplace=True)
print('Number of rows of cleaned Toronto dataframe is', tor_df1.shape)
tor_df1.head()

Number of rows of cleaned Toronto dataframe is (103, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### Bring in the latitude and longitude csv file

In [4]:
# read in the csv file with latitude and lognitude coordinates
lat_long = pd.read_csv("http://cocl.us/Geospatial_data",header=0)
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [5]:
# Merge the two dfs together and view
tor_final_df = pd.merge(tor_df1, lat_long, on='Postal Code')
tor_final_df.columns = ['PostalCode','Borough','Neighborhood','Latitude','Longitude'] # do away with spaces in Columns
tor_final_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


#### Limit Boroughs to those that contain Toronto to explore and cluster

In [6]:
# limit Boroughs to those that contain Toronto
tor_final_df = tor_final_df[tor_final_df['Borough'].str.contains('Toronto')]
tor_final_df.reset_index(drop=True,inplace=True)
print('Number of rows with Toronto in Borough is', tor_final_df.shape)
tor_final_df.head()

Number of rows with Toronto in Borough is (39, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


### Geographical data for Toronto, Ontario

In [7]:
# bring in necessary libraries
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [8]:
# Lat and Long data for Toronto
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent='toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The lat and long coordinates of Toronto are ', latitude, ',', longitude)

The lat and long coordinates of Toronto are  43.6534817 , -79.3839347


### Define Foursquare Credentials and Version

In [9]:
CLIENT_ID = '1EICQWL3EGEU3WKOP2Z353S3L1ANUPVGHFGD3KJ10WGSCTCC'
CLIENT_SECRET = 'SDH1N2FSMBSVIAQ3GVRXXZI4OTGRIQT3UCMZFHX204NU21C0'
VERSION = '20180605'
LIMIT = 100

print('Your credentials:')
print('CLIENT_ID: ', CLIENT_ID)
print('CLIENT_SECRET: ', CLIENT_SECRET)

Your credentials:
CLIENT_ID:  1EICQWL3EGEU3WKOP2Z353S3L1ANUPVGHFGD3KJ10WGSCTCC
CLIENT_SECRET:  SDH1N2FSMBSVIAQ3GVRXXZI4OTGRIQT3UCMZFHX204NU21C0


### Create GET request and URL

In [10]:
LIMIT = 100
radius = 500
toronto_latitude = latitude
toronto_longitude = longitude
ACCESS_TOKEN = '0XTELH0D32YAMWODS0QX5PHBXV5TK1E30CA3ZRIDKMS3I1IA'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    ACCESS_TOKEN,
    VERSION,
    toronto_latitude,
    toronto_longitude,
    radius,
    LIMIT)

print('Toronto Lat and Long are: ', toronto_latitude, ',', toronto_longitude)
url

Toronto Lat and Long are:  43.6534817 , -79.3839347


'https://api.foursquare.com/v2/venues/explore?&client_id=1EICQWL3EGEU3WKOP2Z353S3L1ANUPVGHFGD3KJ10WGSCTCC&client_secret=SDH1N2FSMBSVIAQ3GVRXXZI4OTGRIQT3UCMZFHX204NU21C0&oauth_token=0XTELH0D32YAMWODS0QX5PHBXV5TK1E30CA3ZRIDKMS3I1IA&v=20180605&ll=43.6534817,-79.3839347&radius=500&limit=100'

### Send the GET requests and examine the results

In [11]:
# import appropriate libraries
import json
import requests
from pandas.io.json import json_normalize # transforms JSON file into a pandas dataframe

In [12]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '602727d5e0b857202ab3ee72'},
 'notifications': [{'type': 'notificationTray', 'item': {'unreadCount': 0}}],
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 162,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatL

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Clean the JSON file and put into pandas dataframe

In [14]:
venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues) # flatten JSON

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns] # clean columns
nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Japango,Sushi Restaurant,43.655268,-79.385165
3,Poke Guys,Poke Place,43.654895,-79.385052
4,Chatime 日出茶太,Bubble Tea Shop,43.655542,-79.384684


In [15]:
# number of venues returned by Foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [16]:
# unique categories from JSON file
print('There are {} unique categories.'.format(len(nearby_venues['categories'].unique())))

There are 62 unique categories.


### Explore Neighborhoods in Toronto

In [17]:
LIMIT = 50

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            ACCESS_TOKEN,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Code to apply the function to each neighborhood

In [18]:
toronto_venues = getNearbyVenues(names=tor_final_df['Neighborhood'],
                                   latitudes=tor_final_df['Latitude'],
                                   longitudes=tor_final_df['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
R

In [19]:
# size/shape and preview of df
print(toronto_venues.shape)
toronto_venues.head()

(1398, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [20]:
# count of venues for each neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,50,50,50,50,50,50
"Brockton, Parkdale Village, Exhibition Place",35,35,35,35,35,35
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",23,23,23,23,23,23
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",19,19,19,19,19,19
Central Bay Street,50,50,50,50,50,50
Christie,29,29,29,29,29,29
Church and Wellesley,50,50,50,50,50,50
"Commerce Court, Victoria Hotel",50,50,50,50,50,50
Davisville,44,44,44,44,44,44
Davisville North,13,13,13,13,13,13


In [21]:
# how many unique categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 246 uniques categories.


### Analyze each Neighborhood

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
# fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1]) -- this didn't work b/c Neighborhood was in the middle
toronto_onehot = toronto_onehot[['Neighborhood','ATM',
 'Adult Boutique',
 'Airport',
 'Airport Food Court',
 'Airport Gate',
 'Airport Lounge',
 'Airport Service',
 'Airport Terminal',
 'American Restaurant',
 'Antique Shop',
 'Aquarium',
 'Arepa Restaurant',
 'Art Gallery',
 'Art Museum',
 'Arts & Crafts Store',
 'Asian Restaurant',
 'Athletics & Sports',
 'Auto Workshop',
 'BBQ Joint',
 'Baby Store',
 'Bagel Shop',
 'Bakery',
 'Bank',
 'Bar',
 'Baseball Field',
 'Baseball Stadium',
 'Basketball Stadium',
 'Beach',
 'Beer Bar',
 'Beer Store',
 'Belgian Restaurant',
 'Bistro',
 'Board Shop',
 'Boat or Ferry',
 'Bookstore',
 'Boutique',
 'Brazilian Restaurant',
 'Breakfast Spot',
 'Brewery',
 'Bubble Tea Shop',
 'Burger Joint',
 'Burrito Place',
 'Bus Line',
 'Bus Station',
 'Bus Stop',
 'Business Service',
 'Butcher',
 'Café',
 'Cajun / Creole Restaurant',
 'Candy Store',
 'Caribbean Restaurant',
 'Cheese Shop',
 'Chinese Restaurant',
 'Chocolate Shop',
 'Church',
 'Climbing Gym',
 'Clothing Store',
 'Cocktail Bar',
 'Coffee Shop',
 'College Arts Building',
 'College Auditorium',
 'College Gym',
 'College Rec Center',
 'Colombian Restaurant',
 'Comfort Food Restaurant',
 'Comic Shop',
 'Concert Hall',
 'Construction & Landscaping',
 'Convenience Store',
 'Cosmetics Shop',
 'Costume Shop',
 'Coworking Space',
 'Creperie',
 'Cuban Restaurant',
 'Cupcake Shop',
 'Cycle Studio',
 'Dance Studio',
 'Deli / Bodega',
 'Department Store',
 'Dessert Shop',
 'Diner',
 'Discount Store',
 'Distribution Center',
 'Dive Bar',
 'Dog Run',
 'Donut Shop',
 'Dumpling Restaurant',
 'Eastern European Restaurant',
 'Electronics Store',
 'Escape Room',
 'Ethiopian Restaurant',
 'Event Service',
 'Event Space',
 'Falafel Restaurant',
 'Farmers Market',
 'Fast Food Restaurant',
 'Fish & Chips Shop',
 'Fish Market',
 'Flea Market',
 'Flower Shop',
 'Food & Drink Shop',
 'Food Court',
 'Food Truck',
 'Fountain',
 'French Restaurant',
 'Fried Chicken Joint',
 'Frozen Yogurt Shop',
 'Fruit & Vegetable Store',
 'Furniture / Home Store',
 'Gaming Cafe',
 'Garden',
 'Garden Center',
 'Gas Station',
 'Gastropub',
 'Gay Bar',
 'General Entertainment',
 'Gift Shop',
 'Gluten-free Restaurant',
 'Gourmet Shop',
 'Greek Restaurant',
 'Grocery Store',
 'Gym',
 'Gym / Fitness Center',
 'Gym Pool',
 'Harbor / Marina',
 'Health & Beauty Service',
 'Health Food Store',
 'Historic Site',
 'History Museum',
 'Home Service',
 'Hotel',
 'Hotel Bar',
 'IT Services',
 'Ice Cream Shop',
 'Indian Restaurant',
 'Indie Movie Theater',
 'Intersection',
 'Italian Restaurant',
 'Japanese Restaurant',
 'Jazz Club',
 'Jewelry Store',
 'Juice Bar',
 'Korean Restaurant',
 'Lake',
 'Latin American Restaurant',
 'Lawyer',
 'Light Rail Station',
 'Liquor Store',
 'Lounge',
 'Mac & Cheese Joint',
 'Malay Restaurant',
 'Market',
 'Martial Arts School',
 'Mediterranean Restaurant',
 "Men's Store",
 'Metro Station',
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 'Miscellaneous Shop',
 'Mobile Phone Shop',
 'Modern European Restaurant',
 'Molecular Gastronomy Restaurant',
 'Monument / Landmark',
 'Movie Theater',
 'Museum',
 'Music Venue',
 'New American Restaurant',
 'Nightclub',
 'Nightlife Spot',
 'Noodle House',
 'Office',
 'Opera House',
 'Optical Shop',
 'Organic Grocery',
 'Paper / Office Supplies Store',
 'Park',
 'Performing Arts Venue',
 'Persian Restaurant',
 'Pet Store',
 'Pharmacy',
 'Photography Studio',
 'Pizza Place',
 'Plane',
 'Playground',
 'Plaza',
 'Poke Place',
 'Pool',
 'Portuguese Restaurant',
 'Post Office',
 'Poutine Place',
 'Pub',
 'Ramen Restaurant',
 'Record Shop',
 'Recording Studio',
 'Rental Car Location',
 'Restaurant',
 'Roof Deck',
 'Sake Bar',
 'Salad Place',
 'Salon / Barbershop',
 'Sandwich Place',
 'Scenic Lookout',
 'School',
 'Sculpture Garden',
 'Seafood Restaurant',
 'Shoe Store',
 'Shopping Mall',
 'Skate Park',
 'Skating Rink',
 'Smoke Shop',
 'Smoothie Shop',
 'Snack Place',
 'South American Restaurant',
 'Spa',
 'Speakeasy',
 'Sporting Goods Shop',
 'Sports Bar',
 'Stadium',
 'Stationery Store',
 'Steakhouse',
 'Supermarket',
 'Supplement Shop',
 'Sushi Restaurant',
 'Swim School',
 'Taco Place',
 'Tailor Shop',
 'Taiwanese Restaurant',
 'Tanning Salon',
 'Tea Room',
 'Tech Startup',
 'Tennis Court',
 'Thai Restaurant',
 'Theater',
 'Theme Restaurant',
 'Thrift / Vintage Store',
 'Tibetan Restaurant',
 'Toy / Game Store',
 'Trail',
 'Train Station',
 'Vegetarian / Vegan Restaurant',
 'Video Game Store',
 'Vietnamese Restaurant',
 'Wine Bar',
 'Wings Joint',
 'Yoga Studio']
]

toronto_onehot.head()

Unnamed: 0,Neighborhood,ATM,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
# I used this before previous code to get the columns in a list so I could move Neighborhood to the first column
toronto_onehot.columns.values.tolist()

['Neighborhood',
 'ATM',
 'Adult Boutique',
 'Airport',
 'Airport Food Court',
 'Airport Gate',
 'Airport Lounge',
 'Airport Service',
 'Airport Terminal',
 'American Restaurant',
 'Antique Shop',
 'Aquarium',
 'Arepa Restaurant',
 'Art Gallery',
 'Art Museum',
 'Arts & Crafts Store',
 'Asian Restaurant',
 'Athletics & Sports',
 'Auto Workshop',
 'BBQ Joint',
 'Baby Store',
 'Bagel Shop',
 'Bakery',
 'Bank',
 'Bar',
 'Baseball Field',
 'Baseball Stadium',
 'Basketball Stadium',
 'Beach',
 'Beer Bar',
 'Beer Store',
 'Belgian Restaurant',
 'Bistro',
 'Board Shop',
 'Boat or Ferry',
 'Bookstore',
 'Boutique',
 'Brazilian Restaurant',
 'Breakfast Spot',
 'Brewery',
 'Bubble Tea Shop',
 'Burger Joint',
 'Burrito Place',
 'Bus Line',
 'Bus Station',
 'Bus Stop',
 'Business Service',
 'Butcher',
 'Café',
 'Cajun / Creole Restaurant',
 'Candy Store',
 'Caribbean Restaurant',
 'Cheese Shop',
 'Chinese Restaurant',
 'Chocolate Shop',
 'Church',
 'Climbing Gym',
 'Clothing Store',
 'Cocktail Bar

In [24]:
# size of df
toronto_onehot.shape

(1398, 246)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [25]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,ATM,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.052632,0.052632,0.052632,0.105263,0.157895,0.105263,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,...,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
# size of new df
toronto_grouped.shape

(39, 246)

#### Each Neighborhood with top 5 most common venues

In [27]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0         Coffee Shop  0.08
1        Cocktail Bar  0.06
2          Restaurant  0.04
3         Cheese Shop  0.04
4  Seafood Restaurant  0.04


----Brockton, Parkdale Village, Exhibition Place----
                   venue  freq
0                   Café  0.11
1            Coffee Shop  0.06
2         Breakfast Spot  0.06
3      Convenience Store  0.06
4  Performing Arts Venue  0.06


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                  venue  freq
0  Gym / Fitness Center  0.09
1  Fast Food Restaurant  0.09
2    Light Rail Station  0.09
3               Brewery  0.04
4               Butcher  0.04


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0      Airport Service  0.16
1       Airport Lounge  0.11
2     Airport Terminal  0.11
3  Rental Car Location  0.11
4             Boutique  0.05


---

### Top 5 into a df

In [28]:
# function
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# create the df
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Farmers Market,Cheese Shop,Beer Bar,Seafood Restaurant,Restaurant,Sandwich Place,Fish Market
1,"Brockton, Parkdale Village, Exhibition Place",Café,Sandwich Place,Performing Arts Venue,Nightclub,Breakfast Spot,Convenience Store,Coffee Shop,Bus Station,Bus Stop,Bakery
2,"Business reply mail Processing Centre, South C...",Gym / Fitness Center,Fast Food Restaurant,Light Rail Station,Garden Center,Skate Park,Restaurant,Recording Studio,Pizza Place,Park,Martial Arts School
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Rental Car Location,Airport Lounge,Airport Terminal,Plane,Harbor / Marina,Coffee Shop,Sculpture Garden,Boutique,Bar
4,Central Bay Street,Coffee Shop,Café,Chinese Restaurant,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Ice Cream Shop,Bubble Tea Shop,Spa,Pharmacy


## Cluster Neighborhoods

In [29]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# Run K-Means to cluster Neighborhoods

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

#### New df with clusters and top 10 for Neighborhoods

In [30]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = tor_final_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Café,Theater,Breakfast Spot,Furniture / Home Store,Greek Restaurant,Historic Site
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Diner,Burrito Place,Burger Joint,Bubble Tea Shop,Distribution Center,Spa,Japanese Restaurant,Smoothie Shop
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Coffee Shop,Café,Ramen Restaurant,Clothing Store,Theater,Middle Eastern Restaurant,Diner,Burrito Place,Burger Joint,Lake
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Restaurant,Gastropub,Coffee Shop,Italian Restaurant,Café,Diner,Hotel,Farmers Market,Japanese Restaurant,Cocktail Bar
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Asian Restaurant,Pizza Place,Coffee Shop,Park,Trail,Health Food Store,Home Service,Pub,Diner,Dance Studio


#### Visualization Time!

In [34]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# folium
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\evbob\anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.2               |     pyhd8ed1ab_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1h             |       he774522_0         5.8 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
         

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine each Neighborhood Cluster

### Cluster 1 --> RED dots on map

### Surprising findings
- Furniture/Home Store is located Downtown
- Construction/Landscaping is common in Downtown
- Skating Rinks/Parks (there's at least 2!)
- Some Martial Arts on Lunch Break? You can do it!
- One location has several Airport related venues (I didn't know the airport was right downtown!)
- There are some members of this cluster that are geographically closer to other clusters

### Unsurprising finds
- Cafe/Coffee shops abound
- Plenty of food and entertainment venues
- This Cluster had the most members
- Public Transportation is high here due to commuters

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Park,Bakery,Pub,Café,Theater,Breakfast Spot,Furniture / Home Store,Greek Restaurant,Historic Site
1,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Diner,Burrito Place,Burger Joint,Bubble Tea Shop,Distribution Center,Spa,Japanese Restaurant,Smoothie Shop
2,Downtown Toronto,0,Coffee Shop,Café,Ramen Restaurant,Clothing Store,Theater,Middle Eastern Restaurant,Diner,Burrito Place,Burger Joint,Lake
3,Downtown Toronto,0,Restaurant,Gastropub,Coffee Shop,Italian Restaurant,Café,Diner,Hotel,Farmers Market,Japanese Restaurant,Cocktail Bar
4,East Toronto,0,Asian Restaurant,Pizza Place,Coffee Shop,Park,Trail,Health Food Store,Home Service,Pub,Diner,Dance Studio
5,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Bakery,Farmers Market,Cheese Shop,Beer Bar,Seafood Restaurant,Restaurant,Sandwich Place,Fish Market
6,Downtown Toronto,0,Coffee Shop,Café,Chinese Restaurant,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Ice Cream Shop,Bubble Tea Shop,Spa,Pharmacy
7,Downtown Toronto,0,Grocery Store,Café,Rental Car Location,Park,Coffee Shop,Gym / Fitness Center,Athletics & Sports,Business Service,Restaurant,Flower Shop
8,Downtown Toronto,0,Coffee Shop,American Restaurant,Café,Seafood Restaurant,Steakhouse,Hotel,Japanese Restaurant,Restaurant,Sushi Restaurant,Asian Restaurant
9,West Toronto,0,Pharmacy,Furniture / Home Store,Bakery,Smoke Shop,Pool,Brewery,Liquor Store,Bus Stop,Café,Middle Eastern Restaurant


### Cluster 2 --> PURPLE dots on map

### Commentary
- Not surprised that this is the only member of this cluster as it is furtherest from downtown
- Let's go see some Live Music (after COVID of course) at the Concert Hall!
- Eclectic mix of venue types - from Construction/Landscaping to Lawyers and Swim Schools and Comic Shops

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,1,Gym / Fitness Center,Construction & Landscaping,Photography Studio,Lawyer,Swim School,Bus Line,Business Service,Park,Concert Hall,Comic Shop


### Cluster 3 --> AQUA dots on map

### Commentary
- The venues here seem to be more suburban oriented (Home Service, Department Store, Discount Stores, etc.)
- Probably the best spot for a younger married couple that just had a baby and is looking to stay close to downtown

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,2,Home Service,Pool,Health & Beauty Service,Ice Cream Shop,Fast Food Restaurant,Distribution Center,Department Store,Dessert Shop,Diner,Discount Store


### Cluster 4 --> TEAL dots on map

### Commentary
- Looks like a good place to be for active people with Parks, Trails, Yoga Studio, etc.
- And after all that exercise you can hit a popular Dessert Shop!
- Downtowners travel here to get their engagement rings???
- Bus Line being most common makes sense as this is further away from Downtown

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Central Toronto,3,Bus Line,Jewelry Store,Sushi Restaurant,Park,Trail,Discount Store,Department Store,Dessert Shop,Diner,Yoga Studio


### Cluster 5 --> ORANGE dots on map

### Commentary
- FINALLY another Cluster with more than 1 member!!!
- Another place for the more active people
- Seems like active locations like their sweets (see Cluster 4 as well)!
- Beer Store being most common seems to be a surprise as this cluster seems active

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Central Toronto,4,Beer Store,Playground,Tennis Court,Park,Trail,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Diner
33,Downtown Toronto,4,Park,Playground,Trail,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Yoga Studio


## I really enjoyed this analysis and learning about Clustering and Geospatial data while analyzing Toronto! Serves as inspiration for future travels... after COVID.