## Importing the necessary libraries for the project execution

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Importing the Requests package to get the Wikipedia page 

In [2]:
import requests

### Reading the Wikipedia Page and return the html to the variable 'page'

In [3]:
page = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
page

<Response [200]>

### Importing the Beautifulsoup package and Parse the html in the 'page' variable, and store it in Beautiful Soup format

In [4]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

### Use function “prettify” to look at nested structure of HTML page

In [5]:
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":861189943,"wgRevisionId":861189943,"wgArticleId":539066,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wg

### Returns the content which is enclosed within the tags

In [6]:
soup.title

<title>List of postal codes of Canada: M - Wikipedia</title>

In [7]:
soup.a

<a id="top"></a>

In [8]:
soup.title.string

'List of postal codes of Canada: M - Wikipedia'

In [9]:
soup.find_all('a')

[<a id="top"></a>,
 <a class="mw-jump-link" href="#mw-head">Jump to navigation</a>,
 <a class="mw-jump-link" href="#p-search">Jump to search</a>,
 <a href="/wiki/Postal_codes_in_Canada" title="Postal codes in Canada">postal codes in Canada</a>,
 <a href="/wiki/Toronto" title="Toronto">Toronto</a>,
 <a href="/wiki/Ontario" title="Ontario">Ontario</a>,
 <a href="/wiki/Canada_Post" title="Canada Post">Canada Post</a>,
 <a href="#cite_note-1">[1]</a>,
 <a class="mw-redirect" href="/wiki/Smartphones" title="Smartphones">smartphones</a>,
 <a href="/wiki/IPhone" title="IPhone">iPhone</a>,
 <a href="/wiki/BlackBerry" title="BlackBerry">BlackBerry</a>,
 <a href="#cite_note-2">[2]</a>,
 <a href="/wiki/CD-ROM" title="CD-ROM">CD-ROMs</a>,
 <a href="/w/index.php?title=List_of_postal_codes_of_Canada:_M&amp;action=edit&amp;section=1" title="Edit section: Toronto]] - 103 Canadian postal code#Forward sortation areas|FSAs]]">edit</a>,
 <a href="/wiki/North_York" title="North York">North York</a>,
 <a hr

### Find the right table: As we are seeking a table to extract information about state capitals, we should identify the right table first. Let’s write the command to extract information within all table tags.

In [10]:
all_tables=soup.find_all('table')
all_tables

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Postcode</th>
 <th>Borough</th>
 <th>Neighbourhood
 </th></tr>
 <tr>
 <td>M1A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M2A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M3A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
 </td></tr>
 <tr>
 <td>M4A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
 </td></tr>
 <tr>
 <td>M5A</td>
 <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
 <td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
 </td></tr>
 <tr>
 <td>M5A</td>
 <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
 <td><a href="/wiki/Regent_Park" title="Regent Park">Regent Pa

#### Now to identify the right table, we will use attribute “class” of table and use it to filter the right table. In chrome, you can check the class name by right click on the required table of web page –> Inspect element –> Copy the class name OR go through the output of above command find the class name of right table.

Extract the information to DataFrame: Here, we need to iterate through each row (tr) and then assign each element of tr (td) to a variable and append it to a list. 

In [11]:
right_table=soup.find('table', class_='wikitable sortable')
right_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

#### Now to access value of each element, we will use “find(text=True)” option with each element. 

In [12]:
A=[]
B=[]
C=[]

for row in right_table.findAll("tr"):
    cells = row.findAll('td')
    if len(cells)==3: #Only extract table body not heading
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

len(cells)
A

['M1A',
 'M2A',
 'M3A',
 'M4A',
 'M5A',
 'M5A',
 'M6A',
 'M6A',
 'M7A',
 'M8A',
 'M9A',
 'M1B',
 'M1B',
 'M2B',
 'M3B',
 'M4B',
 'M4B',
 'M5B',
 'M5B',
 'M6B',
 'M7B',
 'M8B',
 'M9B',
 'M9B',
 'M9B',
 'M9B',
 'M9B',
 'M1C',
 'M1C',
 'M1C',
 'M2C',
 'M3C',
 'M3C',
 'M4C',
 'M5C',
 'M6C',
 'M7C',
 'M8C',
 'M9C',
 'M9C',
 'M9C',
 'M9C',
 'M1E',
 'M1E',
 'M1E',
 'M2E',
 'M3E',
 'M4E',
 'M5E',
 'M6E',
 'M7E',
 'M8E',
 'M9E',
 'M1G',
 'M2G',
 'M3G',
 'M4G',
 'M5G',
 'M6G',
 'M7G',
 'M8G',
 'M9G',
 'M1H',
 'M2H',
 'M3H',
 'M3H',
 'M3H',
 'M4H',
 'M5H',
 'M5H',
 'M5H',
 'M6H',
 'M6H',
 'M7H',
 'M8H',
 'M9H',
 'M1J',
 'M2J',
 'M2J',
 'M2J',
 'M3J',
 'M3J',
 'M4J',
 'M5J',
 'M5J',
 'M5J',
 'M6J',
 'M6J',
 'M7J',
 'M8J',
 'M9J',
 'M1K',
 'M1K',
 'M1K',
 'M2K',
 'M3K',
 'M3K',
 'M4K',
 'M4K',
 'M5K',
 'M5K',
 'M6K',
 'M6K',
 'M6K',
 'M7K',
 'M8K',
 'M9K',
 'M1L',
 'M1L',
 'M1L',
 'M2L',
 'M2L',
 'M3L',
 'M4L',
 'M4L',
 'M5L',
 'M5L',
 'M6L',
 'M6L',
 'M6L',
 'M7L',
 'M8L',
 'M9L',
 'M1M',
 'M1M',


### Converting the lists to one dataframe

In [13]:
#import pandas to convert list to data frame
import pandas as pd
df=pd.DataFrame(A,columns=['Postal_code'])
df['Borough']=B
df['Neighborhood']=C

df

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


#### Removing the rows with incomplete data

In [14]:
df = df[df.Borough !="Not assigned"]
df = df[df.Neighborhood !="Not assigned"]
df

Unnamed: 0,Postal_code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


#### Resetting the Index

In [15]:
df.reset_index(drop=True)

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [16]:
df.shape

(212, 3)

## Below step joins values in Neighborhood column based on postalCode and Borough

In [17]:
df2=df.groupby(['Postal_code','Borough'])['Neighborhood'].apply(','.join).reset_index()
df2

Unnamed: 0,Postal_code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood]]\n,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park\n,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West\n"
9,M1N,Scarborough,"Birch Cliff,Cliffside West\n"


In [18]:
df2.shape

(103, 3)

In [19]:
postal = df.Postal_code.tolist()
len(postal)

212

### Creating 2 functions to get Latitude and Longitude
I'm using "Geolocator" tool insted of geocoder which was given to us for sample

I'm using Neighboorhod and Borough to get the latitude and longitude.

In [20]:
# Function for getting Latitude
def lat(elem):
    n,b=elem
    n1=list(n.split(","))
    a='{},{}'.format(n1[0],b)
    geolocator = Nominatim()
    location = geolocator.geocode(a)
    if location == None:
        location = geolocator.geocode(b)
        latitude = location.latitude
        longitude = location.longitude
    else:
        latitude = location.latitude
        longitude = location.longitude
    return latitude

In [21]:
# Function for getting Longitude
def lng(elem):
    n,b=elem
    n1=list(n.split(","))
    a='{},{}'.format(n1[0],b)
    #print(a)
    geolocator = Nominatim()
    location = geolocator.geocode(a)
    if location == None:
        location = geolocator.geocode(b)
        latitude = location.latitude
        longitude = location.longitude
    else:
        latitude = location.latitude
        longitude = location.longitude
    return longitude

In [22]:
df2['latitude'] = df2[['Neighborhood','Borough']].apply(lat,axis=1)



In [23]:
df2['longitude'] = df2[['Neighborhood','Borough']].apply(lng,axis=1)



In [24]:
df2.head()

Unnamed: 0,Postal_code,Borough,Neighborhood,latitude,longitude
0,M1B,Scarborough,"Rouge,Malvern",43.80493,-79.165837
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334
2,M1E,Scarborough,"Guildwood]]\n,Morningside,West Hill",43.754899,-79.197776
3,M1G,Scarborough,Woburn,43.759824,-79.225291
4,M1H,Scarborough,Cedarbrae,43.756467,-79.226692


In [25]:
neighborhoods =df2

In [26]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        len(neighborhoods['Neighborhood'].unique())
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


### Use geopy library to get the latitude and longitude values of Toronto.

In [27]:
address = 'Toronto, Ontario'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Toronto City are 43.653963, -79.387207.


### Creating a map of Toronto with neighborhoods superimposed on top

In [28]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['latitude'], neighborhoods['longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto


### Considering North York as my sample cluster

In [29]:
Scarborough_data = neighborhoods[neighborhoods['Borough'] == 'Scarborough'].reset_index(drop=True)
Scarborough_data.head()


Unnamed: 0,Postal_code,Borough,Neighborhood,latitude,longitude
0,M1B,Scarborough,"Rouge,Malvern",43.80493,-79.165837
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334
2,M1E,Scarborough,"Guildwood]]\n,Morningside,West Hill",43.754899,-79.197776
3,M1G,Scarborough,Woburn,43.759824,-79.225291
4,M1H,Scarborough,Cedarbrae,43.756467,-79.226692


#### Get Latitude and Longitude for Scarborough

In [30]:
address = 'Scarborough, Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Scarborough are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Scarborough are 43.773077, -79.257774.


#### Creating a map of Scarborough with neighborhoods superimposed on top.

In [31]:
# create map of New York using latitude and longitude values
map_Scarborough = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, borough, neighborhood in zip(Scarborough_data['latitude'], Scarborough_data['longitude'], Scarborough_data['Borough'], Scarborough_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_Scarborough)  
    
map_Scarborough

#### Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them
#### Defining Foursquare Credentials and Version

In [32]:
CLIENT_ID = 'LSTPSCV1WUB2U1MDWUFASOWYY3VXINOS5PHALJ5QYE2IUICP' # your Foursquare ID
CLIENT_SECRET = 'O3YFLC4P5VOSTTOGRRUCH3FTBT0HYOJWGQUSQH0SM00D432W' # your Foursquare Secret
VERSION = '20180924' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LSTPSCV1WUB2U1MDWUFASOWYY3VXINOS5PHALJ5QYE2IUICP
CLIENT_SECRET:O3YFLC4P5VOSTTOGRRUCH3FTBT0HYOJWGQUSQH0SM00D432W


#### Here I'm taking Woburn for exploring

In [33]:
Scarborough_data.loc[3, 'Neighborhood']

'Woburn'

### Get the neighborhood's latitude and longitude values.

In [34]:
neighborhood_latitude = Scarborough_data.loc[3, 'latitude'] # neighborhood latitude value
neighborhood_longitude = Scarborough_data.loc[3, 'longitude'] # neighborhood longitude value

neighborhood_name = Scarborough_data.loc[3, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Woburn are 43.7598243, -79.2252908.


#### Getting top 50 venues that are in Woburn within a radius of 700 meters.¶

In [35]:
radius = 700 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=LSTPSCV1WUB2U1MDWUFASOWYY3VXINOS5PHALJ5QYE2IUICP&client_secret=O3YFLC4P5VOSTTOGRRUCH3FTBT0HYOJWGQUSQH0SM00D432W&v=20180924&ll=43.7598243,-79.2252908&radius=700'

In [36]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bab1e344434b93b6beb6ace'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Woburn',
  'headerFullLocation': 'Woburn, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 27,
  'suggestedBounds': {'ne': {'lat': 43.76612430630001,
    'lng': -79.21658428052649},
   'sw': {'lat': 43.75352429369999, 'lng': -79.2339973194735}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bdaf4cb3904a5930902489e',
       'name': 'GoodLife Fitness Scarborough Cedarbrae Mall',
       'location': {'address': '3495 Lawrence Ave E',
        'crossStreet': 'Markham Rd.',
        'lat': 43.75830314509114,
        'lng': -79.2285328416918,
        

#### From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [37]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Now we are ready to clean the json and structure it into a pandas dataframe.

In [38]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.distance','venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,distance,lat,lng
0,GoodLife Fitness Scarborough Cedarbrae Mall,Gym,310,43.758303,-79.228533
1,Shoppers Drug Mart,Pharmacy,117,43.760718,-79.224505
2,Pho Dau Bo,Vietnamese Restaurant,162,43.761286,-79.22517
3,Staples Cedarbrae,Paper / Office Supplies Store,244,43.759851,-79.228335
4,The Beer Store,Beer Store,20,43.759714,-79.225487


In [39]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

27 venues were returned by Foursquare.


# 2. Explore Neighborhoods in Scarborough

### Creating a function to repeat the same process to all the neighborhoods in Scarborough

In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue',                             
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Getting venues in North York

In [41]:
Scarborough_venues = getNearbyVenues(names=Scarborough_data['Neighborhood'],
                                   latitudes=Scarborough_data['latitude'],
                                   longitudes=Scarborough_data['longitude']
                                  )

Rouge,Malvern
Highland Creek,Rouge Hill,Port Union
Guildwood]]
,Morningside,West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park
,Ionview,Kennedy Park
Clairlea,Golden Mile,Oakridge
Cliffcrest,Cliffside,Scarborough Village West

Birch Cliff,Cliffside West

Dorset Park,Scarborough Town Centre,Wexford Heights
Maryvale,Wexford
Agincourt
Clarks Corners
,Sullivan
,Tam O'Shanter
Agincourt North,L'Amoreaux East
,Milliken,Steeles East

L'Amoreaux West
,Steeles West
Upper Rouge


In [42]:
print(Scarborough_venues.shape)
Scarborough_venues.head()

(174, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.80493,-79.165837,Dean Park,43.804364,-79.169159,Park
1,"Rouge,Malvern",43.80493,-79.165837,Paul's Breakfast & Burgers,43.803835,-79.169825,Fast Food Restaurant
2,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334,Highland Creek,43.790281,-79.173703,Neighborhood
3,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334,Rapid Tac Roofing & Metal Works,43.788898,-79.173198,Construction & Landscaping
4,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334,Canmore Park,43.792401,-79.178339,Park


In [43]:
Scarborough_venues = Scarborough_venues[Scarborough_venues['Venue Category'] !="Neighborhood"]

#### Checking how many venues were returned for each neighborhood

In [44]:
Scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,11,11,11,11,11,11
"Agincourt North,L'Amoreaux East\n,Milliken,Steeles East\n",30,30,30,30,30,30
"Birch Cliff,Cliffside West\n",4,4,4,4,4,4
Cedarbrae,22,22,22,22,22,22
"Clairlea,Golden Mile,Oakridge",8,8,8,8,8,8
"Clarks Corners\n,Sullivan\n,Tam O'Shanter",1,1,1,1,1,1
"Cliffcrest,Cliffside,Scarborough Village West\n",7,7,7,7,7,7
"Dorset Park,Scarborough Town Centre,Wexford Heights",13,13,13,13,13,13
"East Birchmount Park\n,Ionview,Kennedy Park",7,7,7,7,7,7
"Guildwood]]\n,Morningside,West Hill",5,5,5,5,5,5


In [45]:
print('There are {} uniques categories.'.format(len(Scarborough_venues['Venue Category'].unique())))

There are 68 uniques categories.


## Breakpoint:
### End of Processing the Retrieved Information from Foursquare
#### Saving a Cleaned Version of DataFrame as the Results from Foursquare

In [46]:
Scarborough_venues.to_csv('Scarborough_venues.csv')

#### Loading Data from File (Saved "Foursquare " DataFrame for Venues)¶

In [47]:
Scarborough_venues = pd.read_csv('Scarborough_venues.csv')

#### Some Summary Information about Neighborhoods inside "Scarborough"

In [48]:
neigh_list = list(Scarborough_venues['Neighborhood'].unique())
print('Number of Neighborhoods inside Scarborough:')
print(len(neigh_list))
print('List of Neighborhoods inside Scarborough:')
neigh_list

Number of Neighborhoods inside Scarborough:
17
List of Neighborhoods inside Scarborough:


['Rouge,Malvern',
 'Highland Creek,Rouge Hill,Port Union',
 'Guildwood]]\r\n,Morningside,West Hill',
 'Woburn',
 'Cedarbrae',
 'Scarborough Village',
 'East Birchmount Park\r\n,Ionview,Kennedy Park',
 'Clairlea,Golden Mile,Oakridge',
 'Cliffcrest,Cliffside,Scarborough Village West\r\n',
 'Birch Cliff,Cliffside West\r\n',
 'Dorset Park,Scarborough Town Centre,Wexford Heights',
 'Maryvale,Wexford',
 'Agincourt',
 "Clarks Corners\r\n,Sullivan\r\n,Tam O'Shanter",
 "Agincourt North,L'Amoreaux East\r\n,Milliken,Steeles East\r\n",
 "L'Amoreaux West\r\n,Steeles West",
 'Upper Rouge']

#### Some Summary Information about Neighborhoods inside "Scarborough" Cont'd

In [49]:
neigh_venue_summary = Scarborough_venues.groupby('Neighborhood').count()
neigh_venue_summary.drop(columns = ['Unnamed: 0']).head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,11,11,11,11,11,11
"Agincourt North,L'Amoreaux East\r\n,Milliken,Steeles East\r\n",30,30,30,30,30,30
"Birch Cliff,Cliffside West\r\n",4,4,4,4,4,4
Cedarbrae,22,22,22,22,22,22
"Clairlea,Golden Mile,Oakridge",8,8,8,8,8,8


In [50]:
print('There are {} uniques categories.'.format(len(Scarborough_venues['Venue Category'].unique())))

print('Here is the list of different categories:')
list(Scarborough_venues['Venue Category'].unique())

There are 68 uniques categories.
Here is the list of different categories:


['Park',
 'Fast Food Restaurant',
 'Construction & Landscaping',
 'Train Station',
 'Moving Target',
 'Storage Facility',
 'Baseball Field',
 'Gym',
 'Pharmacy',
 'Vietnamese Restaurant',
 'Beer Store',
 'Paper / Office Supplies Store',
 'Bank',
 'Pizza Place',
 'Discount Store',
 'Coffee Shop',
 'Toy / Game Store',
 'Big Box Store',
 'Furniture / Home Store',
 'Clothing Store',
 'Supermarket',
 'Bar',
 'Grocery Store',
 'Liquor Store',
 'Video Game Store',
 'Optical Shop',
 'Shopping Mall',
 'Pub',
 'Chinese Restaurant',
 'Caribbean Restaurant',
 'Mexican Restaurant',
 'Greek Restaurant',
 'Taiwanese Restaurant',
 'Ice Cream Shop',
 'Sandwich Place',
 'Diner',
 'Convenience Store',
 'Burger Joint',
 'Gym Pool',
 'Indian Restaurant',
 'Latin American Restaurant',
 'Bowling Alley',
 'Wings Joint',
 'Fish & Chips Shop',
 'Electronics Store',
 'Plaza',
 'Asian Restaurant',
 'Gaming Cafe',
 'Hakka Restaurant',
 'Spa',
 'Pet Store',
 'Video Store',
 'Health Food Store',
 'Bus Line',
 'Korea

In [51]:
# one hot encoding
Scarborough_onehot = pd.get_dummies(data = Scarborough_venues, drop_first  = False, 
                              prefix = "", prefix_sep = "", columns = ['Venue Category'])
Scarborough_onehot.head()

Unnamed: 0.1,Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Asian Restaurant,Bakery,Bank,Bar,Baseball Field,Beer Store,Big Box Store,Bowling Alley,Breakfast Spot,Burger Joint,Bus Line,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Greek Restaurant,Grocery Store,Gym,Gym Pool,Hakka Restaurant,Health Food Store,Hong Kong Restaurant,Ice Cream Shop,Indian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Mexican Restaurant,Movie Theater,Moving Target,Noodle House,Optical Shop,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pub,Sandwich Place,Shopping Mall,Spa,Sporting Goods Shop,Storage Facility,Supermarket,Taco Place,Taiwanese Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint
0,0,"Rouge,Malvern",43.80493,-79.165837,Dean Park,43.804364,-79.169159,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,"Rouge,Malvern",43.80493,-79.165837,Paul's Breakfast & Burgers,43.803835,-79.169825,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,3,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334,Rapid Tac Roofing & Metal Works,43.788898,-79.173198,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,4,"Highland Creek,Rouge Hill,Port Union",43.790117,-79.173334,Canmore Park,43.792401,-79.178339,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,5,"Guildwood]]\r\n,Morningside,West Hill",43.754898,-79.197776,Guildwood GO Station,43.754977,-79.198372,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


In [52]:
Scarborough_onehot.columns

Index(['Unnamed: 0', 'Neighborhood', 'Neighborhood Latitude',
       'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude',
       'Asian Restaurant', 'Bakery', 'Bank', 'Bar', 'Baseball Field',
       'Beer Store', 'Big Box Store', 'Bowling Alley', 'Breakfast Spot',
       'Burger Joint', 'Bus Line', 'Cantonese Restaurant',
       'Caribbean Restaurant', 'Chinese Restaurant', 'Clothing Store',
       'Coffee Shop', 'Construction & Landscaping', 'Convenience Store',
       'Diner', 'Discount Store', 'Electronics Store', 'Fast Food Restaurant',
       'Fish & Chips Shop', 'Fried Chicken Joint', 'Furniture / Home Store',
       'Gaming Cafe', 'Greek Restaurant', 'Grocery Store', 'Gym', 'Gym Pool',
       'Hakka Restaurant', 'Health Food Store', 'Hong Kong Restaurant',
       'Ice Cream Shop', 'Indian Restaurant', 'Japanese Restaurant',
       'Juice Bar', 'Korean Restaurant', 'Latin American Restaurant',
       'Liquor Store', 'Mexican Restaurant', 'Movie Theater', 'Movin

### Manually Selecting (Subsetting) Related Features for the gym and gaming suppliers Contractor

In [53]:
# This list is created manually 
important_list_of_features = [
    'Neighborhood',
    'Neighborhood Latitude',
    'Neighborhood Longitude',
    'Baseball Field',
    'Bowling Alley',
    'Gaming Cafe',
    'Gym', 
    'Gym Pool',
    'Sporting Goods Shop',
    'Toy / Game Store',
    'Video Game Store']

### Updating the One-hot Encoded DataFrame and
#### Grouping the Data by Neighborhoods

In [54]:
Scarborough_onehot = Scarborough_onehot[important_list_of_features].drop(
    columns = ['Neighborhood Latitude', 'Neighborhood Longitude']).groupby(
    'Neighborhood').sum()


Scarborough_onehot.head()

Unnamed: 0_level_0,Baseball Field,Bowling Alley,Gaming Cafe,Gym,Gym Pool,Sporting Goods Shop,Toy / Game Store,Video Game Store
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Agincourt,0,0,0,0,0,0,0,0
"Agincourt North,L'Amoreaux East\r\n,Milliken,Steeles East\r\n",0,0,0,0,0,1,0,1
"Birch Cliff,Cliffside West\r\n",0,0,0,1,1,0,0,0
Cedarbrae,0,0,0,1,0,0,1,1
"Clairlea,Golden Mile,Oakridge",0,0,0,0,0,0,0,0


#### Integrating Different Gyms and Different Joints

In [55]:
feat_name_list = list(Scarborough_onehot.columns)
restaurant_list = []


for counter, value in enumerate(feat_name_list):
    if value.find('Gym' or 'Game') != (-1):
        restaurant_list.append(value)
        
Scarborough_onehot['Total Restaurants'] = Scarborough_onehot[restaurant_list].sum(axis = 1)
Scarborough_onehot = Scarborough_onehot.drop(columns = restaurant_list)


feat_name_list = list(Scarborough_onehot.columns)
joint_list = []


for counter, value in enumerate(feat_name_list):
    if value.find('Joint') != (-1):
        joint_list.append(value)
        
Scarborough_onehot['Total Joints'] = Scarborough_onehot[joint_list].sum(axis = 1)
Scarborough_onehot = Scarborough_onehot.drop(columns = joint_list)

#### Showing the Fully-Processed DataFrame about Neighborhoods inside Scarborrough.
#### This Dataset is Ready for any Machine Learning Algorithm.

In [56]:
Scarborough_onehot

Unnamed: 0_level_0,Baseball Field,Bowling Alley,Gaming Cafe,Sporting Goods Shop,Toy / Game Store,Video Game Store,Total Restaurants,Total Joints
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Agincourt,0,0,0,0,0,0,0,0.0
"Agincourt North,L'Amoreaux East\r\n,Milliken,Steeles East\r\n",0,0,0,1,0,1,0,0.0
"Birch Cliff,Cliffside West\r\n",0,0,0,0,0,0,2,0.0
Cedarbrae,0,0,0,0,1,1,1,0.0
"Clairlea,Golden Mile,Oakridge",0,0,0,0,0,0,0,0.0
"Clarks Corners\r\n,Sullivan\r\n,Tam O'Shanter",0,0,0,0,0,0,0,0.0
"Cliffcrest,Cliffside,Scarborough Village West\r\n",0,0,0,0,0,0,0,0.0
"Dorset Park,Scarborough Town Centre,Wexford Heights",0,1,1,0,0,0,0,0.0
"East Birchmount Park\r\n,Ionview,Kennedy Park",0,0,0,0,0,0,0,0.0
"Guildwood]]\r\n,Morningside,West Hill",1,0,0,0,0,0,0,0.0


#### Run k-means to Cluster Neighborhoods into 5 Clusters

In [57]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# run k-means clustering
kmeans = KMeans(n_clusters = 5, random_state = 0).fit(Scarborough_onehot)

#### Showing Centers of Each Cluster

In [58]:
means_df = pd.DataFrame(kmeans.cluster_centers_)
means_df.columns = Scarborough_onehot.columns
means_df.index = ['G1','G2','G3','G4','G5']
means_df['Total Sum'] = means_df.sum(axis = 1)
means_df.sort_values(axis = 0, by = ['Total Sum'], ascending=False)

Unnamed: 0,Baseball Field,Bowling Alley,Gaming Cafe,Sporting Goods Shop,Toy / Game Store,Video Game Store,Total Restaurants,Total Joints,Total Sum
G2,0.0,0.0,0.0,0.0,1.0,0.5,1.0,0.0,2.5
G4,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0
G5,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,2.0
G3,0.0,0.0,0.0,0.0,0.0,0.0,1.5,0.0,1.5
G1,0.090909,-6.938894e-18,-6.938894e-18,-6.938894e-18,-1.387779e-17,-1.387779e-17,5.5511150000000004e-17,0.0,0.090909


### Result:
### Best Group is G2;
### Second Best Group is G4;
### Third Best Group is G5;
### Inserting "kmeans.labels_" into the Original Scarborough DataFrame
### Finding the Corresponding Group for Each Neighborhood.

In [59]:
neigh_summary = pd.DataFrame([Scarborough_onehot.index, 1 + kmeans.labels_]).T
neigh_summary.columns = ['Neighborhood', 'Group']
neigh_summary

Unnamed: 0,Neighborhood,Group
0,Agincourt,1
1,"Agincourt North,L'Amoreaux East\r\n,Milliken,S...",5
2,"Birch Cliff,Cliffside West\r\n",3
3,Cedarbrae,2
4,"Clairlea,Golden Mile,Oakridge",1
5,"Clarks Corners\r\n,Sullivan\r\n,Tam O'Shanter",1
6,"Cliffcrest,Cliffside,Scarborough Village West\r\n",1
7,"Dorset Park,Scarborough Town Centre,Wexford He...",4
8,"East Birchmount Park\r\n,Ionview,Kennedy Park",1
9,"Guildwood]]\r\n,Morningside,West Hill",1


#### Deducing Results:
#### Best Neighborhood Are..

In [64]:
neigh_summary[neigh_summary['Group'] == 2]

Unnamed: 0,Neighborhood,Group
3,Cedarbrae,2
16,Woburn,2


#### Second Best Neighborhoods

In [65]:
neigh_summary[neigh_summary['Group'] == 4]

Unnamed: 0,Neighborhood,Group
7,"Dorset Park,Scarborough Town Centre,Wexford He...",4


### Third Best Neighborhood

In [66]:
neigh_summary[neigh_summary['Group'] == 5]

Unnamed: 0,Neighborhood,Group
1,"Agincourt North,L'Amoreaux East\r\n,Milliken,S...",5


In [67]:
name_of_neigh = list(neigh_summary[neigh_summary['Group'] == 4]['Neighborhood'])[0]
Scarborough_venues[Scarborough_venues['Neighborhood'] == name_of_neigh].iloc[0,1:5].to_dict()

{'Neighborhood': 'Dorset Park,Scarborough Town Centre,Wexford Heights',
 'Neighborhood Latitude': 43.7528467,
 'Neighborhood Longitude': -79.282067,
 'Venue': 'Starbucks'}

### THANK YOU