# Introduction / Business Problem

# Data

### First, some context

Before I describe the data, let me provide some context as to what I will do with the data.

I want to divide the neighborhoods of Toronto into 4 clusters based on how Upscale and Diverse they are. The 4 clusters will be:
(1) Both Upscale & Diverse
(2) Upscale, but not Diverse
(3) Diverse, but not Upscale
(4) Neither Upscale nor Diverse

Of these, I will focus just on the Upscale & Diverse neighborhoods for my final test. In this final test, I will rank the Upscale & Diverse neighborgoods by how many Indian restaurants they already have, and I will suggest that my friend pick one of the neighborhoods with fewer Indian restaurants.

### Next, an outline of the data

I am going to need the following data from Folium, Foursquare, and Wikipedia.

1) A map of Toronto to get started. This will come from Folium.
2) A basic dataset of boroughs and neighborhoods in Toronto, like the one I created for the Week 3 assignment of Segmenting and Clustering neighborhoods in Toronto. This will come from Wikipedia, just like in the Week 3 assignment.
3) Finally, for the K-means clustering, I will need data on the top venues in each neighborhood within a given radius. This will come from Foursquare, just like in the Week 3 assignment.

Overall, my approach will be similar to the analysis in Week 3 with just two major changes
(1) I care only about the top restaurant venues, rather than all the top venues (i.e., I will be ignoring any venue that is not a restaurant).
(2) I need to define some metric that estimates the degree of diversity of restaurants, and the degree of upscale-ness of restaurants, for the K Means clustering.
    For measuring diversity, I will use the cuisine of each restaurant as raw data. I will get the cuisine from the 'Venue Category' field in Foursquare, which classified restaurants into "Afghan restaurant", "African restaurant", "American restaurant", etc. in the Week 3 Lab of segmenting neighborhoods in New York. The higher the variety in cuisine names, the higher that neighborhood's diversity score will be.
    For measuring upscale-ness, I will use the expensive-ness of each restaurant (using the 1-4 "dollar" scale) as raw data. I am looking for a neighborhood that is dominated by 2-dollar and 3-dollar restaurants, because 4-dollar will be too upscale, and 1-dollar will be too cheap.

### Finally, the code

#### Part 1: Scraping Data from Wikipedia to create a Pandas dataframe

Import libraries needed for data preparation

In [1]:
import requests
import pandas as pd

Get raw data from the Wikipedia page

In [2]:
website_url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text

Import the Beautiful Soup package and format the HTML code behind the Wikipedia page

In [3]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,"lxml")
# print(soup.prettify())
# Executed the above line while testing code, but I have now commented it since it is a long, intermediate output that will make it harder to read my notebook.

Extract the table containing the data from the rest of the HTML code.

In [5]:
My_table = soup.find("table",{"class":"wikitable sortable"})

My_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

Create a list in which every list element corresponds to one cell in the Wikipedia table.

E.g., the first 3 list elements will be "M1A", "Not assigned", and "Not assigned".

In [6]:
My_table_cells = My_table.findAll("td")

The ".text" part of each list element contains the data which we need for our Pandas dataframe.

Run the below code snippet to see how this works.

In [7]:
print("List elements of My_table_cells look like this:\n")
print(My_table_cells[6])
print(My_table_cells[7])
print(My_table_cells[8])
print("\n")

print("Using the .text function, we can extract the text we need:\n")
print(My_table_cells[6].text)
print(My_table_cells[7].text)
print(My_table_cells[8].text)
print("\n")

List elements of My_table_cells look like this:

<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td>


Using the .text function, we can extract the text we need:

M3A
North York
Parkwoods





Create a new list that contains only the text we are interested in (without the "td" parts).

The rstrip() function was used to remove the trailing "\n" that was a part of some list elements.

In [8]:
My_table_cells_text = []
num_entries = len(My_table_cells)
for i in range(0, num_entries):
    My_table_cells_text.append(My_table_cells[i].text.rstrip())

My_table_cells_text

['M1A',
 'Not assigned',
 'Not assigned',
 'M2A',
 'Not assigned',
 'Not assigned',
 'M3A',
 'North York',
 'Parkwoods',
 'M4A',
 'North York',
 'Victoria Village',
 'M5A',
 'Downtown Toronto',
 'Harbourfront',
 'M5A',
 'Downtown Toronto',
 'Regent Park',
 'M6A',
 'North York',
 'Lawrence Heights',
 'M6A',
 'North York',
 'Lawrence Manor',
 'M7A',
 "Queen's Park",
 'Not assigned',
 'M8A',
 'Not assigned',
 'Not assigned',
 'M9A',
 'Etobicoke',
 'Islington Avenue',
 'M1B',
 'Scarborough',
 'Rouge',
 'M1B',
 'Scarborough',
 'Malvern',
 'M2B',
 'Not assigned',
 'Not assigned',
 'M3B',
 'North York',
 'Don Mills North',
 'M4B',
 'East York',
 'Woodbine Gardens',
 'M4B',
 'East York',
 'Parkview Hill',
 'M5B',
 'Downtown Toronto',
 'Ryerson',
 'M5B',
 'Downtown Toronto',
 'Garden District',
 'M6B',
 'North York',
 'Glencairn',
 'M7B',
 'Not assigned',
 'Not assigned',
 'M8B',
 'Not assigned',
 'Not assigned',
 'M9B',
 'Etobicoke',
 'Cloverdale',
 'M9B',
 'Etobicoke',
 'Islington',
 'M9B',
 

Create an empty Pandas dataframe to populate the Wikipedia data

In [9]:
df = pd.DataFrame(columns = ["PostalCode", "Borough", "Neighborhood"])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood


Create 3 new lists from the My_table_cells_text list - One list for postal codes, one for boroughs, and one for neighborhoods.

We will use these 3 separate lists to populate the Pandas dataframe columns.

Postal codes are present in list elements 0,3,6,9,12,15, and so on.

In [10]:
My_postcodes = []
for i in range(0, len(My_table_cells_text),3):
    My_postcodes.append(My_table_cells_text[i])

My_postcodes

['M1A',
 'M2A',
 'M3A',
 'M4A',
 'M5A',
 'M5A',
 'M6A',
 'M6A',
 'M7A',
 'M8A',
 'M9A',
 'M1B',
 'M1B',
 'M2B',
 'M3B',
 'M4B',
 'M4B',
 'M5B',
 'M5B',
 'M6B',
 'M7B',
 'M8B',
 'M9B',
 'M9B',
 'M9B',
 'M9B',
 'M9B',
 'M1C',
 'M1C',
 'M1C',
 'M2C',
 'M3C',
 'M3C',
 'M4C',
 'M5C',
 'M6C',
 'M7C',
 'M8C',
 'M9C',
 'M9C',
 'M9C',
 'M9C',
 'M1E',
 'M1E',
 'M1E',
 'M2E',
 'M3E',
 'M4E',
 'M5E',
 'M6E',
 'M7E',
 'M8E',
 'M9E',
 'M1G',
 'M2G',
 'M3G',
 'M4G',
 'M5G',
 'M6G',
 'M7G',
 'M8G',
 'M9G',
 'M1H',
 'M2H',
 'M3H',
 'M3H',
 'M3H',
 'M4H',
 'M5H',
 'M5H',
 'M5H',
 'M6H',
 'M6H',
 'M7H',
 'M8H',
 'M9H',
 'M1J',
 'M2J',
 'M2J',
 'M2J',
 'M3J',
 'M3J',
 'M4J',
 'M5J',
 'M5J',
 'M5J',
 'M6J',
 'M6J',
 'M7J',
 'M8J',
 'M9J',
 'M1K',
 'M1K',
 'M1K',
 'M2K',
 'M3K',
 'M3K',
 'M4K',
 'M4K',
 'M5K',
 'M5K',
 'M6K',
 'M6K',
 'M6K',
 'M7K',
 'M8K',
 'M9K',
 'M1L',
 'M1L',
 'M1L',
 'M2L',
 'M2L',
 'M3L',
 'M4L',
 'M4L',
 'M5L',
 'M5L',
 'M6L',
 'M6L',
 'M6L',
 'M7L',
 'M8L',
 'M9L',
 'M1M',
 'M1M',


Boroughs are present in list elements 1,4,7,10, and so on.

In [11]:
My_boroughs = []
for i in range(1, len(My_table_cells_text),3):
    My_boroughs.append(My_table_cells_text[i])

My_boroughs

['Not assigned',
 'Not assigned',
 'North York',
 'North York',
 'Downtown Toronto',
 'Downtown Toronto',
 'North York',
 'North York',
 "Queen's Park",
 'Not assigned',
 'Etobicoke',
 'Scarborough',
 'Scarborough',
 'Not assigned',
 'North York',
 'East York',
 'East York',
 'Downtown Toronto',
 'Downtown Toronto',
 'North York',
 'Not assigned',
 'Not assigned',
 'Etobicoke',
 'Etobicoke',
 'Etobicoke',
 'Etobicoke',
 'Etobicoke',
 'Scarborough',
 'Scarborough',
 'Scarborough',
 'Not assigned',
 'North York',
 'North York',
 'East York',
 'Downtown Toronto',
 'York',
 'Not assigned',
 'Not assigned',
 'Etobicoke',
 'Etobicoke',
 'Etobicoke',
 'Etobicoke',
 'Scarborough',
 'Scarborough',
 'Scarborough',
 'Not assigned',
 'Not assigned',
 'East Toronto',
 'Downtown Toronto',
 'York',
 'Not assigned',
 'Not assigned',
 'Not assigned',
 'Scarborough',
 'Not assigned',
 'Not assigned',
 'East York',
 'Downtown Toronto',
 'Downtown Toronto',
 'Not assigned',
 'Not assigned',
 'Not assigned

Neighborhoods are present in list elements 2,5,8,11, and so on.

In [12]:
My_neighborhoods = []
for i in range(2, len(My_table_cells_text),3):
    My_neighborhoods.append(My_table_cells_text[i])

My_neighborhoods

['Not assigned',
 'Not assigned',
 'Parkwoods',
 'Victoria Village',
 'Harbourfront',
 'Regent Park',
 'Lawrence Heights',
 'Lawrence Manor',
 'Not assigned',
 'Not assigned',
 'Islington Avenue',
 'Rouge',
 'Malvern',
 'Not assigned',
 'Don Mills North',
 'Woodbine Gardens',
 'Parkview Hill',
 'Ryerson',
 'Garden District',
 'Glencairn',
 'Not assigned',
 'Not assigned',
 'Cloverdale',
 'Islington',
 'Martin Grove',
 'Princess Gardens',
 'West Deane Park',
 'Highland Creek',
 'Rouge Hill',
 'Port Union',
 'Not assigned',
 'Flemingdon Park',
 'Don Mills South',
 'Woodbine Heights',
 'St. James Town',
 'Humewood-Cedarvale',
 'Not assigned',
 'Not assigned',
 'Bloordale Gardens',
 'Eringate',
 'Markland Wood',
 'Old Burnhamthorpe',
 'Guildwood',
 'Morningside',
 'West Hill',
 'Not assigned',
 'Not assigned',
 'The Beaches',
 'Berczy Park',
 'Caledonia-Fairbanks',
 'Not assigned',
 'Not assigned',
 'Not assigned',
 'Woburn',
 'Not assigned',
 'Not assigned',
 'Leaside',
 'Central Bay Stre

Populate the Pandas dataframe with data from the 3 lists we just created.

In [13]:
df["PostalCode"] = My_postcodes
df["Borough"] = My_boroughs
df["Neighborhood"] = My_neighborhoods
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


Check how big the dataframe is

In [14]:
df.shape

(288, 3)

Drop rows where the Borough is "Not assigned"

In [15]:
df.drop(df[df["Borough"] == "Not assigned"].index, inplace=True)
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


Check the size of the dataframe after dropping rows where Borough = "Not assigned"

In [16]:
df.shape

(211, 3)

Check how many Neighborhoods are "Not assigned"

In [17]:
df.loc[df["Neighborhood"] == "Not assigned", "Neighborhood"]

8    Not assigned
Name: Neighborhood, dtype: object

Only 1 neighborhood is "not assigned"; replace the "not assigned" with the name of the Borough.

In [18]:
df.loc[df["Neighborhood"] == "Not assigned", "Neighborhood"] = df.loc[df["Neighborhood"] == "Not assigned", "Borough"]

Group the data by PostalCode

In [19]:
grouped_df = df.groupby(["PostalCode", "Borough"])["Neighborhood"].apply(",".join).reset_index()

View the size of the final dataframe

In [20]:
grouped_df.shape

(103, 3)

View the final dataframe

In [21]:
grouped_df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


#### Part 2: Getting Latitude & Longitude

I tried using Geocoder but it did not work for me. The code I used below gets Latitude and Longitude data from the csv file shared in the assignment.

In [22]:
LatLong = pd.read_csv("http://cocl.us/Geospatial_data")

Check the columns and number of rows to verify that the data was imported correctly.

In [23]:
LatLong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [24]:
LatLong.shape

(103, 3)

In [25]:
LatLong.rename(columns = {"Postal Code": "PostalCode"}, inplace=True)
LatLong.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [26]:
final_df = pd.merge(grouped_df, LatLong, how="left", on="PostalCode")

Check the number of rows.

In [27]:
final_df.shape

(103, 5)

Check the final dataframe

In [28]:
final_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


#### Part 3: Clustering the Neighborhoods

Import all the required libraries.

In [29]:
import pandas as pd
import numpy as np
import json
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print("Libraries imported.")

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

Create a new dataframe with just the Boroughs that have "Toronto" in their name.

In [30]:
df_toronto = final_df[final_df.Borough.str.contains("Toronto", case=False)]
df_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197
46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
47,M4S,Central Toronto,Davisville,43.704324,-79.38879
48,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
49,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


Reset the index to start from index=0

In [31]:
df_toronto.reset_index(drop=True)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


Create a map of Toronto

In [32]:
# create map of Toronto using latitude and longitude values
toronto_latitude = 43.6532
toronto_longitude = -79.3832

map_toronto = folium.Map(location=[toronto_latitude,toronto_longitude], zoom_start=10)

map_toronto

Add markers to the map

In [33]:
# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Define Foursquare credentials and version.

In [34]:
CLIENT_ID = 'NTW3XHHVYEC4SGLFRPEVZJ5343WGXACTO0TDTD4EQFHIMVHV' # your Foursquare ID
CLIENT_SECRET = 'CFBXH3TV0WM1JBBFYXD0TG2LJ2GTWJBHFTLRSVO0S5UIGKA5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NTW3XHHVYEC4SGLFRPEVZJ5343WGXACTO0TDTD4EQFHIMVHV
CLIENT_SECRET:CFBXH3TV0WM1JBBFYXD0TG2LJ2GTWJBHFTLRSVO0S5UIGKA5


Define function to get nearby venues for each neighborhood.

In [35]:
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Get nearby venues for the Toronto neighborhoods.

In [36]:
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )
print(toronto_venues.shape)
toronto_venues.head()

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvall

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [37]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"Brockton,Exhibition Place,Parkdale Village",22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",14,14,14,14,14,14
"Cabbagetown,St. James Town",44,44,44,44,44,44
Central Bay Street,83,83,83,83,83,83
"Chinatown,Grange Park,Kensington Market",94,94,94,94,94,94
Christie,18,18,18,18,18,18
Church and Wellesley,88,88,88,88,88,88


In [38]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 234 uniques categories.


Filtering the venues that are restaurants.

In [44]:
toronto_venues[toronto_venues['Venue Category'].str.contains("Restaurant")]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
4,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
6,"The Danforth West,Riverdale",43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant
8,"The Danforth West,Riverdale",43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant
9,"The Danforth West,Riverdale",43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
13,"The Danforth West,Riverdale",43.679557,-79.352188,7 Numbers,43.677062,-79.353934,Italian Restaurant
16,"The Danforth West,Riverdale",43.679557,-79.352188,Alexandros,43.678304,-79.349486,Greek Restaurant
20,"The Danforth West,Riverdale",43.679557,-79.352188,Rikkochez,43.677267,-79.353274,Restaurant
24,"The Danforth West,Riverdale",43.679557,-79.352188,Athen's Pastries,43.678166,-79.348927,Greek Restaurant
27,"The Danforth West,Riverdale",43.679557,-79.352188,Christina's On The Danforth,43.678240,-79.349185,Greek Restaurant
28,"The Danforth West,Riverdale",43.679557,-79.352188,Pan on the Danforth,43.678263,-79.348648,Greek Restaurant


### Rest of the K Means clustering code is yet to be written.