# Capstone - Location of Founders' Bakery
### Applied Data Science Capstone by IBM/Coursera

#### Capstone Week 1 
* A description of the problem and a discussion of the background. (15 marks)
* A description of the data and how it will be used to solve the problem. (15 marks)

#### Capstone Week 2
1. A full report consisting of all of the following components (15 marks):
* Introduction where you discuss the business problem and who would be interested in this project.
* Data where you describe the data that will be used to solve the problem and the source of the data.
* Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
* Results section where you discuss the results.
* Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
* Conclusion section where you conclude the report.
2. A link to your Notebook on your Github repository pushed showing your code. (15 marks)

3. Your choice of a presentation or blogpost. (10 marks)

Here are examples of previous outstanding submissions that should give you an idea of what your report would look like, what your notebook would look like in terms of clean, clear, and well-commented code, and what your presentation would look like or your blogpost would look like:

Report: https://cocl.us/coursera_capstone_report <br>
Notebook: https://cocl.us/coursera_capstone_notebook <br>
Presentation: https://cocl.us/coursera_capstone_presentation <br>
Blogpost: https://cocl.us/coursera_capstone_blogpost <br>

## Introduction : Business Problem

A group of undergraduate students from the University of Toronto (UofT) are looking to open a new bakery. In order to do so, they will need to gather capital by getting their fellow students to invest in their new venture via the Kickstarter portal.

This project aims to determine the optimal location of the new proposed bakery in order to get buy-in from their peers. 

The founders would prefer a neigborhood is physically close to the UofT campus and does not have a bakery in the neighborhood. Ideally, the location should also be similar to the UofT neighborhood as the founders are targetting a similar demographic population. 

## Description of Data

Following data sources will used for this data analysis:
* **Beautiful Soup** was used to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
* The coordinates of the University of Toronto campus and neighborhoods in the Downtown Toronto borough were obtained using **Google Maps API geocoding** 
* The number of bakery and their location in every Downtown neighborhood were obtained using **Foursquare API**
* The types of venues in each Downtown Toronto neighborhood were obtained using **Foursquare API**

## Methodology 

1. Identify the location of the UofT campus and the which borough it is located in
2. Identify the neighborhoods that are in the borough UofT is located in
3. Explore location of bakeries within each neighborhood
4. Determine which neighborhood does not have a bakery
5. Analysis the similarity of each neighborhood by clusters using k-means clustering

## Analysis
### Download and explore dataset

The data of the boroughs and neighborhoods in Canada were extracted from wikipedia using Beautiful Soup.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
extract_data = requests.get(url).text
wiki_data = BeautifulSoup(extract_data, 'lxml')

In [3]:
#create a new dataframe
column_names = ['Postalcode','Borough','Neighborhood']
toronto = pd.DataFrame(columns = column_names)

In [4]:
#loop through html to find postcode, borough, neighborhood 
content = wiki_data.find('div', class_='mw-parser-output')
table = content.table.tbody
postcode = 0
borough = 0
neighborhood = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text
            i = i + 1
        elif i == 1:
            borough = td.text
            i = i + 1
        elif i == 2: 
            neighborhood = td.text.strip('\n').replace(']','')
    toronto = toronto.append({'Postalcode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)

#### Cleaning and exploring the dataset

In [5]:
# clean dataframe 
toronto = toronto[toronto.Borough!='Not assigned']
toronto = toronto[toronto.Borough!= 0]
toronto.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,toronto.shape[0]):
    if toronto.iloc[i][2] == 'Not assigned':
        toronto.iloc[i][2] = toronto.iloc[i][1]
        i = i+1
                                 
df = toronto.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df = df.replace('\n','', regex=True)
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,
1,M1B,Scarborough,Malvern / Rouge
2,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
3,M1E,Scarborough,Guildwood / Morningside / West Hill
4,M1G,Scarborough,Woburn


In [6]:
#cleaning dataframe - dropping rows with empty values
df = df[(df.Postalcode != 'Not assigned' ) & (df.Borough != 'Not assigned') & (df.Neighborhood != 'Not assigned')]

In [7]:
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
1,M1B,Scarborough,Malvern / Rouge
2,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
3,M1E,Scarborough,Guildwood / Morningside / West Hill
4,M1G,Scarborough,Woburn
5,M1H,Scarborough,Cedarbrae


In [8]:
df.describe()

Unnamed: 0,Postalcode,Borough,Neighborhood
count,103,103,103
unique,103,10,98
top,M3J,North York,Downsview
freq,1,24,4


In [9]:
df.to_csv('toronto.csv', index=False)

### Use geopy library to get the latitude and longitude values of Toronto

In [10]:
!pip install geocoder



In [11]:
import geocoder

In [12]:
df = pd.read_csv('toronto.csv')
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [13]:
def get_latlng(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords
    
get_latlng('M4G')

[43.70941386000004, -79.36309957799995]

In [14]:
postal_codes = df['Postalcode']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes.tolist() ]

In [15]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [16]:
# call out one of the rows as a test
df[df.Postalcode == 'M6B']

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
72,M6B,North York,Glencairn,43.707279,-79.4475


In [17]:
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.808626,-79.189913
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.785779,-79.157368
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.765806,-79.185284
3,M1G,Scarborough,Woburn,43.771545,-79.218135
4,M1H,Scarborough,Cedarbrae,43.768791,-79.238813


### Create a map of Toronto and exploring its neighborhoods

In [18]:
from geopy.geocoders import Nominatim

In [19]:
address = 'Toronto, Ontario Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


In [20]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [21]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [22]:
# Foursquare API
CLIENT_ID = '32GDPVDPDSMRCRMFJL1KEQK0ZPDFWPOYVYZ1B1NB0WR5MT3G' # your Foursquare ID
CLIENT_SECRET = '1QP54JVUZUSDXYTTOUC4MHA1YEJUXUIPOGVQG2FBJRJDOL5Z' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


### Exploring University of Toronto's neighborhood


In [23]:
df.index[df.Neighborhood == 'University of Toronto / Harbord']

Int64Index([66], dtype='int64')

In [24]:
#get latitude and longitude of University of Toronto / Harbord neighborhood

neighborhood_latitude = df.loc[66, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[66, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[66, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of University of Toronto / Harbord are 43.66328128200007, -79.39808791199994.


In [25]:
#creating the GET request url
# type your answer here
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=32GDPVDPDSMRCRMFJL1KEQK0ZPDFWPOYVYZ1B1NB0WR5MT3G&client_secret=1QP54JVUZUSDXYTTOUC4MHA1YEJUXUIPOGVQG2FBJRJDOL5Z&v=20180605&ll=43.66328128200007,-79.39808791199994&radius=500&limit=100'

In [26]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb00dabad1ab4001b417a79'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'University of Toronto',
  'headerFullLocation': 'University of Toronto, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 35,
  'suggestedBounds': {'ne': {'lat': 43.66778128650007,
    'lng': -79.3918789793354},
   'sw': {'lat': 43.65878127750007, 'lng': -79.40429684466447}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5362c366498e602fbe1db395',
       'name': 'Yasu',
       'location': {'address': '81 Harbord St.',
        'lat': 43.66283719650635,
        'lng': -79.40321739973975,
        'labeledLatLngs': [{'label': 'display',
          '

In [27]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [28]:
import requests # library to handle requests
from pandas.io.json import json_normalize

In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Yasu,Japanese Restaurant,43.662837,-79.403217
1,Hart House Theatre,Theater,43.663571,-79.394616
2,Rasa,Restaurant,43.662757,-79.403988
3,Philosopher's Walk,Park,43.666894,-79.395597
4,The Dessert Kitchen,Dessert Shop,43.662823,-79.402746


#### Determine how types of stores available in the area

In [30]:
nearby_venues.groupby('categories').count()

Unnamed: 0_level_0,name,lat,lng
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bakery,1,1,1
Bank,1,1,1
Bar,1,1,1
Beer Store,1,1,1
Bookstore,2,2,2
Bubble Tea Shop,1,1,1
Café,5,5,5
Chinese Restaurant,1,1,1
Coffee Shop,1,1,1
College Arts Building,1,1,1


### Exploring Downtown Toronto

In [31]:
downtown = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.681893,-79.376706
1,M4X,Downtown Toronto,St. James Town / Cabbagetown,43.667656,-79.367326
2,M4Y,Downtown Toronto,Church and Wellesley,43.666659,-79.381472
3,M5A,Downtown Toronto,Regent Park / Harbourfront,43.650964,-79.353041
4,M5B,Downtown Toronto,Garden District / Ryerson,43.657491,-79.377529


In [32]:
downtown.shape

(19, 5)

### Analyzing each neighborhood in Downtown Toronto

In [33]:
#defining function to ger nearby venues from each neighborhood in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
#creating new dataframe for toronto venues

downtown_venues = getNearbyVenues(names=downtown['Neighborhood'],
                                   latitudes=downtown['Latitude'],
                                   longitudes=downtown['Longitude']
                                  )

Rosedale
St. James Town / Cabbagetown
Church and Wellesley
Regent Park / Harbourfront
Garden District / Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond / Adelaide / King
Harbourfront East / Union Station / Toronto Islands
Toronto Dominion Centre / Design Exchange
Commerce Court / Victoria Hotel
University of Toronto / Harbord
Kensington Market / Chinatown / Grange Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport
Stn A PO Boxes
First Canadian Place / Underground city
Christie
Queen's Park / Ontario Provincial Government


In [35]:
downtown_venues.shape

(1157, 7)

In [36]:
downtown_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.681893,-79.376706,Summerhill Market,43.686265,-79.375458,Grocery Store
1,Rosedale,43.681893,-79.376706,Rosedale Park,43.682328,-79.378934,Playground
2,Rosedale,43.681893,-79.376706,Whitney Park,43.682036,-79.373788,Park
3,Rosedale,43.681893,-79.376706,Scoops Convenience Boutique,43.686148,-79.375828,Candy Store
4,St. James Town / Cabbagetown,43.667656,-79.367326,Cranberries,43.667843,-79.369407,Diner


In [37]:
venue_cat = downtown_venues.groupby('Venue Category').count()
venue_cat

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
American Restaurant,20,20,20,20,20,20
Art Gallery,9,9,9,9,9,9
Art Museum,2,2,2,2,2,2
Arts & Crafts Store,3,3,3,3,3,3
Asian Restaurant,12,12,12,12,12,12
...,...,...,...,...,...,...
Video Game Store,3,3,3,3,3,3
Vietnamese Restaurant,4,4,4,4,4,4
Wine Bar,6,6,6,6,6,6
Women's Store,1,1,1,1,1,1


###  Identify bakeries in Downtown neighborhoods

In [38]:
downtown_bakery = downtown_venues.loc[downtown_venues['Venue Category'] == 'Bakery']
downtown_bakery

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
11,St. James Town / Cabbagetown,43.667656,-79.367326,Absolute Bakery & Café,43.667469,-79.369277,Bakery
32,St. James Town / Cabbagetown,43.667656,-79.367326,Daniel et Daniel Event Creation & Catering,43.664384,-79.368328,Bakery
151,Regent Park / Harbourfront,43.650964,-79.353041,The Sweet Escape Patisserie,43.650632,-79.358709,Bakery
205,Garden District / Ryerson,43.657491,-79.377529,Danish Pastry House,43.654574,-79.38074,Bakery
300,St. James Town,43.651734,-79.375554,Stonemill Bread,43.648668,-79.37161,Bakery
356,Berczy Park,43.645196,-79.373855,Stonemill Bread,43.648668,-79.37161,Bakery
374,Berczy Park,43.645196,-79.373855,Carousel Bakery,43.648707,-79.37158,Bakery
486,Richmond / Adelaide / King,43.650542,-79.384116,Brick Street Bakery,43.648815,-79.380605,Bakery
546,Richmond / Adelaide / King,43.650542,-79.384116,Forno Cultura,43.648533,-79.382535,Bakery
574,Toronto Dominion Centre / Design Exchange,43.646923,-79.381626,Brick Street Bakery,43.648815,-79.380605,Bakery


### Determine the number of bakeries in each neighborhood

In [39]:
downtown_bakery.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,2,2,2,2,2,2
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,1,1,1,1,1,1
Commerce Court / Victoria Hotel,1,1,1,1,1,1
First Canadian Place / Underground city,1,1,1,1,1,1
Garden District / Ryerson,1,1,1,1,1,1
Kensington Market / Chinatown / Grange Park,2,2,2,2,2,2
Regent Park / Harbourfront,1,1,1,1,1,1
Richmond / Adelaide / King,2,2,2,2,2,2
St. James Town,1,1,1,1,1,1
St. James Town / Cabbagetown,2,2,2,2,2,2


### Clustering Downtown Toronto neighborhoods

In [40]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")
downtown_onehot.head()

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,...,Theater,Theme Park,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
# add neighborhood column back to dataframe
downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 
downtown_onehot.head()

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,...,Theater,Theme Park,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [42]:
downtown_onehot.shape

(1157, 185)

In [43]:
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,...,Theater,Theme Park,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.015152,0.0,0.0,0.0,0.0,0.015152,0.0,0.015152,...,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.015152
1,CN Tower / King and Spadina / Railway Lands / ...,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,...,0.018519,0.0,0.0,0.0,0.0,0.018519,0.018519,0.0,0.0,0.0
3,Christie,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.011628,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,...,0.011628,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.011628
5,Commerce Court / Victoria Hotel,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01
6,First Canadian Place / Underground city,0.04,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
7,Garden District / Ryerson,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0
8,Harbourfront East / Union Station / Toronto Is...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Kensington Market / Chinatown / Grange Park,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.04878,0.0,0.04878,0.02439,0.0,0.0


#### Print each neighborhood along with the top 5 most common venues

In [44]:
num_top_venues = 5

for hood in downtown_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2        Lounge  0.03
3        Bakery  0.03
4    Restaurant  0.03


----CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport----
               venue  freq
0        Coffee Shop  0.06
1               Café  0.06
2               Park  0.05
3  French Restaurant  0.05
4         Restaurant  0.05


----Central Bay Street----
                       venue  freq
0                Coffee Shop  0.15
1                       Café  0.06
2  Middle Eastern Restaurant  0.06
3         Italian Restaurant  0.04
4            Bubble Tea Shop  0.04


----Christie----
           venue  freq
0           Café  0.25
1  Grocery Store  0.25
2     Playground  0.08
3     Baby Store  0.08
4    Coffee Shop  0.08


----Church and Wellesley----
                 venue  freq
0          Coffee Shop  0.13
1  Japanese Restaurant  0.07
2     Sushi Restaurant  0.05
3           Restaura

### Write a function to sort the venues in descending order.

In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Create the new pandas dataframe and display the top 10 venues for each neighborhood.

In [46]:
import numpy as np

In [47]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Lounge,Bakery,Café,Seafood Restaurant,Breakfast Spot,Cheese Shop,Beer Bar,Restaurant
1,CN Tower / King and Spadina / Railway Lands / ...,Café,Coffee Shop,Restaurant,French Restaurant,Park,Bar,Speakeasy,Japanese Restaurant,Italian Restaurant,Lounge
2,Central Bay Street,Coffee Shop,Middle Eastern Restaurant,Café,Breakfast Spot,Restaurant,Clothing Store,Plaza,Sandwich Place,Bubble Tea Shop,Italian Restaurant
3,Christie,Café,Grocery Store,Park,Coffee Shop,Candy Store,Playground,Athletics & Sports,Baby Store,Farmers Market,Farm
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Pub,Grocery Store,Smoke Shop,Hotel,Dance Studio,Gay Bar
5,Commerce Court / Victoria Hotel,Coffee Shop,Restaurant,Hotel,Café,Italian Restaurant,American Restaurant,Japanese Restaurant,Gym,Deli / Bodega,Seafood Restaurant
6,First Canadian Place / Underground city,Coffee Shop,Café,Hotel,American Restaurant,Gym,Japanese Restaurant,Restaurant,Deli / Bodega,Asian Restaurant,Steakhouse
7,Garden District / Ryerson,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Hotel,Restaurant,Café,Bar,Italian Restaurant,Cosmetics Shop
8,Harbourfront East / Union Station / Toronto Is...,Harbor / Marina,Theme Park,Fast Food Restaurant,Park,Farm,Distribution Center,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store
9,Kensington Market / Chinatown / Grange Park,Café,Mexican Restaurant,Coffee Shop,Bakery,Vietnamese Restaurant,Gaming Cafe,Vegetarian / Vegan Restaurant,Cheese Shop,Record Shop,Cocktail Bar


## Cluster Neighborhoods

In [48]:
# set number of clusters
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 0, 0, 0, 0, 2, 0], dtype=int32)

In [49]:
downtown.shape

(19, 5)

In [50]:
downtown

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.681893,-79.376706
1,M4X,Downtown Toronto,St. James Town / Cabbagetown,43.667656,-79.367326
2,M4Y,Downtown Toronto,Church and Wellesley,43.666659,-79.381472
3,M5A,Downtown Toronto,Regent Park / Harbourfront,43.650964,-79.353041
4,M5B,Downtown Toronto,Garden District / Ryerson,43.657491,-79.377529
5,M5C,Downtown Toronto,St. James Town,43.651734,-79.375554
6,M5E,Downtown Toronto,Berczy Park,43.645196,-79.373855
7,M5G,Downtown Toronto,Central Bay Street,43.656072,-79.385653
8,M5H,Downtown Toronto,Richmond / Adelaide / King,43.650542,-79.384116
9,M5J,Downtown Toronto,Harbourfront East / Union Station / Toronto Is...,43.62375,-79.3692


In [51]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Clusters', kmeans.labels_)
downtown_merged = downtown

In [52]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
downtown_merged.head() # check the last columns!

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.681893,-79.376706,1,Playground,Grocery Store,Park,Candy Store,Yoga Studio,Dog Run,Farmers Market,Farm,Falafel Restaurant,Ethiopian Restaurant
1,M4X,Downtown Toronto,St. James Town / Cabbagetown,43.667656,-79.367326,0,Coffee Shop,Park,Restaurant,Pub,Italian Restaurant,Pizza Place,Bakery,Café,Cosmetics Shop,Caribbean Restaurant
2,M4Y,Downtown Toronto,Church and Wellesley,43.666659,-79.381472,0,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Pub,Grocery Store,Smoke Shop,Hotel,Dance Studio,Gay Bar
3,M5A,Downtown Toronto,Regent Park / Harbourfront,43.650964,-79.353041,0,Pub,Café,Athletics & Sports,Mediterranean Restaurant,Chocolate Shop,French Restaurant,Intersection,Coffee Shop,Mexican Restaurant,Bank
4,M5B,Downtown Toronto,Garden District / Ryerson,43.657491,-79.377529,0,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Hotel,Restaurant,Café,Bar,Italian Restaurant,Cosmetics Shop


In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Clusters']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining each cluster in Downtown Toronto

In [54]:
downtown_merged.loc[downtown_merged['Clusters'] == 0, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,0,Coffee Shop,Park,Restaurant,Pub,Italian Restaurant,Pizza Place,Bakery,Café,Cosmetics Shop,Caribbean Restaurant
2,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Pub,Grocery Store,Smoke Shop,Hotel,Dance Studio,Gay Bar
3,Downtown Toronto,0,Pub,Café,Athletics & Sports,Mediterranean Restaurant,Chocolate Shop,French Restaurant,Intersection,Coffee Shop,Mexican Restaurant,Bank
4,Downtown Toronto,0,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Hotel,Restaurant,Café,Bar,Italian Restaurant,Cosmetics Shop
5,Downtown Toronto,0,Coffee Shop,Café,American Restaurant,Cosmetics Shop,Gastropub,Cocktail Bar,Seafood Restaurant,Gym,Moroccan Restaurant,Creperie
6,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Lounge,Bakery,Café,Seafood Restaurant,Breakfast Spot,Cheese Shop,Beer Bar,Restaurant
7,Downtown Toronto,0,Coffee Shop,Middle Eastern Restaurant,Café,Breakfast Spot,Restaurant,Clothing Store,Plaza,Sandwich Place,Bubble Tea Shop,Italian Restaurant
8,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Clothing Store,Deli / Bodega,Salad Place,Gym,Sushi Restaurant,Thai Restaurant,Hotel
10,Downtown Toronto,0,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Seafood Restaurant,Deli / Bodega,Salad Place,American Restaurant,Tea Room
11,Downtown Toronto,0,Coffee Shop,Restaurant,Hotel,Café,Italian Restaurant,American Restaurant,Japanese Restaurant,Gym,Deli / Bodega,Seafood Restaurant


In [55]:
downtown_merged.loc[downtown_merged['Clusters'] == 1, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Playground,Grocery Store,Park,Candy Store,Yoga Studio,Dog Run,Farmers Market,Farm,Falafel Restaurant,Ethiopian Restaurant


In [56]:
downtown_merged.loc[downtown_merged['Clusters'] == 2, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Downtown Toronto,2,Harbor / Marina,Theme Park,Fast Food Restaurant,Park,Farm,Distribution Center,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store


In [57]:
downtown_merged.loc[downtown_merged['Clusters'] == 3, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Downtown Toronto,3,Café,Grocery Store,Park,Coffee Shop,Candy Store,Playground,Athletics & Sports,Baby Store,Farmers Market,Farm


In [58]:
downtown_merged.loc[downtown_merged['Clusters'] == 4, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Clusters,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Downtown Toronto,4,Coffee Shop,Sushi Restaurant,Café,Middle Eastern Restaurant,Italian Restaurant,Bookstore,Juice Bar,Burrito Place,Sandwich Place,Yoga Studio


### Results and Discussion
* Since there is a bakery within the UofT neighborhood, it will not be ideal to set up a new bakery within the vicinity in order to avoid competition
* As UofT is located within the Downtown Toronto borough, different neighborhood within this borough were considered to be potential candidates for the location of the new bakery
* The closest neighborhoods to the UofT campus that do not have bakeries were identified to be Queen's Park, Church and Wellesley, and Central Bay
* After examining these three neighborhoods, Church and Wellesly was found to be the most suitable location because it was similar to the UofT neighborhood (based on cluster analysis) and was closer to the UofT neighborhood when compared to the Central Bay neighborhood

### Conclusion
* The purpose of this project was to identify a potential location for a new bakery based on the founders' criteria.
* Narrowing the choice of location to Church and Wellesley is the first step to for the bakery's founders as they will need to perform further investigation and market research on this neighborhood to assess its potential for their new venture