# Coursera Capstone Project

## Best Location to Start an Ice Cream Store in SFO

### Introduction

**Background:** A family wants to start a business in Toronto. So they need to find out the best areas that can attract the most customers. They are planning on starting an ice cream store, so they hope to find the the region in Toronto that has the lot of business around it. A neighborhood with a lot of businesses would appeal to them as there will be a lot of people visiting those areas.

**Introduction/Business Problem:** They want to be able to find the best locations to start a restaurant in the SFO. They want to be able to attract the most customers and based on the number of nearby restaurants. They want to see if they can be successful in that area.

**Target Audience:** This problem will help people who are looking to start a new food store or restaurant. They want to be able to find the most popular and busiest locations so they can attract a vast number of customers. So its mainly useful for them as they want to find the most popular locations in the Toronto.

### Data

#### We will need a list of neighborhoods in Toronto. We are referenceing Wikipedia here.

In [1]:
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# !conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [None]:
listy = [pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]]
postal_codes = []
borough = []
neighborhood = []

for i in range(0,len(listy)):
    for y in range(0,len(listy[i])):
        for z in range(0,len(listy[i][y])):
            postal_codes.append(listy[i][y][z][0:3])
            rest = listy[i][y][z][3:].split("(")
            if(len(rest)==1):
                borough.append(None)
                neighborhood.append(None)
            
            else:
                borough.append(rest[0])
                neighborhoods = rest[1].split(")")
                
                neighborhoods = neighborhoods[0].split(" / ")
                for n in neighborhoods:
                    neighborhood.append(n)



In [23]:
neighborhood = list(set(neighborhood))

In [24]:
df = pd.DataFrame(neighborhood, columns = ['Neighborhood'])
df.dropna(axis = 0, how = 'any', inplace = True)

#### Now we are going to get the longitude and latitude for each neighborhood in Toronto. And we are going to continue cleaning the data.

In [26]:
def get_latlng(neighborhood):
    
    # Initialize the Location (lat. and long.) to "None"
    geolocator = Nominatim(user_agent="cn_explorer")
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    location = geolocator.geocode('{}, Toronto, Canada'.format(neighborhood))
    if location != None:
        return (location.latitude, location.longitude)
    return "None"

In [27]:
latitude = []
longitude = []

for neighborhood in df['Neighborhood']:
    location = get_latlng(neighborhood)
    latitude.append(location[0])
    longitude.append(location[1])

In [28]:
df["Latitude"] = latitude
df["Longitude"] = longitude

In [30]:
df = df[df.Latitude != 'N']

In [31]:
df.reset_index(inplace = True)
df.drop('index', axis =1, inplace = True)
df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,South Steeles,43.8162,-79.3145
1,East Birchmount Park,43.7142,-79.2711
2,Rexdale,43.7214,-79.5655
3,Bloordale Gardens,43.6353,-79.5637
4,Highland Creek,43.7901,-79.1733
...,...,...,...
188,Church and Wellesley,43.6655,-79.3838
189,Richmond,43.8126,-79.2634
190,Parkview Hill,43.7063,-79.3219
191,The Beaches West,43.671,-79.2967


#### We are know going to find the location of Toronto so we can get the nearby venues

In [32]:
address = 'Toronto'

geolocator = Nominatim(user_agent="cn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [35]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Now we are going to use Foursquare to get the venue data for each neighborhood

In [36]:
CLIENT_ID = 'QIXPNKAAW2042ZPIKIMRT1RUOI2HPHTBAKDTE4KDAFBAVWZK' # your Foursquare ID
CLIENT_SECRET = 'Q0ABZAPOQJD2NP2E5XJIMMRKDXDOQFQGAQ3HXWBRYRO4P4J0'
VERSION = '20180605' # Foursquare API version
LIMIT = 20


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [38]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

South Steeles
East Birchmount Park
Rexdale
Bloordale Gardens
Highland Creek
The Danforth East
Woburn
Agincourt
Chinatown
Willowdale
Milliken
Cabbagetown
Fairview
Yorkville
Dufferin
West Hill
Harbourfront
The Junction South
Silver Hills
The Queensway East
Rathnelly
Woodbine Downs
Harbourfront East
Princess Gardens
Swansea
Dorset Park
Humberwood
Enclave of M5E
Woodbine Gardens
Sullivan
Islington Avenue
Brockton
Mimico NW
India Bazaar
Humber Summit
South Niagara
Riverdale
Montgomery Road
Henry Farm
Upper Rouge
Golden Mile
Toronto Dominion Centre
Glencairn
Upwood Park
North Midtown
Eringate
Clarks Corners
Oriole
Old Mill North
Cliffcrest
St. Phillips
First Canadian Place
Cloverdale
CN Tower
Kingsview Village
Parkdale
Steeles East
Port Union
Bayview Village
Northwood Park
Union Station
Forest Hill North & West
Newtonbrook
The Queensway West
York Mills West
Markland Wood
Grange Park
Agincourt North
Martin Grove
King
Leaside
Hillcrest Village
Victoria Hotel
Deer Park
Maple Leaf Park
Tam O'Sha

In [40]:
toronto_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,South Steeles,43.816178,-79.314538,Sanwood Park,43.817374,-79.310045,Playground
1,East Birchmount Park,43.714167,-79.271109,Shell,43.711699,-79.271869,Gas Station
2,East Birchmount Park,43.714167,-79.271109,The Beer Store,43.713765,-79.272981,Beer Store
3,East Birchmount Park,43.714167,-79.271109,Birchmount Rd & St Clair Ave E,43.714040,-79.271507,Intersection
4,East Birchmount Park,43.714167,-79.271109,Fu Yao Supermarket,43.711709,-79.271876,Grocery Store
...,...,...,...,...,...,...,...
2411,Weston,43.700161,-79.516247,EZ LaundroMat,43.698615,-79.512607,Laundromat
2412,Weston,43.700161,-79.516247,Dollarama,43.701049,-79.510984,Discount Store
2413,Weston,43.700161,-79.516247,Raymore Park,43.696247,-79.514491,Park
2414,Weston,43.700161,-79.516247,Olympic convenience store,43.704486,-79.515789,Convenience Store


#### We have acquired all the data