# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Insights and conclusion](#conclusion)

### Introduction: Business Problem <a name="introduction"></a>

In this project I'll be trying to find the Neighborhoods which are suitable for opening a new restaurant in the Toronto city. This project specifically targets the stakeholders interested in opening an **Indian restaurant** in **Toronto city**, Canada.

We would primarily target the locations that are **as close to city center as possible** which have the maximum probability to attract customers for the new restaurant. Since there are lots of restaurants in Toronto we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Indian restaurants in vicinity**.

Using Data science techniques, a few potential neighborhoods will be selected which can drive the new restaurant business for the stakeholders.

### Data <a name="data"></a>

The Data for this project is collected from multiple sources:
* I'll start preparing the data from scratch by scraping a wiki page, to get the **Neighborhood** details of the **Toronto city**. 
* Then I'll be using geolocator to get the latitudes and longitudes of the Neighborhoods of the Toronto city.
* Then I'll be using Foursquare API to get the venue locations of local restaurants and Indian restaurants in each neighborhood of Toronto.



In [1]:
# Scraping the wiki page
# Import the required libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [3]:
#  Get the requested page
page = requests.get(wiki_url)

In [4]:
# Get the text data from the response object
data = page.text

In [5]:
# Parse the page
soup = BeautifulSoup(data)

In [6]:
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":890001695,"wgRevisionId":890001695,"wgArticleId":539066,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wg

In [7]:
# Extract the table tag
wiki_table = soup.find('table')
wiki_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

In [8]:
# Extract the table rows
table_row = wiki_table.find_all('tr')
table_row

[<tr>
 <th>Postcode</th>
 <th>Borough</th>
 <th>Neighbourhood
 </th></tr>, <tr>
 <td>M1A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>, <tr>
 <td>M2A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>, <tr>
 <td>M3A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
 </td></tr>, <tr>
 <td>M4A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
 </td></tr>, <tr>
 <td>M5A</td>
 <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
 <td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
 </td></tr>, <tr>
 <td>M5A</td>
 <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
 <td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
 </td></tr>, <tr>
 <td>M6A</td>
 <td

In [9]:
# Get the data from each table row

dataframe = []
for row in table_row:
    table_data = row.find_all('td')
    data = [i.text.rstrip() for i in table_data]
    dataframe.append(data)
 
del dataframe[0]
dataframe

[['M1A', 'Not assigned', 'Not assigned'],
 ['M2A', 'Not assigned', 'Not assigned'],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront'],
 ['M5A', 'Downtown Toronto', 'Regent Park'],
 ['M6A', 'North York', 'Lawrence Heights'],
 ['M6A', 'North York', 'Lawrence Manor'],
 ['M7A', "Queen's Park", 'Not assigned'],
 ['M8A', 'Not assigned', 'Not assigned'],
 ['M9A', 'Etobicoke', 'Islington Avenue'],
 ['M1B', 'Scarborough', 'Rouge'],
 ['M1B', 'Scarborough', 'Malvern'],
 ['M2B', 'Not assigned', 'Not assigned'],
 ['M3B', 'North York', 'Don Mills North'],
 ['M4B', 'East York', 'Woodbine Gardens'],
 ['M4B', 'East York', 'Parkview Hill'],
 ['M5B', 'Downtown Toronto', 'Ryerson'],
 ['M5B', 'Downtown Toronto', 'Garden District'],
 ['M6B', 'North York', 'Glencairn'],
 ['M7B', 'Not assigned', 'Not assigned'],
 ['M8B', 'Not assigned', 'Not assigned'],
 ['M9B', 'Etobicoke', 'Cloverdale'],
 ['M9B', 'Etobicoke', 'Islington'],
 ['M9B', 

In [10]:
# Declare labels to assign to the dataframe
labels = ['Postcode', 'Borough', 'Neighborhood']

In [11]:
# Create pandas dataframe and load it with data
wiki_table_df = pd.DataFrame.from_records(dataframe, columns=labels)

In [12]:
# Remove the data cells with 'Not assigned' value in Borough
wiki_table_df = wiki_table_df[wiki_table_df.Borough != 'Not assigned']

In [13]:
wiki_table_df.head()
# wiki_table_df.shape

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [14]:
# Replace the 'Not assigned' in Neighborhood with the respective Borough values.
wiki_table_df.Neighborhood = wiki_table_df.Neighborhood.replace('Not assigned', wiki_table_df.Borough)  

In [15]:
wiki_table_df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


In [16]:

del wiki_table_df['Borough']

In [17]:
wiki_table_df.shape

(211, 2)

In [18]:
wiki_table_df.head(5)

Unnamed: 0,Postcode,Neighborhood
2,M3A,Parkwoods
3,M4A,Victoria Village
4,M5A,Harbourfront
5,M5A,Regent Park
6,M6A,Lawrence Heights


In [19]:
# Read the geospatial data
geo_data = pd.read_csv('Geospatial_Coordinates.csv')
# Rename the name of the common column
geo_data = geo_data.rename(columns={'Postal Code': 'Postcode'})
geo_data.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [20]:
# Merge both the dataframes to join the Latitude and Longitude
wiki_table_df = pd.merge(wiki_table_df, geo_data, on = 'Postcode')
wiki_table_df.head()

Unnamed: 0,Postcode,Neighborhood,Latitude,Longitude
0,M3A,Parkwoods,43.753259,-79.329656
1,M4A,Victoria Village,43.725882,-79.315572
2,M5A,Harbourfront,43.65426,-79.360636
3,M5A,Regent Park,43.65426,-79.360636
4,M6A,Lawrence Heights,43.718518,-79.464763


In [21]:
wiki_table_df.shape

(211, 4)

In [22]:
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors

# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [23]:
#  Get the latitude and longitude values of Torrronto.

address = 'Downtown Toronto, Ontario, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)

toronto_lat = location.latitude
toronto_lon = location.longitude
toronto_center = [toronto_lat, toronto_lon]

print('The geograpical coordinates of {} are {} {}.'.format(address, toronto_lat, toronto_lon))

The geograpical coordinates of Downtown Toronto, Ontario, Canada are 43.655115 -79.380219.


In [24]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(longitude, latitude):
    '''Converts latitude, longitude pair to an equivalent X,Y coordinates'''
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, longitude, latitude)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    '''Converts X,Y pair to an equivalent latitude, longitude coordinates'''
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    '''Calculates the Euclidean distance between two points'''
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Toronto center longitude={}, latitude={}'.format(toronto_lon, toronto_lat))
toronto_X, toronto_Y = lonlat_to_xy(toronto_lon, toronto_lat)
print('Toronto center UTM X={}, Y={}'.format(toronto_X, toronto_Y))
tor_lon, tor_lat = xy_to_lonlat(toronto_X,toronto_Y)
print('Toronto center longitude={}, latitude={}'.format(tor_lon, tor_lat))

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/38/b6/b53f19062afd49bb5abd049aeed36f13bf8d57ef8f3fa07a5203531a0252/Shapely-1.6.4.post2-cp36-cp36m-manylinux1_x86_64.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 12.8MB/s 
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Coordinate transformation check
-------------------------------
Toronto center longitude=-79.380219, latitude=43.655115
Toronto center UTM X=-5310314.843463886, Y=10507080.468591856
Toronto center longitude=-79.38021900000047, latitude=43.65511499999977


In [25]:
distances_from_center = []
xs = []
ys = []

for Latitude in wiki_table_df.itertuples(index=False):
    lat, lon = (Latitude[2:4])
    x, y = lonlat_to_xy(lon, lat)
    dist = calc_xy_distance(toronto_X, toronto_Y, x, y)
    xs.append(x)
    ys.append(y)
    distances_from_center.append(dist) 



In [26]:
wiki_table_df['Dist_from_center'] = distances_from_center
wiki_table_df['X'] = xs
wiki_table_df['Y'] = ys


In [27]:
wiki_table_df.head()

Unnamed: 0,Postcode,Neighborhood,Latitude,Longitude,Dist_from_center,X,Y
0,M3A,Parkwoods,43.753259,-79.329656,16757.753354,-5295352.0,10499540.0
1,M4A,Victoria Village,43.725882,-79.315572,13582.956021,-5299879.0,10498390.0
2,M5A,Harbourfront,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0
3,M5A,Regent Park,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0
4,M6A,Lawrence Heights,43.718518,-79.464763,14114.081861,-5299146.0,10515710.0


In [28]:
wiki_table_df.shape

(211, 7)

In [29]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_lat, toronto_lon], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(wiki_table_df['Latitude'], wiki_table_df['Longitude'], wiki_table_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.9,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [30]:
# Foursquare API credentials

CLIENT_ID = 'SZRMZIJBD3STU4VC3E5B4XNTYKNUF2DCWH5DFAYQXP34WAYB' # your Foursquare ID
CLIENT_SECRET = 'LAYRHUG4GBNIQEMEFOHE0MMK1NF5F3SKHQQQBLDW3VWBGHWS' # your Foursquare Secret
VERSION = '20180323'



In [31]:
# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

indian_restaurant_categories = ['4bf58dd8d48988d10f941735', '54135bf5e4b08f3d2429dfe5', '54135bf5e4b08f3d2429dff3',
                               '54135bf5e4b08f3d2429dff5', '54135bf5e4b08f3d2429dfe2', '54135bf5e4b08f3d2429dff2',
                               '54135bf5e4b08f3d2429dfe1', '54135bf5e4b08f3d2429dfe3', '54135bf5e4b08f3d2429dfe8',
                               '54135bf5e4b08f3d2429dfe9', '54135bf5e4b08f3d2429dfe6', '54135bf5e4b08f3d2429dfdf',
                               '54135bf5e4b08f3d2429dfe4', '54135bf5e4b08f3d2429dfe7', '54135bf5e4b08f3d2429dfea',
                               '54135bf5e4b08f3d2429dfeb', '54135bf5e4b08f3d2429dfed', '54135bf5e4b08f3d2429dfee',
                               '54135bf5e4b08f3d2429dff4', '54135bf5e4b08f3d2429dfe0', '54135bf5e4b08f3d2429dfdd',
                               '54135bf5e4b08f3d2429dff6', '54135bf5e4b08f3d2429dfef', '54135bf5e4b08f3d2429dff0',
                               '54135bf5e4b08f3d2429dff1', '54135bf5e4b08f3d2429dfde', '54135bf5e4b08f3d2429dfec']


def is_restaurant(categories, specific_filter=None):
    '''Returns the food category venues which come under restaurant category'''
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Ontario', '')
    address = address.replace(', Canada', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [32]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found italian restaurants

import pickle

def get_restaurants(lats, lons):
    '''Returns all restaurants in a given location'''
    restaurants = {}
    indian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_indian = is_restaurant(venue_categories, specific_filter=indian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_indian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_indian:
                    indian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, indian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('indian_restaurants_350.pkl', 'rb') as f:
        indian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, indian_restaurants, location_restaurants = get_restaurants(wiki_table_df['Latitude'], wiki_table_df['Longitude'])
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('indian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(indian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Restaurant data loaded.


In [33]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Indian restaurants:', len(indian_restaurants))
print('Percentage of Indian restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 442
Total number of Indian restaurants: 16
Percentage of Indian restaurants: 3.62%
Average number of restaurants in neighborhood: 3.9245283018867925


### Methodology <a name="methodology"></a>

### Analysis <a name="analysis"></a>

In [34]:
# Visualize the Restaurants in Toronto
map_toronto = folium.Map(location=toronto_center, zoom_start=12)
folium.Marker(toronto_center, popup='Downtown Toronto, Ontario, Canada').add_to(map_toronto)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_indian = res[6]
    color = 'red' if is_indian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_toronto)
map_toronto

In [35]:
toronto_boroughs_url = 'https://raw.githubusercontent.com/madhulika95b/Coursera_capstone/master/neighbourhoods.js'
toronto_boroughs = requests.get(toronto_boroughs_url).json()

def boroughs_style(feature):
    return { 'color': 'gray', 'fill': False }


In [36]:
rest_latlons = [[res[2], res[3]] for res in restaurants.values()]

In [37]:
from folium.plugins import HeatMap

map_toronto_data = folium.Map(location=toronto_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_toronto_data) #cartodbpositron cartodbdark_matter
HeatMap(rest_latlons).add_to(map_toronto_data)
folium.Marker(toronto_center).add_to(map_toronto_data)
folium.Circle(toronto_center, radius=1000, fill=False, color='black').add_to(map_toronto_data)
folium.Circle(toronto_center, radius=5000, fill=False, color='black').add_to(map_toronto_data)
folium.Circle(toronto_center, radius=3000, fill=False, color='black').add_to(map_toronto_data)
folium.GeoJson(toronto_boroughs, style_function=boroughs_style, name='geojson').add_to(map_toronto_data)
map_toronto_data

In [38]:
location_restaurants_count = [len(res) for res in location_restaurants]
wiki_table_df['Restaurants_in_area'] = location_restaurants_count
wiki_table_df.head()

ValueError: Length of values does not match length of index

##### Compute the distance to the nearest Indian restaurant from each Neighborhood.

In [39]:
distances_to_indian_restaurant = []
wiki_table_df.head(10)
for area_x, area_y in zip(wiki_table_df['X'], wiki_table_df['Y']):
    min_distance = 10000
    for res in indian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_indian_restaurant.append(min_distance)

wiki_table_df['Distance_to_Indian_restaurant'] = distances_to_indian_restaurant
wiki_table_df.head(10)

Unnamed: 0,Postcode,Neighborhood,Latitude,Longitude,Dist_from_center,X,Y,Distance_to_Indian_restaurant
0,M3A,Parkwoods,43.753259,-79.329656,16757.753354,-5295352.0,10499540.0,6199.479257
1,M4A,Victoria Village,43.725882,-79.315572,13582.956021,-5299879.0,10498390.0,5228.901798
2,M5A,Harbourfront,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0,1425.528596
3,M5A,Regent Park,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0,1425.528596
4,M6A,Lawrence Heights,43.718518,-79.464763,14114.081861,-5299146.0,10515710.0,5881.129383
5,M6A,Lawrence Manor,43.718518,-79.464763,14114.081861,-5299146.0,10515710.0,5881.129383
6,M7A,Queen's Park,43.662301,-79.389494,1576.342147,-5309053.0,10508030.0,394.234936
7,M9A,Islington Avenue,43.667856,-79.532242,17780.213618,-5306315.0,10524400.0,10000.0
8,M1B,Rouge,43.806686,-79.194353,32428.186484,-5288536.0,10483050.0,10000.0
9,M1B,Malvern,43.806686,-79.194353,32428.186484,-5288536.0,10483050.0,10000.0


In [40]:
print('Average distance to closest Indian restaurant from each area center:', wiki_table_df['Distance_to_Indian_restaurant'].mean())

Average distance to closest Indian restaurant from each area center: 5682.611283408463


##### Lets visualize the data using a Heat Map before applying our criteria.

##### I'll make the data more specific by considering the Neighborhoods in Toronto within 10 kms of distance from Toronto center. 

In [41]:
wiki_table_df = wiki_table_df[wiki_table_df.Dist_from_center <= 10000]
wiki_table_df.shape

(74, 8)

##### Let's visualize the data with a Heat Map.

In [42]:
restaurants, indian_restaurants, location_restaurants = get_restaurants(wiki_table_df['Latitude'], wiki_table_df['Longitude'])



restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

indian_latlons = [[res[2], res[3]] for res in indian_restaurants.values()]

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [43]:

map_toronto_restaurants = folium.Map(location=toronto_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_toronto_restaurants) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_toronto_restaurants)
folium.Marker(toronto_center).add_to(map_toronto_restaurants)
folium.Circle(toronto_center, radius=1000, fill=False, color='black').add_to(map_toronto_restaurants)
folium.Circle(toronto_center, radius=5000, fill=False, color='black').add_to(map_toronto_restaurants)
folium.Circle(toronto_center, radius=3000, fill=False, color='black').add_to(map_toronto_restaurants)
folium.GeoJson(toronto_boroughs, style_function=boroughs_style, name='geojson').add_to(map_toronto_restaurants)
map_toronto_restaurants

In [44]:
# from folium import plugins

map_toronto_indian_restaurants = folium.Map(location=toronto_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_toronto_indian_restaurants) #cartodbpositron cartodbdark_matter
HeatMap(indian_latlons).add_to(map_toronto_indian_restaurants)
folium.Marker(toronto_center).add_to(map_toronto_indian_restaurants)
folium.Circle(toronto_center, radius=1000, fill=False, color='black').add_to(map_toronto_indian_restaurants)
folium.Circle(toronto_center, radius=5000, fill=False, color='black').add_to(map_toronto_indian_restaurants)
folium.Circle(toronto_center, radius=3000, fill=False, color='black').add_to(map_toronto_indian_restaurants)
folium.GeoJson(toronto_boroughs, style_function=boroughs_style, name='geojson').add_to(map_toronto_indian_restaurants)
map_toronto_indian_restaurants

In [45]:
def get_direction(destination_x, origin_x, destination_y, origin_y):
    deltaX = destination_x - origin_x
    deltaY = destination_y - origin_y
    degrees_temp = math.atan2(deltaX, deltaY)/math.pi*180
    if degrees_temp < 0:
        degrees_final = 360 + degrees_temp
    else:
        degrees_final = degrees_temp

    compass_brackets = ["N", "NE", "E", "SE", "S", "SW", "W", "NW", "N"]
    compass_lookup = round(degrees_final / 45)

    return compass_brackets[compass_lookup], degrees_final

In [46]:
direction_from_center = []
for borough in wiki_table_df.itertuples(index=False):
    x, y = (borough[5:7])
    direction = get_direction(x, toronto_X, y, toronto_Y)
    direction_from_center.append(direction[0])


In [47]:
wiki_table_df['Direction_from_center'] = direction_from_center

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [48]:
wiki_table_df.head()

Unnamed: 0,Postcode,Neighborhood,Latitude,Longitude,Dist_from_center,X,Y,Distance_to_Indian_restaurant,Direction_from_center
2,M5A,Harbourfront,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0,1425.528596,S
3,M5A,Regent Park,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0,1425.528596,S
6,M7A,Queen's Park,43.662301,-79.389494,1576.342147,-5309053.0,10508030.0,394.234936,NE
13,M5B,Ryerson,43.657162,-79.378937,359.91694,-5310006.0,10506900.0,277.852061,SE
14,M5B,Garden District,43.657162,-79.378937,359.91694,-5310006.0,10506900.0,277.852061,SE


In [49]:
wiki_table_df = wiki_table_df[wiki_table_df.Direction_from_center != 'SE']
wiki_table_df.shape

(65, 9)

In [50]:
wiki_table_df = wiki_table_df[wiki_table_df.Direction_from_center != 'NW']
wiki_table_df.shape

(55, 9)

In [51]:
wiki_table_df = wiki_table_df[wiki_table_df.Direction_from_center != 'N']
wiki_table_df.shape

(43, 9)

##### Now I'll narrow down my search in Toronto among the Neighborhoods within 5 kms of distance from Toronto center. 

In [52]:
wiki_table_df = wiki_table_df[wiki_table_df.Dist_from_center <= 5000]
wiki_table_df.shape

(24, 9)

##### Considering only the Neighborhoods with less than 5 restaurants will make them more suitable for opening a new restaurant.

In [53]:
wiki_table_df = wiki_table_df[wiki_table_df.Restaurants_in_area <= 5]
wiki_table_df.shape

AttributeError: 'DataFrame' object has no attribute 'Restaurants_in_area'

##### After analyzing the data I'm left with 10 Neighborhoods which become potential targets for stakeholders to open a new Indian restaurant. Let's view the Neighborhoods that'll be presented to the stakeholder as the final suggestion.

In [54]:
wiki_table_df

Unnamed: 0,Postcode,Neighborhood,Latitude,Longitude,Dist_from_center,X,Y,Distance_to_Indian_restaurant,Direction_from_center
2,M5A,Harbourfront,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0,1425.528596,S
3,M5A,Regent Park,43.65426,-79.360636,2280.198876,-5310700.0,10504830.0,1425.528596,S
6,M7A,Queen's Park,43.662301,-79.389494,1576.342147,-5309053.0,10508030.0,394.234936,NE
27,M5C,St. James Town,43.651494,-79.375418,804.62411,-5310952.0,10506590.0,384.476804,SW
37,M5E,Berczy Park,43.644771,-79.373306,1840.664029,-5312049.0,10506460.0,630.782814,W
41,M5G,Central Bay Street,43.657952,-79.387383,948.365068,-5309772.0,10507860.0,461.423226,NE
61,M5J,Harbourfront East,43.640816,-79.381752,2296.169144,-5312571.0,10507510.0,165.477218,W
62,M5J,Toronto Islands,43.640816,-79.381752,2296.169144,-5312571.0,10507510.0,165.477218,W
63,M5J,Union Station,43.640816,-79.381752,2296.169144,-5312571.0,10507510.0,165.477218,W
74,M5K,Design Exchange,43.647177,-79.381576,1280.549005,-5311561.0,10507380.0,1156.059507,W


### Insights and conclusion  <a name="conclusion"></a>

From the above analysis, it can be understood that among **212 Neighborhoods** in the city of **Toronto,** **74 Neighborhoods** are within 10 kms from the Center of Toronto city which makes them the most visited/populated Neighborhoods in the city. 

After visualizing these Neighborhoods using Heat Map, it can be observed that a lot of Restaurants are located in North-West, South-East and North directions from Toronto center. So our targeted Neighborhoods will be those that are not densely located with restaurants. By omitting these Neighborhoods we are left with **43 Neighborhoods**.

This analysis can be further made more precise by considering those boroughs that are within 5kms of distance from the Toronto City which come down to **24 Neighborhoods**.

Among these 24 Neighborhoods, there are **10 Neighborhoods** with less than 5 restaurants which means people living in those neighborhoods have to travel little farther if they wanna eat out. So these locations become potential locations for starting a new restaurant as the customers will be more attracted and the stakeholder would have least competition.