# Capstone Project - The Battle of Neighborhoods (Week 1)

## 1. Introduction/Business Problem

This section deals with the introduction to the business problem.

<b>Problem statement:</b> Toronto is Canada's largest city, the fourth largest in North America and home to people with diverse ethnic and cultural background. Over the years, Canada witnessed increasing in-migration from across the globe especially asian countries including India and China which contributes to a largest portion in the pie. 

<img src="https://www.canada.ca/content/dam/ircc/images/corporate/publications-manuals/annual-report-2018/en/chart-16-en.png">

The above diagram represents the admission of permanent residents by top 10 countries in 2017 to Canada (Source: Canada.ca). It is evident that Asian countries (China, India and Pakistan) are amongst the top countries from which large number of people are migrating to Canada. Considering the increasing number of population of Asians in Canada and popularity of chinese cuisine across asian nations as well as across the globe, this project will be exploring the right places to start a chinese restaurant, considering various factors such as proximity to tourist attractions, in Toronto city. 

In this project we will be identifying ideal locations for starting a chinese restaurant in the Canadian city of Toronto. Tripadvisor ranked <b> CN Tower </b> as third amongst the top tourist attractions and first amongst the important landmarks in Canada. This is one of the most important landmark in Canada. Hence our focus will be to identify suitable locations near CN Tower that can be accessed very easily and fast from this landmark.

We will also try to identify and visualize suitable <b>neighborhoods</b> for starting <b> targeted marketing and initial promotion activities </b> of the restaurant launching (near each of the identified locations for starting the restaurant).   

Another problem we are trying to explore is that the selection of locations should be considering the <b> proximity to other places of interest, such as Royal Ontario Museum </b> which is another important landmark, thereby increasing maximum footfall of tourists.

# 2. Data

The above problem statement gives us some indication on the data sets we will be requiring for solving the business problem. For this assignment we will be requiring following data from various information sources,

<ul style="list-style-type:square;">
<li>Data of all the neighborhoods in toronto</li>
  <li>Data of all restaurants near to places of interest such as <b> CN Tower and Royal Ontario Museum</b></li>
  <li>Data of all chinese restaurants near to places of interest such as <b> CN Tower and Royal Ontario Museum </b></li>
</ul>

For meeting the above data requirements we will be using the following data sources,

<li><a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">Wikipedia for retrieving data of neighborhoods in Toronto </a href> </li>
  <li>Foursquare for retrieving data of <b>all restaurants</b> near to places of interest such as <b> CN Tower and Royal Ontario Museum</b></li>
  <li>Foursquare for retrieving Data of <b>all chinese restaurants</b> near to places of interest such as <b> CN Tower and Royal Ontario Museum </b></li>
</ul>

### 2.1 Retrieving data of neighborhoods in Toronto

#### Let's retrieve the data of all neighborhoods in toronto from Wikipedia using the python package <b> Beutiful Soup </b> for HTML parsing 

First we will install all the necessary libraries required at this stage including Beautful Soup Package

In [18]:
import pandas as pd
import numpy as np

import requests

from bs4 import BeautifulSoup
from pandas.io.json import json_normalize

!pip install geopy
from geopy.geocoders import Nominatim 

import matplotlib.cm as cm
import matplotlib.colors as colors


!pip install sklearn
from sklearn.cluster import KMeans

!pip install folium
import folium 



We are good to go!

Now we will use the Beautiful Soup library for parsing the data of <b> neighborhoods in toronto from Wikipedia </b> 

In [19]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(source, 'html5lib')

postal_codes_dict = {} # initialize an empty dictionary to save the data in
for table_cell in soup.find_all('td'):
    try:
        postal_code = table_cell.p.b.text # get the postal code
        postal_code_investigate = table_cell.span.text
        neighborhoods_data = table_cell.span.text # get the rest of the data in the cell
        borough = neighborhoods_data.split('(')[0] # get the borough in the cell
        
        # if the cell is not assigned then ignore it
        if neighborhoods_data == 'Not assigned':
            neighborhoods = []
        # else process the data and add it to the dictionary
        else:
            postal_codes_dict[postal_code] = {}
            
            try:
                neighborhoods = neighborhoods_data.split('(')[1]
            
                # remove parantheses from neighborhoods string
                neighborhoods = neighborhoods.replace('(', ' ')
                neighborhoods = neighborhoods.replace(')', ' ')

                neighborhoods_names = neighborhoods.split('/')
                neighborhoods_clean = ', '.join([name.strip() for name in neighborhoods_names])
            except:
                borough = borough.strip('\n')
                neighborhoods_clean = borough
 
            # add borough and neighborhood to dictionary
            postal_codes_dict[postal_code]['borough'] = borough
            postal_codes_dict[postal_code]['neighborhoods'] = neighborhoods_clean
    except:
        pass
    
# create an empty dataframe
columns = ['PostalCode', 'Borough', 'Neighborhood']
toronto_data = pd.DataFrame(columns=columns)
toronto_data

# populate dataframe with data from dictionary
for ind, postal_code in enumerate(postal_codes_dict):
    borough = postal_codes_dict[postal_code]['borough']
    neighborhood = postal_codes_dict[postal_code]['neighborhoods']
    toronto_data = toronto_data.append({"PostalCode": postal_code, 
                                        "Borough": borough, 
                                        "Neighborhood": neighborhood},
                                        ignore_index=True)

# print number of rows of dataframe
toronto_data.shape[0]

103

Let's check how the data frame looks like.

In [20]:
toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park / Ontario Provincial Government,Queen's Park / Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Now we will use open source data from another source to get the geographical co-ordinates of each of the above neighborhoods and will put into another data frame. This will be useful for visualizations in later stages.

In [21]:
# load geographical coordinates
neighborhoods_geo = pd.read_csv("http://cocl.us/Geospatial_data", sep=",")

# renamed column 'Posta Code' -> 'PostalCode' for the merge
neighborhoods_geo.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
neighborhoods_geo

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


<b> Let's merge both the dataframes together </b>

In [22]:
toronto_data = pd.merge(toronto_data, neighborhoods_geo, how="left", on="PostalCode")
toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park / Ontario Provincial Government,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [23]:
new_neigborhoods = pd.merge(toronto_data, neighborhoods_geo, how="left", on="PostalCode")
new_neigborhoods
toronto_data1 = new_neigborhoods[new_neigborhoods['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_data1

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude_x,Longitude_x,Latitude_y,Longitude_y
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,43.650571,-79.384568
8,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,43.669005,-79.442259
9,M4J,East YorkEast Toronto,The Danforth East,43.685347,-79.338106,43.685347,-79.338106


In [24]:
print ('The dataset has {} boroughs and {} neighbourhoods'.format(len(toronto_data['Borough'].unique()), toronto_data.shape[0]))

The dataset has 15 boroughs and 103 neighbourhoods


#### Now we will visualize the data we have using folium library

In [25]:
from pandas.io.json import json_normalize

#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
#!pip install sklearn
from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library



#### First we will get the co-ordinates of Toronto using Geopy

In [26]:
my_address_code = 'Toronto, Canada'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(my_address_code)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto, Canada are 43.653963, -79.387207.


#### Now we will visualize the neighborhoods in folium map and add some cool pop-up markers from Font Awesome libraries

In [27]:
map_nbr = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(toronto_data, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
    location=[lat, lng],
    icon=folium.Icon(color="orange",icon="fa-check", prefix='fa')
).add_to(map_nbr) 
    
    #folium.CircleMarker(
        #[lat, lng],
        #radius=10,
        #popup=label,
        #color='blue',
        #fill=True,
        #fill_color='green',
        #fill_opacity=.1,
        #parse_html=False).add_to(map_toronto)  
    
map_nbr

<b> Lovely! Now we have neighborhood data of Toronto with us... We will now retrieve data for possible locations around CN Tower

Our centre area of interest is CN Tower and hence first we will get the co-ordinates of CN Tower using Geopy encoders

## 2.2 Candidates for suitable locations

In [28]:

address = 'CN Tower,Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6425637, -79.38708718320467.


In [29]:
toronto_center = location.latitude, location.longitude

#### Now let's create circled grid of candidates for suitable locations around CN Tower. These circular centres will be within 6 KMs from CN Tower and will be having 300 meters radius for each 

In [30]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Toronto center longitude={}, latitude={}'.format(toronto_center[1], toronto_center[0]))
x, y = lonlat_to_xy(toronto_center[1], toronto_center[0])
print('Toronro center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Toronto center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Toronto center longitude=-79.38708718320467, latitude=43.6425637
Toronro center UTM X=-5312224.210116627, Y=10508095.011275409
Toronto center longitude=-79.38708718320515, latitude=43.642563699999755


In [31]:
toronto_center_x, toronto_center_y = lonlat_to_xy(toronto_center[1], toronto_center[0]) 

k = math.sqrt(3) / 2 
x_min = toronto_center_x - 6000
x_step = 600
y_min = toronto_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(toronto_center_x, toronto_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


#### Let's now visualize the candidate centres and use folium popup marker for CN Tower in toronto

In [32]:
map_toronto = folium.Map(location=toronto_center, zoom_start=13)
folium.Marker(toronto_center, popup='CN Tower').add_to(map_toronto)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_toronto)
map_toronto

#### Now we will try to get the address for each of these locations and try to put it into a dataframe 

First we will use reverse geocoding to get the address of CN Tower

In [33]:

addr = geolocator.reverse(toronto_center)
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(toronto_center[0], toronto_center[1], addr))

Reverse geocoding check
-----------------------
Address of [43.6425637, -79.38708718320467] is: CN Tower, 301, Front Street West, Entertainment District, Spadina—Fort York, Old Toronto, Toronto, Golden Horseshoe, Ontario, M5V 2X3, Canada


##### Now we will retrieve the address of each of the location candidates

In [34]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = geolocator.reverse(lat, lon, timeout=10)
    if address is None:
        address = 'NO ADDRESS'
    addresses.append(addr)

Obtaining location addresses: 

In [35]:
df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"(CN Tower, 301, Front Street West, Entertainme...",43.635297,-79.336538,-5314024.0,10502380.0,5992.495307
1,"(CN Tower, 301, Front Street West, Entertainme...",43.639021,-79.3371,-5313424.0,10502380.0,5840.3767
2,"(CN Tower, 301, Front Street West, Entertainme...",43.642746,-79.337661,-5312824.0,10502380.0,5747.173218
3,"(CN Tower, 301, Front Street West, Entertainme...",43.646471,-79.338223,-5312224.0,10502380.0,5715.767665
4,"(CN Tower, 301, Front Street West, Entertainme...",43.650197,-79.338785,-5311624.0,10502380.0,5747.173218
5,"(CN Tower, 301, Front Street West, Entertainme...",43.653922,-79.339346,-5311024.0,10502380.0,5840.3767
6,"(CN Tower, 301, Front Street West, Entertainme...",43.657648,-79.339908,-5310424.0,10502380.0,5992.495307
7,"(CN Tower, 301, Front Street West, Entertainme...",43.629357,-79.340136,-5314924.0,10502900.0,5855.766389
8,"(CN Tower, 301, Front Street West, Entertainme...",43.633081,-79.340698,-5314324.0,10502900.0,5604.462508
9,"(CN Tower, 301, Front Street West, Entertainme...",43.636806,-79.34126,-5313724.0,10502900.0,5408.326913


#### Lookig great. Now we have locations of candidate centres and their distance from CN Tower . Now we will save this data to a local file

In [37]:
df_locations.to_pickle('./locations.pkl')

#### So far we collected data for

1. Neighborhoods in toronto city
2. Location candidates for the restaurant around CN Tower

#### No we will collect the restaurant data for all and chinese restaurants using foursquare

## 2.3 Restaurant Data

In [39]:
CLIENT_ID='3E214RZFYLX4M0J3H15XYL1HQ2OE1YRTL4RHUDLHE1Q3PRYY'
CLIENT_SECRET='HHJFSM0QZIGZ3DNSYAJU0GLDWTPTZU24YHLNVV1LIURRGI23'
VERSION='20180605'

In [40]:
# Category IDs corresponding to Chinese restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

chinese_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [41]:
foursquare_client_id = '3E214RZFYLX4M0J3H15XYL1HQ2OE1YRTL4RHUDLHE1Q3PRYY'
foursquare_client_secret = 'HHJFSM0QZIGZ3DNSYAJU0GLDWTPTZU24YHLNVV1LIURRGI23'

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    chinese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_chinese = is_restaurant(venue_categories, specific_filter=chinese_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_chinese, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_chinese:
                    chinese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, chinese_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
chinese_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('chinese_restaurants_350.pkl', 'rb') as f:
        chinese_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, chinese_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('chinese_restaurants_350.pkl', 'wb') as f:
        pickle.dump(chinese_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [42]:
print('Total number of restaurants:', len(restaurants))
print('Total number of chinese restaurants:', len(chinese_restaurants))
print('Percentage of chinese restaurants: {:.2f}%'.format(len(chinese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 1135
Total number of chinese restaurants: 104
Percentage of chinese restaurants: 9.16%
Average number of restaurants in neighborhood: 5.035714285714286


<b>Now we will get the list of all the restaurants and save it into a dataframe</b>

In [43]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4f6d1982e4b062083139cc8a', 'Village of Thai Peterborough', 43.654336, -79.339295, 'Peterborough, Canada', 46, False, -5310958.9956449745, 10502366.095392086)
('539a1eff498e1bf8518e4eb5', 'Food Dudes Pantry', 43.65630014845779, -79.3382027627999, '24 Carlaw Ave (at Lake Shore Blvd E), Toronto ON M4M 2R7, Canada', 280, False, -5310660.2536905715, 10502205.67661057)
('4bc1df994cdfc9b6a3229521', "Gale's Snack Bar", 43.658239, -79.339077, '539 Eastern Ave (Carlaw Ave), Toronto ON, Canada', 93, False, -5310340.672131852, 10502272.900043106)
('4ce55a695bf68cfa81633c17', 'The Logan Lodge', 43.65849357683422, -79.34146459721917, '167 Logan Ave, Toronto ON, Canada', 156, False, -5310269.953141994, 10502544.29399832)
('519d414a498e6cc21f677ff4', 'Tabule', 43.65973096534363, -79.3463408134466, '810 Queen St East, Toronto ON M4M 1H7, Canada', 264, False, -5310011.315256245, 10503086.021006016)
('5ba55b9ae0c0c9002c29d056', 'EAT BKK Thai Kitchen', 43.

In [44]:
print('List of chinese restaurants')
print('---------------------------')
for r in list(chinese_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(chinese_restaurants))

List of chinese restaurants
---------------------------
('4ad8b99df964a5200b1421e3', "Lil' Baci", 43.6605116572649, -79.34315072724176, '892 Queen St. E (at Logan Ave.), Toronto ON M4M 1J3, Canada', 311, True, -5309927.496701752, 10502703.891979454)
('5ad12ca2345cbe490eb125e5', 'Frankie’s Italian', 43.660411, -79.343097, '892 Queen Street West, Toronto ON M4M 1J3, Canada', 323, True, -5309944.192957075, 10502699.44081489)
('579f0363498ed2eec32ab828', 'Il Ponte Cucina Italiana', 43.65785164410602, -79.35296058654784, '625 Queen St East, Toronto ON M4M 1G7, Canada', 80, True, -5310226.408227068, 10503883.560613222)
('4ac3e6cef964a520629d20e3', 'Archeo', 43.65066723014277, -79.35943064816142, '31 Trinity St., Toronto ON M5A 3C4, Canada', 154, True, -5311287.439623134, 10504756.758828964)
('5d4f546e0e971c0007dd24ff', 'Vicino Italian Kitchen', 43.659291, -79.360015, '146 Sumach St (St David St.), Toronto ON M5A 0P7, Canada', 160, True, -5309907.803621248, 10504673.294441462)
('4dcc6f4445dd8

In [45]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: 
Restaurants around location 102: 
Restaurants around location 103: 
Restaurants around location 104: 
Restaurants around location 105: Tides
Restaurants around location 106: 
Restaurants around location 107: Market Street Catch, Buster's Sea Cove, Paddington's Pump, European Delight, True True Diner, Bombay Palace, St. Lawrence Pizza and Pasta, Ardo
Restaurants around location 108: Mystic Muffin, The George Street Diner, Fusaro's, Schnitzel Queen, GEORGE Restaurant, Ueno Sushi Deli Cafe, Olympos / crow-bar
Restaurants around location 109: King's Place647-352-0786, Citrus, Casey's Outdoor Patio
Restaurants around location 110: King's Place647-352-0786


#### As a last step in the data section we will visualize the restaurants in a folium map and distinguish chinese restaurants 

In [46]:
map_toronto = folium.Map(location=toronto_center, zoom_start=13)
folium.Marker(toronto_center, popup='CN Tower').add_to(map_toronto)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_chinese = res[6]
    color = 'red' if is_chinese else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_toronto)
map_toronto

#### Good! 

## Now we have data for the following in different dataframes

1. Toronto neighborhood data
2. Candidate locations around CN Tower
3. Restaurant data of all and chinese restaurants 

In the coming week using these dataframes, we will identifying suitable locations for starting a chinese restaurant around CN Tower and other places of interest. 