
# Soroush Rasti
### 27 May, 2020
# Which part of Toronto is more suitable for your opening a restaurant


# Introduction

Many people are liking to start their business smartly in a good place of such a big city like Toronto. But most of them failed to that simply because they do not have an accurate and detailed information about population, cultural decomposition and many other factors of this big city.

People who  want to open a restaurant might also be reluctant to do because of all the competitions. This project will be a help for business owners who are planning to open a restaurant. The key problem in this project is which neighbourhood is suitable for a specific style restaurant that he/she going to open. To provide the best solution for this, several factors should be taken into account.

1- Which neighbourhoods of Toronto, have the most that specific restaurants. This will guide the business owner which area has the higher competition and should be avoided.

2- From the restaurants in each neighbourhood, which style they are. The will help the business owner to decide the best area for her/his restaurant. For example if he/she wants to open a Mexican restaurant, then it is better  to open it in a neighbourhood with fewer Mexican restaurants.

3- What buildings are more in that neighbourhood? Are there a lot of commercial, educational or amusement places around? Areas with these types of places can help to attract more customers and making an ideal location.

4- Culture of neighbourhood is also very important. Some places has a more Mexican population and culture which makes an important difference to open a Mexico resturant in that area or not

By considering all of these factors the business owner should have some ideas that which area would be the most ideal a considering restaurant.

The  audience of this report is who likes to open a restaurant in Toronto


#  Description and Data


At first I get a list of neighbourhoods in Toronto from the Wikipedia webpage: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. 
Transforming this table to a data frame makes the analysis a lot easier.
The Foursquare API was used to acquire number of venues belonging to each postal code and neighborhood of Toronto. In addition using the Foursquare API, I will get the location data, like latitudes and longitudes of each venue. Based on that I can easily show you the location of each venue on a map.

After that, with the power of data analysis packages like Pandas, I will sort out the venue and restaurants lists and gave you some basics information about the frequency of each of them in the Toronto city. Interestingly, using that I will make the city into five zones based on the frequencies of each venue and restaurants category. 
The data sorted by restaurant type can show the prominent style in each neighbourhood. Information such as venue type and frequency of each venue was used to determine if the venues nearby are more likely to attract customers.
Then you can very easily see how many percents of each venue belongs to which city zone and based on that you will understand frequency of each venue was used to determine which neighbourhood has a higher potential for attracting customers. This tremendously would help to decide wisely where you can start your restaurant. 

#  Methodology



### In this part, All the necessary data are uploaded from the website and API Foursquare

In [29]:
import requests
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

import geocoder
from IPython.display import Image

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
import time
import folium
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans


In [30]:
df = pd.read_csv('toronto_geo.csv')
df = df[df['PostalCode'] != 'M7R']
df = df.reset_index(drop=True)

borough = df.loc[0, 'Borough']
latitude = df.loc[0, 'Latitude']
longitude = df.loc[0, 'Longitude']
NEAR = df.loc[0, 'PostalCode']
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
def getNearbyVenues(borough, NEAR, Neighbourhood):
    
    venues_list=[]
    for borough, Neighbourhood, NEAR in zip(borough, Neighbourhood, NEAR):
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            NEAR,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()
        results = results['response']['groups'][0]['items']
#        time.sleep(1)
        
        venues_list.append([(
            NEAR,
            borough, 
            Neighbourhood,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        print('Finished parsing: ',Neighbourhood)


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code',
                             'Borough',
                             'Neighbourhood', 
                             'Venue',
                             'Venue Latitude',
                             'Venue Longitude',
                             'Venue Category']

    return(nearby_venues)    



In [31]:
# Due to privacy the CLIENT_ID and CLIENT_SECRET is not shown
CLIENT_ID = ''
CLIENT_SECRET = ''
VERSION = '20180604'

borough = df['Borough']
NEAR = df['PostalCode']
Neighbourhood = df['Neighbourhood']
LIMIT=500
toronto_venues = getNearbyVenues(borough=borough, NEAR=NEAR, Neighbourhood=Neighbourhood)


('Finished parsing: ', 'Rouge, Malvern')
('Finished parsing: ', 'Highland Creek, Rouge Hill, Port Union')
('Finished parsing: ', 'Guildwood, Morningside, West Hill')
('Finished parsing: ', 'Woburn')
('Finished parsing: ', 'Cedarbrae')
('Finished parsing: ', 'Scarborough Village')
('Finished parsing: ', 'East Birchmount Park, Ionview, Kennedy Park')
('Finished parsing: ', 'Clairlea, Golden Mile, Oakridge')
('Finished parsing: ', 'Cliffcrest, Cliffside, Scarborough Village West')
('Finished parsing: ', 'Birch Cliff, Cliffside West')
('Finished parsing: ', 'Dorset Park, Scarborough Town Centre, Wexford Heights')
('Finished parsing: ', 'Maryvale, Wexford')
('Finished parsing: ', 'Agincourt')
('Finished parsing: ', "Clarks Corners, Sullivan, Tam O'Shanter")
('Finished parsing: ', "Agincourt North, L'Amoreaux East, Milliken, Steeles East")
('Finished parsing: ', "L'Amoreaux West")
('Finished parsing: ', 'Upper Rouge')
('Finished parsing: ', 'Hillcrest Village')
('Finished parsing: ', 'Fairvi

### In this part the analysis get started

In [32]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()


In [33]:
### You can see in below the average of frequency of each building for every specific Neighbourhood

In [34]:
toronto_grouped.head()

Unnamed: 0,Neighbourhood,ATM,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bike Shop,Bistro,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Business Service,Butcher,Café,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Casino,Castle,Caucasian Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Quad,College Rec Center,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hong Kong Restaurant,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Leather Goods Store,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,National Park,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoor Supply Store,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pide Place,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stationery Store,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.14,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.010526,0.0,0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042105,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.031579,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.136842,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.010526,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031579,0.0,0.010526,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.010526,0.010526,0.0,0.0,0.0,0.010526,0.0,0.042105,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042105,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.042105,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.010526,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.04,0.03,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0


In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    
num_top_venues = 10

#indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
        columns.append('Most Common {}'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)
     

### In the below you can see the most common places at each neiborhood

In [36]:
neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,Most Common 1,Most Common 2,Most Common 3,Most Common 4,Most Common 5,Most Common 6,Most Common 7,Most Common 8,Most Common 9,Most Common 10
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Restaurant,Hotel,Burger Joint,Gastropub,Sushi Restaurant,Bar,Asian Restaurant
1,Agincourt,Chinese Restaurant,Coffee Shop,Supermarket,Indian Restaurant,Caribbean Restaurant,Restaurant,Pharmacy,Bookstore,Fast Food Restaurant,Breakfast Spot
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Chinese Restaurant,Vietnamese Restaurant,Bakery,Sandwich Place,Bubble Tea Shop,Supermarket,Indian Restaurant,Tea Room,Dessert Shop,Noodle House
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Coffee Shop,Fast Food Restaurant,Italian Restaurant,Asian Restaurant,Pharmacy,Sandwich Place,Grocery Store,Chinese Restaurant,Caribbean Restaurant,Fried Chicken Joint
4,"Alderwood, Long Branch",Burger Joint,Coffee Shop,Furniture / Home Store,Grocery Store,Middle Eastern Restaurant,Café,Park,Bakery,Seafood Restaurant,Burrito Place


### Now I started clustring the city based on the most commom venues at each neighborhood  into five classes

In [37]:
from sklearn.cluster import KMeans
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10] 
neighbourhoods_venues_sorted.insert(0, 'Cluster', kmeans.labels_)
toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

# Results

### In the below I put label for each clustring class

In [38]:
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster,Most Common 1,Most Common 2,Most Common 3,Most Common 4,Most Common 5,Most Common 6,Most Common 7,Most Common 8,Most Common 9,Most Common 10
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,3,Zoo Exhibit,Breakfast Spot,Indian Restaurant,Fast Food Restaurant,Sandwich Place,Pizza Place,Coffee Shop,Gas Station,Pharmacy,Liquor Store
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,3,Zoo Exhibit,Pharmacy,Park,Coffee Shop,Fast Food Restaurant,Pizza Place,Mexican Restaurant,Liquor Store,Beer Store,Breakfast Spot
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,3,Coffee Shop,Pharmacy,Park,Fast Food Restaurant,Pizza Place,Indian Restaurant,Bank,Beer Store,Breakfast Spot,Pub
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3,Coffee Shop,Indian Restaurant,Clothing Store,Pharmacy,Gas Station,Park,Bank,Pizza Place,Sandwich Place,Fried Chicken Joint
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,3,Coffee Shop,Indian Restaurant,Chinese Restaurant,Pharmacy,Fast Food Restaurant,Caribbean Restaurant,Clothing Store,Bakery,Sandwich Place,Supermarket



### In the below map, you can see how the neighborhood of each class are distributed on the map at one specific color

In [40]:
toronto_location = [43.64, -79.381499]

map_clusters = folium.Map(location=toronto_location, zoom_start=10.4)


x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Discussion

### Now the analysis for solving the problem is started

In [41]:
def perc(cluster):
    cluster_counts = pd.DataFrame()
    for i in range(1,11):
        col = cluster['Most Common {}'.format(i)]
        freq = col.value_counts()
        cluster_counts = cluster_counts.append(freq)

    cluster_counts = cluster_counts.transpose()

    cluster_counts['Total Frequency'] = cluster_counts.sum(axis=1, skipna=True)
    cluster_counts = cluster_counts.sort_values(by='Total Frequency', ascending=False)
    cluster_counts['Percentage'] =(cluster_counts['Total Frequency'] / cluster_counts['Total Frequency'].sum()) * 100
    return cluster_counts

### Cluster 0

In [42]:
cluster_0 = toronto_merged[toronto_merged['Cluster'] == 0].drop(['PostalCode','Borough', 'Latitude', 'Longitude'],axis=1)
cluster_0_counts = perc(cluster_0)
cluster_0_totals = cluster_0_counts['Percentage'].to_frame()
cluster_0_totals.rename(columns={'Percentage' : 'Cluster 0'}, inplace=True)
cluster_0_totals.head(10)

Unnamed: 0,Cluster 0
Chinese Restaurant,10.0
Bakery,8.75
Coffee Shop,8.75
Caribbean Restaurant,6.25
Fast Food Restaurant,5.0
Pharmacy,5.0
Supermarket,5.0
Restaurant,5.0
Noodle House,3.75
Clothing Store,3.75


### Cluster 1

In [43]:
cluster_1 = toronto_merged[toronto_merged['Cluster'] == 1].drop(['PostalCode','Borough', 'Latitude', 'Longitude'],axis=1)
cluster_1_counts = perc(cluster_1)
cluster_1_totals = cluster_1_counts['Percentage'].to_frame()
cluster_1_totals.rename(columns={'Percentage' : 'Cluster 1'}, inplace=True)
cluster_1_totals.head(10)

Unnamed: 0,Cluster 1
Coffee Shop,10.0
Italian Restaurant,10.0
Café,9.411765
Bakery,8.235294
Sushi Restaurant,7.647059
Park,4.705882
Grocery Store,3.529412
Pizza Place,2.941176
Dessert Shop,2.352941
Indian Restaurant,2.352941


### Cluster 2

In [44]:
cluster_2 = toronto_merged[toronto_merged['Cluster'] == 2].drop(['PostalCode','Borough', 'Latitude', 'Longitude'],axis=1)
cluster_2_counts = perc(cluster_2)
cluster_2_totals = cluster_2_counts['Percentage'].to_frame()
cluster_2_totals.rename(columns={'Percentage' : 'Cluster 2'}, inplace=True)
cluster_2_totals.head(10)

Unnamed: 0,Cluster 2
Café,10.0
Coffee Shop,9.333333
Bakery,7.333333
Restaurant,6.666667
Bar,6.0
Italian Restaurant,5.0
Park,5.0
Gastropub,4.666667
Hotel,3.333333
Pub,2.666667


### Cluster 3

In [45]:
cluster_3 = toronto_merged[toronto_merged['Cluster'] == 3].drop(['PostalCode','Borough', 'Latitude', 'Longitude'],axis=1)
cluster_3_counts = perc(cluster_3)
cluster_3_totals = cluster_3_counts['Percentage'].to_frame()
cluster_3_totals.rename(columns={'Percentage' : 'Cluster 3'}, inplace=True)
cluster_3_totals.head(10)

Unnamed: 0,Cluster 3
Coffee Shop,10.0
Pharmacy,8.636364
Sandwich Place,8.181818
Fast Food Restaurant,6.818182
Pizza Place,5.909091
Bank,5.454545
Chinese Restaurant,4.090909
Grocery Store,3.181818
Indian Restaurant,3.181818
Hotel,2.727273


### Cluster 4

In [46]:
cluster_4 = toronto_merged[toronto_merged['Cluster'] == 4].drop(['PostalCode','Borough', 'Latitude', 'Longitude'],axis=1)
cluster_4_counts = perc(cluster_4)
cluster_4_totals = cluster_4_counts['Percentage'].to_frame()
cluster_4_totals.rename(columns={'Percentage' : 'Cluster 4'}, inplace=True)
cluster_4_totals.head(10)

Unnamed: 0,Cluster 4
Coffee Shop,10.0
Grocery Store,5.6
Restaurant,4.8
Burger Joint,4.4
Middle Eastern Restaurant,4.0
Park,4.0
Supermarket,4.0
Bakery,4.0
Clothing Store,3.6
Gym / Fitness Center,3.2


### All the clusters

In [47]:
all_clusters = pd.DataFrame()
all_clusters = pd.concat([cluster_0_totals, cluster_1_totals, cluster_2_totals, cluster_3_totals, cluster_4_totals], axis=1, sort=False)
all_clusters['Total'] = all_clusters.sum(axis=1)
all_clusters = all_clusters.sort_values(by='Total', ascending=False)
all_clusters

Unnamed: 0,Cluster 0,Cluster 1,Cluster 2,Cluster 3,Cluster 4,Total
Coffee Shop,8.75,10.0,9.333333,10.0,10.0,48.083333
Bakery,8.75,8.235294,7.333333,2.272727,4.0,30.591355
Café,,9.411765,10.0,,2.8,22.211765
Restaurant,5.0,1.764706,6.666667,1.363636,4.8,19.595009
Italian Restaurant,,10.0,5.0,1.363636,2.0,18.363636
Pharmacy,5.0,1.176471,,8.636364,2.0,16.812834
Chinese Restaurant,10.0,0.588235,0.333333,4.090909,1.6,16.612478
Park,,4.705882,5.0,2.272727,4.0,15.97861
Fast Food Restaurant,5.0,1.176471,,6.818182,2.8,15.794652
Sandwich Place,1.25,1.176471,0.666667,8.181818,2.8,14.074955


## Conclusion
Coffee shops are in top-2 in all clusters. So it's difficult to differentiate clusters by Coffee Shops. I will skip them in clusters descriptions.

### Cluster 0. "Residential. Family"  
Places where people live. Dominated by grocery Stores, various foodservice establishments , parks, bakeries, stores. Looks like a place for families, due to lack of bars (can be seen from the last all_clusters spreadsheet).
### Cluster 1. "Residential. Younger Population"
There are plenty of Italian-theme venues - pizza places, Italian restaurants, etc (highest concentration of all other clusters).; There are also plenty of parks and supermarkets. This cluster is also stand out from the point of cusine variety. There is a highest (among other clusters) concentration of sushi, thai, mexican venues, as well as ice-cream shops. There are plenty bars and sporting goods shops  
### Cluster 2. "Mixed Residential/Work. Multi-cultural"
Dominated by pharmacies, banks, various mid-scale food establishments as well as Chinese, Indian cuisines. Seems like the dominating population are immigrants. Highest among other clusters concentration of sandwich places, banks. There is also highest concentration of liquor stores, beer stores. This are is also the only cluster where various Zoo Exhibits are present
### Cluster 3. "Downtown. Business, Fast-paced living"
Even looking at the map we can see that our clustering algorithm placed most of the neighborhoods around Toronto downtown. Hotels, banks, parks. Interestingly, this is one of two cluster where Coffee Shops are not the most frequent venues. 
### Cluster 4. "Residential. Asian majority (?)"
Most frequent location - Chinese restaurant. Highest concentration of Tea rooms, Bookstores, Japanese restaurants, from all other clusters. Amazingly, there are no sushi venues.