## COURSERA CAPSTONE 
## THE BATTLE OF CANADIAN NEIGHBORHOODS WEEK 2

### Applied Data Science Capstone by IBM/COURSERA

## TABLE OF CONTENTS :
### 1.  [Introduction: Business Problem](#introduction)
### 2. [Data](#data)
### 3.  [Methodology](#methodology)
### 4.  [Analysis](#analysis)
### 5.  [Results and Discussion](#results)
### 6.  [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

### In this project we will try to discover the Neighborhoods in City of Toronto region. The objectives of this project will be to :
### [1. To use Foursquare API to get the recommending  restaurant location after passing the parameters of search query and location](#BusinessProblem1)
### [2. To rank the neighborhoods of toronto region based on the frequency of the venue categories in a particular location and clustering them using KMeans](#BusinessProblem2) 
### [3. To find the neighborhoods that have utmost one Pizza Place as the venue category and identifying those potential neighborhoods](#BusinessProblem3)

## DATA <a name="data"></a>

### Based on the Business Problem the following factors will affect:

### * number of existing restuarants in the neighborhoods
### * frequency of the venue categories in a particular neighborhoods

###  The following sources will be used for the data: 
### * We plan to use Google API's geocode API to get coordinates for a particular location and then use search endpoint of foursquare API and pass the query and coordinates to get the relevant data
### * We plan to scrape the list of neighborhoods in toronto region by using BeautifulSoup4 and use those neighborhoods location name to get the coordinates of these location by using geocode API
### * We plan to identify the neighborhoods with potential to open more Pizza Places

In [444]:
# importing all the necessary modules
import requests 

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup
from urllib.request import urlopen

import folium

## Business Problem 1 <a name="BusinessProblem1" ></a>

### we make a function that take api key, address name as input and returns the latitude and longitude for a particular location.

In [445]:
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        print(results[0])
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None ]    

### we make a random call to get the coordinates for location Bay Street, Toronto

In [459]:
coordinates = get_coordinates(api_key='AIzaSyBGfEu4-AzbO2mhOYFhRSTvCmgK6b9VUqo', address = 'Bay Street, Toronto')

{'place_id': 'ChIJ34IoW8o0K4gRSgpqWe8kLhA', 'geometry': {'location_type': 'GEOMETRIC_CENTER', 'viewport': {'southwest': {'lat': 43.6409476, 'lng': -79.391605}, 'northeast': {'lat': 43.6737658, 'lng': -79.37612179999999}}, 'location': {'lat': 43.65729839999999, 'lng': -79.38436449999999}, 'bounds': {'southwest': {'lat': 43.6409476, 'lng': -79.391605}, 'northeast': {'lat': 43.6737658, 'lng': -79.37612179999999}}}, 'types': ['route'], 'address_components': [{'types': ['route'], 'long_name': 'Bay Street', 'short_name': 'Bay St'}, {'types': ['political', 'sublocality', 'sublocality_level_1'], 'long_name': 'Old Toronto', 'short_name': 'Old Toronto'}, {'types': ['locality', 'political'], 'long_name': 'Toronto', 'short_name': 'Toronto'}, {'types': ['administrative_area_level_2', 'political'], 'long_name': 'Toronto Division', 'short_name': 'Toronto Division'}, {'types': ['administrative_area_level_1', 'political'], 'long_name': 'Ontario', 'short_name': 'ON'}, {'types': ['country', 'political'],

In [460]:
print("coordinates for Bay Street, Toronto are {}".format(coordinates))

coordinates for Bay Street, Toronto are [43.65729839999999, -79.38436449999999]


In [461]:
# plotting the map for the location of Long Branch, Toronto
import folium

toronto = folium.Map(location=coordinates, zoom_start=12)
toronto

### Now we use Foursquare API to get the nearby sushi restaurants near Bay Street, Toronto

In [463]:
CLIENT_ID='HEL3APBYSYIE4X4Y2S1RGKECXNJELZX3SDULSUPPLEXVFJ0E'
CLIENT_SECRET='JESGRQE1GDCRGUDQMZBODK0RLMLBCMBWYP2RMHCD5KMTZHFD'
radius=500
LIMIT=50
VERSION='20190915'
neighborhood_latitude=coordinates[0]
neighborhood_longitude=coordinates[1]
import requests
url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&query={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius,
    'sushi',
    LIMIT)

In [471]:
results=requests.get(url).json()
results['response']['venues']

[{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/sushi_',
     'suffix': '.png'},
    'id': '4bf58dd8d48988d1d2941735',
    'name': 'Sushi Restaurant',
    'pluralName': 'Sushi Restaurants',
    'primary': True,
    'shortName': 'Sushi'}],
  'hasPerk': False,
  'id': '4b464cd6f964a520f11c26e3',
  'location': {'address': '220 Yonge St.',
   'cc': 'CA',
   'city': 'Toronto',
   'country': 'Canada',
   'crossStreet': 'in Urban Eatery, Toronto Eaton Centre',
   'distance': 398,
   'formattedAddress': ['220 Yonge St. (in Urban Eatery, Toronto Eaton Centre)',
    'Toronto ON M5B 2H6',
    'Canada'],
   'labeledLatLngs': [{'label': 'display',
     'lat': 43.65480108229508,
     'lng': -79.3808126449585}],
   'lat': 43.65480108229508,
   'lng': -79.3808126449585,
   'postalCode': 'M5B 2H6',
   'state': 'ON'},
  'name': 'Sushi-Q',
  'referralId': 'v-1569204422'},
 {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/sushi_',
     'suf

In [472]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [473]:
from pandas.io.json import json_normalize 
venues = results['response']['venues']
    
nearby_sushi_venues = json_normalize(venues) # flatten JSON
filtered_columns=['location.lat', 'location.lng', 'name','location.address']
nearby_sushi_venues = nearby_sushi_venues.loc[:,filtered_columns]

nearby_sushi_venues

Unnamed: 0,location.lat,location.lng,name,location.address
0,43.654801,-79.380813,Sushi-Q,220 Yonge St.
1,43.661385,-79.38158,Daily Sushi,20 Carlton St.
2,43.657466,-79.380957,Tatami Sushi,335A Yonge St.
3,43.655031,-79.386724,Kathy's Sushi and Bento,187 Dundas St W
4,43.659092,-79.382789,Sushi & bbbop,384 Yonge St #57
5,43.656253,-79.38066,Spring Sushi,10 Dundas St. E
6,43.656709,-79.38092,Rolltation Sushi Burrito,321 Yonge St
7,43.65708,-79.381587,Mi’hito Sushi Laboratory,4 Edward St
8,43.658691,-79.388551,Sushi,2 Bloor St W Cumberland Terrace
9,43.655067,-79.386732,Torch Sushi & Bento,187 Dundas Street West


In [475]:
nearby_sushi_venues.dropna(inplace=True)
nearby_sushi_venues.rename(columns={'location.lat':'Latitude', 'location.lng':'Longitude','name':'Name','location.address':'Address'}, inplace=True)

In [478]:
latitude = coordinates[0]
longitude = coordinates[1]
bay_street_toronto = folium.Map(location=[latitude, longitude], zoom_start=16)

for lat, lng, name, addr in zip(nearby_sushi_venues['Latitude'], nearby_sushi_venues['Longitude'], nearby_sushi_venues['Name'], nearby_sushi_venues['Address']):
    label = '{}, {}'.format(addr, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(bay_street_toronto)  
    
bay_street_toronto

## Business Problem 2 <a name = 'BusinessProblem2'> </a>

### we scrape the webpage of city of toronto to get the names of all the neighborhoods in the toronto region

In [479]:
page = urlopen(r'https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/neighbourhood-profiles/')

In [480]:
bsobj = BeautifulSoup(page, 'lxml')

In [481]:
obj = bsobj.find_all('area')
obj[0]['alt']

'Eringate-Centennial-West Deane'

In [482]:
Neighborhood=[]
Coordinates=[]

for i in bsobj.find_all('area'):
    Neighborhood.append(i['alt'])

for j in Neighborhood:
    coord = get_coordinates(api_key='AIzaSyBGfEu4-AzbO2mhOYFhRSTvCmgK6b9VUqo', address = '{}, Toronto'.format(j))
    Coordinates.append(coord)

neighborhoods = pd.DataFrame({
        'Neighborhood':Neighborhood,
        'Coordinates':Coordinates
    })
neighborhoods

{'place_id': 'ChIJNeSfcjU4K4gRyqKTftFEFto', 'geometry': {'location_type': 'APPROXIMATE', 'viewport': {'southwest': {'lat': 43.6378853, 'lng': -79.6087101}, 'northeast': {'lat': 43.674488, 'lng': -79.5551321}}, 'location': {'lat': 43.6599082, 'lng': -79.58331679999999}, 'bounds': {'southwest': {'lat': 43.6378853, 'lng': -79.6087101}, 'northeast': {'lat': 43.674488, 'lng': -79.5551321}}}, 'types': ['neighborhood', 'political'], 'address_components': [{'types': ['neighborhood', 'political'], 'long_name': 'Eringate - Centennial - West Deane', 'short_name': 'Eringate - Centennial - West Deane'}, {'types': ['political', 'sublocality', 'sublocality_level_1'], 'long_name': 'Etobicoke', 'short_name': 'Etobicoke'}, {'types': ['locality', 'political'], 'long_name': 'Toronto', 'short_name': 'Toronto'}, {'types': ['administrative_area_level_2', 'political'], 'long_name': 'Toronto Division', 'short_name': 'Toronto Division'}, {'types': ['administrative_area_level_1', 'political'], 'long_name': 'Onta

Unnamed: 0,Coordinates,Neighborhood
0,"[43.6599082, -79.58331679999999]",Eringate-Centennial-West Deane
1,"[43.6451146, -79.56877279999999]",Etobicoke West Mall
2,"[43.6335688, -79.570763]",Markland Wood
3,"[43.6017103, -79.5452384]",Alderwood
4,"[43.593421, -79.538164]",Long Branch
5,"[43.6309156, -79.5434841]",Islington-City Centre West
6,"[43.6688924, -79.5434841]",Princess-Rosethorn
7,"[43.7206347, -79.584248]",Willowridge-Martingrove-Richview
8,"[43.6628917, -79.39565640000001]",University
9,"[43.6599648, -79.4174767]",Palmerston-Little Italy


In [483]:
neighborhoods

Unnamed: 0,Coordinates,Neighborhood
0,"[43.6599082, -79.58331679999999]",Eringate-Centennial-West Deane
1,"[43.6451146, -79.56877279999999]",Etobicoke West Mall
2,"[43.6335688, -79.570763]",Markland Wood
3,"[43.6017103, -79.5452384]",Alderwood
4,"[43.593421, -79.538164]",Long Branch
5,"[43.6309156, -79.5434841]",Islington-City Centre West
6,"[43.6688924, -79.5434841]",Princess-Rosethorn
7,"[43.7206347, -79.584248]",Willowridge-Martingrove-Richview
8,"[43.6628917, -79.39565640000001]",University
9,"[43.6599648, -79.4174767]",Palmerston-Little Italy


In [484]:
# add the latitudes and longitudes in the neighborhoods dataframe
def unpacklat(list):
    return list[0]

def unpacklng(list):
    return list[1]
neighborhoods['Latitude']=neighborhoods['Coordinates'].apply(unpacklat)
neighborhoods['Longitude']=neighborhoods['Coordinates'].apply(unpacklng)

In [485]:
neighborhoods.head(10)

Unnamed: 0,Coordinates,Neighborhood,Latitude,Longitude
0,"[43.6599082, -79.58331679999999]",Eringate-Centennial-West Deane,43.659908,-79.583317
1,"[43.6451146, -79.56877279999999]",Etobicoke West Mall,43.645115,-79.568773
2,"[43.6335688, -79.570763]",Markland Wood,43.633569,-79.570763
3,"[43.6017103, -79.5452384]",Alderwood,43.60171,-79.545238
4,"[43.593421, -79.538164]",Long Branch,43.593421,-79.538164
5,"[43.6309156, -79.5434841]",Islington-City Centre West,43.630916,-79.543484
6,"[43.6688924, -79.5434841]",Princess-Rosethorn,43.668892,-79.543484
7,"[43.7206347, -79.584248]",Willowridge-Martingrove-Richview,43.720635,-79.584248
8,"[43.6628917, -79.39565640000001]",University,43.662892,-79.395656
9,"[43.6599648, -79.4174767]",Palmerston-Little Italy,43.659965,-79.417477


In [486]:
print('The dataframe has {} Neighborhoods from City of Toronto.'.format(
        len(neighborhoods['Neighborhood'].unique()),
    )
)

The dataframe has 140 Neighborhoods from City of Toronto.


### now we get the coordinates of toronto from geocode api and plot all these neighborhoods on the map

In [487]:
toronto_coordinates = get_coordinates(api_key='AIzaSyBGfEu4-AzbO2mhOYFhRSTvCmgK6b9VUqo', address = 'Toronto')

{'place_id': 'ChIJpTvG15DL1IkRd8S0KlBVNTI', 'geometry': {'location_type': 'APPROXIMATE', 'viewport': {'southwest': {'lat': 43.5810245, 'lng': -79.639219}, 'northeast': {'lat': 43.8554579, 'lng': -79.1168971}}, 'location': {'lat': 43.653226, 'lng': -79.3831843}, 'bounds': {'southwest': {'lat': 43.5810245, 'lng': -79.639219}, 'northeast': {'lat': 43.8554579, 'lng': -79.1168971}}}, 'types': ['locality', 'political'], 'address_components': [{'types': ['locality', 'political'], 'long_name': 'Toronto', 'short_name': 'Toronto'}, {'types': ['administrative_area_level_2', 'political'], 'long_name': 'Toronto Division', 'short_name': 'Toronto Division'}, {'types': ['administrative_area_level_1', 'political'], 'long_name': 'Ontario', 'short_name': 'ON'}, {'types': ['country', 'political'], 'long_name': 'Canada', 'short_name': 'CA'}], 'formatted_address': 'Toronto, ON, Canada'}


In [488]:
city_toronto = folium.Map(location=toronto_coordinates, zoom_start=11)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(city_toronto)  
    
city_toronto

## METHODOLOGY <a name="methodology"></a>

### In this area we plan to find areas in toronto neighborhoods with the frequency of venue categories and find those neighborhoods
### we use the mean value of venue categories occuring in a particular location and then analysiing the top 5
### we then apply KMeans methodology to cluster the neighborhoods 
### and finally list out the neighborhoods with utmost one Pizza Place

### Now we want to get the nearby venues in these neighborhoods

In [491]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [492]:
city_of_toronto_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Eringate-Centennial-West Deane
Etobicoke West Mall
Markland Wood
Alderwood
Long Branch
Islington-City Centre West
Princess-Rosethorn
Willowridge-Martingrove-Richview
University
Palmerston-Little Italy
Dufferin Grove
Roncesvalles
High Park-Swansea
Runnymede-Bloor West Village
Annex
Dovercourt-Wallace Emerson-Junction
High Park North
Junction Area
Weston-Pellam Park
Briar Hill-Belgravia
Wychwood
Casa Loma
Trinity-Bellwoods
Little Portugal
South Parkdale
Niagara
Waterfront Communities-The Island
South Riverdale
The Beaches
New Toronto
Mimico (includes Humber Bay Shores)
Stonegate-Queensway
Kingsway South
Edenbridge-Humber Valley
Kensington-Chinatown
Bay Street Corridor
Moss Park
Church-Yonge Corridor
Rosedale-Moore Park
North St. James Town
Regent Park
Cabbagetown-South St. James Town
North Riverdale
Broadview North
Blake Jones
Greenwood-Coxwell
Danforth
Woodbine Corridor
Lambton-Baby Point
East End-Danforth
Woodbine-Lumsden
Danforth-East York
Playter Estates-Danforth
Old East York
Yonge-

In [493]:
city_of_toronto_venues.shape

(4516, 7)

In [494]:
city_of_toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Eringate-Centennial-West Deane,43.659908,-79.583317,Centennial Park,43.656154,-79.58754,Park
1,Eringate-Centennial-West Deane,43.659908,-79.583317,Porta Via,43.663449,-79.589638,Sandwich Place
2,Eringate-Centennial-West Deane,43.659908,-79.583317,Mrakovic,43.666641,-79.57885,Eastern European Restaurant
3,Eringate-Centennial-West Deane,43.659908,-79.583317,HMV,43.662347,-79.590456,Record Shop
4,Eringate-Centennial-West Deane,43.659908,-79.583317,Pizza Pizza,43.660392,-79.582686,Pizza Place


In [558]:
toronto_grouped = city_of_toronto_venues.groupby('Neighborhood')['Venue'].count().to_frame().reset_index()

In [560]:
toronto_grouped.shape

(140, 2)

In [579]:
top_neighborhood = toronto_grouped[toronto_grouped['Venue']>45]

### 50 unique neighborhoods that has more than 45 unique venue categories

In [584]:
top_neighborhood.shape

(50, 2)

In [496]:
print('There are {} uniques categories.'.format(len(city_of_toronto_venues['Venue Category'].unique())))

There are 309 uniques categories.


In [497]:
# one hot encoding
toronto_onehot = pd.get_dummies(city_of_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = city_of_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Zoo Exhibit,Afghan Restaurant,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [498]:
toronto_onehot.shape

(4516, 309)

## ANALYSIS <a name="analysis"></a>

### In the analysis part we analyse the neighborhood and find the top 5 venue categories based on their frequencies and also we do the general analysis where we describe the top 10 most common venues in a neighborhood and then we can see the neighborhoods in a particular cluster and at last we can see the potential neighborhoods that has one pizza place count

In [499]:
toronto_onehot_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [500]:
toronto_onehot_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Afghan Restaurant,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt North,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.033333,0.000000,0.000000,0.0,0.000000,0.000000,0.033333,0.00,0.000000
1,Agincourt South-Malvern West,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.031250,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
2,Alderwood,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
3,Annex,0.0,0.000000,0.020000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.020000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
4,Banbury-Don Mills,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
5,Bathurst Manor,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
6,Bay Street Corridor,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.020000,0.00,0.020000
7,Bayview Village,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
8,Bayview Woods-Steeles,0.0,0.000000,0.000000,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000
9,Bedford Park-Nortown,0.0,0.000000,0.027778,0.0,0.0,0.000,0.00,0.000000,0.00,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.027778,0.027778,0.00,0.000000


In [501]:
toronto_onehot_grouped.shape

(140, 309)

In [502]:
num_top_venues = 5

for hood in toronto_onehot_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_onehot_grouped[toronto_onehot_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt North----
                venue  freq
0   Indian Restaurant  0.07
1         Coffee Shop  0.07
2      Ice Cream Shop  0.07
3  Chinese Restaurant  0.07
4   Convenience Store  0.07


----Agincourt South-Malvern West----
                  venue  freq
0    Chinese Restaurant  0.12
1  Cantonese Restaurant  0.09
2     Korean Restaurant  0.06
3  Fast Food Restaurant  0.06
4           Coffee Shop  0.06


----Alderwood----
               venue  freq
0     Discount Store  0.12
1           Pharmacy  0.12
2  Convenience Store  0.08
3        Pizza Place  0.08
4                Pub  0.04


----Annex----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.06
1                           Café  0.06
2                  Deli / Bodega  0.04
3                    Pizza Place  0.04
4                     Restaurant  0.04


----Banbury-Don Mills----
                 venue  freq
0  Japanese Restaurant  0.17
1       Baseball Field  0.08
2                 Café  0.08
3       Cosme

In [503]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [504]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_onehot_grouped['Neighborhood']

for ind in np.arange(toronto_onehot_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_onehot_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt North,Ice Cream Shop,Chinese Restaurant,Convenience Store,Coffee Shop,Indian Restaurant,Taco Place,Bank,Bakery,Clothing Store,Spa
1,Agincourt South-Malvern West,Chinese Restaurant,Cantonese Restaurant,Korean Restaurant,Fast Food Restaurant,Coffee Shop,Pet Store,Market,Beer Store,Liquor Store,Restaurant
2,Alderwood,Discount Store,Pharmacy,Convenience Store,Pizza Place,Shopping Mall,Dance Studio,Coffee Shop,Park,Grocery Store,Gym
3,Annex,Café,Vegetarian / Vegan Restaurant,Beer Bar,Pizza Place,Restaurant,Bakery,Deli / Bodega,Japanese Restaurant,Coffee Shop,Pool Hall
4,Banbury-Don Mills,Japanese Restaurant,Golf Course,Middle Eastern Restaurant,Baseball Field,Cosmetics Shop,Pharmacy,Gym / Fitness Center,Café,Caribbean Restaurant,Coffee Shop


In [505]:
neighborhoods_venues_sorted.shape

(140, 11)

In [506]:
from sklearn.cluster import KMeans

kclusters = 3

toronto_onehot_grouped_clustering = toronto_onehot_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_onehot_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 0, 0, 1, 0, 0, 1, 1])

In [507]:
neighborhoods.drop('Coordinates',1, inplace=True)

In [514]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
city_of_toronto_merged = neighborhoods_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

city_of_toronto_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Eringate-Centennial-West Deane,43.659908,-79.583317,1,Park,Baseball Field,Pub,Convenience Store,Hockey Arena,Pharmacy,Fast Food Restaurant,Sandwich Place,Eastern European Restaurant,Ski Chalet
1,Etobicoke West Mall,43.645115,-79.568773,1,Convenience Store,Pizza Place,Coffee Shop,Café,Farmers Market,Theater,Beer Store,Liquor Store,Mexican Restaurant,Shopping Plaza
2,Markland Wood,43.633569,-79.570763,1,Convenience Store,Coffee Shop,Italian Restaurant,Pharmacy,Discount Store,Donut Shop,Sandwich Place,Shipping Store,Fast Food Restaurant,Pizza Place
3,Alderwood,43.60171,-79.545238,1,Discount Store,Pharmacy,Convenience Store,Pizza Place,Shopping Mall,Dance Studio,Coffee Shop,Park,Grocery Store,Gym
4,Long Branch,43.593421,-79.538164,0,Café,Bar,Coffee Shop,Pharmacy,Grocery Store,Gym,Discount Store,Sandwich Place,Gas Station,Beer Store


In [515]:
city_of_toronto_merged.isnull().sum(axis=0)

Neighborhood              0
Latitude                  0
Longitude                 0
Cluster Labels            0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

In [516]:
city_of_toronto_merged.dropna(inplace=True)

In [517]:
city_of_toronto_merged.columns

Index(['Neighborhood', 'Latitude', 'Longitude', 'Cluster Labels',
       '1st Most Common Venue', '2nd Most Common Venue',
       '3rd Most Common Venue', '4th Most Common Venue',
       '5th Most Common Venue', '6th Most Common Venue',
       '7th Most Common Venue', '8th Most Common Venue',
       '9th Most Common Venue', '10th Most Common Venue'],
      dtype='object')

In [518]:
city_of_toronto_merged['Cluster Labels'] = city_of_toronto_merged['Cluster Labels'].astype(int)

In [519]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=coordinates, zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(city_of_toronto_merged['Latitude'], city_of_toronto_merged['Longitude'], city_of_toronto_merged['Neighborhood'], city_of_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [520]:
city_of_toronto_merged.loc[city_of_toronto_merged['Cluster Labels'] == 0, city_of_toronto_merged.columns[[0] + list(range(4, city_of_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Long Branch,Café,Bar,Coffee Shop,Pharmacy,Grocery Store,Gym,Discount Store,Sandwich Place,Gas Station,Beer Store
8,University,Coffee Shop,Vegetarian / Vegan Restaurant,Café,Restaurant,Japanese Restaurant,Bookstore,Ice Cream Shop,Italian Restaurant,Park,Music School
9,Palmerston-Little Italy,Korean Restaurant,Café,Dessert Shop,Coffee Shop,Pizza Place,Indian Restaurant,Tapas Restaurant,Taco Place,Cocktail Bar,Beer Bar
10,Dufferin Grove,Café,Coffee Shop,Italian Restaurant,Cocktail Bar,Vegetarian / Vegan Restaurant,Bakery,Comedy Club,Restaurant,Bar,Sports Bar
11,Roncesvalles,Coffee Shop,Restaurant,Bakery,American Restaurant,Café,Pizza Place,Gift Shop,Park,Sushi Restaurant,Eastern European Restaurant
14,Annex,Café,Vegetarian / Vegan Restaurant,Beer Bar,Pizza Place,Restaurant,Bakery,Deli / Bodega,Japanese Restaurant,Coffee Shop,Pool Hall
15,Dovercourt-Wallace Emerson-Junction,Café,Coffee Shop,Bar,Park,Pharmacy,Gourmet Shop,Gym,Italian Restaurant,Portuguese Restaurant,Supermarket
16,High Park North,Bar,Italian Restaurant,Sushi Restaurant,Café,Thai Restaurant,Bakery,Grocery Store,Coffee Shop,Flea Market,Mexican Restaurant
17,Junction Area,Italian Restaurant,Café,Bar,Coffee Shop,Nail Salon,Mexican Restaurant,Flea Market,Grocery Store,Arts & Crafts Store,Breakfast Spot
18,Weston-Pellam Park,Café,Italian Restaurant,Burger Joint,Seafood Restaurant,Bar,Thai Restaurant,Bakery,Mexican Restaurant,Gastropub,Vietnamese Restaurant


In [521]:
city_of_toronto_merged.loc[city_of_toronto_merged['Cluster Labels'] == 3, city_of_toronto_merged.columns[[0] + list(range(4, city_of_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [371]:
city_of_toronto_merged.loc[city_of_toronto_merged['Cluster Labels'] == 1, city_of_toronto_merged.columns[[0] + list(range(4, city_of_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Niagara,Park,Gym,Café,Historic Site,Light Rail Station,Trail,Arts & Crafts Store,Dog Run,Pizza Place,Dance Studio
33,Edenbridge-Humber Valley,Park,Skating Rink,Baseball Field,Women's Store,Eastern European Restaurant,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
48,Lambton-Baby Point,Park,Playground,Garden,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Donut Shop,Event Space
49,East End-Danforth,Park,Asian Restaurant,Flower Shop,Café,Hungarian Restaurant,Farmers Market,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant
71,Kingsview Village-The Westway,Park,Café,Women's Store,Dumpling Restaurant,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
75,Thistletown-Beaumond Heights,Park,Bakery,Skating Rink,Women's Store,Fish & Chips Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
77,Humber Summit,Pharmacy,Empanada Restaurant,Park,Bakery,Women's Store,Event Space,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
86,Clanton Park,Park,Bar,IT Services,Electronics Store,Elementary School,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Women's Store
90,Westminster-Branson,Pharmacy,Park,Skating Rink,Women's Store,Event Space,Eastern European Restaurant,Electronics Store,Elementary School,Empanada Restaurant,Ethiopian Restaurant
93,Willowdale West,Restaurant,Park,Women's Store,Event Space,Eastern European Restaurant,Electronics Store,Elementary School,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant


In [522]:
city_of_toronto_merged.loc[city_of_toronto_merged['Cluster Labels'] == 2, city_of_toronto_merged.columns[[0] + list(range(4, city_of_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Princess-Rosethorn,Playground,Golf Course,Garden,Yoga Studio,Farmers Market,Electronics Store,Ethiopian Restaurant,Event Service,Event Space,Falafel Restaurant


In [524]:
city_of_toronto_merged['Cluster Labels'].unique()

array([1, 0, 2], dtype=int64)

## Business Problem 3 <a name="BusinessProblem3"></a>

### Now we want to find the Pizza Places in the venue categories in city_of_toronto_venues

In [525]:
query=['Pizza Place']
pizza_place_toronto = city_of_toronto_venues[city_of_toronto_venues['Venue Category'].isin(query)]

In [526]:
pizza_place_toronto.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Eringate-Centennial-West Deane,43.659908,-79.583317,Pizza Pizza,43.660392,-79.582686,Pizza Place
1,Etobicoke West Mall,43.645115,-79.568773,Pizza Hut,43.641845,-79.576556,Pizza Place
2,Etobicoke West Mall,43.645115,-79.568773,Little Caesars Pizza,43.653209,-79.573385,Pizza Place
3,Markland Wood,43.633569,-79.570763,Pizzaville,43.631250,-79.575464,Pizza Place
4,Alderwood,43.601710,-79.545238,Il Paesano Pizzeria & Restaurant,43.601280,-79.545028,Pizza Place
5,Alderwood,43.601710,-79.545238,Pizza Pizza,43.605340,-79.547252,Pizza Place
6,Long Branch,43.593421,-79.538164,850 Degrees,43.595276,-79.529537,Pizza Place
7,Willowridge-Martingrove-Richview,43.720635,-79.584248,Domino's Pizza,43.719365,-79.594622,Pizza Place
8,University,43.662892,-79.395656,Pi Co.,43.670107,-79.389852,Pizza Place
9,Palmerston-Little Italy,43.659965,-79.417477,Apiecalypse Now! Vegan Pizza & Snack Bar,43.663399,-79.418651,Pizza Place


In [535]:
one_venue_pizza = pizza_place_toronto.groupby('Neighborhood')['Venue'].count().to_frame()

In [539]:
one_venue_pizza.reset_index(inplace=True)

In [543]:
q=[1]
one_venue_pizza = one_venue_pizza[one_venue_pizza['Venue'].isin(q)]

In [550]:
one_venue_pizza.reset_index(drop=True, inplace=True)

### Neighborhoods in toronto region with only one Venue Category of Pizza Place 

In [551]:
one_venue_pizza

Unnamed: 0,Neighborhood,Venue
0,Agincourt North,1
1,Agincourt South-Malvern West,1
2,Bathurst Manor,1
3,Bay Street Corridor,1
4,Bendale,1
5,Brookhaven-Amesbury,1
6,Cabbagetown-South St. James Town,1
7,Casa Loma,1
8,Centennial Scarborough,1
9,Church-Yonge Corridor,1


## RESULTS AND DISCUSSION <a name="results"></a>

### Using the geocode api we can pass the type of food that a user wants to try out in a particular location and we can output the recommending nearby places based on the search query
### Our results and findings shows that neighborhoods in toronto region has great variety when it comes different kinds of venue categories
### Also, we can further find which neighborhood is famous for what kind of venue category based on the frequency of occurence of that particular venue in the neighborhood
### we were also able to see the potential neighborhoods with utmost one Pizza Place as the category

## CONCLUSION <a name="conclusion"></a>

### Finally, our main objectives have been acheived. we were able to find the necessary results.
### Further work that can be done to improve is that making a web application and connecting this Model in the backend and serving all the necessary results in the frontend. can be pursued upon.

### At last, selecting a location for a particular restaurant for example Pizza Place in this case depends on lot of other factors such as land price, area, popuplation demographic in that area. 
### This project is an attemtp to take a step further and solve a problem with the help of technology and any product cannot be be perfect in one attempt of building it.  So, keep breaking and making things up fellas!!

### I thank Coursera and IBM team for bringing together all the course materials, organizing it and making all the assignments and everything. This course is fantastic for anyone who wants to learn about Data Science.