# Capstone Project - The Battle of the Neighborhoods

## Table of contents:

* [Section-1 Introduction:](#introduction)
* [Section-2 Data:](#data)
* [Section-3 Methodology:](#methodology)
* [Section-4 Analysis:](#analysis)
* [Section-5 Results and Discussion:](#results)
* [Section-6 Conclusion:](#conclusion)

##  Section-1 Introduction:

In this project we will try to find the best neighborhood in the financial capitals of 2 big North American countries, namely United States and Canada, for a restaurant investment.

Since both **New York and Toronto** are big multicultural cities, we will attempt at finding the **most optimal restaurant** that most people from the neighborhood are likely to go.

**Thus, the main goal of this project is to find the best restaurant type for financial investments and find out the most optimal neighborhood for it**.

## Section-2 Data:

##### Now that we have a clear goal in mind for this project, let us look at what data will be used for it:
* New York neighborhood data obtained from IBM cloud storage
* Toronto postal codes data that is scraped from Wikipedia
* Toronto neighborhood coordinates data obtained from IBM cloud storage
* Map data used via Folium library as well as finding coordinates of a given location via geopy library
* All the restaurant related data (names, location, frequency of customers, popularity) from Foursquare API

## Section-3 Methodology:


Following strategies will be used to acquire, pre-process, and clean the datasets:
* The New York city data will be obtained from the IBM cloud storage via a json file provided earlier in the course. The dataset will be converted from json into a pandas dataframe containing the neighborhood names as well as their latitude and longitude coordinates.
* The Toronto data would be acquired from two sources, the neighbordhood names will be scraped from Wikipedia page while the Coordinate data will be obtained from a csv file provided previously in this course.
* Foursquare API will be extensively used in order to find the top-10 most visited restaurants in each city which will then be loaded in seperate dataset for each city awaiting further analysis.

### Importing Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
from sklearn import preprocessing

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library

!pip install bs4
!pip install requests
from bs4 import BeautifulSoup # Webscrape data and load it into a dataframe
import requests # Make http requests

print('Libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Libraries imported.


### Loading New York Neighborhoods Data

In [2]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)


In [3]:
# Cleaning the data

ny_neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
ny_neighborhoods = pd.DataFrame(columns=column_names)
for data in ny_neighborhoods_data:
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
ny_neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Wakefield,40.894705,-73.847201
1,Co-op City,40.874294,-73.829939
2,Eastchester,40.887556,-73.827806
3,Fieldston,40.895437,-73.905643
4,Riverdale,40.890834,-73.912585
5,Kingsbridge,40.881687,-73.902818
6,Marble Hill,40.876551,-73.91066
7,Woodlawn,40.898273,-73.867315
8,Norwood,40.877224,-73.879391
9,Williamsbridge,40.881039,-73.857446


### Loading Toronto Neighborhoods Data

In [4]:
link = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(link,'html5lib')
table = soup.find('table')

In [5]:
table_contents=[]
for row in table.findAll('td'):
    cell={}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)
print(table_contents)

[{'PostalCode': 'M3A', 'Borough': 'North York', 'Neighborhood': 'Parkwoods'}, {'PostalCode': 'M4A', 'Borough': 'North York', 'Neighborhood': 'Victoria Village'}, {'PostalCode': 'M5A', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Regent Park, Harbourfront'}, {'PostalCode': 'M6A', 'Borough': 'North York', 'Neighborhood': 'Lawrence Manor, Lawrence Heights'}, {'PostalCode': 'M7A', 'Borough': "Queen's Park", 'Neighborhood': 'Ontario Provincial Government'}, {'PostalCode': 'M9A', 'Borough': 'Etobicoke', 'Neighborhood': 'Islington Avenue'}, {'PostalCode': 'M1B', 'Borough': 'Scarborough', 'Neighborhood': 'Malvern, Rouge'}, {'PostalCode': 'M3B', 'Borough': 'North York', 'Neighborhood': 'Don Mills North'}, {'PostalCode': 'M4B', 'Borough': 'East York', 'Neighborhood': 'Parkview Hill, Woodbine Gardens'}, {'PostalCode': 'M5B', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Garden District, Ryerson'}, {'PostalCode': 'M6B', 'Borough': 'North York', 'Neighborhood': 'Glencairn'}, {'PostalCode': 'M9

In [7]:
# Cleaning the data

tor_neighborhoods=pd.DataFrame(table_contents)
tor_neighborhoods['Borough']=tor_neighborhoods['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
tor_neighborhoods = tor_neighborhoods.drop(['Borough'], axis=1)
display(tor_neighborhoods)

Unnamed: 0,PostalCode,Neighborhood
0,M3A,Parkwoods
1,M4A,Victoria Village
2,M5A,"Regent Park, Harbourfront"
3,M6A,"Lawrence Manor, Lawrence Heights"
4,M7A,Ontario Provincial Government
5,M9A,Islington Avenue
6,M1B,"Malvern, Rouge"
7,M3B,Don Mills North
8,M4B,"Parkview Hill, Woodbine Gardens"
9,M5B,"Garden District, Ryerson"


#### Loading geospatial data from a csv file provided by IBM in this course

In [8]:
tor_geosp_data = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')
tor_geosp_data.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
tor_neighborhoods = pd.merge(tor_neighborhoods, tor_geosp_data, on='PostalCode')
tor_neighborhoods = tor_neighborhoods.drop(['PostalCode'], axis=1)
tor_neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Parkwoods,43.753259,-79.329656
1,Victoria Village,43.725882,-79.315572
2,"Regent Park, Harbourfront",43.65426,-79.360636
3,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,Ontario Provincial Government,43.662301,-79.389494
5,Islington Avenue,43.667856,-79.532242
6,"Malvern, Rouge",43.806686,-79.194353
7,Don Mills North,43.745906,-79.352188
8,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,"Garden District, Ryerson",43.657162,-79.378937


#### Finding lattitude and longitude of Toronto and New York City

In [9]:
address_ny = 'New York City, NY'
geolocator_ny = Nominatim(user_agent="ny_explorer")
location_ny = geolocator_ny.geocode(address_ny)
latitude_ny = location_ny.latitude
longitude_ny = location_ny.longitude


address_tor = 'Toronto, ONT'
geolocator_tor = Nominatim(user_agent="ont_explorer")
location_tor = geolocator_tor.geocode(address_tor)
latitude_tor = location_tor.latitude
longitude_tor = location_tor.longitude

print('The geographical coordinate of New York City are {}, {} and'.format(latitude_ny, longitude_ny))
print('the geographical coordinate of Toronto are {}, {}.'.format(latitude_tor, longitude_tor))

mid_lat = (latitude_ny + latitude_tor)/2
mid_long = (longitude_ny + longitude_tor)/2

The geographical coordinate of New York City are 40.7127281, -74.0060152 and
the geographical coordinate of Toronto are 43.678523999999996, -79.62912913064454.


### Visualizing the neighborhoods of New York City and Toronto on Map created via Folium library

In [10]:
# create map of New York using latitude and longitude values
map_newyork_toronto = folium.Map(location=[mid_lat, mid_long], zoom_start=5)

# add ny markers to map
for lat, lng, neighborhood in zip(ny_neighborhoods['Latitude'], ny_neighborhoods['Longitude'], ny_neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork_toronto)  

# add tor markers to map
for lat, lng, neighborhood in zip(tor_neighborhoods['Latitude'], tor_neighborhoods['Longitude'], tor_neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#00ff00',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork_toronto)

map_newyork_toronto

### Foursquare

#### Defining Foursquare Credentials

In [11]:
CLIENT_ID = 'X2TMWYD1XGUQ5S3TN5KLDFMRXTN3UD30JKFUTZ445AMTX5JR' # Foursquare ID
CLIENT_SECRET = 'PJ0JQ40GQE1XVRG2YOOMBZEYVPW2HQY1GX3AJR2HVNDJETAE' # Foursquare Secret
ACCESS_TOKEN = 'DTXZLU1BKF3I4HILBKRDYIPHE3TKCSUKNKFHUYURFETADW2M' # Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 10 # A default Foursquare API limit value
search_query = 'Restaurant'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: X2TMWYD1XGUQ5S3TN5KLDFMRXTN3UD30JKFUTZ445AMTX5JR
CLIENT_SECRET:PJ0JQ40GQE1XVRG2YOOMBZEYVPW2HQY1GX3AJR2HVNDJETAE


#### Creating a function that uses Foursquare API to look for top 10 restaurants in each neighborhood

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name1, lat1, lng1 in zip(names, latitudes, longitudes):
        print(name1)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&query=restaurant&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat1, 
            lng1, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name1, 
            lat1, 
            lng1, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Finding restaurants in New York

In [13]:
ny_venues = getNearbyVenues(names=ny_neighborhoods['Neighborhood'],
                                   latitudes=ny_neighborhoods['Latitude'],
                                   longitudes=ny_neighborhoods['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [14]:
print(ny_venues.shape)
ny_venues.head()

(2524, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop
1,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant,40.898083,-73.850259,Caribbean Restaurant
2,Wakefield,40.894705,-73.847201,Subway,40.890468,-73.849152,Restaurant
3,Wakefield,40.894705,-73.847201,Pitman Deli,40.896744,-73.844398,Food
4,Wakefield,40.894705,-73.847201,Central Deli,40.896728,-73.844387,Deli / Bodega


#### Utilizing one hot encoding technique to arrange the data by restaurant category

In [15]:
# one hot encoding
ny_onehot = pd.get_dummies(ny_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ny_onehot['Neighborhood'] = ny_venues['Neighborhood'] 

# move neighborhood column to the first column
first_column = ny_onehot.pop('Neighborhood')
ny_onehot.insert(0, 'Neighborhood', first_column)

ny_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dosa Place,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hotpot Restaurant,Indian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,Peruvian Restaurant,Pet Café,Pizza Place,Polish Restaurant,Portuguese Restaurant,Puerto Rican Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Varenyky restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [16]:
ny_grouped = ny_onehot.groupby('Neighborhood').mean().reset_index()
num_top_venues = 10

for hood in ny_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ny_grouped[ny_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))

----Allerton----
                  venue  freq
0           Pizza Place   0.2
1   Fried Chicken Joint   0.1
2            Restaurant   0.1
3            Donut Shop   0.1
4  Fast Food Restaurant   0.1
5         Deli / Bodega   0.1
6                  Food   0.1
7        Breakfast Spot   0.1
8    Chinese Restaurant   0.1
9   Peruvian Restaurant   0.0
----Annadale----
                     venue  freq
0      American Restaurant   0.2
1              Pizza Place   0.2
2         Sushi Restaurant   0.1
3               Restaurant   0.1
4                    Diner   0.1
5            Deli / Bodega   0.1
6                   Bakery   0.1
7                     Food   0.1
8  New American Restaurant   0.0
9         Ramen Restaurant   0.0
----Arden Heights----
                     venue  freq
0            Deli / Bodega   0.5
1              Pizza Place   0.5
2        Afghan Restaurant   0.0
3      Moroccan Restaurant   0.0
4               Restaurant   0.0
5         Ramen Restaurant   0.0
6  Puerto Rican Rest

                         venue  freq
0           Chinese Restaurant   0.3
1                       Bakery   0.1
2                  Pizza Place   0.1
3  Eastern European Restaurant   0.1
4           Mexican Restaurant   0.1
5               Breakfast Spot   0.1
6           Spanish Restaurant   0.1
7           Italian Restaurant   0.1
8            Polish Restaurant   0.0
9                 Noodle House   0.0
----Brooklyn Heights----
                     venue  freq
0          Thai Restaurant   0.1
1              Pizza Place   0.1
2      Japanese Restaurant   0.1
3       Falafel Restaurant   0.1
4       Italian Restaurant   0.1
5         Asian Restaurant   0.1
6                    Diner   0.1
7  New American Restaurant   0.1
8            Deli / Bodega   0.1
9         Sushi Restaurant   0.1
----Brookville----
                     venue  freq
0            Deli / Bodega   1.0
1        Afghan Restaurant   0.0
2      Moroccan Restaurant   0.0
3               Restaurant   0.0
4         Ramen Resta

                      venue  freq
0                    Bakery   0.2
1        Italian Restaurant   0.1
2  Mediterranean Restaurant   0.1
3            Breakfast Spot   0.1
4             Deli / Bodega   0.1
5            Sandwich Place   0.1
6                Restaurant   0.1
7       Japanese Restaurant   0.1
8       American Restaurant   0.1
9          Ramen Restaurant   0.0
----Dyker Heights----
                     venue  freq
0       Italian Restaurant   0.2
1               Food Truck   0.2
2               Bagel Shop   0.2
3       Mexican Restaurant   0.2
4                     Food   0.2
5               Restaurant   0.0
6         Ramen Restaurant   0.0
7  Puerto Rican Restaurant   0.0
8    Portuguese Restaurant   0.0
9        Polish Restaurant   0.0
----East Elmhurst----
                       venue  freq
0                 Donut Shop   0.2
1        American Restaurant   0.1
2  Latin American Restaurant   0.1
3              Deli / Bodega   0.1
4                Snack Place   0.1
5  South 

                       venue  freq
0                 Food Truck   0.1
1                 Donut Shop   0.1
2             Sandwich Place   0.1
3          Indian Restaurant   0.1
4         Falafel Restaurant   0.1
5                 Bagel Shop   0.1
6       Fast Food Restaurant   0.1
7         Mexican Restaurant   0.1
8  Middle Eastern Restaurant   0.1
9                Pizza Place   0.1
----Glendale----
                     venue  freq
0              Pizza Place  0.43
1            Deli / Bodega  0.29
2                   Bakery  0.14
3       Chinese Restaurant  0.14
4        Afghan Restaurant  0.00
5  New American Restaurant  0.00
6               Restaurant  0.00
7         Ramen Restaurant  0.00
8  Puerto Rican Restaurant  0.00
9    Portuguese Restaurant  0.00
----Gowanus----
                    venue  freq
0             Pizza Place   0.2
1      Italian Restaurant   0.2
2     American Restaurant   0.1
3        Ramen Restaurant   0.1
4  Argentinian Restaurant   0.1
5       Indian Restaurant  

                       venue  freq
0           Sushi Restaurant   0.2
1  Middle Eastern Restaurant   0.2
2                Pizza Place   0.2
3              Deli / Bodega   0.1
4                       Food   0.1
5               Burger Joint   0.1
6                 Restaurant   0.1
7  Latin American Restaurant   0.0
8        Peruvian Restaurant   0.0
9           Ramen Restaurant   0.0
----Kingsbridge----
                       venue  freq
0         Mexican Restaurant   0.3
1                Pizza Place   0.2
2                Wings Joint   0.1
3  Latin American Restaurant   0.1
4         Spanish Restaurant   0.1
5               Burger Joint   0.1
6       Caribbean Restaurant   0.1
7    New American Restaurant   0.0
8           Ramen Restaurant   0.0
9    Puerto Rican Restaurant   0.0
----Kingsbridge Heights----
                       venue  freq
0                Pizza Place   0.3
1                 Restaurant   0.1
2      Vietnamese Restaurant   0.1
3                      Diner   0.1
4  Lati

----Morningside Heights----
                     venue  freq
0           Sandwich Place   0.2
1      American Restaurant   0.2
2         Greek Restaurant   0.1
3                     Café   0.1
4             Burger Joint   0.1
5       Mexican Restaurant   0.1
6               Food Truck   0.1
7               Restaurant   0.1
8        Polish Restaurant   0.0
9  New American Restaurant   0.0
----Morris Heights----
                     venue  freq
0       Spanish Restaurant  0.38
1            Deli / Bodega  0.25
2              Pizza Place  0.12
3                     Food  0.12
4                   Buffet  0.12
5        Afghan Restaurant  0.00
6  New American Restaurant  0.00
7         Ramen Restaurant  0.00
8  Puerto Rican Restaurant  0.00
9    Portuguese Restaurant  0.00
----Morris Park----
                     venue  freq
0              Pizza Place   0.3
1            Deli / Bodega   0.2
2       Italian Restaurant   0.1
3         Arepa Restaurant   0.1
4               Donut Shop   0.1
5    

                     venue  freq
0              Pizza Place  0.29
1         Sushi Restaurant  0.14
2       Italian Restaurant  0.14
3               Bagel Shop  0.14
4       Chinese Restaurant  0.14
5               Food Truck  0.14
6      Moroccan Restaurant  0.00
7         Ramen Restaurant  0.00
8  Puerto Rican Restaurant  0.00
9    Portuguese Restaurant  0.00
----Prospect Heights----
                       venue  freq
0                       Café   0.1
1       Caribbean Restaurant   0.1
2  Cajun / Creole Restaurant   0.1
3          Korean Restaurant   0.1
4                      Diner   0.1
5            Thai Restaurant   0.1
6           Sushi Restaurant   0.1
7               Burger Joint   0.1
8         Mexican Restaurant   0.1
9                Pizza Place   0.1
----Prospect Lefferts Gardens----
                venue  freq
0                Café   0.3
1    Sushi Restaurant   0.1
2              Bakery   0.1
3  Italian Restaurant   0.1
4         Pizza Place   0.1
5           Gastropub   0

                             venue  freq
0               Italian Restaurant   0.3
1                           Bakery   0.2
2    Vegetarian / Vegan Restaurant   0.1
3                      Pizza Place   0.1
4         Mediterranean Restaurant   0.1
5               Spanish Restaurant   0.1
6                French Restaurant   0.1
7  Molecular Gastronomy Restaurant   0.0
8          Puerto Rican Restaurant   0.0
9            Portuguese Restaurant   0.0
----Soundview----
                       venue  freq
0         Chinese Restaurant   0.3
1                Pizza Place   0.2
2  Latin American Restaurant   0.1
3                       Food   0.1
4             Breakfast Spot   0.1
5               Burger Joint   0.1
6        Fried Chicken Joint   0.1
7          Afghan Restaurant   0.0
8               Noodle House   0.0
9                 Restaurant   0.0
----South Beach----
                     venue  freq
0            Deli / Bodega   1.0
1        Afghan Restaurant   0.0
2      Moroccan Restaurant 

                 venue  freq
0                 Café   0.2
1           Restaurant   0.1
2   Italian Restaurant   0.1
3     Ramen Restaurant   0.1
4        Deli / Bodega   0.1
5               Bakery   0.1
6          Pizza Place   0.1
7             Pet Café   0.1
8         Burger Joint   0.1
9  Moroccan Restaurant   0.0
----Weeksville----
                   venue  freq
0     Chinese Restaurant   0.3
1                 Bakery   0.1
2    American Restaurant   0.1
3             Food Truck   0.1
4             Donut Shop   0.1
5                  Diner   0.1
6   Caribbean Restaurant   0.1
7          Deli / Bodega   0.1
8  Portuguese Restaurant   0.0
9            Pizza Place   0.0
----West Brighton----
                 venue  freq
0   Italian Restaurant   0.2
1           Bagel Shop   0.1
2   Mexican Restaurant   0.1
3    German Restaurant   0.1
4                 Café   0.1
5           Taco Place   0.1
6         Burger Joint   0.1
7          Wings Joint   0.1
8  American Restaurant   0.1
9   Tibet

#### Arranging the data by most frequently visited categories

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [18]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Type of Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Type of Restaurant'.format(ind+1))

# create a new dataframe
ny_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
ny_neighborhoods_venues_sorted['Neighborhood'] = ny_grouped['Neighborhood']

for ind in np.arange(ny_grouped.shape[0]):
    ny_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ny_grouped.iloc[ind, :], num_top_venues)

ny_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Type of Restaurant,2nd Most Common Type of Restaurant,3rd Most Common Type of Restaurant,4th Most Common Type of Restaurant,5th Most Common Type of Restaurant,6th Most Common Type of Restaurant,7th Most Common Type of Restaurant,8th Most Common Type of Restaurant,9th Most Common Type of Restaurant,10th Most Common Type of Restaurant
0,Allerton,Pizza Place,Restaurant,Deli / Bodega,Chinese Restaurant,Fried Chicken Joint,Fast Food Restaurant,Breakfast Spot,Donut Shop,Food,Fish & Chips Shop
1,Annadale,American Restaurant,Pizza Place,Restaurant,Bakery,Diner,Sushi Restaurant,Food,Deli / Bodega,Eastern European Restaurant,Fish & Chips Shop
2,Arden Heights,Deli / Bodega,Pizza Place,Wings Joint,Food,Dosa Place,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant
3,Arlington,American Restaurant,Deli / Bodega,Fast Food Restaurant,Caribbean Restaurant,Wings Joint,Food Court,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant
4,Arrochar,Pizza Place,Italian Restaurant,Polish Restaurant,Mediterranean Restaurant,Restaurant,Deli / Bodega,Bagel Shop,Middle Eastern Restaurant,Dumpling Restaurant,Eastern European Restaurant


### Finding restaurants in Toronto

In [None]:
tor_venues = getNearbyVenues(names=tor_neighborhoods['Neighborhood'],
                                   latitudes=tor_neighborhoods['Latitude'],
                                   longitudes=tor_neighborhoods['Longitude']
                                  )


In [None]:
print(tor_venues.shape)
tor_venues.head()

#### Utilizing one hot encoding technique to arrange the data by restaurant category

In [None]:
# one hot encoding
tor_onehot = pd.get_dummies(tor_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tor_onehot['Neighborhood'] = tor_venues['Neighborhood'] 

# move neighborhood column to the first column
first_column = tor_onehot.pop('Neighborhood')
tor_onehot.insert(0, 'Neighborhood', first_column)

tor_onehot.head()

In [None]:
tor_grouped = tor_onehot.groupby('Neighborhood').mean().reset_index()
num_top_venues = 10

for hood in tor_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = tor_grouped[tor_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))

#### Arranging the data by most frequently visited categories

In [None]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Type of Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Type of Restaurant'.format(ind+1))

# create a new dataframe
tor_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
tor_neighborhoods_venues_sorted['Neighborhood'] = tor_grouped['Neighborhood']

for ind in np.arange(tor_grouped.shape[0]):
    tor_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tor_grouped.iloc[ind, :], num_top_venues)

tor_neighborhoods_venues_sorted.head()

## Section-4 Analysis:

Now that we have top-10 most visited restaurants in every neighborhood of New York City as well as Toronto, let us dive in the analysis section.

### Analysis 4.1
* Firstly, we will drop the neighborhood column from the "_neighborhoods_venues_sorted" dataset as part of cleaning.
* Then we will count the number of times each restaurant category has been in top-10 for every neighborhood.
* We would change the NaN values into 0 to prepare the data for further analysis.
* We will then arrange the data in descending order so we can see which restaurant type was in top-10 most number of times for a given city.
* Next, we will normalize the data by min-max method.
* The, we will add restaurant type labels in the normalized value dataframe.
* Repeating the process for both cities.
* Now, we will merge the normalized value dataframe of both cities by restaurant types.
* We will add the normalized value for NY and Toronto for each restaurant types and find the one with maximum value.
* This restaurant is our optimum restaurant type for investment as it has been in top-10 most visited for most neighborhoods.

### Analyzing Data to find out which restaurant category are most frequently visited by people of New York and Toronto

In [None]:
ny_analysis = ny_neighborhoods_venues_sorted.drop('Neighborhood', inplace=False, axis=1) # Dropping neighborhood in order to analyze only restaurant categories
ny_analysis = ny_analysis.apply(pd.Series.value_counts) # Counting number of times a given category was in top-10
ny_analysis = ny_analysis.fillna(0) # Replacing NaN values with 0 to make analysis easier
ny_analysis.head()

In [None]:
tor_analysis = tor_neighborhoods_venues_sorted.drop('Neighborhood', inplace=False, axis=1) # Dropping neighborhood in order to analyze only restaurant categories
tor_analysis = tor_analysis.apply(pd.Series.value_counts) # Counting number of times a given category was in top-10
tor_analysis = tor_analysis.fillna(0) # Replacing NaN values with 0 to make analysis easier
tor_analysis.head()

In [None]:
# Finding the restaurant type that was in top 10 most number of times and arranging in descending order
ny_sum = ny_analysis.sum(axis=1).sort_values(ascending=False)
tor_sum = tor_analysis.sum(axis=1).sort_values(ascending=False)

#Converting variables into dataframe
ny_sum = pd.DataFrame(ny_sum)


tor_sum = pd.DataFrame(tor_sum)



In [None]:
# Normalizing NY data
x_ny = ny_sum.values #returns a numpy array
min_max_scaler_ny = preprocessing.MinMaxScaler()
x_ny_scaled = min_max_scaler_ny.fit_transform(x_ny)
ny_sum_norm = pd.DataFrame(x_ny_scaled)
ny_sum_norm.columns = ['Normalized Frequency NY']


# Normalizing Toronto data
x_tor = tor_sum.values #returns a numpy array
min_max_scaler_tor = preprocessing.MinMaxScaler()
x_tor_scaled = min_max_scaler_tor.fit_transform(x_tor)
tor_sum_norm = pd.DataFrame(x_tor_scaled)
tor_sum_norm.columns = ['Normalized Frequency Tor']


In [None]:
# Preparing the data:
ny_sum.index.name = 'Restaurant type'
ny_sum.reset_index(inplace=True)
ny_sum.columns = ['Restaurant type', 'Frequency NY']

tor_sum.index.name = 'Restaurant type'
tor_sum.reset_index(inplace=True)
tor_sum.columns = ['Restaurant type', 'Frequency Tor']



In [None]:
ny_sum_norm = ny_sum.join(ny_sum_norm)
tor_sum_norm = tor_sum.join(tor_sum_norm)


In [None]:
# Merging data of both cities
both_city_sum_norm = pd.merge(ny_sum_norm, tor_sum_norm, on='Restaurant type')
both_city_sum_norm

In [None]:
both_city_sum_norm = both_city_sum_norm.set_index('Restaurant type')
both_city_sum_norm = both_city_sum_norm.drop(['Frequency NY','Frequency Tor'], axis=1)

best_category_both_city = both_city_sum_norm.sum(axis=1).sort_values(ascending=False)
best_restaurant_type = best_category_both_city.head(1)


In [None]:
best_restaurant_type = pd.DataFrame(best_restaurant_type)
best_restaurant_type.index.name = 'Restaurant type'
best_restaurant_type.reset_index(inplace=True)
best_restaurant_type.columns = ['Restaurant type', 'Total Normalized Score in both cities']
best_restaurant_type_name = best_restaurant_type.iat[0,0]
print('The best restaurant category to open in New York City as well as Toronto is:', best_restaurant_type_name)

### Analysis 4.2

Let us find which neighborhood in NY City and Toronto should the restaurant be opened:
* We will find the neighborhoods from the "_neighborhoods_venues_sorted" dataframe where our above optimal restaurant type is most visisted since that neighborhood will have most people who like such restaurant types
* We will repeat it for dataset of both cities

In [None]:
# Finding the neighborhood where the given restaurant type is most common;y visited:

ny_best_neighborhood = ny_neighborhoods_venues_sorted[ny_neighborhoods_venues_sorted['1st Most Common Type of Restaurant'].str.contains(best_restaurant_type_name)]
ny_best_neighborhood = ny_best_neighborhood.iat[0,0]

tor_best_neighborhood = tor_neighborhoods_venues_sorted[tor_neighborhoods_venues_sorted['1st Most Common Type of Restaurant'].str.contains(best_restaurant_type_name)]
tor_best_neighborhood = tor_best_neighborhood.iat[0,0]

print('The best neighborhoods to open', best_restaurant_type_name, 'type of restaurant are', ny_best_neighborhood, ' for New York and,', tor_best_neighborhood, 'for Toronto')

## Section-5 Results and Discussion:

The thorough analysis of the location data by statistical means has yeilded the result that Deli/Bodega type of restaurants are consistently enjoyed by majority of neighborhoods in both New York City in United States as well as City of Toronto in Canada. The best neighborhoods to open this type of restaurant are Arden Heights in New York and either of North Park, Maple Leaf Park, or Upwood Park in Toronto since people from these neighborhoods are most likely to visit Deli/Bodega type of restaurant as it was the most frequency in respective neighborhoods.

Apart from the given analysis, there are some other factors to consider before making this investment, that this project has not taken into account due to scarcity of such data avaiable on any open source platform, which include competition in the same restaurant type sector, profitability of such restaurants, ease of availability of labor in the neighborhood to run such restaurant type, initial investment cost as well as maintenance cost.


## Section-6 Conclusion:

Thus, as per the thorough analysis presented by this project, Deli/Bodega type restaurants are most likely to be visited by people of New York City as well as Toronto and the neighborhood of Arden Heights in New York and North Park, Maple Leaf Park, or Upwood Park in Toronto seem to be most optimum for this investment.