# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Restaurant equipment supplier by Juan Luis Mejía Villa - February 2020

### *Description of the problem and a discussion of the background*

A multinational Company that manufactures home appliances and mechanical/electronic products like refrigerators, microwaves, ovens and stoves wants to diversify its market with new complementary line of products in a specific market niche and in a promising location.  
The restaurant industry has offered a consistent growth of 2.1% for the last 20 years in North America (https://aaronallen.com/blog/restaurant-industry-growth).
So, the company has shown interest in start a new business specialized in supplying restaurants tools and equipment to any kind of restaurant. They chose North America to begin, and it will be a big investment, so it must be a wise decision in which city is going to be located the main warehouse-store.   
That means that has to be a really promising city, with a large number of Restaurants that would be the possible clients for the new Supplying shop that the company is going to start.

Another research team select one city from each country in North America (USA and Canada) to be compared with the other. My first task would be to decide which city is the best to make the investment and start the first store there.  
Chicago-based foodservice database, marketing, and analytics firm, CHD Expert has new data indicating that independent restaurant operators are making an impact in Toronto’s foodservice market landscape. The new data reveals that 69.3 percent of restaurants in Toronto are independent restaurants (1 to 9 units), whereas only 63.4 percent of nationwide restaurants are considered independent. This release will focus on the Canadian restaurant landscape, specifically focusing on Toronto’s independent restaurant operator growth, and popular menu types. https://www.chd-expert.com/blog/press_release/the-canadian-restaurant-industry-landscape-why-is-toronto-unique/  
So the city was the chosen one in Canada because its nature with the growing share of the market for independent restaurants could make the city really promising.  

Now, for New York City a writer in forbes said:
The restaurant business in New York City is like no other business in the world. The rent structure, the volume of business, minimum wage pay scale, spotlight and notoriety, 3rd party online order, celeb chefs, and delivery platforms, as well as the ever-increasing regulation set forth by NYC, make operating a restaurant in NYC exciting, exhausting, and sometimes as nerve-wracking as bungee jumping. As a result, national organizations established to support restaurant operators in other parts of the country very often do not connect with restaurant issues in New York City. The unique rewards and challenges facing restaurateurs are often more complicated, misunderstood, or not embraced at all. As a franchise consultant in the restaurant development space, my experience has been that when it comes to addressing and assisting restaurateurs in New York, one size does not fit all. https://www.forbes.com/sites/garyocchiogrosso/2019/12/20/the-new-york-city-restaurant-business-is-so-much-more-than-just-the-center-of-the-plate/#3d705859639c  
This excellent perspective makes New York a perfect option to begin the business.  

The stake holders of the project are the company owners and managers, they already decide that they want to go all in with the project, they are going to invest what is necessary because their idea is to begin really big, that’s why they want to start in the city with the best profile and the larger number of possible clients for the business, they need an exhaustive analysis to have the best foundation for he decision, and is there where the analytics team is going to solve the business problem.  
The second part of the problem comes after choosing the city. The stake holders made very clear that the location of the store must be very strategic, the idea is that the store could be located in a zone in which the neighborhoods contains a large number of restaurants and a variety of categories, because is important to show that the products can be used in different cuisines to get diverse clients and grow faster.
That is the second task that must be done, is necessary to make a complete analysis to be able to tell the stake holders with good certainty where to locate the store. Is really important that the location allow the store to be near of the bigger number of restaurants because the stakeholders want a over standing level in logistics to be able to respond to clients need the fastest because this level of service would be key to penetrate the market as the stake holders want, and a location near lots of restaurant would be great to make publicity to the store.


---

### *Data*

For these tasks is going to be used Foursquare to retrieve the necessary data trough the API that this page offers.
We will be using geographical data from both cities; the idea is to have a table for each city that contains the restaurants in the city detailing the restaurant category and locating each restaurant in a neighborhood.  
If needed, the library geocoder will be used to get the latitude and longitude of the cities and neighborhoods.
Having this data would allow to know the number of restaurants in each city and also in each neighbor and analyzing its categories can be determined in some way the diversity of clients that could be impacted. At the moment that’s all the data that is going to be needed for the scope of the project.

- Toronto data will be taken from wikipedia, the list of Neihborhoods with its Postal Code (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)
- Toronto locations (latitud and longitud) are going to be taken from provided csv in the following link: https://cocl.us/Geospatial_data
- New York City data will be taken from the provided link: https://geo.nyu.edu/catalog/nyu_2451_34572
- New York City locations (latitud and longitud) are going to be found with geopy

Methodology  
1)	 Define the business problem and understand what the stake holders want.  
2)	Be clear with the tasks that the analytics team must accomplish.  
3)	Define what data is going to be used to respond to the tasks.  
4)	Select the data sources  
5)	Extract the data from the data sources.  
6)	Clean and prepare the data, in this case the data are tables thar are going to be defined as DataFrames.  
7)	Enrich the Neighborhoods tables to get the latitude and longitude.  
8)	Map the Neighborhoods.  
9)	Get the venues within a 500m radius from the center of each neighborhood for both cities.  
10)	Extract just the restaurants from the venues by Venue Category for each city.  
11)	Compute metrics to define the best city to start the store. Metrics are number of restaurants, categories and categories per restaurant.  
12)	Conclude which is the best city.  


#### Data extraction and cleansing

In [None]:
# Install packages, uncomment if are necesary
# !conda install -c conda-forge geopy --yes
# !conda install -c conda-forge folium=0.5.0 --yes

In [6]:
# Imports
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import json

In [None]:
## New York City Data


In [7]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [8]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
## Find latitud and longitud
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [32]:
neighborhoods.head()
# print(len(neighborhoods["Neighborhood"]))
# print(len(neighborhoods["Neighborhood"].unique()))
# print(len(neighborhoods["Borough"]))
# print(len(neighborhoods["Borough"].unique()))

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

---

In [None]:
## Toronto data


In [12]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_url,'lxml')  #Make the scrapping from the url defined above (the url to the wikipedia page)
My_table = soup.find('table',{'class':'wikitable sortable'}) # Bring table information from de page, in the HTML is in the class "wikitable sortable"
lineas=My_table.find_all('td') #In the HTML take the rows that are the ones in td, that means that are between <td></td>

In [13]:
#Here we clean it. if there is Not Assigned, then it wont find anything sraeching for 'a', and the line is treated as a string to remove the <td></td> and the \n.
#If it contains 'a' it means the line has to be treated to take the Neigbour or Borough in the field 'title'
lineas_new = []
for linea in lineas:
    
    if linea.find('a')==None:
        
        linea=str(linea)
        linea=linea[4:]
        linea=linea[:-5]
        linea=linea.replace("\n","")
        
    else:
        
        linea=linea.findChild()
        linea=linea.get('title')
        linea=str(linea)
        if linea[-9:]==", Toronto":
            linea=linea[:-9]

    lineas_new.append(linea)

In [39]:
#Base for the table to make the assignments
Tabla = pd.DataFrame(data=np.zeros((int(len(lineas)/3),3)),columns=["PostCode","Borough","Neighborhood"])

#Table is created
row=0
i=0
for linea in lineas_new:

    Tabla.iloc[row,i]=linea
    
    i=i+1
    if i==3:
        i=0
        row=row+1
Tabla.head()

Unnamed: 0,PostCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park


In [40]:
## Clean details
Tabla=Tabla[Tabla['Borough']!="Not assigned"].reset_index(drop=True)
Tabla.sort_values(by=["PostCode"],inplace=True)
Tabla.reset_index(drop=True,inplace=True)
Tabla[Tabla["Neighborhood"]=="Not assigned"]


Unnamed: 0,PostCode,Borough,Neighborhood
182,M9A,Queen's Park (Toronto),Not assigned


In [41]:
Tabla.loc[182,"Neighborhood"]=Tabla.loc[182,"Borough"]
Tabla[Tabla["Neighborhood"]=="Not assigned"]

Unnamed: 0,PostCode,Borough,Neighborhood


In [42]:
Tabla.head()

Unnamed: 0,PostCode,Borough,Neighborhood
0,M1B,Scarborough,Rouge
1,M1B,Scarborough,Malvern
2,M1C,Scarborough,Port Union
3,M1C,Scarborough,Rouge Hill
4,M1C,Scarborough,Highland Creek (Toronto)


In [43]:
## Take locations from provided csv
longlat=pd.read_csv('https://cocl.us/Geospatial_data')

In [44]:
## Merge Tables

Tabla=pd.merge(Tabla,longlat, left_on="PostCode", right_on="Postal Code")
Tabla.drop("Postal Code",axis=1, inplace=True)
Tabla.head()

Unnamed: 0,PostCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Rouge,43.806686,-79.194353
1,M1B,Scarborough,Malvern,43.806686,-79.194353
2,M1C,Scarborough,Port Union,43.784535,-79.160497
3,M1C,Scarborough,Rouge Hill,43.784535,-79.160497
4,M1C,Scarborough,Highland Creek (Toronto),43.784535,-79.160497


In [45]:
## Map of Toronto

latitude =43.651070
longitude = -79.347015

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Tabla['Latitude'], Tabla['Longitude'], Tabla['Borough'], Tabla['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

---

#### Take information from Foresquare

In [46]:
CLIENT_ID = 'LX142KEJPMABXD5IRIROOO4JSOJ0MHSKN3DO5CTXEYH25IJ0' # your Foursquare ID
CLIENT_SECRET = '3U53LUTVXZPT4UTEEFWWBSUNQ1MSAO2SMW5DGUYRJGNUL2LF' # your Foursquare Secret
VERSION = '20200218'#'20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LX142KEJPMABXD5IRIROOO4JSOJ0MHSKN3DO5CTXEYH25IJ0
CLIENT_SECRET:3U53LUTVXZPT4UTEEFWWBSUNQ1MSAO2SMW5DGUYRJGNUL2LF


In [48]:
## Function to get near venues for all neighborhoods
LIMIT = 50 # limit of number of venues returned by Foursquare API
radius = 500 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [49]:
## Take NY venues for all neighborhoods
ny_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [51]:
ny_venues.shape

(7871, 7)

In [55]:
ny_venues.columns

Index(['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude',
       'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'],
      dtype='object')

In [52]:
## Take Toronto venues for all neighborhoods
toronto_venues = getNearbyVenues(names=Tabla['Neighborhood'],
                                   latitudes=Tabla['Latitude'],
                                   longitudes=Tabla['Longitude']
                                  )

Rouge
Malvern
Port Union
Rouge Hill
Highland Creek (Toronto)
Guildwood
Morningside
West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park
Ionview
Kennedy Park
Golden Mile
Oakridge
Clairlea
Cliffcrest
Scarborough Village West
Cliffside
Cliffside West
Birch Cliff
Wexford Heights
Dorset Park
Scarborough Town Centre
Maryvale
Wexford
Agincourt
Sullivan
Clarks Corners
Tam O'Shanter – Sullivan
Steeles East
Milliken, Ontario
L'Amoreaux East
Agincourt North
L'Amoreaux West
Upper Rouge
Hillcrest Village
Henry Farm
Fairview
Oriole
Bayview Village
Silver Hills
York Mills
Willowdale
Newtonbrook
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Don Mills South
Flemingdon Park
Bathurst Manor
Wilson Heights
Downsview North
Northwood Park
York University
Downsview East
CFB Toronto
Downsview
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens
Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
Riverdale
The Danforth W

In [53]:
toronto_venues.shape

(3344, 7)

#### Serch for Restaurants

In [None]:
### NY

In [72]:
categories_ny=ny_venues["Venue Category"].unique()

In [88]:
list_food=['Dessert Shop', 'Ice Cream Shop', 'Caribbean Restaurant', 'Sandwich Place', 'Donut Shop', 'Food', 'Pizza Place',
       'Bagel Shop', 'Fast Food Restaurant',
       'Restaurant', 'Fried Chicken Joint',
        'Diner',
       'Seafood Restaurant',
       'Food & Drink Shop',
       'Chinese Restaurant',
       'Food Truck',
       'Latin American Restaurant', 'Burger Joint',
       'Spanish Restaurant',
       'Mexican Restaurant', 'Coffee Shop', 'Wings Joint',
       'Bakery', 'Breakfast Spot', 'Café', 
       'American Restaurant', 'Steakhouse',
       'Italian Restaurant', 'Indian Restaurant',
       'Soup Place',
       'Sushi Restaurant',
       'French Restaurant',
       'African Restaurant',
       'Burrito Place', 'Buffet',
       'Paella Restaurant',
       'Asian Restaurant',
       'Peruvian Restaurant', 'Fish & Chips Shop',
       'Cupcake Shop',
       'BBQ Joint', 'South American Restaurant', 'Sports Bar',
       'Southern / Soul Food Restaurant',
       'Middle Eastern Restaurant',
       'Arepa Restaurant', 'Eastern European Restaurant',
       'Cheese Shop', 'Thai Restaurant',
       'Japanese Restaurant', 'Comfort Food Restaurant', 'Taco Place',
       'Caucasian Restaurant', 'Greek Restaurant',
       'New American Restaurant', 'Vietnamese Restaurant',
       'Noodle House',
       'Shabu-Shabu Restaurant', 'Hotpot Restaurant', 'Halal Restaurant',
       'Creperie',
       'Polish Restaurant', 'Vegetarian / Vegan Restaurant',
       'Gastropub',
       'Mediterranean Restaurant', 'Korean Restaurant',
       'Russian Restaurant', 'Varenyky restaurant', 'Turkish Restaurant',
       'Salad Place', 'Cajun / Creole Restaurant',
       'North Indian Restaurant', 'Tapas Restaurant',
       'Cuban Restaurant', 'Pakistani Restaurant', 'Food Stand',
       'Falafel Restaurant',
       'Dumpling Restaurant', 'Filipino Restaurant',
       'Pie Shop', 'Argentinian Restaurant',
       'Ramen Restaurant', 'Israeli Restaurant',
       'Ethiopian Restaurant',
       'German Restaurant', 'Dim Sum Restaurant', 'Cantonese Restaurant',
       'Kebab Restaurant', 'Bistro',
       'Food Court',
       'Shanghai Restaurant',
       'Taiwanese Restaurant',
       'English Restaurant', 'Malay Restaurant', 'Pet Café',
       'Empanada Restaurant', 'Japanese Curry Restaurant', 'Cafeteria',
       'Chocolate Shop', 'Hot Dog Joint',
       'Czech Restaurant', 'Afghan Restaurant', 'Kosher Restaurant',
       'Szechuan Restaurant', 'Hawaiian Restaurant',
       'Jewish Restaurant', 'Snack Place',
       'Udon Restaurant', 'Moroccan Restaurant',
       'Scandinavian Restaurant', 'Swiss Restaurant',
       'Austrian Restaurant',
       'Cooking School',
       'Brazilian Restaurant', 'Tibetan Restaurant',
       'Himalayan Restaurant', 'Colombian Restaurant',
       'Indonesian Restaurant','Hunan Restaurant', 'Romanian Restaurant',
       'Egyptian Restaurant', 
       'Persian Restaurant', 'Gluten-free Restaurant',
       'Sri Lankan Restaurant',
       'Tex-Mex Restaurant', 'Molecular Gastronomy Restaurant',
       'Australian Restaurant']

In [89]:
## Only Restaurants for NY

ny_rest=ny_venues[pd.DataFrame(ny_venues["Venue Category"].tolist()).isin(list_food).any(1)].reset_index(drop=True)
ny_rest.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
2,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898083,-73.850259,Caribbean Restaurant
3,Wakefield,40.894705,-73.847201,SUBWAY,40.890468,-73.849152,Sandwich Place
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


In [None]:
### Toronto

In [90]:
toronto_venues[~toronto_venues["Venue Category"].isin(categories_ny)]["Venue Category"].unique()

array(['Hakka Restaurant', 'College Stadium', 'Light Rail Station',
       'Airport', 'Hockey Arena', 'Portuguese Restaurant', 'Curling Ice',
       'Stationery Store', 'Coworking Space', 'Swim School',
       'Costume Shop', 'Indoor Play Area', 'Theme Restaurant',
       'Comic Shop', 'College Rec Center', 'Modern European Restaurant',
       'Basketball Stadium', 'Belgian Restaurant', 'General Travel',
       'Aquarium', 'College Gym', 'Poutine Place', 'Doner Restaurant',
       'Airport Lounge', 'Airport Food Court', 'Plane', 'Airport Service',
       'Baby Store', 'Mac & Cheese Joint', 'College Auditorium',
       'Auto Workshop', 'Shopping Plaza', 'Drugstore'], dtype=object)

In [91]:
list_food_toronto=list_food+['Hakka Restaurant','Portuguese Restaurant','Modern European Restaurant','Belgian Restaurant','Doner Restaurant','Mac & Cheese Joint']


In [92]:
## Only Restaurants for toronto

toronto_rest=toronto_venues[pd.DataFrame(toronto_venues["Venue Category"].tolist()).isin(list_food_toronto).any(1)].reset_index(drop=True)
toronto_rest.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rouge,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,Malvern,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
2,Guildwood,43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
3,Guildwood,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
4,Guildwood,43.763573,-79.188711,Eggsmart,43.7678,-79.190466,Breakfast Spot


---

## Analyze wich city to chose

### Number of restaurants and divercity

#### New York

In [94]:
#Number of Restaurants
numb_ny = len(ny_rest["Venue"])

#Divercity

div_ny = len(ny_rest["Venue Category"].unique())

cat_by_venues_ny = div_ny/numb_ny

In [95]:
print(numb_ny)
print(div_ny)
print(cat_by_venues_ny)

3895
127
0.03260590500641849


#### Toronto

In [98]:
#Number of Restaurants
numb_toronto = len(toronto_rest["Venue"])

#Divercity

div_toronto = len(toronto_rest["Venue Category"].unique())

cat_by_venues_toronto = div_toronto/numb_toronto

In [99]:

print(numb_toronto)
print(div_toronto)
print(cat_by_venues_toronto)

1725
77
0.04463768115942029


### Conclusion

Toronto has 1% more categories per restaurant than NY, but the difference is extremely big in number of restaurant and number of categories, that we defined as divercity.

#### This is why NY is the definitive city to start the store.

### Discussion  

Scrapping isn’t the best path to extract data, even more if the data is a table that can be extract with other methods. Is a difficult way and is not the more optimal, so after learning from that is a recommendation that I give after this project. But, if there is web data that you need, even if is not the best way, web scrapping is a great resource.
Other metrics to compare cities for the restaurant industry can be used.
