# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Restaurant equipment supplier by Juan Luis Mejía Villa - February 2020

### *Description of the problem and a discussion of the background*

A multinational Company that manufactures home appliances and mechanical/electronic products like refrigerators, microwaves, ovens and stoves wants to diversify its market with new complementary line of products in a specific market niche and in a promising location.  
The restaurant industry has offered a consistent growth of 2.1% for the last 20 years in North America (https://aaronallen.com/blog/restaurant-industry-growth).
So, the company has shown interest in start a new business specialized in supplying restaurants tools and equipment to any kind of restaurant. They chose North America to begin, and it will be a big investment, so it must be a wise decision in which city is going to be located the main warehouse-store.   
That means that has to be a really promising city, with a large number of Restaurants that would be the possible clients for the new Supplying shop that the company is going to start.

Another research team select one city from each country in North America (USA and Canada) to be compared with the other. My first task would be to decide which city is the best to make the investment and start the first store there.  
Chicago-based foodservice database, marketing, and analytics firm, CHD Expert has new data indicating that independent restaurant operators are making an impact in Toronto’s foodservice market landscape. The new data reveals that 69.3 percent of restaurants in Toronto are independent restaurants (1 to 9 units), whereas only 63.4 percent of nationwide restaurants are considered independent. This release will focus on the Canadian restaurant landscape, specifically focusing on Toronto’s independent restaurant operator growth, and popular menu types. https://www.chd-expert.com/blog/press_release/the-canadian-restaurant-industry-landscape-why-is-toronto-unique/  
So the city was the chosen one in Canada because its nature with the growing share of the market for independent restaurants could make the city really promising.  

Now, for New York City a writer in forbes said:
The restaurant business in New York City is like no other business in the world. The rent structure, the volume of business, minimum wage pay scale, spotlight and notoriety, 3rd party online order, celeb chefs, and delivery platforms, as well as the ever-increasing regulation set forth by NYC, make operating a restaurant in NYC exciting, exhausting, and sometimes as nerve-wracking as bungee jumping. As a result, national organizations established to support restaurant operators in other parts of the country very often do not connect with restaurant issues in New York City. The unique rewards and challenges facing restaurateurs are often more complicated, misunderstood, or not embraced at all. As a franchise consultant in the restaurant development space, my experience has been that when it comes to addressing and assisting restaurateurs in New York, one size does not fit all. https://www.forbes.com/sites/garyocchiogrosso/2019/12/20/the-new-york-city-restaurant-business-is-so-much-more-than-just-the-center-of-the-plate/#3d705859639c  
This excellent perspective makes New York a perfect option to begin the business.  

The stake holders of the project are the company owners and managers, they already decide that they want to go all in with the project, they are going to invest what is necessary because their idea is to begin really big, that’s why they want to start in the city with the best profile and the larger number of possible clients for the business, they need an exhaustive analysis to have the best foundation for he decision, and is there where the analytics team is going to solve the business problem.  
The second part of the problem comes after choosing the city. The stake holders made very clear that the location of the store must be very strategic, the idea is that the store could be located in a zone in which the neighborhoods contains a large number of restaurants and a variety of categories, because is important to show that the products can be used in different cuisines to get diverse clients and grow faster.
That is the second task that must be done, is necessary to make a complete analysis to be able to tell the stake holders with good certainty where to locate the store. Is really important that the location allow the store to be near of the bigger number of restaurants because the stakeholders want a over standing level in logistics to be able to respond to clients need the fastest because this level of service would be key to penetrate the market as the stake holders want, and a location near lots of restaurant would be great to make publicity to the store.


---

### *Data*

For these tasks is going to be used Foursquare to retrieve the necessary data trough the API that this page offers.
We will be using geographical data from both cities; the idea is to have a table for each city that contains the restaurants in the city detailing the restaurant category and locating each restaurant in a neighborhood.  
If needed, the library geocoder will be used to get the latitude and longitude of the cities and neighborhoods.
Having this data would allow to know the number of restaurants in each city and also in each neighbor and analyzing its categories can be determined in some way the diversity of clients that could be impacted. At the moment that’s all the data that is going to be needed for the scope of the project.

- Toronto data will be taken from wikipedia, the list of Neihborhoods with its Postal Code (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)
- Toronto locations (latitud and longitud) are going to be taken from provided csv in the following link: https://cocl.us/Geospatial_data
- New York City data will be taken from the provided link: https://geo.nyu.edu/catalog/nyu_2451_34572
- New York City locations (latitud and longitud) are going to be found with geopy

#### Data extraction and cleansing

In [None]:
# Install packages, uncomment if are necesary
# !conda install -c conda-forge geopy --yes
# !conda install -c conda-forge folium=0.5.0 --yes

In [6]:
# Imports
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import json

In [None]:
## New York City Data


In [7]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [8]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
## Find latitud and longitud
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

---

In [None]:
## Toronto data


In [None]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_url,'lxml')  #Make the scrapping from the url defined above (the url to the wikipedia page)
My_table = soup.find('table',{'class':'wikitable sortable'}) # Bring table information from de page, in the HTML is in the class "wikitable sortable"
lineas=My_table.find_all('td') #In the HTML take the rows that are the ones in td, that means that are between <td></td>

In [None]:
#Here we clean it. if there is Not Assigned, then it wont find anything sraeching for 'a', and the line is treated as a string to remove the <td></td> and the \n.
#If it contains 'a' it means the line has to be treated to take the Neigbour or Borough in the field 'title'
lineas_new = []
for linea in lineas:
    
    if linea.find('a')==None:
        
        linea=str(linea)
        linea=linea[4:]
        linea=linea[:-5]
        linea=linea.replace("\n","")
        
    else:
        
        linea=linea.findChild()
        linea=linea.get('title')
        linea=str(linea)
        if linea[-9:]==", Toronto":
            linea=linea[:-9]

    lineas_new.append(linea)

In [None]:
#Base for the table to make the assignments
Tabla = pd.DataFrame(data=np.zeros((int(len(lineas)/3),3)),columns=["PostCode","Borough","Neighborhood"])

#Table is created
row=0
i=0
for linea in lineas_new:

    Tabla.iloc[row,i]=linea
    
    i=i+1
    if i==3:
        i=0
        row=row+1
Tabla.head()

In [None]:
## Clean details
Tabla=Tabla[Tabla['Borough']!="Not assigned"].reset_index(drop=True)
Tabla.sort_values(by=["PostCode"],inplace=True)
Tabla.reset_index(drop=True,inplace=True)
Tabla.loc[6,"Neighborhood"]=Tabla.loc[6,"Borough"]
Tabla[Tabla["Neighborhood"]=="Not assigned"]


In [None]:
Tabla_final=pd.DataFrame(Tabla.groupby(['PostCode','Borough'])['Neighborhood'].apply(lambda x: "%s" % ', '.join(x))) #Concatenate Neighborhoods
Tabla_final.reset_index(inplace=True)

In [None]:
## Take locations from provided csv
longlat=pd.read_csv('https://cocl.us/Geospatial_data')