# Capstone Project - Restaurant Location Analysis
### Applied Data Science Capstone by IBM/Coursera

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

### Introduction - The Business problem <a name="introduction"></a>

One of the most important factors leading to the success of a restaurant is its location. in this project, I will attempt to recommend, based on the avaiable data and analysis of it, **an optimal location for a new restaurant, and its preferable menu style**, on the basis of existing venues in the different cities of Israel.

We will assume the business stakeholder in this situation is an entrepreneur interested in opening a new restaurant around Israel & in an urban enviroment, **we will work under the assumption that the prevalence of specific menu types per area is indicative of demend for that restaurant style** for a specific area.

We will also use socio-economic index data in order to segment Israeli cities, to try and differentiate between prefered venues, and eventually make various recomendations based on this categorization. 

## Data <a name="data"></a>

Based on our definition of the issue, will will require data sources that include - 

* A list of Israeli cities, icluding their longitude & latitude information 
* Socio-Economic Index per city
* Specification of venues in each city

Accordingly, these are the data sources that will be used - 

* We will scrape the Wikipedia table for Cities is Israel for its included list of Socio-Economic Index per city
* We will use a geolocating api to add longitude & latitude
* The list of venues per area will be accessed with the Foursquare API

In [20]:
### required libraries 

import requests
import pandas as pd
import folium
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
import xlrd

Lets scrape the wikipedia table located at:https://en.wikipedia.org/wiki/List_of_cities_in_Israel 

In [51]:
url = "https://en.wikipedia.org/wiki/List_of_cities_in_Israel"

In [54]:
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")

In [58]:
right_table=soup.find('table', class_='wikitable sortable') #create object for the table alone

In [60]:
#loop to scrape each column contents into a list
A=[]
B=[]
C=[]
D=[]
E=[]
F=[]
G=[]
H=[]
I=[]
J=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)>0:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))  
        D.append(cells[3].find(text=True))  
        E.append(cells[4].find(text=True))  
        F.append(cells[5].find(text=True))  
        G.append(cells[6].find(text=True))  
        H.append(cells[7].find(text=True))
        I.append(cells[8].find(text=True))  
        J.append(cells[9].find(text=True))

In [65]:
#create pandas df
df_socio=pd.DataFrame(A,columns=['Name'])
df_socio['First_settlement']=B
df_socio['District']=D
df_socio['Population_Estimate_2018']=E
df_socio['Population_Census_2008']=F
df_socio['Change_2008_2018']=G
df_socio['Area_KM']=H
df_socio['Density_Per_KM']=I
df_socio['Socio_Economic_Index']=J
df_socio.head()

Unnamed: 0,Name,First_settlement,District,Population_Estimate_2018,Population_Census_2008,Change_2008_2018,Area_KM,Density_Per_KM,Socio_Economic_Index
0,Acre,Bronze Age,North,48930,46100,+6.14%,13.5,3362.0,−0.395
1,Afula,Bronze Age,North,51737,40200,+28.70%,26.9,1611.7,−0.028
2,Arad,1962,South,26451,23400,+13.04%,93.1,195.9,0.287
3,Arraba,,North,25369,20600,+23.15%,8.25,3097.1,−0.945
4,Ashdod,Bronze Age,South,224628,204300,+9.95%,47.2,4783.9,−0.109


In [66]:
df_socio.describe(include="all")

Unnamed: 0,Name,First_settlement,District,Population_Estimate_2018,Population_Census_2008,Change_2008_2018,Area_KM,Density_Per_KM,Socio_Economic_Index
count,73,73,73,73,73,73,73.0,73.0,73
unique,73,42,6,73,70,73,68.0,73.0,72
top,Or Yehuda,Bronze Age,Center,25636,35700,+15.57%,14.2,4741.2,−1.011
freq,1,9,21,1,2,1,2.0,1.0,2


### Add longitude & latitude 

We will use the OpenCage Geocoder 

In [133]:
!pip install opencage
from opencage.geocoder import OpenCageGeocode



In [None]:
key = 'Use your key here' # get api key from:  https://opencagedata.com

In [135]:

	
geocoder = OpenCageGeocode(key)



	
query = 'Tel Aviv'  



	
results = geocoder.geocode(query)



lat = results[0]['geometry']['lat']

lng = results[0]['geometry']['lng']

print (lat, lng)

32.0854162 34.7817131


In [136]:
# now for all cities in our df:
list_lat = []   # create empty lists

list_long = []


	
for index, row in df_socio.iterrows(): # iterate over rows in dataframe



    City = row['Name']
    State = 'Israel'       
    query = str(City)+','+str(State)

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)

	
# create new columns from lists    

df_socio['latitude'] = list_lat   

df_socio['longitude'] = list_long

In [137]:
df_socio.head()

Unnamed: 0,Name,First_settlement,District,Population_Estimate_2018,Population_Census_2008,Change_2008_2018,Area_KM,Density_Per_KM,Socio_Economic_Index,latitude,longitude
0,Acre,Bronze Age,North,48930,46100,+6.14%,13.5,3362.0,−0.395,32.928173,35.075638
1,Afula,Bronze Age,North,51737,40200,+28.70%,26.9,1611.7,−0.028,32.607559,35.289086
2,Arad,1962,South,26451,23400,+13.04%,93.1,195.9,0.287,31.26122,35.214581
3,Arraba,,North,25369,20600,+23.15%,8.25,3097.1,−0.945,32.848613,35.335827
4,Ashdod,Bronze Age,South,224628,204300,+9.95%,47.2,4783.9,−0.109,31.797731,34.652992


In [None]:
df_socio['Area_KM'] = df_socio['Area_KM'].apply(pd.to_numeric, errors='coerce')
df_socio.dtypes

In [155]:
df_socio.describe(include="all")

Unnamed: 0,Name,First_settlement,District,Population_Estimate_2018,Population_Census_2008,Change_2008_2018,Area_KM,Density_Per_KM,Socio_Economic_Index,latitude,longitude
count,73,73,73,73.0,73.0,73,73.0,73.0,73,73.0,73.0
unique,73,42,6,73.0,70.0,73,,73.0,72,,
top,Or Yehuda,Bronze Age,Center,25636.0,35700.0,+15.57%,,4741.2,−1.011,,
freq,1,9,21,1.0,2.0,1,,1.0,2,,
mean,,,,,,,22.682877,,,32.199448,34.983206
std,,,,,,,24.767805,,,0.594389,0.229776
min,,,,,,,2.6,,,29.556935,34.573016
25%,,,,,,,8.2,,,31.927999,34.824681
50%,,,,,,,14.2,,,32.143128,34.949795
75%,,,,,,,26.9,,,32.70663,35.100408


Lets Create a map of Israel with cities superimposed on top

In [148]:

map_Israel = folium.Map(location=[31, 35], zoom_start=7)

# add markers to map
for lat, lng, Name, District in zip(df_socio['latitude'], df_socio['longitude'], df_socio['Name'], df_socio['District']):
    label = '{}, {}'.format(Name, District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Israel)  
    
map_Israel

## Foursquare

Now that all of our locations are set, lets get venue data for all cities.
We will make a call to the Foursquare API for all venues in the vicinity of each city.

In [156]:
#Define Foursquare Credentials and Version

CLIENT_ID = 'set your id' # your Foursquare ID
CLIENT_SECRET = 'set your pass' # your Foursquare Secret
VERSION = '20200713' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID has been set ' )
print('CLIENT_SECRET has been set')


Your credentails:
CLIENT_ID has been set 
CLIENT_SECRET has been set


In [180]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 3000 # define radius

In [181]:
 #create a function to get venus for all the cities in Israel

def getNearbyVenues(names, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    print("Finished!")
    
    return(nearby_venues)

In [182]:
# write the code to run the above function on each city and create a new dataframe called Israel_venues


Israel_venues = getNearbyVenues(names=df_socio['Name'],
                                 latitudes=df_socio['latitude'],
                                 longitudes=df_socio['longitude'],
                                 radius = 3000
                                )

Finished!


In [183]:
# check the size of the resulting dataframe
print(Israel_venues.shape)
Israel_venues

(2266, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Acre,32.928173,35.075638,Old City of Acre / Akko (העיר העתיקה של עכו),32.922891,35.070638,Historic Site
1,Acre,32.928173,35.075638,Uri Buri,32.920179,35.066798,Seafood Restaurant
2,Acre,32.928173,35.075638,Kukushka - Premium Snack Bar - קוקושקה,32.922540,35.069923,Tapas Restaurant
3,Acre,32.928173,35.075638,The Crusader Citadel (Citadel of Acre),32.923712,35.070589,Historic Site
4,Acre,32.928173,35.075638,Effendi Hotel,32.922566,35.068062,Hotel
5,Acre,32.928173,35.075638,Hummus Said (חומוס סעיד),32.921535,35.069755,Middle Eastern Restaurant
6,Acre,32.928173,35.075638,Suhila Hummus (חומוס סוהילה),32.922480,35.071718,Mediterranean Restaurant
7,Acre,32.928173,35.075638,Doniana,32.919306,35.068455,Asian Restaurant
8,Acre,32.928173,35.075638,El Marsa,32.920104,35.070044,Restaurant
9,Acre,32.928173,35.075638,חוף הים,32.935526,35.072735,Scenic Lookout


In [184]:
Israel_venues.describe(include="all")

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
count,2266,2266.0,2266.0,2266,2266.0,2266.0,2266
unique,73,,,1690,,,213
top,Ramat Gan,,,Aroma (ארומה),,,Café
freq,100,,,57,,,304
mean,,32.098229,34.920833,,32.098472,34.917284,
std,,0.61373,0.181069,,0.613732,0.182567,
min,,29.556935,34.573016,,29.531398,34.546251,
25%,,32.015456,34.811328,,32.005942,34.804078,
50%,,32.085416,34.856625,,32.087046,34.860495,
75%,,32.328618,34.998386,,32.320483,34.990508,


In [185]:
Israel_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acre,31,31,31,31,31,31
Afula,15,15,15,15,15,15
Arad,14,14,14,14,14,14
Arraba,3,3,3,3,3,3
Ashdod,60,60,60,60,60,60
Ashkelon,22,22,22,22,22,22
Baqa al-Gharbiyye,10,10,10,10,10,10
Bat Yam,75,75,75,75,75,75
Beersheba,45,45,45,45,45,45
Beit She'an,5,5,5,5,5,5


In [187]:
#Let's find out how many unique categories can be curated from all the returned venues

print('There are {} uniques categories.'.format(len(Israel_venues['Venue Category'].unique())))

There are 213 uniques categories.


### At this point we have all of the data we require, this concludes the data gathering phase