# Capstone project - Week 2
### Applied Data Science Capstone by IBM/Coursera

## Introduction: Business Problem

In this project we will try to predict popularity of newly opened restaurant given its location.
Specifically, this report will be targeted on stakeholders, who want to open **McDonald's** restaurant in **Kiev, Ukraine** and want to see if their selected location will be popular enough.
The project is focused on **chain fast food restaurant** for few reasons:
* Fast food restaurant is valued for possibility to hop in and out to grab some food on the commute, it is not popular because of high class cuisine.
* Chain restaurants look similar and have the same menu within given country. 
* Because of similarity, we can assume that popularity metric for each restaurant in chain is more dependant on it's location, than on its cuisine or interior.
* There are already quite a few opened **McDonald's** restaurants in **Kiev**, so we can have enough data to make predictions 

Speaking of location, fast food restaurant should be more popular if it is located near some significant point(s) of interest, such as shopping center, metro station, city center or train station. But how much each type of interest affect on the popularity? Is the selected place for the new restaurant is good enough, if it was chosen by simple criterias (e.g. near metro station)? These are the questions I am trying to answer with this project.


## Data and how it will be used to solve the problem

Based on the Business Problem, we will use multiple metrics from the **Foursquare API**:
* Number of McDonald's restaurants in Kiev
* Location for each restaurant
* Popularity of each restaurant (amount of visitors)

Unfortunately, third metric is not that simple to recover from the API. For this metric we could use total amount of check-ins for each restaurant. After some initial research, with the current version of **Foursquare API** we cannot retrive amount of check-ins anymore.<br>
Other two possible resolutions for this metric are number of *'Likes'* for each restaurant and number of *'Rating signals'*.

I've decided to stick with the amount of *'Rating Signals'* as during the initial research it seems that there are more *'Rating Signals'* than there are *'Likes'* per restaurant. Therefore results should be more precise.<br>
Rating for each restaurant is a value between 1 and 10. *Rating Signals* is the total amount of people, who rated this venue. As we are not interested in the rating itself, we will use only amount of votes which will be our indicator of popularity for the venue. More people visited place means more people rated it.<br>
But what if one restaurant was opened two years ago and have only 100 *Rating Signals* while another one is opened for 10 years and has 1000 *Rating Signals*? Can we assume that second restaurant is more popular than the first one only by the amount of *Rating Signals*? No. To solve this issue we will also use one more metric for each restaurant:
* Venue Creation Date


This is a date, when the restaurant was added to **Forsquare**. From this value we will calculate average amount of *Rating Signals* per year. This value will be our main metric for the venue popularity.
After that we will also normalize these values, so that we will have values from 0 to 1. Let's call this normalized value as **Venue Popularity Index**, or **VPI**.
The closer **VPI** is to 1, the higher is the popularity.

After calculating **VPI** for each McDonald's restaurant in Kiev, we can plot these values on the map of the city  and predict **VPI** for future restaurants, based on their location.



### Data collection

Lets start with collecting all neccessary data for our project.

Importing some python libraries and Foursquare API credentials:

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
from pandas import json_normalize

Foursquare credentials:

In [2]:
CLIENT_ID = 'EHGYCFMDLQCZJXQA2DP0BSXGLVMMTEIGS1NU0JNJ5O0I4QES' 
CLIENT_SECRET = 'ECRF4FO2EAABHXNR2GJIJHBSVUQRDSWL3BWPUE404HM3BG5V'

Now, we will pull a list of all McDonald's restaurants in Kiev by creating search request and creating dataframe of the result.
We will use 'near' parameter for the query, that requires name of the place, instead of coordinates for our city, so that we won't need to add radius parmeter around the coordinates. 

In [3]:
VERSION = '20200430'
LIMIT = 40

search_query='McDonalds' # 
Location = 'Kiev, Ukraine' 

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&near={}&v={}&query={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, Location, VERSION, search_query, LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eaf091f69babe001bddddf9'},
 'response': {'venues': [{'id': '4bcb33f7fb84c9b6b64d1e3e',
    'name': "McDonald's",
    'location': {'address': 'вул. Борщагівська, 2б',
     'lat': 50.44804208177356,
     'lng': 30.479175234841115,
     'labeledLatLngs': [{'label': 'display',
       'lat': 50.44804208177356,
       'lng': 30.479175234841115}],
     'postalCode': '03087',
     'cc': 'UA',
     'city': 'Київ',
     'state': 'м. Київ',
     'country': 'Україна',
     'formattedAddress': ['вул. Борщагівська, 2б', 'Київ, 03087', 'Україна']},
    'categories': [{'id': '4bf58dd8d48988d16e941735',
      'name': 'Fast Food Restaurant',
      'pluralName': 'Fast Food Restaurants',
      'shortName': 'Fast Food',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fastfood_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1588529540',
    'hasPerk': False},
   {'id': '4c00b39434ccc9284a10e2cd',
    'name': "McDonald's",

In [37]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
McD_df = json_normalize(venues)
McD_df

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,location.neighborhood,venuePage.id
0,4bcb33f7fb84c9b6b64d1e3e,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"вул. Борщагівська, 2б",50.448042,30.479175,"[{'label': 'display', 'lat': 50.44804208177356...",3087.0,UA,Київ,м. Київ,Україна,"[вул. Борщагівська, 2б, Київ, 03087, Україна]",,,
1,4c00b39434ccc9284a10e2cd,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"вул. Софіївська, 1/2",50.451128,30.521917,"[{'label': 'display', 'lat': 50.45112810903885...",1001.0,UA,Київ,м. Київ,Україна,"[вул. Софіївська, 1/2 (Майдан Незалежності), К...",Майдан Незалежності,,
2,4bd200aa77b29c748fc38d82,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"вул. Хрещатик, 19а",50.44752,30.522896,"[{'label': 'display', 'lat': 50.4475202043031,...",1001.0,UA,Київ,м. Київ,Україна,"[вул. Хрещатик, 19а, Київ, 01001, Україна]",,Липки,
3,568d19d0498e545e812fa206,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"Боричів узвіз, 10",50.459679,30.525817,"[{'label': 'display', 'lat': 50.45967850686011...",4070.0,UA,Київ,м. Київ,Україна,"[Боричів узвіз, 10 (Поштова площа), Київ, 0407...",Поштова площа,"Podil, Kyiv",
4,4ed3b0d2e5faa5ec069df659,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"Майдан Незалежності, 1",50.450967,30.522714,"[{'label': 'display', 'lat': 50.45096729018484...",1001.0,UA,Київ,м. Київ,Україна,"[Майдан Незалежності, 1 (ТРЦ «Глобус», фудкорт...","ТРЦ «Глобус», фудкорт",,
5,4c111d9681e976b0623e10eb,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"пл. Московська, 1/3",50.406227,30.518996,"[{'label': 'display', 'lat': 50.40622742762063...",2000.0,UA,Київ,м. Київ,Україна,"[пл. Московська, 1/3, Київ, 02000, Україна]",,,
6,4bc6088842419521dc76031d,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"вул. Богдана Хмельницького, 40/25",50.446909,30.509092,"[{'label': 'display', 'lat': 50.44690870171596...",,UA,Київ,м. Київ,Україна,"[вул. Богдана Хмельницького, 40/25 (вул. Івана...",вул. Івана Франка,,
7,4c39d1edae2da5938f1103c6,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"вул. Мельникова, 3",50.462544,30.481603,"[{'label': 'display', 'lat': 50.46254352624871...",4119.0,UA,Київ,м. Київ,Україна,"[вул. Мельникова, 3, Київ, 04119, Україна]",,Лукьяновка,
8,4c0a64c932daef3bf7a14b50,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"просп. Степана Бандери, 12А",50.488507,30.497852,"[{'label': 'display', 'lat': 50.48850712917407...",4073.0,UA,Київ,м. Київ,Україна,"[просп. Степана Бандери, 12А (Оболонський прос...",Оболонський просп.,Оболонь,
9,4c1686aadaf42d7f4b4e4466,McDonald's,"[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",v-1588529540,False,"вул. Вишгородська, 33а",50.506461,30.450408,"[{'label': 'display', 'lat': 50.50646077074081...",,UA,Київ,м. Київ,Україна,"[вул. Вишгородська, 33а, Київ, Україна]",,,


As we can see, items with id = 34 and higher are not McDonald's restaurants and are irrelevant for our task. Let's drop those items. We will also drop rows 24, 28 and 33, as those are not in Kiev itself, but in a satellite town, thus are also irrelevant (see that column 'location.city' and 'location.state' differ from other for those venues.

In [38]:
McD_df.drop(McD_df.index[[24,28,33,34,35,36,37,38,39]], axis=0, inplace=True)
McD_df = McD_df.reset_index(drop=True)

Let's also drop now all columns that we don't need. From this dataframe we will only need unique id, address and coordinates

In [39]:
McD_df = McD_df[['id', 'location.address', 'location.lat', 'location.lng']]

In [40]:
McD_df

Unnamed: 0,id,location.address,location.lat,location.lng
0,4bcb33f7fb84c9b6b64d1e3e,"вул. Борщагівська, 2б",50.448042,30.479175
1,4c00b39434ccc9284a10e2cd,"вул. Софіївська, 1/2",50.451128,30.521917
2,4bd200aa77b29c748fc38d82,"вул. Хрещатик, 19а",50.44752,30.522896
3,568d19d0498e545e812fa206,"Боричів узвіз, 10",50.459679,30.525817
4,4ed3b0d2e5faa5ec069df659,"Майдан Незалежності, 1",50.450967,30.522714
5,4c111d9681e976b0623e10eb,"пл. Московська, 1/3",50.406227,30.518996
6,4bc6088842419521dc76031d,"вул. Богдана Хмельницького, 40/25",50.446909,30.509092
7,4c39d1edae2da5938f1103c6,"вул. Мельникова, 3",50.462544,30.481603
8,4c0a64c932daef3bf7a14b50,"просп. Степана Бандери, 12А",50.488507,30.497852
9,4c1686aadaf42d7f4b4e4466,"вул. Вишгородська, 33а",50.506461,30.450408


Now we can visualize our data: Kiev city with all McDonald's restaurants marked on it.

In [41]:
import folium
Kiev_lat = '50.45466'
Kiev_lng = '30.5238'

Kiev = folium.Map(location=[Kiev_lat, Kiev_lng], zoom_start=11)
for lat, lng, address in zip(McD_df['location.lat'], McD_df['location.lng'], McD_df['location.address']):
    label = str(address).encode('ascii', 'xmlcharrefreplace') # we will have to encode address as otherwise cyrillic symbols are rendered incorrectly
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup = label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Kiev) 
Kiev

Now we can collect information for each venue: 'Rating Signals' and creation date.

First, we will create new copy of McD_df dataframe with two more values: '*ratingSignals*','*createdAt*'

In [42]:
#add to McD_df two more colums for ratingSignals and createdAt
McD_full_df = McD_df
McD_full_df['ratingSignals'] = ''
McD_full_df['createdAt'] = ''

In [43]:
McD_full_df

Unnamed: 0,id,location.address,location.lat,location.lng,ratingSignals,createdAt
0,4bcb33f7fb84c9b6b64d1e3e,"вул. Борщагівська, 2б",50.448042,30.479175,,
1,4c00b39434ccc9284a10e2cd,"вул. Софіївська, 1/2",50.451128,30.521917,,
2,4bd200aa77b29c748fc38d82,"вул. Хрещатик, 19а",50.44752,30.522896,,
3,568d19d0498e545e812fa206,"Боричів узвіз, 10",50.459679,30.525817,,
4,4ed3b0d2e5faa5ec069df659,"Майдан Незалежності, 1",50.450967,30.522714,,
5,4c111d9681e976b0623e10eb,"пл. Московська, 1/3",50.406227,30.518996,,
6,4bc6088842419521dc76031d,"вул. Богдана Хмельницького, 40/25",50.446909,30.509092,,
7,4c39d1edae2da5938f1103c6,"вул. Мельникова, 3",50.462544,30.481603,,
8,4c0a64c932daef3bf7a14b50,"просп. Степана Бандери, 12А",50.488507,30.497852,,
9,4c1686aadaf42d7f4b4e4466,"вул. Вишгородська, 33а",50.506461,30.450408,,


Now we can create a loop for each venue id. 

In [44]:
i = 0
for venue in McD_df['id']:
    url3 = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue, CLIENT_ID, CLIENT_SECRET,VERSION)
    venue_info = requests.get(url3).json()
    McD_full_df.at[i, 'ratingSignals'] = venue_info['response']['venue']['ratingSignals']
    McD_full_df.at[i, 'createdAt'] = venue_info['response']['venue']['createdAt']
    i+=1


KeyError: 'venue'

In [45]:
McD_full_df

Unnamed: 0,id,location.address,location.lat,location.lng,ratingSignals,createdAt
0,4bcb33f7fb84c9b6b64d1e3e,"вул. Борщагівська, 2б",50.448042,30.479175,2190.0,1271608311.0
1,4c00b39434ccc9284a10e2cd,"вул. Софіївська, 1/2",50.451128,30.521917,1131.0,1275114388.0
2,4bd200aa77b29c748fc38d82,"вул. Хрещатик, 19а",50.44752,30.522896,3406.0,1272053930.0
3,568d19d0498e545e812fa206,"Боричів узвіз, 10",50.459679,30.525817,829.0,1452087760.0
4,4ed3b0d2e5faa5ec069df659,"Майдан Незалежності, 1",50.450967,30.522714,308.0,1322496210.0
5,4c111d9681e976b0623e10eb,"пл. Московська, 1/3",50.406227,30.518996,1452.0,1276190102.0
6,4bc6088842419521dc76031d,"вул. Богдана Хмельницького, 40/25",50.446909,30.509092,1294.0,1271269512.0
7,4c39d1edae2da5938f1103c6,"вул. Мельникова, 3",50.462544,30.481603,2640.0,1278857709.0
8,4c0a64c932daef3bf7a14b50,"просп. Степана Бандери, 12А",50.488507,30.497852,2027.0,1275749577.0
9,4c1686aadaf42d7f4b4e4466,"вул. Вишгородська, 33а",50.506461,30.450408,1142.0,1276544682.0


'*createdAt*' value is saved in epoch time format, so we will need to convert it to datetime format.

In [47]:
McD_full_df['createdAt'] = pd.to_datetime(McD_full_df['createdAt'], unit='s')

In [48]:
McD_full_df

Unnamed: 0,id,location.address,location.lat,location.lng,ratingSignals,createdAt
0,4bcb33f7fb84c9b6b64d1e3e,"вул. Борщагівська, 2б",50.448042,30.479175,2190.0,2010-04-18 16:31:51
1,4c00b39434ccc9284a10e2cd,"вул. Софіївська, 1/2",50.451128,30.521917,1131.0,2010-05-29 06:26:28
2,4bd200aa77b29c748fc38d82,"вул. Хрещатик, 19а",50.44752,30.522896,3406.0,2010-04-23 20:18:50
3,568d19d0498e545e812fa206,"Боричів узвіз, 10",50.459679,30.525817,829.0,2016-01-06 13:42:40
4,4ed3b0d2e5faa5ec069df659,"Майдан Незалежності, 1",50.450967,30.522714,308.0,2011-11-28 16:03:30
5,4c111d9681e976b0623e10eb,"пл. Московська, 1/3",50.406227,30.518996,1452.0,2010-06-10 17:15:02
6,4bc6088842419521dc76031d,"вул. Богдана Хмельницького, 40/25",50.446909,30.509092,1294.0,2010-04-14 18:25:12
7,4c39d1edae2da5938f1103c6,"вул. Мельникова, 3",50.462544,30.481603,2640.0,2010-07-11 14:15:09
8,4c0a64c932daef3bf7a14b50,"просп. Степана Бандери, 12А",50.488507,30.497852,2027.0,2010-06-05 14:52:57
9,4c1686aadaf42d7f4b4e4466,"вул. Вишгородська, 33а",50.506461,30.450408,1142.0,2010-06-14 19:44:42
