<center>
<h2>Capstone Project</h2>

# <span style='color: #81A5FF; '> 👨‍🍳 Exploratory Restaurant Data Analysis and Cleaning </span>

<h3> Group 4 </h3>

<span style='color: #81A5FF; font-size: 18px;'>2023/2024</span>
</center>


---------

## Table of Contents

The present notebook refers to the section of exploring the extracted restaurant-based data and perform cleaning operations in order to prepare it to be fed into the Flavour Flix platform. Essentially, the key procedures followed throughout this section are data integration from multiple sources, handling of missing values and duplicate data, reformatting the structure of the data, geocoding, standardizing text and extracting the most relevant features

<br>

<span style = 'font-size: 16px;'>

1. [🍽️ Importing Libraries and Data](#data-import)
2. [🍽️ Summary Statistics](#sum-stats)
3. [🍽️ Extracting Locations](#location)
4. [🍽️ Handling Missing Values and Standardizing Data](#missing-val)
5. [🍽️ Section-Based Analysis and Treatment](#sections)
6. [🍽️ Final Feature Extraction](#feat-ex)
7. [🍽️ Integrating Menu Data](#menu)
</span>

--------

## 1. 🍽️ Importing Libraries and  Data <a class="anchor" id="data-import"></a>
[Back to TOC](#toc)

In [1]:
#Importing Packages
import numpy as np
import pandas as pd

from functions.utils import *
from functions.location import *
from functions.menus import *
from functions.preprocessement import *

#ignore warnings
import warnings
warnings.filterwarnings("ignore")

%load_ext autoreload
%autoreload 2

In [2]:
#Importing the data and make all columns readable
pd.set_option('display.max_columns', None) 
data = pd.read_csv('data/og_restaurant_data.csv')
data.drop('Unnamed: 0', axis=1, inplace=True)

In [3]:
#Find the restaurant URL
data['restaurantID'] = data['url'].str.split('-r').str[-1]
data.set_index('restaurantID', inplace=True)

In [4]:
#Remove duplicated restaurants
data = data[~data.index.duplicated(keep='first')]

## 2. 🍽️ Summary Statistics <a class="anchor" id="sum-stats"></a>
[Back to TOC](#toc)

In [5]:
data.describe()

Unnamed: 0,averagePrice,latitude,longitude,maxPartySize,phone,radius,ratingValue,reviewCount,reviewList/0/ambienceRatingValue,reviewList/0/foodRatingValue,reviewList/0/ratingValue,reviewList/0/serviceRatingValue,reviewList/1/ambienceRatingValue,reviewList/1/foodRatingValue,reviewList/1/ratingValue,reviewList/1/serviceRatingValue,reviewList/2/ambienceRatingValue,reviewList/2/foodRatingValue,reviewList/2/ratingValue,reviewList/2/serviceRatingValue,reviewList/3/ambienceRatingValue,reviewList/3/foodRatingValue,reviewList/3/ratingValue,reviewList/3/serviceRatingValue,reviewList/4/ambienceRatingValue,reviewList/4/foodRatingValue,reviewList/4/ratingValue,reviewList/4/serviceRatingValue,reviewList/5/ambienceRatingValue,reviewList/5/foodRatingValue,reviewList/5/ratingValue,reviewList/5/serviceRatingValue,reviewList/6/ambienceRatingValue,reviewList/6/foodRatingValue,reviewList/6/ratingValue,reviewList/6/serviceRatingValue,reviewList/7/ambienceRatingValue,reviewList/7/foodRatingValue,reviewList/7/ratingValue,reviewList/7/serviceRatingValue,reviewList/8/ambienceRatingValue,reviewList/8/foodRatingValue,reviewList/8/ratingValue,reviewList/8/serviceRatingValue,reviewList/9/ambienceRatingValue,reviewList/9/foodRatingValue,reviewList/9/ratingValue,reviewList/9/serviceRatingValue,reviewList/10/ambienceRatingValue,reviewList/10/foodRatingValue,reviewList/10/ratingValue,reviewList/10/serviceRatingValue,reviewList/11/ambienceRatingValue,reviewList/11/foodRatingValue,reviewList/11/ratingValue,reviewList/11/serviceRatingValue,reviewList/12/ambienceRatingValue,reviewList/12/foodRatingValue,reviewList/12/ratingValue,reviewList/12/serviceRatingValue,reviewList/13/ambienceRatingValue,reviewList/13/foodRatingValue,reviewList/13/ratingValue,reviewList/13/serviceRatingValue,reviewList/14/ambienceRatingValue,reviewList/14/foodRatingValue,reviewList/14/ratingValue,reviewList/14/serviceRatingValue,reviewList/15/ambienceRatingValue,reviewList/15/foodRatingValue,reviewList/15/ratingValue,reviewList/15/serviceRatingValue,reviewList/16/ambienceRatingValue,reviewList/16/foodRatingValue,reviewList/16/ratingValue,reviewList/16/serviceRatingValue,reviewList/17/ambienceRatingValue,reviewList/17/foodRatingValue,reviewList/17/ratingValue,reviewList/17/serviceRatingValue,reviewList/18/ambienceRatingValue,reviewList/18/foodRatingValue,reviewList/18/ratingValue,reviewList/18/serviceRatingValue,reviewList/19/ambienceRatingValue,reviewList/19/foodRatingValue,reviewList/19/ratingValue,reviewList/19/serviceRatingValue
count,1632.0,1632.0,1632.0,1410.0,974.0,1632.0,1592.0,1621.0,1583.0,1583.0,1583.0,1583.0,1520.0,1520.0,1520.0,1520.0,1473.0,1473.0,1473.0,1473.0,1422.0,1422.0,1422.0,1422.0,1391.0,1391.0,1391.0,1391.0,1358.0,1358.0,1358.0,1358.0,1330.0,1330.0,1330.0,1330.0,1298.0,1298.0,1298.0,1298.0,1256.0,1256.0,1256.0,1256.0,1224.0,1224.0,1224.0,1224.0,1191.0,1191.0,1191.0,1191.0,1150.0,1150.0,1150.0,1150.0,1125.0,1125.0,1125.0,1125.0,1100.0,1100.0,1100.0,1100.0,1072.0,1072.0,1072.0,1072.0,1052.0,1052.0,1052.0,1052.0,1034.0,1034.0,1034.0,1034.0,1004.0,1004.0,1004.0,1004.0,981.0,981.0,981.0,981.0,960.0,960.0,960.0,960.0
mean,26.08701,40.169333,-8.305869,51.44539,339526500000.0,72452320000000.0,8.98304,560.998766,9.080227,9.163613,9.147505,9.182565,9.035526,9.185526,9.150658,9.196053,9.093007,9.170401,9.168364,9.239647,9.059072,9.194093,9.159634,9.19128,9.079799,9.181884,9.167505,9.226456,9.185567,9.300442,9.272091,9.301915,9.094737,9.186466,9.173684,9.227068,9.149461,9.212635,9.206086,9.249615,9.136943,9.248408,9.222532,9.256369,9.173203,9.292484,9.262663,9.292484,9.108312,9.232578,9.201511,9.232578,9.222609,9.304348,9.284783,9.307826,9.189333,9.292444,9.262667,9.276444,9.16,9.247273,9.235909,9.289091,9.212687,9.356343,9.322295,9.363806,9.222433,9.285171,9.290875,9.370722,9.160542,9.232108,9.230174,9.295938,9.219124,9.338645,9.305777,9.326693,9.178389,9.288481,9.285423,9.38634,9.139583,9.314583,9.282292,9.360417
std,22.778323,3.825969,1.516901,26.795764,65558130000.0,44465490000000.0,0.553375,888.559617,1.697178,1.76004,1.568759,1.715962,1.724639,1.71038,1.534655,1.751812,1.617722,1.72699,1.513693,1.691652,1.712856,1.687898,1.52628,1.734278,1.605329,1.602378,1.434285,1.639369,1.531597,1.551717,1.394196,1.583401,1.639905,1.663071,1.488071,1.64891,1.629765,1.628418,1.480566,1.647672,1.574813,1.599426,1.400499,1.60414,1.481221,1.486734,1.293042,1.54075,1.604884,1.685972,1.483137,1.655796,1.510339,1.491338,1.334415,1.581285,1.477191,1.501122,1.335637,1.618132,1.530949,1.656801,1.418239,1.528637,1.453772,1.46413,1.281702,1.467391,1.422916,1.490359,1.292956,1.497037,1.583846,1.649399,1.42859,1.600896,1.468782,1.445887,1.290593,1.552294,1.492082,1.555556,1.310313,1.432344,1.521927,1.48155,1.273373,1.4482
min,9.0,37.019355,-9.381659,4.0,33140140000.0,5000.0,4.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0
25%,15.0,38.676524,-9.165105,40.0,351218400000.0,5000000000000.0,8.8,87.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,8.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,8.0,9.0,8.0,8.0,10.0,9.0,10.0,8.0,8.0,9.0,10.0,8.0,8.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0,8.0,10.0,9.0,10.0
50%,20.0,38.696861,-8.629105,60.0,351280000000.0,100000000000000.0,9.1,264.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
75%,27.0,41.157944,-8.24788,60.0,351920200000.0,100000000000000.0,9.3,653.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
max,350.0,53.400002,-2.983333,370.0,447880700000.0,100000000000000.0,10.0,11476.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0


In [6]:
data.describe(include='object')

Unnamed: 0,address,chefName,cuisine,currency,customerPhotos/0,customerPhotos/1,customerPhotos/2,customerPhotos/3,customerPhotos/4,customerPhotos/5,customerPhotos/6,customerPhotos/7,customerPhotos/8,customerPhotos/9,customerPhotos/10,customerPhotos/11,customerPhotos/12,customerPhotos/13,customerPhotos/14,customerPhotos/15,customerPhotos/16,customerPhotos/17,customerPhotos/18,customerPhotos/19,description,location,name,offer,openingHours,paymentAccepted/0,paymentAccepted/1,paymentAccepted/2,paymentAccepted/3,paymentAccepted/4,paymentAccepted/5,paymentAccepted/6,paymentAccepted/7,paymentAccepted/8,photo,photos/0,photos/1,photos/2,photos/3,photos/4,photos/5,photos/6,photos/7,photos/8,photos/9,photos/10,photos/11,photos/12,photos/13,photos/14,photos/15,photos/16,photos/17,photos/18,photos/19,photos/20,photos/21,photos/22,photos/23,photos/24,photos/25,photos/26,photos/27,photos/28,photos/29,photos/30,photos/31,photos/32,photos/33,photos/34,photos/35,photos/36,photos/37,photos/38,photos/39,photos/40,photos/41,photos/42,photos/43,photos/44,photos/45,photos/46,photos/47,photos/48,photos/49,photos/50,photos/51,photos/52,photos/53,photos/54,photos/55,photos/56,photos/57,photos/58,photos/59,photos/60,photos/61,photos/62,photos/63,photos/64,photos/65,photos/66,photos/67,photos/68,photos/69,photos/70,photos/71,photos/72,photos/73,photos/74,photos/75,photos/76,photos/77,photos/78,photos/79,photos/80,photos/81,photos/82,photos/83,photos/84,photos/85,photos/86,photos/87,photos/88,photos/89,photos/90,reviewList/0/date,reviewList/0/review,reviewList/0/reviewerName,reviewList/1/date,reviewList/1/review,reviewList/1/reviewerName,reviewList/2/date,reviewList/2/review,reviewList/2/reviewerName,reviewList/3/date,reviewList/3/review,reviewList/3/reviewerName,reviewList/4/date,reviewList/4/review,reviewList/4/reviewerName,reviewList/5/date,reviewList/5/review,reviewList/5/reviewerName,reviewList/6/date,reviewList/6/review,reviewList/6/reviewerName,reviewList/7/date,reviewList/7/review,reviewList/7/reviewerName,reviewList/8/date,reviewList/8/review,reviewList/8/reviewerName,reviewList/9/date,reviewList/9/review,reviewList/9/reviewerName,reviewList/10/date,reviewList/10/review,reviewList/10/reviewerName,reviewList/11/date,reviewList/11/review,reviewList/11/reviewerName,reviewList/12/date,reviewList/12/review,reviewList/12/reviewerName,reviewList/13/date,reviewList/13/review,reviewList/13/reviewerName,reviewList/14/date,reviewList/14/review,reviewList/14/reviewerName,reviewList/15/date,reviewList/15/review,reviewList/15/reviewerName,reviewList/16/date,reviewList/16/review,reviewList/16/reviewerName,reviewList/17/date,reviewList/17/review,reviewList/17/reviewerName,reviewList/18/date,reviewList/18/review,reviewList/18/reviewerName,reviewList/19/date,reviewList/19/review,reviewList/19/reviewerName,style,tags/0,tags/1,tags/2,tags/3,tags/4,tags/5,tags/6,tags/7,url,paymentAccepted/9,paymentAccepted/10,photos/91,photos/92,photos/93,photos/94,photos/95,photos/96,photos/97,photos/98,photos/99,photos/100,photos/101,photos/102,photos/103,photos/104,photos/105
count,1632,658,1630,1632,1540,1493,1452,1416,1376,1346,1311,1282,1242,1215,1184,1155,1131,1110,1082,1066,1044,1019,996,973,25,1522,1632,524,1605,1582,1394,1189,713,396,224,88,13,7,1632,1632,1631,1622,1584,1522,1448,1350,1265,1181,1081,953,895,815,753,684,621,594,549,500,457,419,390,359,321,296,269,248,226,200,182,156,148,138,131,122,113,99,92,84,81,70,64,61,55,51,45,44,44,44,41,40,38,38,37,35,32,31,29,27,26,25,25,24,23,20,20,20,19,19,16,14,13,11,10,9,8,8,7,6,5,5,5,5,5,5,5,5,5,4,4,3,1583,1583,1583,1520,1520,1520,1473,1473,1473,1422,1421,1422,1391,1391,1391,1358,1358,1358,1330,1330,1330,1298,1298,1298,1256,1256,1256,1224,1223,1224,1191,1191,1191,1150,1150,1150,1125,1125,1125,1100,1100,1100,1072,1072,1072,1052,1052,1052,1034,1034,1034,1004,1004,1004,981,981,981,960,960,960,1571,1632,450,162,74,45,23,7,2,1632,2,1,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1
unique,1618,611,56,5,1540,1493,1452,1416,1376,1346,1311,1282,1242,1215,1184,1155,1131,1110,1082,1066,1044,1019,996,973,25,39,1629,126,1146,15,18,21,14,11,9,8,5,3,1632,1632,1631,1622,1584,1522,1448,1350,1265,1181,1081,953,895,815,753,684,621,594,549,500,457,419,390,359,321,296,269,248,226,200,182,156,148,138,131,122,113,99,92,84,81,70,64,61,55,51,45,44,44,44,41,40,38,38,37,35,32,31,29,27,26,25,25,24,23,20,20,20,19,19,16,14,13,11,10,9,8,8,7,6,5,5,5,5,5,5,5,5,5,4,4,3,862,1569,1402,1002,1501,1342,1058,1446,1332,1085,1388,1267,1099,1372,1231,1124,1337,1211,1108,1304,1166,1115,1283,1163,1096,1241,1113,1083,1199,1107,1042,1177,1081,1031,1140,1041,996,1109,1019,980,1079,1012,976,1059,960,973,1035,963,945,1020,947,932,990,908,908,968,892,881,944,882,46,46,38,22,20,18,9,2,1,1632,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1
top,"Marina,8500-843,Portimão",Rui Sequeira,Portuguese,EUR,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,Le secret du restaurant Edouard Loubet n'est p...,"Almada, Portugal",Raízes,30% off the 'a la carte' menu,Monday -\r\nTuesday -\r\nWednesday -\r\nThursd...,Credit Card,Mastercard,Visa,Visa,Visa,Visa,Visa Electron,Visa Electron,Visa Electron,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,2023-09-27T18:30:00.000Z,.,Ana S.,2023-09-20T18:30:00.000Z,.,Ana S.,2023-09-02T19:00:00.000Z,.,Ana C.,2023-09-09T18:30:00.000Z,.,Ana F.,2023-09-08T19:30:00.000Z,.,Nuno M.,2023-09-02T12:00:00.000Z,.,- -,2023-08-15T18:30:00.000Z,.,Ricardo S.,2023-07-29T19:30:00.000Z,.,- -,2023-08-29T19:30:00.000Z,.,- -,2023-08-26T19:00:00.000Z,.,- -,2023-07-07T12:00:00.000Z,.,- -,2022-12-31T20:00:00.000Z,.,- -,2023-08-19T19:00:00.000Z,Muito bom,- -,2023-06-24T19:00:00.000Z,.,- -,2023-08-10T18:00:00.000Z,.,- -,2023-02-11T20:00:00.000Z,Muito bom,- -,2023-07-23T19:30:00.000Z,.,- -,2023-07-22T19:30:00.000Z,.,- -,2023-06-03T18:00:00.000Z,Muito bom,- -,2023-03-30T19:30:00.000Z,.,- -,After work,Portuguese,Portuguese,Accepting my yums,Gift cards,Gift cards,Accepting my yums,Welcome,Welcome,https://www.thefork.com/restaurant/invictus-r7...,Visa Electron,Voucher,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...,https://res.cloudinary.com/tf-lab/image/upload...
freq,3,4,499,1602,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,456,2,261,59,903,568,527,248,150,96,59,5,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,23,6,6,14,7,7,9,13,6,7,18,7,7,5,4,6,8,6,5,6,5,5,7,9,4,7,9,6,11,6,3,7,6,4,6,8,5,5,8,4,8,7,3,8,11,4,6,9,3,9,5,3,7,15,4,6,9,3,5,15,534,397,83,34,24,10,9,4,2,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


## 3. 🍽️ Extracting Locations <a class="anchor" id="location"></a>
[Back to TOC](#toc)

In [7]:
#Separate the data referring to locations in a new dataframe
locations = data[['name','address', 'location', 'longitude', 'latitude']].copy()

In [8]:
#Finds last occurence of the address to determine the city
locations['location2'] = locations['address'].map(lambda x: str(x).split(',')[-1])

#Finds the first occurence within "location" to find city -> this was how we eextracted the data
locations['location3'] = locations['location'].map(lambda x: str(x).split(',')[0])

In [9]:
#Finding indexes from observations not based in Portugal and removing them from the original data and creating a new dataframe with only portuguese location info
foreign_indexes = list((locations[(locations['location'].isna())]).index)
data.drop(foreign_indexes, inplace=True, axis=0)
pt_locations = locations[locations['location'].isna() == False].copy()

In [10]:
#Pre processing address to further find latitude and longitude coordinates
pt_locations['address'] = pt_locations['address'].apply(preprocess_address)

In [11]:
# #Find latitude and longitude coordinates from address
# pt_locations[['latitude', 'longitude']] = pt_locations['address'].apply(lambda x: pd.Series(find_coordinates(x)))

In [12]:
#Saving the data to be used aftherwards if necessary
# pt_locations.to_csv('data/portuguese_locations.csv')

In [13]:
#SAFE CELL Step intermédio enquanto o notebook nao estiver finalizado
pt_locations2 = pd.read_csv('data/portuguese_locations.csv', index_col=0)
pt_locations2.rename(columns={'latitude': 'latitude2', 'longitude': 'longitude2'}, inplace=True)
pt_locations_temp = pt_locations.merge(right=pt_locations2, left_on = ['name', 'address', 'location', 'location2'], right_on = ['name', 'address','location', 'location2'], how='inner')
pt_locations_temp.drop_duplicates(keep='first', inplace=True)
pt_locations_temp.index = pt_locations.index

In [14]:
data[['address', 'latitude', 'longitude', 'location', 'city']] = pt_locations_temp[['address', 'latitude2', 'longitude2', 'location2', 'location3_y']].copy()

## 4. 🍽️ Handling Missing Values and Standardizing Data <a class="anchor" id="missing-val"></a>
[Back to TOC](#toc)

In [15]:
#Find the columns with nans
null_columns = data.isnull().any()
null_columns = null_columns[null_columns].index

#### Chef Name

In [16]:
#Dtandardize the Chef Names
data['chefName'] = np.where(data['chefName'].isnull(), 'Not Applicable', data['chefName'])

In [17]:
data['chefes'] = data['chefName'].apply(get_chef_name)
data['chefName1'] = data['chefes'].apply(lambda x: preprocess_chefs(0, x))
data['chefName2'] = data['chefes'].apply(lambda x: preprocess_chefs(1, x))
data['chefName3'] = data['chefes'].apply(lambda x: preprocess_chefs(2, x))

#### Cuisine

In [18]:
data['cuisine'].unique()

array(['International food', 'Japanese', 'Indian', 'Portuguese',
       'Italian', 'Pizzeria', 'International', 'Mediterranean', 'Fusion',
       'Nepalese', 'European', 'Seafood', 'Vegan cuisine',
       'Traditional cuisine', 'Steakhouse', 'Regional', 'Greek',
       'Vegetarian', 'Varied', 'Grilled', 'Thai', 'Mexican', 'Local',
       'Asian', 'French', 'South American', 'Pub grub', 'Brazilian',
       'Venezuelan', 'Peruvian', 'Meat Cuisine', 'Korean', 'American',
       'Toscano', 'Spanish', 'African', 'Syrian', 'Iranian', 'Lebanese',
       'Cantonese', 'Tibetan', 'Vietnamese', 'Argentinian', 'Pugliese',
       'Chinese'], dtype=object)

In [19]:
#Standardize the cuisine types
data['cuisine'] = data['cuisine'].replace({'International food': 'International', 
                                           'Vegan cuisine': 'Vegan',
                                           'Meat Cuisine': 'Meat',
                                           'Toscano': 'Italian', 
                                           'Pugliese': 'Italian', 
                                           'Traditional cuisine': 'Traditional', 
                                           'Regional': 'Portuguese', 
                                           'Local': 'Portuguese', 
                                           'Cantonese': 'Chinese', 'Pub grup': 'International'})

#### Phone Number

In [20]:
#Extract and format the phone number
data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].apply(lambda x: x[:-2] if x != 'nan' else 'Not Available')
data['phone'] = data['phone'].apply(lambda x: x[3:12] if len(x) == 12 and x.startswith('351') and x != 'Not Available' else 'Not Available')

#### Dealing with Schedule Format

In [21]:
data['openingHours'].fillna('Not Available', inplace=True)

In [22]:
#Clearing the openingHours column
data['schedule'] = data['openingHours'].apply(lambda x: clean_openinghours(x))

#### Generating Promotions

In [23]:
#Generate promotions based on the schedule of the restaurant
data['promotions'] = data['schedule'].apply(lambda x: promotion_generator(x, 0.5))

#### Standardizing Style

In [24]:
data['style'].fillna('Not Available', inplace=True)

In [25]:
data['style'].unique()

array(['Bachelor party', 'After work', 'Birthday', 'Good for ceremonies',
       'All you can eat buffet', 'Good for families',
       'Contemporary cuisine', 'Author', 'Good for groups',
       'Central location', 'Bistro', 'Not Available', 'Fine Dining',
       'Cosy', 'Live music', 'Creative', 'Brunch', 'Lunch', 'Traditional',
       'Bistronomic', 'Good for a business lunch', 'Terrace',
       'Homemade cuisine', 'Brasserie', 'Garden', 'Oceanfront',
       'Great view', 'Café', 'Trendy', 'Romantic', 'Restaurant hotel',
       'Communion', 'Breakfast', 'Winter terrace', 'Nightlife',
       'With friends', 'Street Food', 'Organic', 'Healthy.old', 'Ethnic',
       'From market', 'Design', 'Wine bar'], dtype=object)

In [26]:
#Re-define the restaurant style categories
data['style'] = data['style'].replace({'Bachelor party': 'Festivities', 
                                       'Good for ceremonies': 'Festivities',
                                       'Birthday': 'Festivities',
                                       'Cosy': 'Chill Out',
                                       'Live music': 'Chill Out',
                                       'Creative': 'Chill Out',
                                       'After work': 'Chill Out',
                                       'Terrace': 'Chill Out',
                                       'Garden': 'Chill Out', 
                                       'Trendy': 'Chill Out',
                                       'Winter terrace': 'Chill Out', 
                                       'Nightlife': 'Chill Out',
                                       'Wine bar': 'Chill Out',
                                       'All you can eat buffet': 'Buffet',
                                       'Good for families': 'Family',
                                       'Romantic': 'Family', 
                                       'Author': 'Fine Dining',
                                       'Bistro': 'Friends',
                                       'Bistronomic': 'Friends', 
                                       'Brasserie': 'Friends', 
                                       'With friends': 'Friends', 
                                       'Good for a business lunch': 'Meetings', 
                                       'Oceanfront': 'View',
                                       'Great view': 'View',
                                       'Café': 'Café',
                                       'Communion': 'Festivities', 
                                       'Good for groups': 'Groups',
                                       'Organic': 'Healthy',
                                       'From market': 'Healthy', 
                                       'Homemade cuisine': 'Homemade', 
                                       'Traditional': 'Homemade',
                                       'Design': 'Modern', 
                                       'Contemporary cuisine': 'Modern',
                                       'Restaurant hotel': 'Modern',
                                       'Healthy.old': 'Healthy', 
                                       'Ethnic': 'Ethnic', 
                                       'Street Food': 'Street Food',
                                       'Lunch': 'Casual',
                                       'Central location': 'Central Location', 
                                       })

#### Defining whether the restaurant has outdoor seating

In [27]:
#Randomly assign whether the restaurant has an outdoor area or not
data['outdoor_area'] = np.where(random.random() < 0.5, 0, 1)

#### Defining Random Occupation of the Restaurant

In [28]:
#Generate the current occupation of the restaurant
data['current_occupation'] = data.apply(lambda x: generate_current_occupation(x), axis=1)

## 5. 🍽️  Section-Based Analysis and Treatment <a class="anchor" id="sections"></a>
[Back to TOC](#toc)

### 5.1. Photos

In [29]:
#Extracting all columns reffering to photos
photos = ['photo', 'customerPhotos/0','customerPhotos/1', 'customerPhotos/2', 'customerPhotos/3', 'customerPhotos/4', 'customerPhotos/5', 'customerPhotos/6',
 'customerPhotos/7', 'customerPhotos/8', 'customerPhotos/9', 'customerPhotos/10', 'customerPhotos/11', 'customerPhotos/12', 'customerPhotos/13',
 'customerPhotos/14', 'customerPhotos/15', 'customerPhotos/16', 'customerPhotos/17', 'customerPhotos/18', 'customerPhotos/19', 'photos/1',
 'photos/2', 'photos/3', 'photos/4', 'photos/5', 'photos/6', 'photos/7','photos/8','photos/9','photos/10', 'photos/11', 'photos/12', 'photos/13', 'photos/14',
 'photos/15','photos/16', 'photos/17', 'photos/18', 'photos/19', 'photos/20', 'photos/21', 'photos/22', 'photos/23', 'photos/24', 'photos/25', 'photos/26', 'photos/27',
 'photos/28', 'photos/29', 'photos/30', 'photos/31', 'photos/32', 'photos/33', 'photos/34', 'photos/35', 'photos/36', 'photos/37', 'photos/38', 'photos/39', 'photos/40',
 'photos/41', 'photos/42', 'photos/43', 'photos/44', 'photos/45', 'photos/46', 'photos/47', 'photos/48', 'photos/49', 'photos/50', 'photos/51', 'photos/52','photos/53',
 'photos/54', 'photos/55', 'photos/56', 'photos/57', 'photos/58', 'photos/59', 'photos/60', 'photos/61', 'photos/62', 'photos/63', 'photos/64', 'photos/65', 'photos/66',
 'photos/67','photos/68','photos/69', 'photos/70', 'photos/71', 'photos/72', 'photos/73', 'photos/74', 'photos/75', 'photos/76', 'photos/77', 'photos/78', 'photos/79',
 'photos/80', 'photos/81', 'photos/82', 'photos/83', 'photos/84', 'photos/85', 'photos/86', 'photos/87', 'photos/88', 'photos/89', 'photos/90',  'photos/91', 'photos/92',
 'photos/93', 'photos/94', 'photos/95', 'photos/96', 'photos/97', 'photos/98', 'photos/99', 'photos/100', 'photos/101', 'photos/102', 'photos/103', 'photos/104', 'photos/105']
df_photos = data[photos].copy()

### 5.2. Reviews

In [30]:
#Extracting all columns referring to reviews
reviews = ['reviewList/0/ambienceRatingValue','reviewList/0/date', 'reviewList/0/foodRatingValue', 'reviewList/0/ratingValue', 'reviewList/0/review',
 'reviewList/0/reviewerName', 'reviewList/0/serviceRatingValue', 'reviewList/1/ambienceRatingValue', 'reviewList/1/date', 'reviewList/1/foodRatingValue', 'reviewList/1/ratingValue',
 'reviewList/1/review', 'reviewList/1/reviewerName', 'reviewList/1/serviceRatingValue', 'reviewList/2/ambienceRatingValue', 'reviewList/2/date', 'reviewList/2/foodRatingValue',
 'reviewList/2/ratingValue', 'reviewList/2/review', 'reviewList/2/reviewerName', 'reviewList/2/serviceRatingValue', 'reviewList/3/ambienceRatingValue','reviewList/3/date',
 'reviewList/3/foodRatingValue', 'reviewList/3/ratingValue', 'reviewList/3/review', 'reviewList/3/reviewerName', 'reviewList/3/serviceRatingValue', 'reviewList/4/ambienceRatingValue',
 'reviewList/4/date','reviewList/4/foodRatingValue', 'reviewList/4/ratingValue', 'reviewList/4/review', 'reviewList/4/reviewerName', 'reviewList/4/serviceRatingValue',
 'reviewList/5/ambienceRatingValue', 'reviewList/5/date', 'reviewList/5/foodRatingValue', 'reviewList/5/ratingValue', 'reviewList/5/review', 'reviewList/5/reviewerName',
 'reviewList/5/serviceRatingValue', 'reviewList/6/ambienceRatingValue', 'reviewList/6/date', 'reviewList/6/foodRatingValue', 'reviewList/6/ratingValue', 'reviewList/6/review',
 'reviewList/6/reviewerName', 'reviewList/6/serviceRatingValue', 'reviewList/7/ambienceRatingValue', 'reviewList/7/date', 'reviewList/7/foodRatingValue', 'reviewList/7/ratingValue',
 'reviewList/7/review', 'reviewList/7/reviewerName', 'reviewList/7/serviceRatingValue', 'reviewList/8/ambienceRatingValue', 'reviewList/8/date', 'reviewList/8/foodRatingValue',
 'reviewList/8/ratingValue', 'reviewList/8/review', 'reviewList/8/reviewerName', 'reviewList/8/serviceRatingValue', 'reviewList/9/ambienceRatingValue', 'reviewList/9/date',
 'reviewList/9/foodRatingValue', 'reviewList/9/ratingValue', 'reviewList/9/review', 'reviewList/9/reviewerName', 'reviewList/9/serviceRatingValue', 'reviewList/10/ambienceRatingValue',
 'reviewList/10/date', 'reviewList/10/foodRatingValue', 'reviewList/10/ratingValue', 'reviewList/10/review', 'reviewList/10/reviewerName', 'reviewList/10/serviceRatingValue',
 'reviewList/11/ambienceRatingValue', 'reviewList/11/date', 'reviewList/11/foodRatingValue', 'reviewList/11/ratingValue', 'reviewList/11/review', 'reviewList/11/reviewerName',
 'reviewList/11/serviceRatingValue', 'reviewList/12/ambienceRatingValue', 'reviewList/12/date', 'reviewList/12/foodRatingValue', 'reviewList/12/ratingValue', 'reviewList/12/review',
 'reviewList/12/reviewerName', 'reviewList/12/serviceRatingValue', 'reviewList/13/ambienceRatingValue', 'reviewList/13/date', 'reviewList/13/foodRatingValue', 'reviewList/13/ratingValue',
 'reviewList/13/review', 'reviewList/13/reviewerName', 'reviewList/13/serviceRatingValue', 'reviewList/14/ambienceRatingValue', 'reviewList/14/date',
 'reviewList/14/foodRatingValue', 'reviewList/14/ratingValue', 'reviewList/14/review', 'reviewList/14/reviewerName', 'reviewList/14/serviceRatingValue', 'reviewList/15/ambienceRatingValue',
 'reviewList/15/date', 'reviewList/15/foodRatingValue', 'reviewList/15/ratingValue', 'reviewList/15/review', 'reviewList/15/reviewerName', 'reviewList/15/serviceRatingValue',
 'reviewList/16/ambienceRatingValue', 'reviewList/16/date', 'reviewList/16/foodRatingValue', 'reviewList/16/ratingValue', 'reviewList/16/review', 'reviewList/16/reviewerName',
 'reviewList/16/serviceRatingValue', 'reviewList/17/ambienceRatingValue', 'reviewList/17/date', 'reviewList/17/foodRatingValue', 'reviewList/17/ratingValue', 'reviewList/17/review',
 'reviewList/17/reviewerName', 'reviewList/17/serviceRatingValue', 'reviewList/18/ambienceRatingValue', 'reviewList/18/date', 'reviewList/18/foodRatingValue', 'reviewList/18/ratingValue',
 'reviewList/18/review', 'reviewList/18/reviewerName', 'reviewList/18/serviceRatingValue', 'reviewList/19/ambienceRatingValue', 'reviewList/19/date', 'reviewList/19/foodRatingValue',
 'reviewList/19/ratingValue', 'reviewList/19/review', 'reviewList/19/reviewerName', 'reviewList/19/serviceRatingValue'].copy()
df_reviews = data[reviews].copy()

In [31]:
#Finding the average rating per perspective (ambience, food, service) based on customer reviews
ambience = []
food = []
service = []
for col in df_reviews.columns:
    if 'ambienceRatingValue' in col:
        ambience.append(col)
    elif 'foodRatingValue' in col:
        food.append(col)
    elif 'serviceRatingValue' in col:
        service.append(col)

In [32]:
#Creating three new columns regarding ratings per category using the mean values of the user-based ratings
df_reviews['ambienceRatingSummary'] = df_reviews[ambience].mean(axis=1)
df_reviews['foodRatingSummary'] = df_reviews[food].mean(axis=1)
df_reviews['serviceRatingSummary'] = df_reviews[service].mean(axis=1)

In [33]:
data[['ambienceRatingSummary', 'foodRatingSummary', 'serviceRatingSummary']] = df_reviews[['ambienceRatingSummary', 'foodRatingSummary', 'serviceRatingSummary']].copy()

### 5.3. Payment Methods

In [34]:
#Extracting all columns referring to payment types accepted
payments = ['paymentAccepted/0','paymentAccepted/1', 'paymentAccepted/2', 'paymentAccepted/3', 'paymentAccepted/4',
 'paymentAccepted/5', 'paymentAccepted/6', 'paymentAccepted/7', 'paymentAccepted/8', 'paymentAccepted/9', 'paymentAccepted/10']
df_payments = data[payments].copy()

In [35]:
df_payments.fillna(0, inplace=True)
df_payments['paymentAcceptedSummary'] = df_payments.apply(lambda row: [row[col] for col in df_payments.columns if row[col] != 0], axis=1)

In [36]:
data['paymentAcceptedSummary'] = df_payments['paymentAcceptedSummary'].copy()

In [37]:
data_exploded_pay = data['paymentAcceptedSummary'].explode()
payment_counts = data_exploded_pay.value_counts()

In [38]:
#Replace 'Cash Only' with 'Cash'
data['paymentAcceptedSummary'] = [x if 'Cash Only' not in x else [item.replace('Cash Only', 'Cash') for item in x] for x in data['paymentAcceptedSummary']]

#Replace 'Elo', 'Cabal Credit Card', 'Cabal', 'MobilePay', 'Rede Shop', 'Clave', 'EC Card' with 'MBWay'
data['paymentAcceptedSummary'] = [ [item if item not in ['Elo', 'Cabal Credit Card', 'Cabal Debit Card', 'MobilePay', 'Rede Shop',
                                                         'Clave Debit Card', 'Clave Credit Card', 'EC card'] else 'MBWay' for item in x]
                                  for x in data['paymentAcceptedSummary']]

# Eleminate duplicates from the list
data['paymentAcceptedSummary'] = [set(x) for x in data['paymentAcceptedSummary']]

### 5.4. Tags

In [39]:
#Extracting all columns referring to tags
df_tags = data[['tags/0','tags/1', 'tags/2', 'tags/3', 'tags/4', 'tags/5', 'tags/6', 'tags/7',]].copy()
df_tags

Unnamed: 0_level_0,tags/0,tags/1,tags/2,tags/3,tags/4,tags/5,tags/6,tags/7
restaurantID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
730060,YUMS x2,International food,,,,,,
805058,International food,,,,,,,
712669,Japanese,,,,,,,
576521,Indian,,,,,,,
802974,Portuguese,,,,,,,
...,...,...,...,...,...,...,...,...
501687,Portuguese,,,,,,,
724143,Italian,,,,,,,
418275,Portuguese,,,,,,,
612699,INSIDER,Steakhouse,,,,,,


In [40]:
#Finding restaurants with "Michelin" as a tag
df_tags['michelin'] = df_tags.apply(lambda row: 1 if 'MICHELIN' in row.values else 0, axis=1)

In [41]:
data['michelin'] = df_tags['michelin'].copy()

## 6. 🍽️  Final Feature Extraction <a class="anchor" id="feat-ex"></a>
[Back to TOC](#toc)

In [42]:
df_workable = data[['url', 'name',  'address', 'photo', 'averagePrice', 
                    'chefName1', 'chefName2', 'chefName3', 
                    'cuisine', 'michelin',
                    'description',  'isBookable', 'maxPartySize',
                    'schedule', 'promotions', 'phone', 'photo', 
                    'ratingValue', 'reviewCount', 'style',
                    'latitude', 'longitude', 'location', 'city', 
                    'ambienceRatingSummary', 'foodRatingSummary', 'serviceRatingSummary', 'paymentAcceptedSummary', 'outdoor_area','current_occupation']].copy()

## 7. 🍽️  Integrating Menu Data <a class="anchor" id="menu"></a>
[Back to TOC](#toc)

In [43]:
menu_df = pd.read_csv('data/menus_with_translations.csv')

In [44]:
df_workable = df_workable.merge(menu_df, left_on = 'url', right_on='input', how='left')

In [45]:
df_workable.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1522 entries, 0 to 1521
Data columns (total 34 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   url                     1522 non-null   object 
 1   name                    1522 non-null   object 
 2   address                 1522 non-null   object 
 3   photo                   1522 non-null   object 
 4   averagePrice            1522 non-null   int64  
 5   chefName1               1522 non-null   object 
 6   chefName2               1522 non-null   object 
 7   chefName3               1522 non-null   object 
 8   cuisine                 1522 non-null   object 
 9   michelin                1522 non-null   int64  
 10  description             11 non-null     object 
 11  isBookable              1522 non-null   bool   
 12  maxPartySize            1311 non-null   float64
 13  schedule                1522 non-null   object 
 14  promotions              1522 non-null   

In [46]:
df_workable.drop(['input'], axis=1, inplace=True)

In [47]:
df_workable.to_csv('data/preprocessed_restaurant_data.csv', index=False)

In [54]:
df_workable = pd.read_csv('data/preprocessed_restaurant_data.csv')

In [55]:
df_workable['promotions']

0                                               No Offers
1                                               No Offers
2                                               No Offers
3                                               No Offers
4       {'promotion_type': '10% off', 'day_of_week': '...
                              ...                        
1517                                            No Offers
1518                                            No Offers
1519    {'promotion_type': 'Happy Hour', 'day_of_week'...
1520                                            No Offers
1521                                            No Offers
Name: promotions, Length: 1522, dtype: object

In [59]:
import ast
promotion = df_workable.loc[4, 'promotions']
promotion = ast.literal_eval(promotion)

In [62]:
df_workable[df_workable['name'] == 'Invictus']['promotions']

0    No Offers
Name: promotions, dtype: object