# FoodBER: *For Berliners on the go!*

Berliners are a busy bunch who want quick meals and good vibes without breaking the bank. Ever the fans of good recommendations, they cannot be arsed to spend hours going over restaurants to decide which place offers the most unique culinary experience they so desire. 

Our program allows anyone to input criteria on what cuisine they would like to try, the money they are willing to spend given the rating they demand. The program will make recommendations based on user input, providing additional information to make the choice easier for the user.


### Let's begin!

# Module 1
### Ask for input

The user will be asked to provide their preferred cuisine style, price and rating.

In [2]:
# link to google colab to call dataset
# google.colab refers to the Jupyter notebook the runs on the cloud and was developed as product from Google Research
# google.colab allows the user to write,execute code in Python. We can upload and share notebooks (File.ipynb) with others using a google account 
# External datasets can be imported into the notebook environment

from google.colab import drive
drive.mount('/content/drive')

# Function "from" designates the location of the drive within "google.colab" in which the dataset is placed
# Function "import" specifies what functionality or library is to be imported and links it to the FoodBER.ipynb 
# File and drive path is specified using "drive.mount"

Mounted at /content/drive


In [55]:
# import modules
import pandas as pd
from google.colab import data_table
data_table.enable_dataframe_formatter()

# Pandas is the library being imported and it is referenced using "as" by the shortname pd which is the standard shortname for Pandas library
# "from" designates location of the extension in colab that exhibits panda dataframes into interactive displays. 
# This allows us to work with dataframes that can be dynamically filtered and sorted 
# This extension suits our purpose of exploring the dataset to provide resturant recommendations according to the dynamic input/s provided by the end user
# Data table display is enabled by running the code "data_table.enable_dataframe_formatter()" in conjuction to the import

In [56]:
# read in dataset
berlin = pd.read_csv('/content/drive/MyDrive/E1326 Python Programming/Project: HertiEats/TAdata/TA_berlin_clean.csv')
berlin.head()
# The dataset is saved/designated as the object "berlin" and read using the "pd.read_csv" function as the dataset is saved as a comma-separated values file. The filepath has also been specified in this line of code
# We used the ".head()" pandas function to return the first n rows of the called/specified object
# The ".head()" fuction helps us to test our object has the correct type of data. 
# Though the output here showcases only 5 rows as that is the default value of the function, our dataset will still consist of 50 rows which will be filtered and sorted as per the input provided by the end user

Unnamed: 0,Name,Cuisine,Ranking,Rating,Price,ReviewCount,Reviews,URL_TA
0,FACIL,"['International', 'European', 'Gluten Free Opt...",1.0,4.5,High,687.0,"[['What a way to turn 60!', 'Only to recommend...",https://www.tripadvisor.com/Restaurant_Review-...
1,La Gondola Due,"['Italian', 'Mediterranean', 'European', 'Vege...",2.0,4.5,Medium,282.0,"[['Great food!', 'Very good'], ['12/17/2017', ...",https://www.tripadvisor.com/Restaurant_Review-...
2,Ristorante A Mano,"['Italian', 'Mediterranean', 'European', 'Vege...",3.0,4.5,Medium,1501.0,"[['Sensational!', 'Delicious food and lovely s...",https://www.tripadvisor.com/Restaurant_Review-...
3,Restaurant Bieberbau,"['German', 'Central European', 'Vegetarian Fri...",4.0,4.5,High,621.0,"[['Excellent restaurant', 'Perfect Birthday tr...",https://www.tripadvisor.com/Restaurant_Review-...
4,El Loco,"['Mexican', 'Central American', 'Vegetarian Fr...",5.0,5.0,Medium,171.0,"[['Join the Loco family 😉', 'Best sweet potato...",https://www.tripadvisor.com/Restaurant_Review-...


In [57]:
# one version of input function
input("Welcome to FoodBER. You must be hungry! Press enter to start.")

# The input function allows us to attain input from a user using a prompt arguement. 
# The input function reads the information/text from the input provided by the user and converts it into a string while stripping the trailing newline
# Here the first input request is fulfilled by the pressing the enter key

u_cuisine = input("What type of cuisine do you like? ") # ideally show list of cuisines available
u_price = input("What is your price range? Enter: High, Medium or Low") # ideally show list of price points
u_rating = float(input("What is your rating preference? Chose from 5.0 to 1.0"))

# The variable "u_cuisine" is created based upon the users input for their cuisine preference 
# The variable "u_price" is created based upon the users input for their price preference 
# The variable "u_rating" is created based upon the users input for their preferred rating for a resturant. 
# Since the dataset ratings are denoted as 1,2,3 etc numbers instead of 1 star, 2 star etc 
# In order to filter and compare the ratings, the input has to converted into a float from a string using "float" infront of the input function

Welcome to FoodBER. You must be hungry! Press enter to start.
What type of cuisine do you like? German
What is your price range? Enter: High, Medium or LowMedium
What is your rating preference? Chose from 5.0 to 1.04


In [58]:
# Alternative display for the input
#@title Welcome to HertiEATS! { display-mode: "both" }
#@markdown Pick some choices below to get started:

Cuisine = 'German' #@param ["Asian", "German", "Mediterranean", "American", "Middle East", "Sushi", "Cafe", "Central European", " European", "Vegan Options", "Vegetarian Friendly'", "International", "Gluten Free Options", "Healthy", "Soups", "Italian", "Vietnamese", "Greek", "Japanese", "Indonesian", "Gastropub", "Fast Food", "Fusion", "Halal", "French", "Korean", "Turkish", "Israeli", "Ukrainian", "Russian", "Eastern European", "Austrian ", "Indian", "Contemporary", "Bar", "Latin", "Chilean", "Steakhouse", "Barbecue", "Delicatessen", "Diner", "Chinese ", "Brazilian", "Pizza", "Choose Below", "Mexican"]
Price = 'Medium' #@param ["Choose Below", "High", "Medium", "Low"]
Rating = 4.0 #@param ["1.0", "2.0", "3.0", "4.0", "5.0", "0-5"] {type:"raw"}

print(Cuisine) 
print(Price)
print(Rating)

German
Medium
4.0


# Module 2

### Creating the database

Our program will call on a database of restaurants to serve as a point of reference. The database will be dictionary-based and will include information on the distance, cuisine and price point of restaurants. The aim is to also include information on the waiting time, indoor-outdoor seating options and menu recommendations inclusive of dietary restrictions where possible. 


In [59]:
# IMPORTANT: Please note that the dataset used in Module 1 is the result of Module 2 
# The raw dataset was cleaned and edited separately and in Module 3 before being saved as the "TA_berlin_clean.csv" file 

# read in dataset and clean data for processing
# drop unnamed index column
data = pd.read_csv('/content/drive/MyDrive/E1326 Python Programming/Project: HertiEats/TAdata/TA_berlin.csv', index_col=0)

# The object "data" is created and read using the pandas "pd.read_csv" function
# "index_col=0" overrides the default usage of the column names as the index in this dataframe 


# drop ID_TA column
data = data.drop(['City','ID_TA'], axis=1)
# We removed the specified label named "City, ID_TA" and additionally specified the corresponding axis for this label in the index

# drop rows with missing values NaN
data = data.dropna(subset=['Name', 'Cuisine Style', 'Rating', 'Price Range'])
# Similarly we removed the missing values using the ".dropna" function in pandas. 
# The subset where the missing values need to removed are specified for assessment of NaN cells within the columns "Name","Cuisine style","Rating","Price Range"

# recode price range column
data = data.replace({'Price Range': {'$$$$': 'High', '$$ - $$$' : 'Medium', '$':'Low'}})
# The the Price Range column within our datasframe consists of $ symbols but we want to present the price range a High, Medium, Low options to our end users
# To do this we use the ".replace" function to code in "x:y" a replacemant in the values such that the value given in space X is exchanged for the value in space Y

# rename columns
data = data.rename({'Cuisine Style': 'Cuisine', 'Price Range': 'Price', 'Number of Reviews': 'ReviewCount'}, axis='columns')
# For ease of recall and aesthetics we chose to rename the labels of out dataframe using the pandas function ".rename" and we specified the axis as the columns  

In [50]:
# add in https to strings to convert to urls
data['URL_TA'] = 'https://www.tripadvisor.com' + data['URL_TA'].astype(str)

# We want to access and edit our dictionary values within the "URL_TA" column 
# Beacuse we provide our end users with ease of accessing resturant information online by adding https to the html link we already have in the dataframe 
# For this the dictionary "data" is selected and the key specified is the column "URL_TA"
# The value 'https://www.tripadvisor.com' is added using "+" to the existing cell information in the key named "data['URL_TA']" 
# The function ".astype(str)" designates the type of the cell as a string or sequence of characters 

In [51]:
data.head(5)
# This code quickly checks the output (deafult 5 rows) to ascertain if the intended dataframe correspondingly adapts to the criteria and infromation that we wanted it to include/exclude

Unnamed: 0,Name,Cuisine,Ranking,Rating,Price,ReviewCount,Reviews,URL_TA
0,FACIL,"['International', 'European', 'Gluten Free Opt...",1.0,4.5,High,687.0,"[['What a way to turn 60!', 'Only to recommend...",https://www.tripadvisor.com/Restaurant_Review-...
1,La Gondola Due,"['Italian', 'Mediterranean', 'European', 'Vege...",2.0,4.5,Medium,282.0,"[['Great food!', 'Very good'], ['12/17/2017', ...",https://www.tripadvisor.com/Restaurant_Review-...
2,Ristorante A Mano,"['Italian', 'Mediterranean', 'European', 'Vege...",3.0,4.5,Medium,1501.0,"[['Sensational!', 'Delicious food and lovely s...",https://www.tripadvisor.com/Restaurant_Review-...
3,Restaurant Bieberbau,"['German', 'Central European', 'Vegetarian Fri...",4.0,4.5,High,621.0,"[['Excellent restaurant', 'Perfect Birthday tr...",https://www.tripadvisor.com/Restaurant_Review-...
4,El Loco,"['Mexican', 'Central American', 'Vegetarian Fr...",5.0,5.0,Medium,171.0,"[['Join the Loco family 😉', 'Best sweet potato...",https://www.tripadvisor.com/Restaurant_Review-...


In [52]:
# save clean data file as csv to google colab
data.to_csv('/content/drive/MyDrive/E1326 Python Programming/Project: HertiEats/TAdata/TA_berlin_clean.csv', index=False)
# The function "data.to_csv" saves the object "data" as a csv file in the filepath specified. Additionally we have used the "False" to avoid printing row names/fields for index names

Some more data cleaning steps below.

In [53]:
# check first before converting columns into different types
dfn = data.convert_dtypes()
dfn.info()
# The ".convert_dtypes" command allows us to convert columns in object "data" to the optimal dtypes that support pd.NA. (due to our missing values)
# New object/variable "dfn" is created 
# the ".info()" function provides us with a summary of the dataframe including number of entries, number of columns and the types of variables in them

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3513 entries, 0 to 6370
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         3513 non-null   string 
 1   Cuisine      3513 non-null   string 
 2   Ranking      3510 non-null   Int64  
 3   Rating       3513 non-null   float64
 4   Price        3513 non-null   string 
 5   ReviewCount  3440 non-null   Int64  
 6   Reviews      3513 non-null   string 
 7   URL_TA       3513 non-null   string 
dtypes: Int64(2), float64(1), string(5)
memory usage: 253.9 KB


In [54]:
# convert cuisine to list column = CuisineList
data["CuisineList"] = data["Cuisine"].apply(eval)
data.info()
# We want to evaluate the data present as strings within "Cuisine" into another column i.e "CuisineList" that allows us to view the same data as lists
# Aim to create a second list named "CuisineList" with its data as lists
# The ".info()" function provides us with a summary of the dataframe including number of entries, number of columns and the types of variables in them

# check if changing to list works = yes
for i, l in enumerate(data["CuisineList"]):
  print("list",i,"is",type(l))
  break
# This code passes an iterable object in the argument of enumerate(), to get the index and elements 
# By using "CuisineList" as the subset within data in the "enumerate" function we set the type check for that particular column
# Result list 0 is <class 'list'>
# Note: data now contains two columns
# Cuisine which is a column of lists read as strings
# CuisineList which is a column of lists 


#Extraction of items from lists
df = data.loc[data.Cuisine.str.contains('German|Mexican'), :] # extract items from cuisine using this syntax
df.head()
len(df)
# The ".loc[]" function allows us to access rows and column by labels and also incorporate a boolean array.
# Using "Series.str.contains()"" function, we set Cuisine as series/index and "str.contains()" tests if the regex or expression is present in a string. 
# The function returns boolean series/index if the specified regex is present within a string
# The ".len()" function is used to calculate the length of any iterable object

# ADDITIONAL - Extraction method
# or this option below for the cuisinelist column also works - probably better to save user choices in list form
mycuisine = ['German', 'Mexican'] 
# save user preferences in list

mask = data.CuisineList.apply(lambda x: any(item for item in mycuisine if item in x))
# Create mask based on list
# Masks are used to return elements/items that satify a condition
# Lambda/s are anonymous functions can be defined without a name but are confined to a single line of expression
# Here we apply lambda function to the CuisineList 
# "any(item for item in mycuisine if item in x)" this code satisfies the dynamic input given by the user
# Can be read as List item for list item in column CuisineList if it staifies the elements as saved in "mycuisine" by the user
# Masks are used to return elements/items that satify a condition, here what ever is entered in "mycuisine" as the the end user preference 

df1 = data[mask] 
# new dataframe extracts only German and/or Mexican cuisines
# "df1" is the new dataframe


df1.head()
len(df1)
# The ".len()" Function is used to calculate the length of any iterable object


# Similar example for reviews column
rev = data.loc[data.Reviews.str.contains('Excellent'), :] # extract items from reviews column using this syntax 
rev.head()
len(rev)

# The ".loc[]" function allows us to access rows and column by labels and also incorporate a boolean array.
# Using "Series.str.contains()"" function, we set Reviews as series/index and "str.contains()" tests if the regex or expression is present in a string. 
# The function returns boolean series/index if the specified regex is present within a string
# the ".head()" function return the first 5 default rows
# The ".len()" function is used to calculate the length of any iterable object

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3513 entries, 0 to 6370
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         3513 non-null   object 
 1   Cuisine      3513 non-null   object 
 2   Ranking      3510 non-null   float64
 3   Rating       3513 non-null   float64
 4   Price        3513 non-null   object 
 5   ReviewCount  3440 non-null   float64
 6   Reviews      3513 non-null   object 
 7   URL_TA       3513 non-null   object 
 8   CuisineList  3513 non-null   object 
dtypes: float64(3), object(6)
memory usage: 274.5+ KB
list 0 is <class 'list'>


219

# Module 3

## Generate a list of restaurants

The program will reference the database to find restaurants matching the input provided by the user. This list of restaurants can be further refined by the user by showing optional parameters:
- Type of cuisine
- Dietary preferences, e.g. vegetarianism, veganism  
- Number and snippet of reviews

The program will generate a final restaurant list that fulfills the goal of providing the user with preferred dining options while incorporating the user's eating preferences.


In [60]:
# Function to compare input vs cuisine values in the df. 
# Run this function over every row in the dataframe and add matches to a new df
def compareCuisine(UserCuisine, restaurantCuisine): # make sure user inputs string
  return UserCuisine in restaurantCuisine # returns true or false
# A keyword for defining the specific function is "def"
# Here "compareCuisine" is defined as a comparision between elements in UserCuisine and restaurantCuisine
# The function "return" allows us to send the result back to the caller. 
# Here whichever element in restaurantCuisine match with the elements present in UserCuisine, they are sent to the caller

def compareRating(UserRating, restaurantRating): # make sure user input is a numerical value (float)
  return restaurantRating >= UserRating

# Here "compareRating" is defined as a comparision between elements in UserRating and restaurantRating
# Elements in the restaurantRating are returned/matched if it is Equal or Greater than the UserRating

def comparePrice(UserPrice, restaurantPrice):
  return UserPrice == restaurantPrice

# Here "comparePrice" is defined as a comparision between elements in UserPrice and restaurantPrice
# Elements in the restaurantRating are returned/matched if it is Equal to the price option chosen in UserPrice 


In [61]:
#Create an empty dataframe that the results will be added into
results = pd.DataFrame(columns= ["Name", "Cuisine", "Rating", "Price", "ReviewCount", "Reviews", "URL_TA"])
#print("Empty df", results, sep='\n')

#"pd.DataFrame" function creates a mutable 2 Dimensional tabular data, it allows us to create a data structure that has labled columns and rows
# The column are lablled as "Name", "Cuisine", "Rating", "Price", "ReviewCount", "Reviews", "URL_TA"
# The "print" function here if utilized will be able to tell us if the dataframe is empty of not using TRUE/FLASE condition

In [62]:
#Add the compare funtions for all three variables into a function that will add the values that are "true" into the results DF. 
for i, restaurant in berlin.iterrows():
    if compareCuisine(Cuisine, restaurant['Cuisine']) and comparePrice(Price, restaurant['Price']) and compareRating(Rating, restaurant["Rating"]):
      #print(restaurant)
      results = results.append(restaurant, ignore_index=True)
      #results.append(restaurant, ignore_index=True)

In [63]:
data_table.DataTable(results.head(), include_index=False, num_rows_per_page=10)
#As we know the ".head()" function returns the first n rows, here num_rows_per_page is set as 10 

Unnamed: 0,Name,Cuisine,Rating,Price,ReviewCount,Reviews,URL_TA,Ranking
0,Hackethals,"['German', 'Bar', 'European', 'Pub', 'Central ...",4.5,Medium,893.0,"[['Authentic brilliance', 'Traditional German ...",https://www.tripadvisor.com/Restaurant_Review-...,15.0
1,Restaurant Schlossgarten,"['German', 'Vegetarian Friendly', 'Gluten Free...",4.5,Medium,392.0,"[['Authentic German food', 'Beautiful food, be...",https://www.tripadvisor.com/Restaurant_Review-...,16.0
2,Schnitzelei,"['Diner', 'German', 'Vegetarian Friendly', 'Ve...",4.5,Medium,370.0,"[['Authentic, superb German tapas', 'Hidden Ge...",https://www.tripadvisor.com/Restaurant_Review-...,21.0
3,Marjellchen,"['German', 'Vegetarian Friendly']",4.5,Medium,2364.0,[['Delicious food and good Prussian hospitali....,https://www.tripadvisor.com/Restaurant_Review-...,24.0
4,KaDeWe Feinschmeckerbars,"['Italian', 'French', 'German', 'Seafood', 'Me...",4.5,Medium,1414.0,[['bouillabaisse for one for €9.95 not worth.....,https://www.tripadvisor.com/Restaurant_Review-...,32.0


# Module 4

## Desired output: Restaurant search results in a hyperlinked list 

The output is the list of restaurants, which will be viewable in the console as text or as an HTML file 


In [48]:
#FINAL OUTPUT
#The code is creating an HTML file by writing strings to a file.
#The string is being saved as text, but do to the formating, the system reads the text as HTML
opening_html_code = """<!DOCTYPE html>\n
<html>\n
   <body>\n
      <h1>Search Results for Restaurants:</h1>\n
      """
#resturant_names variable is created 
restaurant_names = ""
for ind in results.index:
      restaurant_names+= "<p><a href=\"" + results["URL_TA"][ind] + "\">" + results['Name'][ind] + "</a></p>\n"     
#loops though results dataframe and hyperlinks the restaurant name to its URL
# ind is the index number , reults.index returns a list with all the index numbers in results, supposing results.index = [0,1,2,3] ind will be first 1, then 1 in the next loop and then 2 in the next loop and soforth
# the href specifies the URL of a page the link should go to.

#code to close or complete our desired HTML page 
ending_html_code = """ 
<br> 
<br> 
<br> 
<br> 
<br> 
<br> 
<br>
<br> 
<br> 
<br> 
<br> 
<br> 
<br> 
<br>
   </body>\n
</html>\n""" 
#Utilized a lot of <BR> breaks, as I could not figure out a more elegant solution to make the display bigger

# Here we created a varible that combined the HTML creation code + the variable "restaurant_names" (in which the restaurant names have been hyperlinked with their URLs) + the completion end of an HTML file 
total_html_code = opening_html_code + restaurant_names + ending_html_code
#print(total_html_code)

#This variable is just a step that allows us to write the html file into our drive
FinList = open("/content/drive/MyDrive/E1326 Python Programming/Project: HertiEats/ResturantList.html","w") #Please rename this variable as needed to adjust to your drive 
# 'w' is write mode for files
FinList.write(total_html_code)
#the html writes in the variable we created above 'total_html_code'
FinList.close()
#html ends with .close()

#Displays the html file
import IPython
#the import function is used for the IPython which is a command shell for interactive computing that will enable us to view the output in the console
IPython.display.HTML(filename='/content/drive/MyDrive/E1326 Python Programming/Project: HertiEats/ResturantList.html')  
#.display.HTML function calls the html created earlier 
# The restaurant recommendation should now be visable to you and you can click on the resturant name to be linked to their url on tripadvisor
