# Predict Waiter Tips

Predicting waiter tips using machine learning is an interesting problem that falls under the realm of regression analysis. In this tutorial, we'll walk through the process of building a machine learning model to predict waiter tips based on various features such as total bill amount, party size, and other factors.


## Dataset Overview

We'll use the well-known "tips" dataset, which is available in the seaborn library. This dataset contains information about tips collected from a restaurant, including the total bill amount, tip amount, gender of the person paying the bill, whether the party was smokers or non-smokers, the day of the week, time of day, and party size.


In [4]:
import pandas as pd


In [None]:
# This is how we load the tips dataset from seaborn and saved to a csv.
#tips = sns.load_dataset("tips")
#tips.to_csv('tips.csv', encoding='utf-8', index=False)


In [None]:
# Load the data from the csv.
tips =  pd.read_csv('tips.csv')

# Take a look.
print(tips.head())


## Getting to Know the Data


Let’s have a look at the tips given to the waiters according to:
    
- the total bill paid
- number of people at a table
- the day of the week:
    

In [None]:
figure = px.scatter(data_frame = tips, x="total_bill", y="tip", size="size", color= "day", trendline="ols")
figure.show()


Now let’s have a look at the tips given to the waiters according to: 

- the total bill paid
- the number of people at a table
- the gender of the person paying the bill


In [None]:
figure = px.scatter(data_frame = tips, x="total_bill", y="tip", size="size", color= "sex", trendline="ols")
figure.show()


Now let’s have a look at the tips given to the waiters according to:

- the total bill paid
- the number of people at a table
- the time of the meal


In [None]:
figure = px.scatter(data_frame = tips, x="total_bill", y="tip", size="size", color= "time", trendline="ols")
figure.show()


Now let’s see the tips given to the waiters according to the days to find out which day the most tips are given to the waiters:
    

In [None]:
figure = px.pie(tips, values='tip', names='day',hole = 0.5)
figure.show()


According to the visualization above, on Saturdays, most tips are given to the waiters. Now let’s look at the number of tips given to waiters by gender of the person paying the bill to see who tips waiters the most:
    

In [None]:
figure = px.pie(tips, values='tip', names='sex',hole = 0.5)
figure.show()


According to the visualization above, a waiter is tipped more during dinner.


So this is how we can analyze all the factors affecting waiter tips. Now in the section below, we will walk through how to train a machine learning model for the task of waiter tips prediction.


## Data Preprocessing

Before we start building our machine learning model, we need to preprocess the data. This involves tasks such as handling missing values, encoding categorical variables, and splitting the data into training and testing sets.


In [None]:
tips["sex"] = tips["sex"].map({"Female": 0, "Male": 1})
tips["smoker"] = tips["smoker"].map({"No": 0, "Yes": 1})
tips["day"] = tips["day"].map({"Thur": 0, "Fri": 1, "Sat": 2, "Sun": 3})
tips["time"] = tips["time"].map({"Lunch": 0, "Dinner": 1})
tips.head()


## Feature Selection and Engineering

Next, we'll select relevant features and engineer new features if needed. For example, we might calculate the tip percentage (tip amount divided by total bill amount) or create dummy variables for categorical features.


In [None]:
x = np.array(tips[["total_bill", "sex", "smoker", "day", "time", "size"]])
y = np.array(tips["tip"])


## Model Selection

We'll choose a regression algorithm suitable for our problem. Some common choices include linear regression, decision trees, random forests, or gradient boosting algorithms. We'll train multiple models and evaluate their performance to select the best one.


In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()


## Model Training

We'll split the dataset into training and testing sets.


In [None]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)


Now we can train the selected machine learning model on the training data.


In [None]:
model.fit(xtrain, ytrain)


## Test the Model

Now let’s test the performance of this model by giving inputs to this model according to the features that we have used to train this model:


In [None]:
# features = [[total_bill, "sex", "smoker", "day", "time", "size"]]
features = np.array([[24.50, 1, 0, 0, 1, 4]])
model.predict(features)


According to the visualization above, most tips are given by men. Now let’s see if a smoker tips more or a non-smoker:
    

In [None]:
figure = px.pie(tips, values='tip', names='smoker',hole = 0.5)
figure.show()


According to the visualization above, non-smoker tips waiters more than smokers. Now let’s see if most tips are given during lunch or dinner:
    

In [None]:
figure = px.pie(tips, values='tip', names='time',hole = 0.5)
figure.show()


## Summary

In this tutorial, we learned how to build a machine learning model to predict waiter tips using the tips dataset. We covered data preprocessing, feature selection, model training, evaluation, and deployment. You can further enhance the model's performance by experimenting with different algorithms, feature engineering techniques, and hyperparameter tuning.


<details>
<summary><b>Instructor Notes</b></summary>

Nothing to add...

</details>