# Case Study on Waiter Tips

This case study has you using Waiter's Tips. The objective is to to build a machine learning model that predicts and find out the tips given to waiter.

## Steps

1. Installation.
2. Load libaries.
3. Load the dataset.
4. Descriptive analysis: Get to know the data.
5. Data exploration: Visualize the data.
6. Identifying target and features.
7. Building and training a Linear Regression Model.
8. Evaluating the Model and making predictions.


## Installation.
  
You can install scikit-learn using pip:


In [None]:
pip install scikit-learn pandas numpy seaborn matplotlib


## Load libaries.

- Pandas : Data structures and operations for manipulating numerical tables and time series.
- Sklearn : Various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.
- Pickle : Serializing and de-serializing a Python object structure. 
- Seaborn : High-level interface for drawing attractive and informative statistical graphics.
- Matplotlib : Object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.


In [None]:
# Load libraries 
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, Lasso, LassoCV
from sklearn.model_selection import KFold, cross_val_score, train_test_split

# Ignore warnings
# https://docs.python.org/3/library/warnings.html
import warnings

warnings.filterwarnings('ignore')


## Load the dataset.


In [None]:
# Solution


The dataset is available from Kaggle. It contains data about:
- total_bill: Total bill in dollars including taxes
- tip: Tip given to waiters in dollars
- sex: gender of the person paying the bill
- smoker: whether the person smoked or not
- day: day of the week
- time: lunch or dinner
- size: number of people in a table


## Descriptive analysis: Get to know the data.

View the dataset and basic details like total number of rows and columns, what are the column data types and see if we need to create new column or not.
  

In [None]:
# Solution


Data type check helps to understand what type of variables our dataset contains.


In [None]:
category_cols = ['category']
category_lst = list(df.select_dtypes(include=category_cols).columns)
print("Total number of categorical columns are ", len(category_lst))
print("There names are as follows: ", category_lst)


In [None]:
int64_cols = ['int64']
int64_lst = list(df.select_dtypes(include=int64_cols).columns)
print("Total number of numerical columns are ", len(int64_lst))
print("There names are as follows: ", int64_lst)


In [None]:
float64_cols = ['float64']
float64_lst = list(df.select_dtypes(include=float64_cols).columns)
print("Total number of float64 columns are ", len(float64_lst))
print("There name are as follow: ", float64_lst)


## Visualize the data.

Graphs we are going to view:
- Histogram of all columns to check the distrubution of the columns
- Distplot or distribution plot of all columns to check the variation in the data distribution
- Heatmap to calculate correlation within feature variables
- Boxplot to find out outlier in the feature columns


In [None]:
# Solution


## Identifying target and features.

Before we start building our machine learning model, we need to preprocess the data. This involves tasks such as handling missing values, encoding categorical variables, and splitting the data into training and testing sets.

Separate the target variable and feature columns in two different dataframes and check the shape of the dataset for validation purpose.


In [None]:
df["sex"] = df["sex"].map({"Female": 0, "Male": 1})
df["smoker"] = df["smoker"].map({"No": 0, "Yes": 1})
df["day"] = df["day"].map({"Thur": 0, "Fri": 1, "Sat": 2, "Sun": 3})
df["time"] = df["time"].map({"Lunch": 0, "Dinner": 1})
df.head()


Next, we'll select relevant features and engineer new features if needed. For example, we might calculate the tip percentage (tip amount divided by total bill amount) or create dummy variables for categorical features.


In [None]:
# Solution


We'll choose a regression algorithm suitable for our problem. Some common choices include linear regression, decision trees, random forests, or gradient boosting algorithms. We'll train multiple models and evaluate their performance to select the best one.


In [None]:
# Solution


We'll split the dataset into training and testing sets.


In [None]:
# Solution


## Building and training a Linear Regression Model.

Now we can train the selected machine learning model on the training data.


In [None]:
# Solution


## Evaluating the Model and making predictions.

Now let’s test the performance of this model by giving inputs to this model according to the features that we have used to train this model.


In [None]:
# Solution


<details>
<summary><b>Instructor Notes</b></summary>

https://hidevscommunity.medium.com/waiter-tips-prediction-29d527efa0d6

</details>
