# Fundamentals of Data Analysis - Project
**Author** Mark Cotter - GMIT
**Email** g00376335@gmit.ie
**Dates** September 2019 to November 2019
**Lecturer** Ian McLoughlin

This document is my analysis of the well-known 'tips' dataset. The program run in this jupyter notebook and uses the python programming language to analyse the dataset.

### Import python libraries and dataset

This code imports python libraries, the 'tips' dataset.

In [1]:
# import numpy for use of arrays
import numpy as np
# import pandas to use DataFrames for the dataset
import pandas as pd
# Import pyplot for plotting
import matplotlib.pyplot as plt
# import seaborn for ploting and loading tips dataset
import seaborn as sns

### Initial review of dataset content

This code prints a brief summary of the dataset content.

In [2]:
#The python seaborn module already includes the 'tips' dataset
#Code adapted from https://seaborn.pydata.org/introduction.html?highlight=tips%20dataset
tips = sns.load_dataset("tips")

# Prints the first 5 lines of the dataset
print("\nFirst 5 lines of the 'tips' dataset\n")
print(tips.head())

# Prints the last 5 lines of the dataset
print("\nLast 5 lines of the 'tips' dataset\n")
print(tips.tail())


First 5 lines of the 'tips' dataset

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

Last 5 lines of the 'tips' dataset

     total_bill   tip     sex smoker   day    time  size
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2



The 'tips' dataset includes 244 observations during which 7 variables were recorded. Based on the names of the variables, it appears that the dataset relates to information recorded at a restaurant over a number of days. (Reference
https://dfrieds.com/data-visualizations/bar-plot-python-pandas).
Information recorded appears to include the following:
- **total_bill**: The total cost of the bill for a meal
- **tip**: The tip received by waiting staff
- **sex**: The sex of the waiting staff
- **smoker**: Weather or not the customer was a smoker
- **day**: Day of the week
- **time**: Meal time
- **size**: The party size that was served

### Basic dataset statistics

This code describes some basis statastics about the dataset

In [3]:
# Display number of observations
# Code adapted from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html
print("Number of observations", tips.total_bill.count())

# Display the min, max and mean total_bill rounded to 2 decimal
# Code adapted from https://stackoverflow.com/questions/455612/limiting-floats-to-two-decimal-points
print("The max, min and mean values for the total_bill are", tips.total_bill.max(),
      ",", tips.total_bill.min(), "and", round((tips.total_bill.mean()), 2))

# Display the min, max and mean tip rounded to 2 decimal
# Code adapted from https://stackoverflow.com/questions/455612/limiting-floats-to-two-decimal-points
print("The max, min and mean values for the tip are", tips.tip.max(),
      ",", tips.tip.min(), "and", round((tips.tip.mean()), 2))

# Display unique values of days
# Code adapted from https://chrisalbon.com/python/data_wrangling/pandas_list_unique_values_in_column/
print("List of days of the week included\n", list(tips.day.unique()))

Number of observations 244
The max, min and mean values for the total_bill are 50.81 , 3.07 and 19.79
The max, min and mean values for the tip are 10.0 , 1.0 and 3.0
List of days of the week included
 ['Sun', 'Sat', 'Thur', 'Fri']


The basic statistic for the 'tips' dataset show that observations were only recorded on 4 days of the week. The fact that these 4 days are approaching or during the weekend, suggests that the restaurant may be closed Monday to Wednesday.