**Context**

One waiter recorded information about each tip he received over a period of a few months working in one restaurant. In all he recorded 244 tips.


**Acknowledgements**

The data was reported in a collection of case studies for business statistics.

Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing

The dataset is also available through the Python package Seaborn.

**Hint:**
Of course, this database has additional columns compared to other  tips datasets.

**Dataset info**

RangeIndex: 244 entries, 0 to 243

Data columns (total 11 columns):
        
     Column                    Non-Null Count  Dtype
  
---  ------                    --------------  ----- 
 
 0   total_bill                                        244 non-null    float64

 1   tip                                                  244 non-null    float64

 2   sex                                               244 non-null    object
 
 3   smoker                                        244 non-null    object 

 4   day                                               244 non-null    object
 
 5   time                                              244 non-null    object
 
 6   size                                              244 non-null    int64
  
 7   price_per_person                        244 non-null    float64

 8   Payer Name                                244 non-null    object
 
 9   CC Number                                 244 non-null    int64
  
 10  Payment ID                                244 non-null    object 

dtypes: float64(3), int64(2), object(6)

**Some details**
total_bill
a numeric vector, the bill amount (dollars)

tip
a numeric vector, the tip amount (dollars)

sex
a factor with levels Female Male, gender of the payer of the bill

Smoker
a factor with levels No Yes, whether the party included smokers

day
a factor with levels Friday Saturday Sunday Thursday, day of the week

time
a factor with levels Day Night, rough time of day

size
a numeric vector, number of people in party

# Import Libraries

In [None]:
import numpy as np
import pandas as pd

# Reading Dataset

In [None]:
df=pd.read_csv('../input/tips-dataset/tips.csv')

In [None]:
df

# Data overview

In [None]:
# This command displays the first 5 rows by default.

df.head()

In [None]:
# If you need more rows, specify their number in the command.
# For example 10 rows

df.head(10)

In [None]:
# Pandas .size, .shape and .ndim are used to 
# return size, shape and dimensions of data frames and series.

df.shape

In [None]:
# Pandas dataframe.info() function is used to get a concise summary of the dataframe. 
# It comes really handy when doing exploratory analysis of the data. 
# To get a quick overview of the dataset we use the dataframe.info() function.

df.info()

In [None]:
# Pandas DataFrame.columns attribute return the column labels of the given Dataframe.

df.columns

In [None]:
# Pandas tail() method is used to return bottom n (5 by default) rows of a data frame or series.

df.tail()

In [None]:
# Pandas describe() is used to view some basic statistical details like 
# percentile, mean, std etc. of a data frame or a series of numeric values.

df.describe()

## Selecting & Indexing

In [None]:
df.head()

In [None]:
df['total_bill']

In [None]:
# type() method returns class type of the argument(object) passed as parameter
# . type() function is mostly used for debugging purposes.

type(df['total_bill'])

In [None]:
df[['total_bill', 'tip']]

In [None]:
type(df[['total_bill', 'tip']])

In [None]:
# A column is added to the data set named tip_percentage and the information is placed inside it.

df['tip_percentage']= 100*(df['tip']/df['total_bill'])

In [None]:
df.head()

In [None]:
# The numpy.round_() is a mathematical function that rounds an array to the given number of decimals.

df['tip_percentage']=np.round(df['tip_percentage'], decimals=3)

In [None]:
df['tip_percentage']

In [None]:
np.round(df['tip_percentage'], 2)

In [None]:
# Pandas provide data analysts a way to delete and filter data frame using .drop() 
# method. Rows can be removed using index label or column name using this method.

df.drop('tip_percentage', axis=1)

# A DataFrame object has two axes: “axis 0” and “axis 1”.
# “axis 0” represents rows and “axis 1” represents columns. 

In [None]:
df=df.drop('tip_percentage', axis=1)

In [None]:
df.head()

In [None]:
# Pandas set_index() is a method to set a List, Series or Data frame as index of a Data Frame.

df.set_index('Payment ID')

In [None]:
# Pandas provide a unique method to retrieve rows from a Data frame. 
# Dataframe.iloc[] method is used when the index label of a data frame is something other than 
# numeric series of 0, 1, 2, 3….n or in case the user doesn’t know the index label. 
# Rows can be extracted using an imaginary index position which isn’t visible in the data frame.

df.iloc[0]

In [None]:
df.iloc[2:4]

In [None]:
# Pandas reset_index() is a method to reset index of a Data Frame. reset_index() method sets 
# a list of integer ranging from 0 to length of data as index.

df=df.reset_index()

In [None]:
df.head()

## Conditional Selection

In [None]:
df[df['total_bill']> 30]

In [None]:
df[df['total_bill']> 30].shape

In [None]:
df[df['sex']=='Male']

In [None]:
df[df['sex']=='Male'].shape

In [None]:
# & --> and
df[(df['sex']=='Male') & (df['total_bill']> 30)]

In [None]:
df[(df['sex']=='Male') & (df['total_bill']> 30)].shape

In [None]:
# | --> or

df[(df['sex']=='Male') | (df['total_bill']> 30)]

In [None]:
df[(df['day']=='Sat') | (df['day']=='Sun')]

In [None]:
# Pandas isin() method is used to filter data frames. isin() method helps in selecting rows 
# with having a particular(or Multiple) value in a particular column.

df[df['day'].isin(['Sat', 'Sun'])]

## Useful Methods

In [None]:
df.info()

In [None]:
def last_four(num):
    return str(num)[-4:]

In [None]:
Num=1234567890
print("Last 4 number of ",Num," is: ",last_four(Num))

In [None]:
df['total_bill'].apply(lambda bill: 0.2*bill)

In [None]:
df['total_bill_20']=df['total_bill'].apply(lambda bill: 0.2*bill)

In [None]:
df.head()

In [None]:
df.sort_values('tip')

In [None]:
df.sort_values('tip', ascending=False)

In [None]:
df.sort_values(['tip', 'size'])

In [None]:
df.corr()

In [None]:
df['total_bill'].max()

In [None]:
df['total_bill'].mean()

In [None]:
# Pandas dataframe.idxmax() function returns index of first occurrence of maximum over requested axis.
# While finding the index of the maximum value across any index, all NA/null values are excluded.

df['total_bill'].idxmax()

In [None]:
df.iloc[170]

In [None]:
df['total_bill'].min()

In [None]:
df['total_bill'].idxmin()

In [None]:
df.head()

In [None]:
# Pandas Series.value_counts() function return a Series containing counts of unique values. 
# The resulting object will be in descending order so that the first element is the most 
# frequently-occurring element. Excludes NA values by default.

df['sex'].value_counts()

In [None]:
# While analyzing the data, many times the user wants to see the unique values in a particular column, 
# which can be done using Pandas unique() function.

df['day'].unique()

In [None]:
df['day'].nunique()

In [None]:
df.groupby('sex').mean()