## Grouping pandas data frames


Tips dataset:

1. `total_bill`: This is the total bill amount for the meal, including tax, in US dollars.

2. `tip`: This is the tip amount that was left for the meal, also in US dollars.

3. `sex`: This is the gender of the person who paid for the meal. It is a categorical variable with two levels: "Male" or "Female".

4. `smoker`: This indicates whether or not the person who paid for the meal is a smoker. It is a categorical variable with two levels: "Yes" if the person is a smoker, and "No" if they are not.

5. `day`: This is the day of the week that the meal took place. It is a categorical variable with four levels: "Thur", "Fri", "Sat", or "Sun".

6. `time`: This is the time of day that the meal took place. It is a categorical variable with two levels: "Lunch" or "Dinner".

7. `size`: This is the number of people in the party at the meal. This is a discrete numerical variable.

Installing seaborn:
 - pip install seaborn

#### Counting number of group instances


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import random 

random.seed(2)

#### Groupby operation

In [None]:
tips = sns.load_dataset("tips")

In [None]:
tips.head()

In [None]:
# Group by day
tip_per_day = tips.groupby('day')

In [None]:
# Inspect contents of the group
for group in tip_per_day:
    
    break

In [None]:
type(group)

In [None]:
len(group)

In [None]:
group[0]

In [None]:
group[1]

In [None]:
# Print contents of each group
for group in tip_per_day:
    
    print(group[1].head())

In [None]:
# Average bill for each group
average_bill_per_day = tips.groupby('day')['total_bill'].mean()

print(average_bill_per_day)
print(type(average_bill_per_day))

In [None]:
# Mean for each group
average_bill_per_day = tips.groupby('day')[['total_bill', 'tip']].mean()

print(average_bill_per_day)
print(type(average_bill_per_day))

In [None]:
# Group the data by 'day' and 'time', and calculate the average total bill for each combination
average_bill_per_day_and_time = tips.groupby(['day', 'time'])['total_bill'].mean()
print(average_bill_per_day_and_time)

In [None]:
average_bill_per_day_and_time.reset_index()

In [None]:
# Group the data by 'day' and 'time', and calculate the average total bill for each combination
average_bill_per_day_and_time = tips.groupby(['day', 'time'])['total_bill'].count()
print(average_bill_per_day_and_time)

In [None]:
# Group the data by 'day' and 'time', and calculate the average total bill for each combination
average_bill_per_day_and_time = tips.groupby(['day', 'time'], observed=True)['total_bill'].mean()
print(average_bill_per_day_and_time)

In [None]:
average_bill_per_day_and_time.reset_index()

#### Group and sort

In [None]:
tips_sorted = tips.sort_values('tip', ascending=False)
average_bill_per_day = tips_sorted.groupby('day').head(3)
average_bill_per_day