<a href="https://colab.research.google.com/github/farrelrassya/Cluster-Analysis-and-Dimensionality-Reduction/blob/main/02.%20Python%20Libraries/13_Pandas_Grouping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Grouping pandas data frames


Tips dataset:

1. `total_bill`: This is the total bill amount for the meal, including tax, in US dollars.

2. `tip`: This is the tip amount that was left for the meal, also in US dollars.

3. `sex`: This is the gender of the person who paid for the meal. It is a categorical variable with two levels: "Male" or "Female".

4. `smoker`: This indicates whether or not the person who paid for the meal is a smoker. It is a categorical variable with two levels: "Yes" if the person is a smoker, and "No" if they are not.

5. `day`: This is the day of the week that the meal took place. It is a categorical variable with four levels: "Thur", "Fri", "Sat", or "Sun".

6. `time`: This is the time of day that the meal took place. It is a categorical variable with two levels: "Lunch" or "Dinner".

7. `size`: This is the number of people in the party at the meal. This is a discrete numerical variable.

Installing seaborn:
 - pip install seaborn

#### Counting number of group instances


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import random

random.seed(2)

#### Groupby operation

In [2]:
tips = sns.load_dataset("tips")

In [3]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [4]:
# Group by day
tip_per_day = tips.groupby('day')

In [5]:
# Inspect contents of the group
for group in tip_per_day:

    break

In [6]:
type(group)

tuple

In [7]:
len(group)

2

In [8]:
group[0]

'Thur'

In [9]:
group[1]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
77,27.20,4.00,Male,No,Thur,Lunch,4
78,22.76,3.00,Male,No,Thur,Lunch,2
79,17.29,2.71,Male,No,Thur,Lunch,2
80,19.44,3.00,Male,Yes,Thur,Lunch,2
81,16.66,3.40,Male,No,Thur,Lunch,2
...,...,...,...,...,...,...,...
202,13.00,2.00,Female,Yes,Thur,Lunch,2
203,16.40,2.50,Female,Yes,Thur,Lunch,2
204,20.53,4.00,Male,Yes,Thur,Lunch,4
205,16.47,3.23,Female,Yes,Thur,Lunch,3


In [10]:
# Print contents of each group
for group in tip_per_day:

    print(group[1].head())

    total_bill   tip   sex smoker   day   time  size
77       27.20  4.00  Male     No  Thur  Lunch     4
78       22.76  3.00  Male     No  Thur  Lunch     2
79       17.29  2.71  Male     No  Thur  Lunch     2
80       19.44  3.00  Male    Yes  Thur  Lunch     2
81       16.66  3.40  Male     No  Thur  Lunch     2
    total_bill   tip     sex smoker  day    time  size
90       28.97  3.00    Male    Yes  Fri  Dinner     2
91       22.49  3.50    Male     No  Fri  Dinner     2
92        5.75  1.00  Female    Yes  Fri  Dinner     2
93       16.32  4.30  Female    Yes  Fri  Dinner     2
94       22.75  3.25  Female     No  Fri  Dinner     2
    total_bill   tip     sex smoker  day    time  size
19       20.65  3.35    Male     No  Sat  Dinner     3
20       17.92  4.08    Male     No  Sat  Dinner     2
21       20.29  2.75  Female     No  Sat  Dinner     2
22       15.77  2.23  Female     No  Sat  Dinner     2
23       39.42  7.58    Male     No  Sat  Dinner     4
   total_bill   tip   

In [11]:
# Average bill for each group
average_bill_per_day = tips.groupby('day')['total_bill'].mean()

print(average_bill_per_day)
print(type(average_bill_per_day))

day
Thur    17.682742
Fri     17.151579
Sat     20.441379
Sun     21.410000
Name: total_bill, dtype: float64
<class 'pandas.core.series.Series'>


In [12]:
# Mean for each group
average_bill_per_day = tips.groupby('day')[['total_bill', 'tip']].mean()

print(average_bill_per_day)
print(type(average_bill_per_day))

      total_bill       tip
day                       
Thur   17.682742  2.771452
Fri    17.151579  2.734737
Sat    20.441379  2.993103
Sun    21.410000  3.255132
<class 'pandas.core.frame.DataFrame'>


In [13]:
# Group the data by 'day' and 'time', and calculate the average total bill for each combination
average_bill_per_day_and_time = tips.groupby(['day', 'time'])['total_bill'].mean()
print(average_bill_per_day_and_time)

day   time  
Thur  Lunch     17.664754
      Dinner    18.780000
Fri   Lunch     12.845714
      Dinner    19.663333
Sat   Lunch           NaN
      Dinner    20.441379
Sun   Lunch           NaN
      Dinner    21.410000
Name: total_bill, dtype: float64


In [14]:
average_bill_per_day_and_time.reset_index()

Unnamed: 0,day,time,total_bill
0,Thur,Lunch,17.664754
1,Thur,Dinner,18.78
2,Fri,Lunch,12.845714
3,Fri,Dinner,19.663333
4,Sat,Lunch,
5,Sat,Dinner,20.441379
6,Sun,Lunch,
7,Sun,Dinner,21.41


In [15]:
# Group the data by 'day' and 'time', and calculate the average total bill for each combination
average_bill_per_day_and_time = tips.groupby(['day', 'time'])['total_bill'].count()
print(average_bill_per_day_and_time)

day   time  
Thur  Lunch     61
      Dinner     1
Fri   Lunch      7
      Dinner    12
Sat   Lunch      0
      Dinner    87
Sun   Lunch      0
      Dinner    76
Name: total_bill, dtype: int64


In [16]:
# Group the data by 'day' and 'time', and calculate the average total bill for each combination
average_bill_per_day_and_time = tips.groupby(['day', 'time'], observed=True)['total_bill'].mean()
print(average_bill_per_day_and_time)

day   time  
Thur  Lunch     17.664754
      Dinner    18.780000
Fri   Lunch     12.845714
      Dinner    19.663333
Sat   Dinner    20.441379
Sun   Dinner    21.410000
Name: total_bill, dtype: float64


In [17]:
average_bill_per_day_and_time.reset_index()

Unnamed: 0,day,time,total_bill
0,Thur,Lunch,17.664754
1,Thur,Dinner,18.78
2,Fri,Lunch,12.845714
3,Fri,Dinner,19.663333
4,Sat,Dinner,20.441379
5,Sun,Dinner,21.41


#### Group and sort

In [18]:
tips_sorted = tips.sort_values('tip', ascending=False)
average_bill_per_day = tips_sorted.groupby('day').head(3)
average_bill_per_day

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
170,50.81,10.0,Male,Yes,Sat,Dinner,3
212,48.33,9.0,Male,No,Sat,Dinner,4
23,39.42,7.58,Male,No,Sat,Dinner,4
141,34.3,6.7,Male,No,Thur,Lunch,6
183,23.17,6.5,Male,Yes,Sun,Dinner,4
47,32.4,6.0,Male,No,Sun,Dinner,4
88,24.71,5.85,Male,No,Thur,Lunch,2
181,23.33,5.65,Male,Yes,Sun,Dinner,2
85,34.83,5.17,Female,No,Thur,Lunch,4
95,40.17,4.73,Male,Yes,Fri,Dinner,4
