# Groupby

The groupby method allows you to group rows of data together and call aggregate functions

In [1]:
import pandas as pd
# Create dataframe
data = {'Company':['Google','Google','MS','MS','Meta','Meta'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

In [2]:
df = pd.DataFrame(data)
df

Unnamed: 0,Company,Person,Sales
0,Google,Sam,200
1,Google,Charlie,120
2,MS,Amy,340
3,MS,Vanessa,124
4,Meta,Carl,243
5,Meta,Sarah,350


<strong>Now you can use the .groupby() method to group rows together based off of a column name.<br>For instance let's group based off of Company. This will create a DataFrameGroupBy object:</strong>

In [3]:
# normally a latest version of pandas can show the result of groupby directly (without using ['Sales'])
df.groupby('Company')['Sales'].mean()

Company
Google    160.0
MS        232.0
Meta      296.5
Name: Sales, dtype: float64

You can save this object as a new variable:

In [4]:
by_comp = df.groupby("Company")['Sales']

And then call aggregate methods off the object:

In [5]:
by_comp.mean()

Company
Google    160.0
MS        232.0
Meta      296.5
Name: Sales, dtype: float64

Company는 Column이 아닌 Index이므로 주의 !

In [6]:
df.groupby('Company')['Sales'].mean()

Company
Google    160.0
MS        232.0
Meta      296.5
Name: Sales, dtype: float64

More examples of aggregate methods:

In [7]:
# standard deviation
by_comp.std()

Company
Google     56.568542
MS        152.735065
Meta       75.660426
Name: Sales, dtype: float64

In [8]:
by_comp.min()

Company
Google    120
MS        124
Meta      243
Name: Sales, dtype: int64

In [9]:
by_comp.max()

Company
Google    200
MS        340
Meta      350
Name: Sales, dtype: int64

In [10]:
by_comp.count()

Company
Google    2
MS        2
Meta      2
Name: Sales, dtype: int64

In [11]:
by_comp.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Google,2.0,160.0,56.568542,120.0,140.0,160.0,180.0,200.0
MS,2.0,232.0,152.735065,124.0,178.0,232.0,286.0,340.0
Meta,2.0,296.5,75.660426,243.0,269.75,296.5,323.25,350.0


In [12]:
by_comp.describe().transpose()

Company,Google,MS,Meta
count,2.0,2.0,2.0
mean,160.0,232.0,296.5
std,56.568542,152.735065,75.660426
min,120.0,124.0,243.0
25%,140.0,178.0,269.75
50%,160.0,232.0,296.5
75%,180.0,286.0,323.25
max,200.0,340.0,350.0


In [13]:
by_comp.describe().transpose()['Google']

count      2.000000
mean     160.000000
std       56.568542
min      120.000000
25%      140.000000
50%      160.000000
75%      180.000000
max      200.000000
Name: Google, dtype: float64