Step 1: Splitting Data into Groups

1. Group data by a single key: 

In [7]:
import pandas as pd

data1 = {'Name':['Jai', 'Anuj', 'Jai', 'Princi',
                 'Gaurav', 'Anuj', 'Princi', 'Abhi'],
        'Age':[27, 24, 22, 32,
               33, 36, 27, 32],
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',
                   'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd',
                         'B.Tech', 'B.com', 'Msc', 'MA']}
df = pd.DataFrame(data1)
print(df)

print(df.groupby('Name').groups)

     Name  Age    Address Qualification
0     Jai   27     Nagpur           Msc
1    Anuj   24     Kanpur            MA
2     Jai   22  Allahabad           MCA
3  Princi   32    Kannuaj           Phd
4  Gaurav   33    Jaunpur        B.Tech
5    Anuj   36     Kanpur         B.com
6  Princi   27  Allahabad           Msc
7    Abhi   32    Aligarh            MA
{'Abhi': [7], 'Anuj': [1, 5], 'Gaurav': [4], 'Jai': [0, 2], 'Princi': [3, 6]}


In [9]:
gk = df.groupby('Name') 
gk.first()

Unnamed: 0_level_0,Age,Address,Qualification
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abhi,32,Aligarh,MA
Anuj,24,Kanpur,MA
Gaurav,33,Jaunpur,B.Tech
Jai,27,Nagpur,Msc
Princi,32,Kannuaj,Phd


2. Grouping data with multiple keys : 

In [12]:
df.groupby(['Name', 'Qualification'])
print(df.groupby(['Name', 'Qualification']).groups)

{('Abhi', 'MA'): [7], ('Anuj', 'B.com'): [5], ('Anuj', 'MA'): [1], ('Gaurav', 'B.Tech'): [4], ('Jai', 'MCA'): [2], ('Jai', 'Msc'): [0], ('Princi', 'Msc'): [6], ('Princi', 'Phd'): [3]}


3. Grouping data by sorting keys:

In [19]:
df.groupby('Name')['Age'].sum()


Name
Abhi      32
Anuj      60
Gaurav    33
Jai       49
Princi    59
Name: Age, dtype: int64

In [21]:
df.groupby(['Name'], sort=False)['Age'].sum()

Name
Jai       49
Anuj      60
Princi    59
Gaurav    33
Abhi      32
Name: Age, dtype: int64

4. Grouping data with object attributes:

In [24]:
df.groupby('Name').groups

{'Abhi': [7], 'Anuj': [1, 5], 'Gaurav': [4], 'Jai': [0, 2], 'Princi': [3, 6]}

5. Iterating through groups:

In [27]:
grp = df.groupby('Name')
for name, group in grp:
    print(name)
    print(group)
    print()

Abhi
   Name  Age  Address Qualification
7  Abhi   32  Aligarh            MA

Anuj
   Name  Age Address Qualification
1  Anuj   24  Kanpur            MA
5  Anuj   36  Kanpur         B.com

Gaurav
     Name  Age  Address Qualification
4  Gaurav   33  Jaunpur        B.Tech

Jai
  Name  Age    Address Qualification
0  Jai   27     Nagpur           Msc
2  Jai   22  Allahabad           MCA

Princi
     Name  Age    Address Qualification
3  Princi   32    Kannuaj           Phd
6  Princi   27  Allahabad           Msc



6. Selecting a group:

In [30]:
grp = df.groupby('Name')
grp.get_group('Jai')

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Nagpur,Msc
2,Jai,22,Allahabad,MCA


Step 2: Applying Functions to Groups

1. Aggregation: 

In [36]:
import numpy as np
grp1 = df.groupby('Name')
grp1['Age'].aggregate(np.sum)

  grp1['Age'].aggregate(np.sum)


Name
Abhi      32
Anuj      60
Gaurav    33
Jai       49
Princi    59
Name: Age, dtype: int64

In [38]:
grp1 = df.groupby(['Name', 'Qualification'])
grp1['Age'].aggregate(np.sum)

  grp1['Age'].aggregate(np.sum)


Name    Qualification
Abhi    MA               32
Anuj    B.com            36
        MA               24
Gaurav  B.Tech           33
Jai     MCA              22
        Msc              27
Princi  Msc              27
        Phd              32
Name: Age, dtype: int64

In [40]:
grp = df.groupby('Name')
grp['Age'].agg([np.sum, np.mean, np.std])

  grp['Age'].agg([np.sum, np.mean, np.std])
  grp['Age'].agg([np.sum, np.mean, np.std])
  grp['Age'].agg([np.sum, np.mean, np.std])


Unnamed: 0_level_0,sum,mean,std
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abhi,32,32.0,
Anuj,60,30.0,8.485281
Gaurav,33,33.0,
Jai,49,24.5,3.535534
Princi,59,29.5,3.535534


2. Transformation:

In [45]:
import pandas as pd
data2 = {'Name':['Jai', 'Anuj', 'Jai', 'Princi', 
                 'Gaurav', 'Anuj', 'Princi', 'Abhi'], 
        'Age':[27, 24, 22, 32, 
               33, 36, 27, 32], 
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',
                   'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd',
                         'B.Tech', 'B.com', 'Msc', 'MA'],
        'Score':[23, 34, 35, 45, 47, 50, 52, 53]} 
df2 = pd.DataFrame(data2)
grp2 = df2.groupby('Name')
sc = lambda x: (x - x.mean()) / x.std() 
grp2['Age'].transform(sc)

0    0.707107
1   -0.707107
2   -0.707107
3    0.707107
4         NaN
5    0.707107
6   -0.707107
7         NaN
Name: Age, dtype: float64

3. Filtration:

In [48]:
grp2 = df2.groupby('Name')
grp2.filter(lambda x: len(x) >= 2)

Unnamed: 0,Name,Age,Address,Qualification,Score
0,Jai,27,Nagpur,Msc,23
1,Anuj,24,Kanpur,MA,34
2,Jai,22,Allahabad,MCA,35
3,Princi,32,Kannuaj,Phd,45
5,Anuj,36,Kanpur,B.com,50
6,Princi,27,Allahabad,Msc,52


Step 3: Combining Results

In [51]:
df.groupby('Name').agg({'Age': 'sum'})

Unnamed: 0_level_0,Age
Name,Unnamed: 1_level_1
Abhi,32
Anuj,60
Gaurav,33
Jai,49
Princi,59
