## Python Pandas - GroupBy

Any groupby operation involves one of the following operations on the original object. They are:

1. Splitting the object.
2. Applying a funcion.
3. Combining the results.

In many situations, we split the data into sets and we apply some functionality on each subset. In the apply functionality, we can perform the following operations:

* Aggregation: computing a summary statistic
* Transformation: perform some group-specific operation
* Filtration: discarding the data with some condition

In [None]:
import pandas as pd

In [None]:
ipl_data = {'Team': ['Riders','Riders','Devils','Devils','Kings','Kings','Kings','Kings',
                     'Riders','Royals','Royals','Riders'],
            'Rank': [1,2,2,3,3,4,1,1,2,4,1,2],
            'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
            'Points': [876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

In [None]:
df

In [None]:
# Split the data into groups
df.groupby("Team")

In [None]:
df.groupby("Team").groups

In [None]:
df.groupby(["Team","Year"]).groups # Groupby with multiple columns

In [None]:
df.groupby("Year").groups

In [None]:
grouped = df.groupby("Year") # Iterating through groups
for name,group in grouped:
    print(name)
    print(group)

In [None]:
# Using the get_group() method, we can select a single group.
grouped = df.groupby("Year")
grouped.get_group(2014,None)

## Aggregations

In [None]:
import numpy as np

In [None]:
df

In [None]:
df.groupby("Team").size()

In [None]:
# How to see the size of each group is by applying the size() function
df.groupby("Team").agg(np.size)

In [None]:
grouped = df.groupby("Team")
grouped['Points'].agg([np.sum,np.mean,np.std])

In [None]:
df.groupby("Team").Points.agg([np.sum,np.mean,np.std])

## Filtration

Filtration filters the data on a defined criteria and returns the subset of data. The filter() function is used to filter the data.

In [None]:
# It returns the teams which have participated 3 or more times in Ipl.
df.groupby("Team").filter(lambda x: len(x)>=3)