<center><font size=6 color="#00416d">GroupBy</font></center>

Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)

In [None]:
import pandas as pd

In [None]:
fortune_comp = pd.read_csv("fortune1000.csv")
fortune_comp.set_index('Rank', inplace=True)
# Group of dataframes bundles together
sector = fortune_comp.groupby('Sector')
fortune_comp.sample(6)

In [None]:
type(fortune_comp)

In [None]:
type(sector)

In [None]:
# Returns the total number of rows
len(fortune_comp)

In [None]:
# which is equal to
# fortune_comp['Sector'].nunique()
len(sector)

In [None]:
# Returns the sector and number of rows associated(total count) with that sector
# Which is similar to 
# fortune_comp["Sector"].value_counts()
sector.size()

In [None]:
# Returns the first row from each group
sector.first()

In [None]:
# Returns the last row from each group
sector.last()

In [None]:
# It returns a dictionary with sector as key and row numbers associated with that key
sector.groups

### .get_group() method
To get rows associated with perticular group

In [None]:
fortune = pd.read_csv("fortune1000.csv", index_col=["Rank"])
sectors = fortune.groupby("Sector")
fortune.head()

In [None]:
# The get_group() method returns rows associated with perticular group
sectors.get_group("Energy")

In [None]:
sectors.get_group("Technology")

### Methematical Methods on groupby object

In [None]:
fortune = pd.read_csv("fortune1000.csv", index_col=["Rank"])
sectors = fortune.groupby("Sector")
fortune.head()

In [None]:
# By default it will check left most column, there fore in our case it is "Company"
# In case of strings it will check smallest alphabet
sectors.min()

In [None]:
sectors.max()

In [None]:
# Some methods only applies on numerical columns of the DataFrame
# It will return total sum, and mean of rows in perticular sector
sectors.sum()
sectors.mean()

In [None]:
# We can also apply on perticular column of the sectors
sectors['Revenue'].sum()
# Returns maximum revenue value
sectors['Revenue'].max()
# Returns minimum revenue value
sectors['Revenue'].min()

In [None]:
# We can also pass multiple columns
sectors[['Revenue', 'Profits']].sum()

### To groupby multiple columns

In [None]:
fortune = pd.read_csv("fortune1000.csv", index_col=["Rank"])
sectors = fortune.groupby(["Sector", "Industry"])
fortune.head()

In [None]:
sectors.size()

In [None]:
# Each row equals in the result equals to below like command
# fortune[(fortune["Sector"] == "Business Services") & (fortune["Industry"] == "Education")].sum()
sectors.sum()

In [None]:
sectors.mean()

In [None]:
# All companies sum, which are inside perticular Sector and Industry
sectors["Employees"].sum().to_frame()

### The agg() method

In [None]:
fortune = pd.read_csv("fortune1000.csv", index_col=["Rank"])
sectors = fortune.groupby(["Sector", "Industry"])
fortune.head()

In [None]:
# We can perform specific actions on invidual columns
sectors[["Revenue", "Profits", "Employees"]].agg(['sum', 'mean', 'min'])

In [None]:
#  we can also pass dictionary to the agg function.
sectors.agg({'Revenue': ['sum', 'mean'],
             'Profits': 'min',
             'Employees': ['mean', 'min']})

### Iterating through groups

In [None]:
fortune = pd.read_csv("fortune1000.csv", index_col=["Rank"])
sectors = fortune.groupby(["Sector", "Industry"])
fortune.head()

In [None]:
# Creating empty DataFrame
df = pd.DataFrame(columns = fortune.columns)

In [None]:
for sector, data in sectors:
    # print(data)
    # Highest revenue company in the each sector and industry
    highest_revenue = data.nlargest(1, 'Revenue')
    df.loc[len(df)] = highest_revenue.values.tolist()[0]

In [None]:
df.columns

In [None]:
df