# What is Groupby in Pandas?

### Pandas groupby() function is a powerful tool used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis and aggregation. 
### It follows a “split-apply-combine” strategy, where data is divided into groups, a function is applied to each group, and the results are combined into a new DataFrame. 
### For example, if you have a dataset of sales transactions, you can use groupby() to group the data by product category and calculate the total sales for each category.

## How to Use Pandas GroupBy Method?

### Syntax

### DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)

In [1]:
import pandas as pd
import numpy as np



In [2]:
# Sample Data
data = {
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'IT', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank'],
    'Salary': [50000, 60000, 45000, 47000, 70000, 72000],
    'Bonus': [5000, 4000, 3000, 3500, 8000, 7500]
}

df = pd.DataFrame(data)
print(df)

  Department Employee  Salary  Bonus
0      Sales    Alice   50000   5000
1      Sales      Bob   60000   4000
2         HR  Charlie   45000   3000
3         HR    David   47000   3500
4         IT      Eva   70000   8000
5         IT    Frank   72000   7500


In [3]:
#for name, group in df.groupby("Department"):
 #   print(name)
  #  print(group)

In [4]:
# Group by 'Department' and sum the 'Salary'
result = df.groupby('Department')['Salary'].sum()
print(result)

Department
HR        92000
IT       142000
Sales    110000
Name: Salary, dtype: int64


In [5]:
# Apply multiple aggregation functions
result = df.groupby('Department')['Salary'].agg(['sum', 'mean'])
print(result)

               sum     mean
Department                 
HR           92000  46000.0
IT          142000  71000.0
Sales       110000  55000.0


In [6]:
# Group by both 'Department' and 'Employee'
result = df.groupby(['Department', 'Employee'])['Bonus'].sum()
print(result)

Department  Employee
HR          Charlie     3000
            David       3500
IT          Eva         8000
            Frank       7500
Sales       Alice       5000
            Bob         4000
Name: Bonus, dtype: int64


In [None]:
df=pd.DataFrame(
    {"key1":list("aabbab"),
     "key2":["one","two","three"]*2,
     "data1":np.random.randn(6),
     "data2":np.random.randn(6)})
df

In [None]:
group=df["data1"].groupby(df["key1"])

In [None]:
group

In [None]:
group.mean()

In [None]:
ave=df["data1"].groupby([df["key1"],
                         df["key2"]]).mean()
ave

In [None]:
ave.unstack()

In [None]:
df.groupby("key1").mean()

In [None]:
df.groupby(["key1","key2"]).mean()

## Iterating over Groups

In [None]:
for name, group in df.groupby("key1"):
    print(name)
    print(group)

In [None]:
for (x1,x2),group in df.groupby(["key1",
                                 "key2"]):
    print(x1,x2)
    print(group)

In [None]:
piece=dict(list(df.groupby("key1")))

In [None]:
piece["a"]

## Selecting a Column or Subset of Columns

In [None]:
df.groupby(['key1', 
            'key2'])[['data1']].mean()

## Grouping with Dicts and Series

In [None]:
fruit=pd.DataFrame(np.random.randn(4,4),
                   columns=list("abcd"),
                   index=["apple","cherry",
                          "banana","kiwi"])
fruit

In [None]:
label={"a": "green","b":"yellow",
       "c":"green","d":"yellow",
       "e":"purple"}

In [None]:
group=fruit.groupby(label,axis=1)

In [None]:
group.sum()

In [None]:
s=pd.Series(label)
s

In [None]:
fruit.groupby(s,axis=1).count()

## Grouping with Functions

In [None]:
fruit.groupby(len).sum()

## Grouping by Index Levels

In [None]:
data=pd.DataFrame(np.random.randn(4,5),
                  columns=[list("AAABB"),
                           [1,2,3,1,2]])

In [None]:
data.columns.names=["letter","number"]
data

In [None]:
data.groupby(level="letter",axis=1).sum()

## Application with Real Data Set 

In [None]:
game=pd.read_csv("vgsalesGlobale.csv")

In [None]:
game.head()

In [None]:
game.dtypes

In [None]:
game.info()
#game.isnull().sum().sort_values(ascending=False)

In [None]:
game.isnull().sum()

In [None]:
game.dropna().isnull().sum()

In [None]:
game.describe()

In [None]:
game.dropna().describe()

In [None]:
game.Global_Sales.mean()

In [None]:
group=game.groupby("Genre")

In [None]:
group["Global_Sales"].count()

In [None]:
group["Global_Sales"].describe()

In [None]:
game[game.Genre=="Action"].Global_Sales.mean()

In [None]:
group.mean()

In [None]:
%matplotlib inline

In [None]:
group["Global_Sales"].mean().plot(kind="bar")

In [None]:
group[["NA_Sales",
       "EU_Sales",
       "JP_Sales"]].mean().plot(kind="bar")