# Grouping Operations in Pandas

This notebook demonstrates various grouping operations in Pandas, including grouping by a single column, using multiple aggregation functions, and grouping by multiple columns.

In [7]:
import pandas as pd

## Creating a DataFrame for Employee Data

We will create a DataFrame to represent employee data, including names, genders, departments, locations, and salaries.

In [8]:
# Create a dictionary with employee data
employees_df = pd.DataFrame({
    'Name': ['Eric', 'Ivy', 'Jude', 'Jane', 'Jesse'],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Female'],
    'Department': ['IT', 'Finance', 'IT', 'Finance', 'IT'],
    'Location': ['Nairobi', 'Nakuru', 'Nairobi', 'Nakuru', 'Mombasa'],
    'Salary': [50000, 60000, 55000, 62000, 58000]
})

# Display the DataFrame
employees_df

Unnamed: 0,Name,Gender,Department,Location,Salary
0,Eric,Male,IT,Nairobi,50000
1,Ivy,Female,Finance,Nakuru,60000
2,Jude,Male,IT,Nairobi,55000
3,Jane,Female,Finance,Nakuru,62000
4,Jesse,Female,IT,Mombasa,58000


## Grouping by Gender and Summing the Salaries

We will group the data by the 'Gender' column and sum the salaries.

In [9]:
# Group by gender and sum the salaries
grouped_gender = employees_df.groupby('Gender')['Salary'].sum()

# Display the grouped data
grouped_gender

Gender
Female    180000
Male      105000
Name: Salary, dtype: int64

## Grouping by Department and Calculating the Mean Salary

We will group the data by the 'Department' column and calculate the mean salary.

In [10]:
# Group by 'Department' and calculate the mean 'Salary'
grouped_department = employees_df.groupby('Department')['Salary'].mean()

# Display the grouped data
grouped_department

Department
Finance    61000.000000
IT         54333.333333
Name: Salary, dtype: float64

## Using Multiple Aggregation Functions

We will use multiple aggregation functions on the grouped data using the `agg` method. This allows us to calculate multiple statistics for the 'Salary' column.

In [11]:
# Group by 'Department' and calculate multiple statistics for 'Salary'
grouped_department_statistics = employees_df.groupby('Department')['Salary'].agg(['sum', 'mean', 'count'])

# Display the grouped data with multiple statistics
grouped_department_statistics

Unnamed: 0_level_0,sum,mean,count
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Finance,122000,61000.0,2
IT,163000,54333.333333,3


## Grouping by Multiple Columns

We will group the data by both 'Department' and 'Location' columns and calculate the mean salary.

In [6]:
# Grouping by multiple columns and calculating the mean salary
grouped_multiple = employees_df.groupby(['Department', 'Location'])['Salary'].mean()

# Display the grouped data
grouped_multiple

Department  Location
Finance     Nakuru      61000.0
IT          Mombasa     58000.0
            Nairobi     52500.0
Name: Salary, dtype: float64