# Pandas Aggregation Tutorial

In this tutorial, we will cover various aspects of data aggregation in Pandas library of Python. We will go through the following topics:

1. Creating a Sample Dataset
2. Mean, Median, Max, Sum
3. Conditional Selection of Columns
4. Getting the Number of Unique Values
5. Boolean Masks for Selection
6. Groupby Commands

Let's get started!


## 1. Creating a Sample Dataset

First, let's create a sample dataset that we'll use throughout this tutorial.
We'll create a DataFrame with columns `Name`, `Age`, `Salary`, and `Department`.
Then, we'll display the dataset.

In [1]:
import pandas as pd
# Creating the sample dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35, 25, 30, 35],
    'Salary': [50000, 60000, 70000, 55000, 65000, 75000],
    'Department': ['HR', 'Finance', 'IT', 'HR', 'Finance', 'IT']
}

# Creating DataFrame
df = pd.DataFrame(data)

# Displaying the dataset
df

Unnamed: 0,Name,Age,Salary,Department
0,Alice,25,50000,HR
1,Bob,30,60000,Finance
2,Charlie,35,70000,IT
3,Alice,25,55000,HR
4,Bob,30,65000,Finance
5,Charlie,35,75000,IT


## 2. Mean, Median, Max, Sum

Let's calculate the mean, median, max, and sum of the `Salary` column.

In [2]:
# Calculating mean, median, max, and sum of Salary column
mean_salary = df['Salary'].mean()
median_age = df['Age'].median()
max_salary = df['Salary'].max()
sum_salaries = df['Salary'].sum()

# Displaying the results
mean_salary, median_age, max_salary, sum_salaries

(np.float64(62500.0), np.float64(30.0), np.int64(75000), np.int64(375000))

## 3. Conditional Selection of Columns

We can select rows based on certain conditions. Let's select people with salary greater than 60000.

In [4]:
# Selecting people with salary greater than 60000
high_salary_df = df[df['Salary'] > 60000]

# Displaying the resulting DataFrame
high_salary_df



Unnamed: 0,Name,Age,Salary,Department
2,Charlie,35,70000,IT
4,Bob,30,65000,Finance
5,Charlie,35,75000,IT


## 4. Getting the Number of Unique Values

We can get the number of unique values in a column. Let's find the number of unique names.

In [5]:
# Getting the number of unique names
unique_names_count = df['Name'].nunique()

# Displaying the result
unique_names_count


3

## 5. Boolean Masks for Selection

We can use boolean masks to select rows based on conditions. Let's select people with age greater than 30.

In [6]:
# Selecting people with age greater than 30
mask = df['Age'] > 30
older_people_df = df[mask]

# Displaying the resulting DataFrame
older_people_df

Unnamed: 0,Name,Age,Salary,Department
2,Charlie,35,70000,IT
5,Charlie,35,75000,IT


## 6. Groupby Commands

We can group data based on certain criteria using the `groupby` function. Let's group by `Department` and calculate the mean salary for each department.

In [7]:
# Grouping by Department and calculating mean salary
department_salary_mean = df.groupby('Department')['Salary'].mean()

# Displaying the result
department_salary_mean

Department
Finance    62500.0
HR         52500.0
IT         72500.0
Name: Salary, dtype: float64