# **6. Aggregation & Grouping**

## **5. Pivot Tables & Cross Tabulations**

In [1]:
import pandas as pd

## ✅ 1. What it does and **when** to use it

Pivot tables and cross tabulations help in **summarizing**, **reshaping**, and **aggregating** large datasets.

They’re essential when:

* You need a **matrix-style summary** of your data
* You want to view the relationship between two or more categorical features
* You want to **aggregate a value column** based on multiple row/column labels

| Feature         | Use case                                                  |
| --------------- | --------------------------------------------------------- |
| `pivot_table()` | Multi-dimensional aggregation (like Excel pivot tables)   |
| `pivot()`       | Simple reshaping when data is already uniquely structured |
| `crosstab()`    | Frequency/contingency tables for categorical variables    |


## ✅ 2. Syntax and Core Parameters

### 🔹 `pivot_table()`

```python
pd.pivot_table(
    data, 
    values=None, 
    index=None, 
    columns=None, 
    aggfunc='mean', 
    fill_value=None, 
    margins=False, 
    dropna=True
)
```

| Parameter    | Description                                               |
| ------------ | --------------------------------------------------------- |
| `data`       | DataFrame                                                 |
| `values`     | Column(s) to aggregate                                    |
| `index`      | Row labels                                                |
| `columns`    | Column labels                                             |
| `aggfunc`    | Aggregation function (`'mean'`, `'sum'`, `'count'`, etc.) |
| `fill_value` | Fill missing values                                       |
| `margins`    | Add subtotals (margins)                                   |
| `dropna`     | Drop missing column combinations                          |

---

### 🔹 `pivot()`

```python
df.pivot(index='row_col', columns='col_col', values='value_col')
```

⚠️ Requires **unique combinations** of index and columns — will raise error otherwise.

---

### 🔹 `crosstab()`

```python
pd.crosstab(index, columns, values=None, aggfunc=None, normalize=False)
```

| Parameter   | Description                               |
| ----------- | ----------------------------------------- |
| `index`     | Rows (categorical)                        |
| `columns`   | Columns (categorical)                     |
| `values`    | Optional numeric values to aggregate      |
| `aggfunc`   | Aggregation function if `values` given    |
| `normalize` | Normalize results (e.g., by rows/columns) |


## ✅ 3. Different Methods and Techniques

| Technique                                  | Description                                    |
| ------------------------------------------ | ---------------------------------------------- |
| Basic pivot table                          | Aggregates one column by multiple rows/columns |
| Multi-index rows/columns                   | Hierarchical summarization                     |
| Using `fill_value`                         | Replace NaNs with defaults (e.g., 0)           |
| Adding `margins=True`                      | Adds subtotal (like grand total)               |
| Normalizing in `crosstab()`                | Converts raw counts into proportions           |
| Multi-function aggregation (via `aggfunc`) | Use list or dict for multiple summaries        |


## ✅ 4. Common Pitfalls & Best Practices

| Pitfall                                     | Tip / Best Practice                            |
| ------------------------------------------- | ---------------------------------------------- |
| Using `pivot()` when data has duplicates    | Use `pivot_table()` instead                    |
| Forgetting `fill_value`                     | Use `fill_value=0` or `''` to avoid NaNs       |
| Misinterpreting MultiIndex                  | Use `.reset_index()` or `.unstack()` if needed |
| Unaware of `margins=True`                   | Use it for subtotals like Excel pivot tables   |
| Using `pivot()` with ambiguous combinations | Use `pivot_table()` with `aggfunc` instead     |


## ✅ 5. Examples on Real/Pseudo Data

In [2]:
df = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'North', 'South', 'East'],
    'Product': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Sales': [250, 150, 200, 400, 300, 350]
})
df

Unnamed: 0,Region,Product,Sales
0,North,A,250
1,South,A,150
2,East,A,200
3,North,B,400
4,South,B,300
5,East,B,350


In [3]:
### 🔸 Example 1: Simple Pivot Table

pd.pivot_table(df, values='Sales', index='Region', columns='Product', aggfunc='sum')

Product,A,B
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
East,200,350
North,250,400
South,150,300


In [4]:
### 🔸 Example 2: Pivot Table with `fill_value` and `margins`
pd.pivot_table(
    df,
    values='Sales',
    index='Region',
    columns='Product',
    aggfunc='sum',
    fill_value=0,
    margins=True
)

Product,A,B,All
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
East,200,350,550
North,250,400,650
South,150,300,450
All,600,1050,1650


In [5]:
### 🔸 Example 3: Using `crosstab()`
df

Unnamed: 0,Region,Product,Sales
0,North,A,250
1,South,A,150
2,East,A,200
3,North,B,400
4,South,B,300
5,East,B,350


In [8]:
pd.crosstab(df['Region'], df['Product'])

Product,A,B
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
East,1,1
North,1,1
South,1,1


**Counts how many times each product sold per region.**

In [9]:
pd.crosstab(df['Region'], df['Sales'])

Sales,150,200,250,300,350,400
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
East,0,1,0,0,1,0
North,0,0,1,0,0,1
South,1,0,0,1,0,0


In [10]:
### 🔸 Example 4: Crosstab with aggregation
pd.crosstab(
    df['Region'],
    df['Product'],
    values=df['Sales'],
    aggfunc='sum'
)

Product,A,B
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
East,200,350
North,250,400
South,150,300


In [12]:
### 🔸 Example 5: Pivot (Not Pivot Table)
# This works only if there’s one value per Region-Product pair
df.pivot(index='Region', columns='Product', values='Sales')

Product,A,B
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
East,200,350
North,250,400
South,150,300


## ✅ 6. Real World Use Cases

| Scenario                   | How Pivot Table / Crosstab Helps                     |
| -------------------------- | ---------------------------------------------------- |
| 🔸 **Sales Analysis**      | Summarize total sales by product, region, or quarter |
| 🔸 **Retail Inventory**    | Count how many SKUs per category per store           |
| 🔸 **Survey Results**      | Frequency tables of responses per demographic        |
| 🔸 **Student Performance** | Average marks per class per subject                  |
| 🔸 **E-commerce**          | Total revenue by category and customer segment       |
| 🔸 **Call Center**         | Count of issues per agent per department             |


## ✅ Summary

| Concept         | Use When You Want To...                            |
| --------------- | -------------------------------------------------- |
| `pivot_table()` | Aggregate values with flexible multi-dim summaries |
| `pivot()`       | Reshape data (unique combinations only)            |
| `crosstab()`    | Build frequency or contingency tables              |


<center><b>Thanks</b></center>