# Pivot Tables

A **pivot table** is a powerful tool for summarizing and analyzing data. It allows you to quickly reorganize and condense large datasets by grouping data and applying aggregation functions such as **sum, mean, median, min, max, or count**.

When you aggregate data with a pivot table, you are essentially:

1. **Choosing rows and columns** → deciding how to categorize the data (e.g., by store type, product category, or date).
2. **Selecting values to summarize** → picking a numeric column (like sales, revenue, or quantities).
3. **Applying an aggregation function** → determining how the values should be combined (e.g., summing sales, averaging prices, finding the maximum revenue).

Pivot tables are especially useful because they:

* Allow multi-level grouping (e.g., by store type *and* by holiday status).
* Provide flexibility to compare across categories.
* Make it easier to spot patterns and trends in the data.

In short, a pivot table **aggregates raw data into a compact summary table**, giving you insights at a glance without manually filtering or grouping.

## Preparing Data

In [1]:
import pandas as pd
sales = pd.read_csv("datasets/sales_subset.csv")

## Exercise: Pivoting on one variable

In spreadsheets, pivot tables are the go-to method for summarizing data.

In pandas, the `.pivot_table()` function serves a similar purpose. It works much like `.groupby()`, but offers a more spreadsheet-like approach to aggregating data. In this exercise, you’ll practice creating pivot tables to reproduce the same calculations you previously did with `.groupby()`.

### Instructions

1. Create a pivot table that calculates the **mean** of `weekly_sales` for each store type, and save it as `mean_sales_by_type`.
2. Build another pivot table to compute both the **mean** and **median** of `weekly_sales` for each store type, storing it in `mean_med_sales_by_type`.
3. Make a pivot table that calculates the **mean** of `weekly_sales` grouped by both store type and holiday indicator. Save this as `mean_sales_by_type_holiday`.

In [2]:
# Step 1: Mean weekly sales by store type
mean_sales_by_type = sales.pivot_table(values="weekly_sales", index="type")
print(mean_sales_by_type)

      weekly_sales
type              
A     23674.667242
B     25696.678370


In [4]:
# Step 2: Mean and median weekly sales by store type
mean_med_sales_by_type = sales.pivot_table(values="weekly_sales", index="type", aggfunc=["mean", "median"])
print(mean_med_sales_by_type)

              mean       median
      weekly_sales weekly_sales
type                           
A     23674.667242     11943.92
B     25696.678370     13336.08


In [5]:
# Step 3: Mean weekly sales by store type and holiday flag
mean_sales_by_type_holiday = sales.pivot_table(values="weekly_sales", index="type", columns="is_holiday")
print(mean_sales_by_type_holiday)

is_holiday         False      True 
type                               
A           23768.583523  590.04525
B           25751.980533  810.70500


## Exercise: Fill in missing values and sum values with pivot tables

The `.pivot_table()` method comes with extra options that make it even more powerful. Two of the most useful are:

* **`fill_value`** → lets you replace missing values with a placeholder (like `0`). This process is called *imputation*. While choosing the right replacement can be complex, a simple approach is to fill gaps with `0` when working with numeric data.
* **`margins`** → automatically adds totals for rows and columns, saving you from doing extra calculations.

In this task, you’ll practice using both arguments to make pivot tables more complete and insightful.

### Instructions

1. Create a pivot table showing the **mean** of `weekly_sales` by department and store type. Replace any missing values with `0`.
2. Make the same pivot table, but this time also display the **row and column totals** using the `margins` option.

In [6]:
# Step 1: Mean weekly sales by department and type; replace missing values with 0
print(sales.pivot_table(values="weekly_sales", index="department", columns="type", fill_value=0))

type                    A              B
department                              
1            30961.725379   44050.626667
2            67600.158788  112958.526667
3            17160.002955   30580.655000
4            44285.399091   51219.654167
5            34821.011364   63236.875000
...                   ...            ...
95          123933.787121   77082.102500
96           21367.042857    9528.538333
97           28471.266970    5828.873333
98           12875.423182     217.428333
99             379.123659       0.000000

[80 rows x 2 columns]


In [7]:
# Step 2: Same pivot, but also include row and column totals
print(sales.pivot_table(values="weekly_sales", index="department", columns="type", fill_value=0, margins=True))

type                   A              B           All
department                                           
1           30961.725379   44050.626667  32052.467153
2           67600.158788  112958.526667  71380.022778
3           17160.002955   30580.655000  18278.390625
4           44285.399091   51219.654167  44863.253681
5           34821.011364   63236.875000  37189.000000
...                  ...            ...           ...
96          21367.042857    9528.538333  20337.607681
97          28471.266970    5828.873333  26584.400833
98          12875.423182     217.428333  11820.590278
99            379.123659       0.000000    379.123659
All         23674.667242   25696.678370  23843.950149

[81 rows x 3 columns]
