## [ Pivot Tables ]

#### 1. **Pivot Table is a Data Summarization Tool**
- **What it means**: A pivot table helps you **summarize and analyze** large datasets quickly.
#### 2. **Aggregates Data by One or More Keys**
- **Explanation**: It groups data using one or more columns (keys), and then **applies an aggregation** (like `sum()`, `mean()`, etc.).

#### 3. **Data Arranged in a Rectangular Table**
- **Explanation**: The result is a **grid (table)** with row headers and column headers from your grouping keys.

#### 4. **Pandas Uses `groupby` + Reshape (Hierarchical Indexing)**
- **Explanation**: Pivot tables work under the hood using:
  - `groupby()` to **group data**.
  - **Reshaping** methods (like `.unstack()` or `.stack()`) to build the table format.
- **Hierarchical indexing** allows for multiple group levels.

#### 5. **`pivot_table()` Method and Function**
- **Explanation**: 
  - You can use **`DataFrame.pivot_table()`** directly.
  - Or use **`pandas.pivot_table()`** as a top-level function.
  - Both allow **easy pivoting** without writing `groupby()` and `unstack()` manually.

#### 6. **Supports Aggregations and Margins (Partial Totals)**
- **Explanation**: 
  - You can specify aggregation functions like `sum`, `mean`, etc.
  - Set `margins=True` to include row/column totals.


In [4]:
import numpy as np 
import pandas as pd 

# returning to the tipping dataset,
tips = pd.read_csv("examples/tips.csv")
# suppose you wanted to compute a table of group means (the default pivot_table aggregaton type) arranged by day and smoker on the rows:
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size
0,16.99,1.01,No,Sun,Dinner,2
1,10.34,1.66,No,Sun,Dinner,3
2,21.01,3.5,No,Sun,Dinner,3
3,23.68,3.31,No,Sun,Dinner,2
4,24.59,3.61,No,Sun,Dinner,4


In [6]:
tips.pivot_table(index=["day", "smoker"], values="total_bill", aggfunc="mean")

Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill
day,smoker,Unnamed: 2_level_1
Fri,No,18.42
Fri,Yes,16.813333
Sat,No,19.661778
Sat,Yes,21.276667
Sun,No,20.506667
Sun,Yes,24.12
Thur,No,17.113111
Thur,Yes,19.190588


In [8]:
# Now, suppose we want to take the average of only tip_pct and size, and additionally group by time. 
# I’ll put smoker in the table columns and time and day in the rows

tips.pivot_table(index=["time", "day"], columns="smoker", values=["size"])

Unnamed: 0_level_0,Unnamed: 1_level_0,size,size
Unnamed: 0_level_1,smoker,No,Yes
time,day,Unnamed: 2_level_2,Unnamed: 3_level_2
Dinner,Fri,2.0,2.222222
Dinner,Sat,2.555556,2.47619
Dinner,Sun,2.929825,2.578947
Dinner,Thur,2.0,
Lunch,Fri,3.0,1.833333
Lunch,Thur,2.5,2.352941


In [9]:
# we could augment this table to include partial totals by passing margins=True.
# this has the effect of adding All row and column labels, with corresponding values being the group statistics for all the data within a single tier

tips.pivot_table(index=["time", "day"], columns="smoker", values=["size"], margins=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,size,size,size
Unnamed: 0_level_1,smoker,No,Yes,All
time,day,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Dinner,Fri,2.0,2.222222,2.166667
Dinner,Sat,2.555556,2.47619,2.517241
Dinner,Sun,2.929825,2.578947,2.842105
Dinner,Thur,2.0,,2.0
Lunch,Fri,3.0,1.833333,2.0
Lunch,Thur,2.5,2.352941,2.459016
All,,2.668874,2.408602,2.569672


In [10]:
# to use aggregation function other than mean, pass it to the aggfunc keyword argument
tips.pivot_table(index=["time", "smoker"], columns="day", values="total_bill", aggfunc=len, margins=True)

Unnamed: 0_level_0,day,Fri,Sat,Sun,Thur,All
time,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dinner,No,3.0,45.0,57.0,1.0,106
Dinner,Yes,9.0,42.0,19.0,,70
Lunch,No,1.0,,,44.0,45
Lunch,Yes,6.0,,,17.0,23
All,,19.0,87.0,76.0,62.0,244


In [11]:
# if some combinations are empty (or otherwise NA) you may wish to pass a fill_value
tips.pivot_table(index=["time", "smoker"], columns="day", values="total_bill", aggfunc=len, margins=True, fill_value=0)

Unnamed: 0_level_0,day,Fri,Sat,Sun,Thur,All
time,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dinner,No,3,45,57,1,106
Dinner,Yes,9,42,19,0,70
Lunch,No,1,0,0,44,45
Lunch,Yes,6,0,0,17,23
All,,19,87,76,62,244



#### **`pandas.pivot_table()` Parameters:**

| **Parameter**     | **Description** |
|-------------------|------------------|
| `data`            | The DataFrame to operate on. |
| `values`          | Column(s) to aggregate. Can be a string or list. |
| `index`           | Column(s) to group by along the **rows**. |
| `columns`         | Column(s) to group by along the **columns**. |
| `aggfunc`         | Aggregation function(s), e.g., `'mean'`, `'sum'`, `np.mean`, list of functions. Default is `'mean'`. |
| `fill_value`      | Value to replace missing values in the output (e.g., `0`, `"-"`, etc.). |
| `margins`         | Add all rows/columns totals (**True/False**). Default is `False`. |
| `margins_name`    | Name for the totals row/column when `margins=True`. Default is `'All'`. |
| `dropna`          | If `True` (default), don’t include columns whose entries are all NaN. |
| `observed`        | For categorical groupers: if `True`, only show observed combinations. Default is `False`. |
| `sort` (v1.3+)    | If `True`, sorts the result by the group keys. Default is `True`. |


## [ Cross-Tabulations: Crosstab ]
crosstab for short is a special case of a pivot table that computes group frequencies.

In [12]:
# example

from io import StringIO

data = """Sample Nationality Handedness
1 USA Right-handed
2 Japan Left-handed
3 USA Right-handed
4 Japan Right-handed
5 Japan Left-handed
6 Japan Right-handed
7 USA Right-handed
8 USA Left-handed
9 Japan Right-handed
10 USA Right-handed"""

data = pd.read_table(StringIO(data), sep="\s+")
data

  data = pd.read_table(StringIO(data), sep="\s+")


Unnamed: 0,Sample,Nationality,Handedness
0,1,USA,Right-handed
1,2,Japan,Left-handed
2,3,USA,Right-handed
3,4,Japan,Right-handed
4,5,Japan,Left-handed
5,6,Japan,Right-handed
6,7,USA,Right-handed
7,8,USA,Left-handed
8,9,Japan,Right-handed
9,10,USA,Right-handed


In [13]:
# as part of some survey analysis, we might want to summarize this data by nationality and handedness.
# you could use pivot_table to do this, but the pandas.crosstab function can be more convenient

pd.crosstab(data["Nationality"], data["Handedness"], margins=True)

Handedness,Left-handed,Right-handed,All
Nationality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Japan,2,3,5
USA,1,4,5
All,3,7,10


In [14]:
# the first two arguments to crosstab can each be an array or Series or a list of arrays
pd.crosstab([tips["time"], tips["day"]], tips["smoker"], margins=True)

Unnamed: 0_level_0,smoker,No,Yes,All
time,day,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Dinner,Fri,3,9,12
Dinner,Sat,45,42,87
Dinner,Sun,57,19,76
Dinner,Thur,1,0,1
Lunch,Fri,1,6,7
Lunch,Thur,44,17,61
All,,151,93,244
