# Pivot Tables

## Introduction 

Most of us have heard of pivot tables from Excel. Pivot tables are a useful tool for reshaping and aggregating our data to gain meaningful insight about the data.

![alt text](https://i2.wp.com/cmdlinetips.com/wp-content/uploads/2018/12/pivot_example_Pandas.jpg)

## The pivot_table function

In Pandas we create a pivot table using the pivot_table function. However, before using the function, let's spend some time looking at the different components of the function. 

### Index

The index columns are the columns we are going to group by.

For example, recall our vehicles dataset. We can look at a pivot table where the index is vehicle class:

In [None]:
import pandas as pd

In [None]:
vehicles = pd.read_csv('vehicles.csv')

In [None]:
vehicles.head()

In [None]:
vehicles.Cylinders.unique()

In [None]:
vehicles.pivot_table(index=["Vehicle Class"]).head()

In [None]:
# Explain the values in the df 
vehicles["Vehicle Class"].isin(['Large Cars'])

In [None]:
# Create subset
large_cars = vehicles[vehicles["Vehicle Class"].isin(['Large Cars'])]

In [None]:
# Compute the mean 
large_cars["City MPG"].sum() / len(large_cars["City MPG"])

In [None]:
large_cars["City MPG"].mean()

### Columns
The columns argument is used to determine the columns for which we would like to compute a summary statistic for every value.

In the example above, we aggregated all columns. We can also use the values to select a subset of columns.

For example:

In [None]:
vehicles = pd.read_csv('vehicles.csv')

In [None]:
vehicles.pivot_table(index=["Vehicle Class"], columns=["Cylinders"]).head()

In [None]:
compact_cars = vehicles[vehicles["Vehicle Class"].isin(['Compact Cars'])]

In [None]:
compact_cars.Cylinders.unique()

Notice that we have quite a few fields with NaN. These cells represent the fact that there are no rows in the data with this combination of values. For example, there are no 2-cylinder cargo vans. Therefore, we cannot find a mean CO2 emissions value for this cell in the pivot table.

### Aggregation Function
The default aggregation function is the mean. However, we might want to aggregate using a different aggregation function. Therefore, we can set the aggfunc argument in the pivot_table function to something different. We can either use an existing function or create our own custom aggregation function.

In this example, we will use the numpy sum function.

In [None]:
import numpy as np

In [None]:
vehicles.pivot_table(index=["Vehicle Class"], values=["Combined MPG"], aggfunc=np.sum)

In [None]:
vehicles.pivot_table(index=["Vehicle Class"], values=["Combined MPG", "CO2 Emission Grams/Mile"], aggfunc=np.sum)

In [None]:
vehicles.pivot_table(index=["Vehicle Class", "City MPG"], values=["Combined MPG", "CO2 Emission Grams/Mile"], aggfunc=np.sum)

### Values
Values allow us to specify the columns that are aggregated.

Here is an example with combined MPG and CO2 emission grams per mile passed to the values argument.

In [None]:
vehicles.pivot_table(index=["Vehicle Class"], values=["Combined MPG", "CO2 Emission Grams/Mile"])

### Fill Value
In the case where no such combination of values exists in the dataset, we will have a missing value. We can opt to fill this value with some default.

In this example, we will fill the missing values with zero.

In [None]:
# Get pivot table
vehicles.pivot_table(index=["Vehicle Class"], columns=["Cylinders"], values=["Combined MPG"])

In [None]:
# Fill values
vehicles.pivot_table(index=["Vehicle Class"], columns=["Cylinders"], values=["Combined MPG"], fill_value=0)

## Summary 
In this lesson we learned how to create pivot tables in Pandas. We learned how to aggregate based on the values of different columns, use other aggregation functions, and aggregate a list of columns.

Pivot tables are extremely useful for deriving insight on a dataset; therefore, they are a great tool to master.