Title: Pivot Tables 
Slug: pivot-table-python-pandas
Summary: Implementing a pivot table in Python using pandas with simple examples
Date: 2018-11-23 12:00  
Category: Python
Subcategory: Data Analysis in pandas
PostType: Tutorial
Keywords: pivot table pandas
Tags: pivot table, python, pandas
Authors: Dan Friedman

**Pivot tables** allow us to perform group-bys on columns and specify aggregate metrics for columns too. This data analysis technique is very popular in GUI spreadsheet applications and also works well in Python using pandas.

### Import Modules

In [28]:
import pandas as pd
import seaborn as sns
import numpy as np

### Example 1: Pivot Tables with Flights Dataset

#### Get Data

Let's get the `flights` dataset included in the `seaborn` library and assign it to the DataFrame `df_flights`.

In [14]:
df_flights = sns.load_dataset('flights')

Preview the first few rows of `df_flights`. 

Each row represents a month's flight history details. The `passengers` column represents that total number of passengers that flew that month.

In [15]:
df_flights.head()

Unnamed: 0,year,month,passengers
0,1949,January,112
1,1949,February,118
2,1949,March,132
3,1949,April,129
4,1949,May,121


#### Implement Pivot Tables

I want to know the sum of passengers that flew on planes for each year. So, from pandas, we'll call the `pivot_table()` method and set the following arguments:

- `data` to be our DataFrame `df_flights`
- `index` to be `year` since that's the column from `df_flights` that we want to appear as a unique value in each row
- `values` as `passengers` since that's the column we want to apply some aggregate operation on
- `aggfunc` to `sum` since we want to sum (aka total) up all values in `passengers` that belong to a unique year

In [19]:
pd.pivot_table(data=df_flights, index='year', values='passengers', aggfunc='sum')

year
1949    1520
1950    1676
1951    2042
1952    2364
1953    2700
1954    2867
1955    3408
1956    3939
1957    4421
1958    4572
1959    5140
1960    5714
Name: passengers, dtype: int64

We can see above that every year, the total number of passengers that flew increased each year. 

Now, I want to know the sum of passengers per month in the dataset. So, from pandas, we'll call the the `pivot_table()` method and include all of the same arguments above, except we'll set the `index` to be `month` since that's the column from `df_flights` that we want to appear as a unique value in each row.

In [22]:
pd.pivot_table(data=df_flights, index='month', values='passengers', aggfunc='sum')

month
January      2901
February     2820
March        3242
April        3205
May          3262
June         3740
July         4216
August       4213
September    3629
October      3199
November     2794
December     3142
Name: passengers, dtype: int64

Our results indicate most people flew in the summer months of July and August.

Now, I want to know the average number of passengers that flew per month in the dataset. So, from pandas, we'll call the the `pivot_table()` method and include all of the same arguments from the previous operation, except we'll set the `aggfunc` to `mean` since we want to find the mean (aka average) number of passengers that flew in each unique month.

In [21]:
pd.pivot_table(data=df_flights, index='month', values='passengers', aggfunc='mean')

month
January      241.750000
February     235.000000
March        270.166667
April        267.083333
May          271.833333
June         311.666667
July         351.333333
August       351.083333
September    302.416667
October      266.583333
November     232.833333
December     261.833333
Name: passengers, dtype: float64

Now, I want to know the maximum number of passengers that flew per month in the dataset. So, from pandas, we'll call the the `pivot_table()` method and include all of the same arguments from the previous operation, except we'll set the `aggfunc` to `max` since we want to find the maximum (aka largest) number of passengers that flew in each unique month.

In [33]:
pd.pivot_table(data=df_flights, index='month', values='passengers', aggfunc='median')

month
January      223.0
February     214.5
March        251.5
April        252.0
May          252.0
June         289.5
July         333.0
August       320.0
September    285.5
October      251.5
November     220.0
December     253.5
Name: passengers, dtype: float64

### Aggregate Operations

Other aggregate operations you could perform with the following values to pass to the `aggfunc` argument are:

value | description 
--- | ---
`sum` | summation 
`mean` | average
`count` | count
`max` | maximum value
`min` | minimum value
`np.std` | standard deviation
`median` | median