<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Slicing-Pivot-Table" data-toc-modified-id="Slicing-Pivot-Table-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Slicing Pivot-Table</a></span></li><li><span><a href="#The-axis-Argument" data-toc-modified-id="The-axis-Argument-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>The <code>axis</code> Argument</a></span></li></ul></div>

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Data Frame from cars.csv
cars = pd.read_csv('../datasets/cars.csv', index_col=0)
cars.head()

Unnamed: 0,cars_per_cap,country,drives_right
US,809,United States,True
AUS,731,Australia,False
JPN,588,Japan,False
IN,18,India,False
RU,200,Russia,True


## Slicing Pivot-Table

- **PivotTables are just dataframes with sorted indexes**
- It means that all *Slicing and Subsetting* can be used on them, particularly `.loc[]`

In [3]:
# Using Pivot Table
cars_pivot_table = cars.pivot_table(
    values = 'cars_per_cap', # values to aggregate
    index = 'country', # column to group_by and display in pivot_table rows
    columns = ['drives_right'], # column to group_by and display in pivot_table columns
    aggfunc = [sum, np.mean, np.median, min, max], # The aggregate functions to calculate
    fill_value = 0, # If NaN, fill with this
    margins = True # Add total row and total column
)
cars_pivot_table

Unnamed: 0_level_0,sum,sum,sum,mean,mean,mean,median,median,median,min,min,min,max,max,max
drives_right,False,True,All,False,True,All,False,True,All,False,True,All,False,True,All
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
Australia,731,0,731,731.0,0,731.0,731,0,731,731,0,731,731,0,731
Egypt,0,45,45,0.0,45,45.0,0,45,45,0,45,45,0,45,45
India,18,0,18,18.0,0,18.0,18,0,18,18,0,18,18,0,18
Japan,588,0,588,588.0,0,588.0,588,0,588,588,0,588,588,0,588
Morocco,0,70,70,0.0,70,70.0,0,70,70,0,70,70,0,70,70
Russia,0,200,200,0.0,200,200.0,0,200,200,0,200,200,0,200,200
United States,0,809,809,0.0,809,809.0,0,809,809,0,809,809,0,809,809
All,1337,1124,2461,445.666667,281,351.571429,588,135,200,18,45,18,731,809,809


In [4]:
# Slicing a pivot-table
cars_pivot_table.loc["India":"Russia", "mean":"median"]

Unnamed: 0_level_0,mean,mean,mean,median,median,median
drives_right,False,True,All,False,True,All
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
India,18.0,0,18.0,18,0,18
Japan,588.0,0,588.0,588,0,588
Morocco,0.0,70,70.0,0,70,70
Russia,0.0,200,200.0,0,200,200


## The `axis` Argument

- Pivot tables are filled with summary statistics, but they are only a first step to finding something insightful
  - Often you'll need to perform further calculations on them
  - A common thing to do is to find the rows or columns where a highest or lowest value occurs
- The methods to calculate statistics on dataframes have an `axis` argument
  - `axis = 'index'`: Calculate the statistic across rows, that is vertical-direction (default)
  - `axis = 'columns'`: Calculate the statistic across column, that is horizontal-direction
- For most dataframes, setting the `axis` argument does not make any sense: Each column with different data type
  - PivotTables are a special case since every column contains the same data-type

In [5]:
mean_cars_per_cap_by_country = cars_pivot_table.max(axis="columns")
mean_cars_per_cap_by_country

country
Australia         731.0
Egypt              45.0
India              18.0
Japan             588.0
Morocco            70.0
Russia            200.0
United States     809.0
All              2461.0
dtype: float64

In [6]:
# Country with the lowest mean
mean_cars_per_cap_by_country[mean_cars_per_cap_by_country == mean_cars_per_cap_by_country.min()]

country
India    18.0
dtype: float64