# Indexing and Selection

Documentation sources:

* https://www.tutorialspoint.com/python_pandas/

More advanced topics are discussed in the following sources:

* https://tomaugspurger.github.io
* https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-39e811c81a0c
* https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-part-3-d5704b4b9116
* https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-part-4-c4216f84d388
* https://nikgrozev.com/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/

In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import read_csv
from pandas import pivot_table

###  Basic properties of DataFrame

DataFrame is characterised by the following read-only properties:

* `ndim`    – number of dimensions
* `shape`   – dimensions of the data matrix
* `size`    – number of elements in the data matrix
* `axes`    – more detailed description of row and column names
* `index`   – list of row names as in `axes[0]`
* `columns` – list of column names as in `axes[1]`
* `head`    – few rows from the top of the data matrix
* `tail`    – few rows from the bottom of the data matrix
* `T`       – transpose of the data matrix

In [2]:
df = read_csv('realwage.csv', index_col = 0)
print(df.ndim, ':', df.shape, ':' ,df.size, '\n')
print(df.axes, '\n')
print(df.index)
print(df.columns)
display(df.head())
display(df.tail())

2 : (1408, 5) : 7040 

[Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407],
           dtype='int64', length=1408), Index(['Time', 'Country', 'Series', 'Pay period', 'value'], dtype='object')] 

Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407],
           dtype='int64', length=1408)
Index(['Time', 'Country', 'Series', 'Pay period', 'value'], dtype='object')


Unnamed: 0,Time,Country,Series,Pay period,value
0,2006-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17132.443
1,2007-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18100.918
2,2008-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,17747.406
3,2009-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18580.139
4,2010-01-01,Ireland,In 2015 constant prices at 2015 USD PPPs,Annual,18755.832


Unnamed: 0,Time,Country,Series,Pay period,value
1403,2012-01-01,Costa Rica,In 2015 constant prices at 2015 USD exchange r...,Hourly,
1404,2013-01-01,Costa Rica,In 2015 constant prices at 2015 USD exchange r...,Hourly,
1405,2014-01-01,Costa Rica,In 2015 constant prices at 2015 USD exchange r...,Hourly,2.41
1406,2015-01-01,Costa Rica,In 2015 constant prices at 2015 USD exchange r...,Hourly,2.56
1407,2016-01-01,Costa Rica,In 2015 constant prices at 2015 USD exchange r...,Hourly,2.63


## I. Simple indices

### Comparison magic for selecting

* **Do not use indirect or chained indexing aka chained indexing in assignments**.
* Use simple comparison operators like `df.value > 300` to create Boolean indices for selection.
* Use set operations to combine simple restrictions but bracket all simple comparisons:
 
  * `~a`     for complement 
  * `a & b`  for intersection
  * `a | b`  for union 
  * `a & ~b` for set difference 
  * `a ^ b`  for symmetrical difference

* Avaliable comparison operators are:
  
  * numeric comparison operators
  * string comparison operators  
  * `isin` operator for checking the value against lists
  * regex search and match operators `str.contains` and `str.match`
  * datetime comparison operations and `isin` operator for `date_ranges`

In [3]:
# Do not write this selection statement in an assignment as it uses chain indexing
display(df.loc[(df.value > 300) & (df.value < 1000), :].iloc[0:1,:])

# These are assignable selections
display(df.loc[(df.value > 300) & (df.value < 1000), :])
display(df.loc[(df.value > 300) & ~(df.value >= 1000), :])

display(df.loc[(df.value <= 0.5) | (df.value >= 25500), :])
display(df.loc[(df.value >  0.5) ^ (df.value <  25500), :])

display(df.loc[df.Country < 'B', :].head(4))
display(df.loc[df.Country == 'Estonia', :].head(4)) 
display(df.loc[df.Country.isin(['Estonia', 'Latvia']), :].head(4)) 
display(df.loc[df.Country.isin(['Estonia', 'Latvia']) & df.Time.str.match('^2010'), :]) 

# The index vector idx is defined for convenience
idx = pd.to_datetime(df.Time).isin(pd.date_range("2010-01-01", "2010-12-31"))
display(df.loc[idx & df.Country.isin(['Estonia', 'Latvia']), :])

Unnamed: 0,Time,Country,Series,Pay period,value
1210,2006-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Annual,568.23199


Unnamed: 0,Time,Country,Series,Pay period,value
1210,2006-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Annual,568.23199
1212,2008-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Annual,955.16498


Unnamed: 0,Time,Country,Series,Pay period,value
1210,2006-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Annual,568.23199
1212,2008-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Annual,955.16498


Unnamed: 0,Time,Country,Series,Pay period,value
120,2016-01-01,Australia,In 2015 constant prices at 2015 USD exchange r...,Annual,25643.729
206,2014-01-01,Luxembourg,In 2015 constant prices at 2015 USD exchange r...,Annual,25713.797
207,2015-01-01,Luxembourg,In 2015 constant prices at 2015 USD exchange r...,Annual,25592.293
1221,2006-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Hourly,0.234
1222,2007-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Hourly,0.448
1223,2008-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Hourly,0.393


Unnamed: 0,Time,Country,Series,Pay period,value
120,2016-01-01,Australia,In 2015 constant prices at 2015 USD exchange r...,Annual,25643.729
206,2014-01-01,Luxembourg,In 2015 constant prices at 2015 USD exchange r...,Annual,25713.797
207,2015-01-01,Luxembourg,In 2015 constant prices at 2015 USD exchange r...,Annual,25592.293
1221,2006-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Hourly,0.234
1222,2007-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Hourly,0.448
1223,2008-01-01,Russian Federation,In 2015 constant prices at 2015 USD exchange r...,Hourly,0.393


Unnamed: 0,Time,Country,Series,Pay period,value
88,2006-01-01,Australia,In 2015 constant prices at 2015 USD PPPs,Annual,20410.652
89,2007-01-01,Australia,In 2015 constant prices at 2015 USD PPPs,Annual,21087.568
90,2008-01-01,Australia,In 2015 constant prices at 2015 USD PPPs,Annual,20718.238
91,2009-01-01,Australia,In 2015 constant prices at 2015 USD PPPs,Annual,20984.768


Unnamed: 0,Time,Country,Series,Pay period,value
660,2006-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,5179.6499
661,2007-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,5830.6699
662,2008-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6383.8848
663,2009-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6388.894


Unnamed: 0,Time,Country,Series,Pay period,value
660,2006-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,5179.6499
661,2007-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,5830.6699
662,2008-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6383.8848
663,2009-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6388.894


Unnamed: 0,Time,Country,Series,Pay period,value
664,2010-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6204.4941
675,2010-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Hourly,2.9748
686,2010-01-01,Estonia,In 2015 constant prices at 2015 USD exchange r...,Annual,4124.624
697,2010-01-01,Estonia,In 2015 constant prices at 2015 USD exchange r...,Hourly,1.978
1280,2010-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,Annual,5923.5762
1291,2010-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,Hourly,2.84007
1302,2010-01-01,Latvia,In 2015 constant prices at 2015 USD exchange r...,Annual,3714.6919
1313,2010-01-01,Latvia,In 2015 constant prices at 2015 USD exchange r...,Hourly,1.781


Unnamed: 0,Time,Country,Series,Pay period,value
664,2010-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6204.4941
675,2010-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Hourly,2.9748
686,2010-01-01,Estonia,In 2015 constant prices at 2015 USD exchange r...,Annual,4124.624
697,2010-01-01,Estonia,In 2015 constant prices at 2015 USD exchange r...,Hourly,1.978
1280,2010-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,Annual,5923.5762
1291,2010-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,Hourly,2.84007
1302,2010-01-01,Latvia,In 2015 constant prices at 2015 USD exchange r...,Annual,3714.6919
1313,2010-01-01,Latvia,In 2015 constant prices at 2015 USD exchange r...,Hourly,1.781


## II. Multidimensional row indices

### Multidimensional indexing and index slices

* Sometimes you want to slice the DataFrame based on several columns.
* For that you should create a multi-index by specifying the list of columns `df.set_index(column_list)`.
* As a result, it is possible to slice data according to columns in `column_list`.
* Top-level row indexing is as usual but hierarchical selection is defined through tuples.
* The data **changes format** unless all indexes are slices or lists.
  * High-level outer indices that are fixed are dropped if the index is a single value.
  * Still, the output format is quite unpredictable. **Validate your guess in practice!** 
* Index slices `IndexSlice[...]` are handy if you want to leave some outer index columns unspecified.
* By default multi-index is not sorted as sorting is quite costly.
* You should sort indices it with `df.sort_index()` whenever you use slices!  

In [4]:
hdf = (df
       .assign(Time = pd.to_datetime(df.Time))
       .set_index(['Country', 'Pay period', 'Time'])
       .sort_index())


display(hdf.head())
display(hdf.loc['Estonia', :].head())
display(hdf.loc[('Estonia', 'Annual'), :].head())
display(hdf.loc[('Estonia', 'Annual', pd.Timestamp('2008-01-01 00:00:00')), :].head())

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Country,Pay period,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Australia,Annual,2006-01-01,In 2015 constant prices at 2015 USD PPPs,20410.652
Australia,Annual,2006-01-01,In 2015 constant prices at 2015 USD exchange r...,23826.637
Australia,Annual,2007-01-01,In 2015 constant prices at 2015 USD PPPs,21087.568
Australia,Annual,2007-01-01,In 2015 constant prices at 2015 USD exchange r...,24616.844
Australia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,20718.238


Unnamed: 0_level_0,Unnamed: 1_level_0,Series,value
Pay period,Time,Unnamed: 2_level_1,Unnamed: 3_level_1
Annual,2006-01-01,In 2015 constant prices at 2015 USD PPPs,5179.6499
Annual,2006-01-01,In 2015 constant prices at 2015 USD exchange r...,3443.3291
Annual,2007-01-01,In 2015 constant prices at 2015 USD PPPs,5830.6699
Annual,2007-01-01,In 2015 constant prices at 2015 USD exchange r...,3876.114
Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,6383.8848


Unnamed: 0_level_0,Series,value
Time,Unnamed: 1_level_1,Unnamed: 2_level_1
2006-01-01,In 2015 constant prices at 2015 USD PPPs,5179.6499
2006-01-01,In 2015 constant prices at 2015 USD exchange r...,3443.3291
2007-01-01,In 2015 constant prices at 2015 USD PPPs,5830.6699
2007-01-01,In 2015 constant prices at 2015 USD exchange r...,3876.114
2008-01-01,In 2015 constant prices at 2015 USD PPPs,6383.8848


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Country,Pay period,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Estonia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,6383.8848
Estonia,Annual,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,4243.8799


In [5]:
# Note that you cannot use index that does not contain all elements in the index
# Wrong: display(hdf.loc[('Estonia', 'Annual', pd.date_range('2008-01-01', '2010-01-01')), :])
# Wrong: display(hdf.loc[pd.IndexSlice[:, 'Annual', pd.date_range('2008', '2008')], :].head())
display(hdf.loc[('Estonia', 'Annual', ['2008-01-01', '2009-01-01', '2010-01-01']), :])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Country,Pay period,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Estonia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,6383.8848
Estonia,Annual,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,4243.8799
Estonia,Annual,2009-01-01,In 2015 constant prices at 2015 USD PPPs,6388.894
Estonia,Annual,2009-01-01,In 2015 constant prices at 2015 USD exchange r...,4247.21
Estonia,Annual,2010-01-01,In 2015 constant prices at 2015 USD PPPs,6204.4941
Estonia,Annual,2010-01-01,In 2015 constant prices at 2015 USD exchange r...,4124.624


In [6]:
display(hdf.loc[pd.IndexSlice[:, 'Annual', pd.date_range('2008', '2008')], :].head())
display(hdf.loc[pd.IndexSlice[['Estonia', 'Latvia'], :, pd.date_range('2008', '2008')], :])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Country,Pay period,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Australia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,20718.238
Australia,Annual,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,24185.703
Belgium,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,21416.957
Belgium,Annual,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,20588.934
Brazil,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,3664.3911


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Country,Pay period,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Estonia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,6383.8848
Estonia,Annual,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,4243.8799
Estonia,Hourly,2008-01-01,In 2015 constant prices at 2015 USD PPPs,3.06081
Estonia,Hourly,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,2.035
Latvia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,5404.939
Latvia,Annual,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,3389.4529
Latvia,Hourly,2008-01-01,In 2015 constant prices at 2015 USD PPPs,2.59141
Latvia,Hourly,2008-01-01,In 2015 constant prices at 2015 USD exchange r...,1.625


###  Controlling levels inside multi-index

* Columns in a multi-index are called levels.
* It is possible to reorder columns in the index with `hdf.reorder_levels` and `hdf.swaplevels`.
* It is possible to push some levels out from the index with `hdf.reset_index`.
* It is possible to delete levels from the index with `hdf.reset_index(..., drop = True)` but this causes data loss.

In [7]:
display(hdf.swaplevel('Pay period', 'Country').head())
display(hdf.reorder_levels(['Pay period', 'Country','Time']).head())
display(hdf.reset_index().head())
display(hdf.reset_index('Time').head())
display(hdf.reset_index('Time', drop = True).head())

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Pay period,Country,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Annual,Australia,2006-01-01,In 2015 constant prices at 2015 USD PPPs,20410.652
Annual,Australia,2006-01-01,In 2015 constant prices at 2015 USD exchange r...,23826.637
Annual,Australia,2007-01-01,In 2015 constant prices at 2015 USD PPPs,21087.568
Annual,Australia,2007-01-01,In 2015 constant prices at 2015 USD exchange r...,24616.844
Annual,Australia,2008-01-01,In 2015 constant prices at 2015 USD PPPs,20718.238


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Series,value
Pay period,Country,Time,Unnamed: 3_level_1,Unnamed: 4_level_1
Annual,Australia,2006-01-01,In 2015 constant prices at 2015 USD PPPs,20410.652
Annual,Australia,2006-01-01,In 2015 constant prices at 2015 USD exchange r...,23826.637
Annual,Australia,2007-01-01,In 2015 constant prices at 2015 USD PPPs,21087.568
Annual,Australia,2007-01-01,In 2015 constant prices at 2015 USD exchange r...,24616.844
Annual,Australia,2008-01-01,In 2015 constant prices at 2015 USD PPPs,20718.238


Unnamed: 0,Country,Pay period,Time,Series,value
0,Australia,Annual,2006-01-01,In 2015 constant prices at 2015 USD PPPs,20410.652
1,Australia,Annual,2006-01-01,In 2015 constant prices at 2015 USD exchange r...,23826.637
2,Australia,Annual,2007-01-01,In 2015 constant prices at 2015 USD PPPs,21087.568
3,Australia,Annual,2007-01-01,In 2015 constant prices at 2015 USD exchange r...,24616.844
4,Australia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,20718.238


Unnamed: 0_level_0,Unnamed: 1_level_0,Time,Series,value
Country,Pay period,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Australia,Annual,2006-01-01,In 2015 constant prices at 2015 USD PPPs,20410.652
Australia,Annual,2006-01-01,In 2015 constant prices at 2015 USD exchange r...,23826.637
Australia,Annual,2007-01-01,In 2015 constant prices at 2015 USD PPPs,21087.568
Australia,Annual,2007-01-01,In 2015 constant prices at 2015 USD exchange r...,24616.844
Australia,Annual,2008-01-01,In 2015 constant prices at 2015 USD PPPs,20718.238


Unnamed: 0_level_0,Unnamed: 1_level_0,Series,value
Country,Pay period,Unnamed: 2_level_1,Unnamed: 3_level_1
Australia,Annual,In 2015 constant prices at 2015 USD PPPs,20410.652
Australia,Annual,In 2015 constant prices at 2015 USD exchange r...,23826.637
Australia,Annual,In 2015 constant prices at 2015 USD PPPs,21087.568
Australia,Annual,In 2015 constant prices at 2015 USD exchange r...,24616.844
Australia,Annual,In 2015 constant prices at 2015 USD PPPs,20718.238


## III. Multidimensional column indices

* If the data is in long key-value format, multidimensional column indices are unnecessary.
* Pivoting and other data aggregation operations can create data that is in wide format. In this case, multidimensional column indices are appropriate.


### Transposed row indices

The simplest option to get column indices is to transform the DataFrame with multi-indexed rows. This creates a transposed table that is not very useful as rows have different meanings.

In [8]:
hdf = df.set_index(['Country', 'Pay period']).T
display(hdf.head())
display(hdf.loc[['Time'], ['Estonia']])


Country,Ireland,Ireland,Ireland,Ireland,Ireland,Ireland,Ireland,Ireland,Ireland,Ireland,...,Costa Rica,Costa Rica,Costa Rica,Costa Rica,Costa Rica,Costa Rica,Costa Rica,Costa Rica,Costa Rica,Costa Rica
Pay period,Annual,Annual.1,Annual.2,Annual.3,Annual.4,Annual.5,Annual.6,Annual.7,Annual.8,Annual.9,...,Hourly,Hourly.1,Hourly.2,Hourly.3,Hourly.4,Hourly.5,Hourly.6,Hourly.7,Hourly.8,Hourly.9
Time,2006-01-01,2007-01-01,2008-01-01,2009-01-01,2010-01-01,2011-01-01,2012-01-01,2013-01-01,2014-01-01,2015-01-01,...,2007-01-01,2008-01-01,2009-01-01,2010-01-01,2011-01-01,2012-01-01,2013-01-01,2014-01-01,2015-01-01,2016-01-01
Series,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...
value,17132.443,18100.918,17747.406,18580.139,18755.832,18284.299,17979.943,17890.01,17854.875,17907.637,...,,,,,,,,2.41,2.56,2.63


Country,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia,Estonia
Pay period,Annual,Annual.1,Annual.2,Annual.3,Annual.4,Annual.5,Annual.6,Annual.7,Annual.8,Annual.9,...,Hourly,Hourly.1,Hourly.2,Hourly.3,Hourly.4,Hourly.5,Hourly.6,Hourly.7,Hourly.8,Hourly.9
Time,2006-01-01,2007-01-01,2008-01-01,2009-01-01,2010-01-01,2011-01-01,2012-01-01,2013-01-01,2014-01-01,2015-01-01,...,2007-01-01,2008-01-01,2009-01-01,2010-01-01,2011-01-01,2012-01-01,2013-01-01,2014-01-01,2015-01-01,2016-01-01


### Pivot tables

* Pivot tables provide the way to group data by two keys: one for rows and one for columns.
* The formal syntax for the function is `pivot_table(df, index, columns, values, aggfunc)`.
* The row key `index` determines values from which columns form a row index.
* The column key `columns` determines values from which columns form a column index.
* Each key pair determines a set of rows (a group) in the original table.
* Parameter `values` determines the columns that are used for determining a cell value.
* Parameter `aggfunc` determines how these values are aggregated to a single value.

In [9]:
sdf=df.loc[df.Country.isin(['Estonia', 'Latvia']) & df.Time.isin(['2008-01-01', '2010-01-10']), : ]
display(sdf)
display(pivot_table(sdf, index = ['Time', 'Country'], columns = ['Series', 'Pay period'], values = 'value'))

display(pivot_table(sdf, index = ['Time'], columns = ['Series', 'Pay period'], values = 'value', aggfunc=lambda x: "; ".join(map(str, x))))
display(pivot_table(sdf, index = ['Time'], columns = ['Series', 'Pay period'], values = 'value', aggfunc=np.mean))

Unnamed: 0,Time,Country,Series,Pay period,value
662,2008-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Annual,6383.8848
673,2008-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,Hourly,3.06081
684,2008-01-01,Estonia,In 2015 constant prices at 2015 USD exchange r...,Annual,4243.8799
695,2008-01-01,Estonia,In 2015 constant prices at 2015 USD exchange r...,Hourly,2.035
1278,2008-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,Annual,5404.939
1289,2008-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,Hourly,2.59141
1300,2008-01-01,Latvia,In 2015 constant prices at 2015 USD exchange r...,Annual,3389.4529
1311,2008-01-01,Latvia,In 2015 constant prices at 2015 USD exchange r...,Hourly,1.625


Unnamed: 0_level_0,Series,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates
Unnamed: 0_level_1,Pay period,Annual,Hourly,Annual,Hourly
Time,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2008-01-01,Estonia,6383.8848,3.06081,4243.8799,2.035
2008-01-01,Latvia,5404.939,2.59141,3389.4529,1.625


Series,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates
Pay period,Annual,Hourly,Annual,Hourly
Time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2008-01-01,6383.8848; 5404.939,3.0608101; 2.5914099,4243.8799; 3389.4529,2.0350001; 1.625


Series,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates
Pay period,Annual,Hourly,Annual,Hourly
Time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2008-01-01,5894.4119,2.82611,3816.6664,1.83


###  Controlling levels inside multi-index

* It is possible to reorder columns in the index with `hdf.reorder_levels` and `hdf.swaplevels`.
* It is possible to push some levels out from the index with `hdf.reset_index`.
* It is possible to delete levels with `hdf.reset_index(..., drop = True)` from the index but this causes data loss.
* You are doing something wrong if you need to push levels out from the column index. Still, it is possible to do this by moving some levels from the column index to the row index and vice versa:
  * `hdf.stack()` pushes a level from a column index to a row index
  * `hdf.unstack()` pushes a level from a row index to a column index

In [10]:
hdf = pivot_table(sdf, index = ['Time', 'Country'], columns = ['Series', 'Pay period'], values = 'value')
display(hdf.reorder_levels(order=['Pay period','Series'], axis = 1).sort_index(axis=1))
display(hdf.reset_index())
display(hdf.T.reset_index().T)
display(hdf.stack('Series'))
display(hdf.unstack('Country'))

Unnamed: 0_level_0,Pay period,Annual,Annual,Hourly,Hourly
Unnamed: 0_level_1,Series,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates
Time,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2008-01-01,Estonia,6383.8848,4243.8799,3.06081,2.035
2008-01-01,Latvia,5404.939,3389.4529,2.59141,1.625


Series,Time,Country,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates
Pay period,Unnamed: 1_level_1,Unnamed: 2_level_1,Annual,Hourly,Annual,Hourly
0,2008-01-01,Estonia,6383.8848,3.06081,4243.8799,2.035
1,2008-01-01,Latvia,5404.939,2.59141,3389.4529,1.625


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3
Time,Country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Series,,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange r...,In 2015 constant prices at 2015 USD exchange r...
Pay period,,Annual,Hourly,Annual,Hourly
2008-01-01,Estonia,6383.8848,3.06081,4243.8799,2.035
2008-01-01,Latvia,5404.939,2.59141,3389.4529,1.625


Unnamed: 0_level_0,Unnamed: 1_level_0,Pay period,Annual,Hourly
Time,Country,Series,Unnamed: 3_level_1,Unnamed: 4_level_1
2008-01-01,Estonia,In 2015 constant prices at 2015 USD PPPs,6383.8848,3.06081
2008-01-01,Estonia,In 2015 constant prices at 2015 USD exchange rates,4243.8799,2.035
2008-01-01,Latvia,In 2015 constant prices at 2015 USD PPPs,5404.939,2.59141
2008-01-01,Latvia,In 2015 constant prices at 2015 USD exchange rates,3389.4529,1.625


Series,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD PPPs,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates,In 2015 constant prices at 2015 USD exchange rates
Pay period,Annual,Annual,Hourly,Hourly,Annual,Annual,Hourly,Hourly
Country,Estonia,Latvia,Estonia,Latvia,Estonia,Latvia,Estonia,Latvia
Time,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3
2008-01-01,6383.8848,5404.939,3.06081,2.59141,4243.8799,3389.4529,2.035,1.625
