# Reshaping and Pivot Tables

Doc Sources: 
* https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html    
* https://pandas.pydata.org/pandas-docs/stable/reshaping.html

While pivot() provides general purpose pivoting with various data types (strings, numerics, etc.), pandas also provides pivot_table() for pivoting with aggregation of numeric data.

The function pivot_table() can be used to create spreadsheet-style pivot tables. See the cookbook for some advanced strategies.

It takes a number of arguments:

* data: a DataFrame object.
* values: a column or a list of columns to aggregate.
* index: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot * table index. If an array is passed, it is being used as the same manner as column values.
* columns: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the * pivot table column. If an array is passed, it is being used as the same manner as column values.
* aggfunc: function to use for aggregation, defaulting to numpy.mean.

In [2]:
import pandas as pd
import numpy as np

In [3]:
# read in the csv file aka dataframe creation

url_data = "https://raw.githubusercontent.com/sb0709/bootcamp_KSU/master/Data/data.csv"
data = pd.read_csv(url_data,sep=',')

In [7]:
#check the dataframe shape.

print(data.shape)

print(data.head())

(1000, 11)
   Unnamed: 0  MATCHKEY  RBAL  TRADES  AGE AGE_groups  DELQID  CRELIM  \
0           0  16345246  1492       4   39   AG_30_50       1     750   
1           1  13728016     0       3   71   AG_70_UP       0    3250   
2           2  14716776   854       9   30    AG_0_30       4     500   
3           3  14568809   408      13   28    AG_0_30       1    3000   
4           4  13513749  4965      41   51   AG_50_70       0     500   

   goodbad  BRNEW  BRAGE  
0        0      5     20  
1        0     19     19  
2        1      0     46  
3        0      2     33  
4        0      3     68  


In [16]:
pivot_data = data.pivot('MATCHKEY','AGE_groups','AGE')

In [18]:
pivot_data.head()

AGE_groups,AG_0_30,AG_30_50,AG_50_70,AG_70_UP
MATCHKEY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1338454,,46.0,,
1343107,29.0,,,
1374470,29.0,,,
1427263,,,53.0,
1431876,,,56.0,


In [28]:
pd.pivot_table(data, values='TRADES', index=['goodbad', 'AGE_groups'], columns=['DELQID'],  aggfunc=np.sum)

Unnamed: 0_level_0,DELQID,0,1,2,3,4,5,6,7
goodbad,AGE_groups,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,AG_0_30,1210.0,458.0,196.0,,,,,
0,AG_30_50,4409.0,1604.0,366.0,,,,,
0,AG_50_70,4122.0,1325.0,298.0,,,,,
0,AG_70_UP,1220.0,154.0,134.0,,,,,
1,AG_0_30,,,,59.0,22.0,59.0,18.0,209.0
1,AG_30_50,,,,185.0,293.0,249.0,127.0,1085.0
1,AG_50_70,,,,202.0,111.0,284.0,83.0,465.0
1,AG_70_UP,,,,22.0,,13.0,,201.0


In [40]:
pd.pivot_table(data,  values=['BRNEW', 'BRAGE'], index=['goodbad'], columns=['AGE_groups'], aggfunc='count')

Unnamed: 0_level_0,BRAGE,BRAGE,BRAGE,BRAGE,BRNEW,BRNEW,BRNEW,BRNEW
AGE_groups,AG_0_30,AG_30_50,AG_50_70,AG_70_UP,AG_0_30,AG_30_50,AG_50_70,AG_70_UP
goodbad,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,131,338,265,79,131,338,265,79
1,31,96,51,9,31,96,51,9


# Create a pivot table of group score counts, by goodbad and AGE_groups

In [38]:
pd.pivot_table(data,index=['goodbad','AGE_groups'], values=["TRADES"], aggfunc='count') #or cab be used mean or any other numpy valid aggregation function. 

Unnamed: 0_level_0,Unnamed: 1_level_0,TRADES
goodbad,AGE_groups,Unnamed: 2_level_1
0,AG_0_30,131
0,AG_30_50,338
0,AG_50_70,265
0,AG_70_UP,79
1,AG_0_30,31
1,AG_30_50,96
1,AG_50_70,51
1,AG_70_UP,9


In [39]:
pd.pivot_table(data,index=['goodbad','AGE_groups'], values=["TRADES"], aggfunc=np.sum)

Unnamed: 0_level_0,Unnamed: 1_level_0,TRADES
goodbad,AGE_groups,Unnamed: 2_level_1
0,AG_0_30,1864
0,AG_30_50,6379
0,AG_50_70,5745
0,AG_70_UP,1508
1,AG_0_30,367
1,AG_30_50,1939
1,AG_50_70,1145
1,AG_70_UP,236


In [41]:

rbal_tabled = pd.pivot_table(data,index=['goodbad','AGE_groups'],values=["RBAL"], aggfunc=np.sum)
rbal_tabled

Unnamed: 0_level_0,Unnamed: 1_level_0,RBAL
goodbad,AGE_groups,Unnamed: 2_level_1
0,AG_0_30,554413
0,AG_30_50,2211399
0,AG_50_70,2467394
0,AG_70_UP,640988
1,AG_0_30,146351
1,AG_30_50,740535
1,AG_50_70,561325
1,AG_70_UP,80439


In [43]:
data['DELQID'].max()

7

In [44]:
data['DELQID'].min()

0

In [48]:
# We can bin the data and pass directly to pivot_table function and will keep the original name DELQID when visualizing the pivot_table.  

d_id = pd.cut(data['DELQID'], [0, 3, 5, 7])
pd.pivot_table(data,index = ['goodbad', d_id], values=["RBAL"], columns=['AGE_groups'], aggfunc=np.sum)

Unnamed: 0_level_0,Unnamed: 1_level_0,RBAL,RBAL,RBAL,RBAL
Unnamed: 0_level_1,AGE_groups,AG_0_30,AG_30_50,AG_50_70,AG_70_UP
goodbad,DELQID,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0,"(0, 3]",182596,750440,690531,140971
1,"(0, 3]",24501,51605,121095,23259
1,"(3, 5]",27969,226521,218388,2846
1,"(5, 7]",93881,462409,221842,54334


# Advanced pivot_table Filtering

In [49]:
rbal_tabled.query('AGE_groups == ["AG_30_50"]')

Unnamed: 0_level_0,Unnamed: 1_level_0,RBAL
goodbad,AGE_groups,Unnamed: 2_level_1
0,AG_30_50,2211399
1,AG_30_50,740535


# Summary Q&A