In [None]:
#Importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')

In [None]:
#Read the dataset
df = pd.read_csv(r"../input/titanic/train.csv")
df.head()

In [None]:
#dropping a few features to make it easier to analyze the data 
#and demonstrate the capabilities of the pivot_table function:


In [None]:
df.drop(['PassengerId','Ticket','Name'],inplace=True,axis=1)
df.head()

# Building a Pivot Table using Pandas


Time to build a pivot table in Python using the awesome Pandas library! We will explore the different facets of a pivot table in this article and build an awesome, flexible pivot table from scratch.

 

How to group data using index in a pivot table?
pivot_table requires a data and an index parameter
data is the Pandas dataframe you pass to the function
index is the feature that allows you to group your data. The index feature will appear as an index in the resultant table
I will be using the ‘Sex’ column as the index for now:


In [None]:
#a single index(Sex)
table = pd.pivot_table(data=df,index=['Sex'])
table

In [None]:
#Plotting the findings
table.plot(kind='bar');

In [None]:
#a single index
table = pd.pivot_table(data=df,index=['Embarked'])
table

In [None]:
table.plot(kind='bar');

In [None]:
#We can also use more than one index to group our data
# pivot with a multi-index

In [None]:
#multiple indexes
table = pd.pivot_table(df,index=['Sex','Pclass'])
table

In [None]:
table.plot(kind='bar');

# Different aggregation function for different features
The values shown in the table are the result of the summarization that aggfunc applies to the feature data. aggfunc is an aggregate function that pivot_table applies to your grouped data.

By default, it is np.mean(), but you can use different aggregate functions for different features too! Just provide a dictionary as an input to the aggfunc parameter with the feature name as the key and the corresponding aggregate function as the value.

I will be using np.mean() for the ‘Age’ feature and np.sum() for the ‘Survived’ feature:

In [None]:
#different aggregate functions
table = pd.pivot_table(df,index=['Sex','Pclass'],aggfunc={'Age':np.median,'Survived':np.sum})
table

In [None]:
table.plot(kind='bar');

# Aggregate on specific features with values parameter
But what are you aggregating on? You can tell Pandas the feature(s) to apply the aggregate function on, in the value parameter.

value parameter is where you tell the function which features to aggregate on. It is an optional field and if you don’t specify this value, then the function will aggregate on all the numerical features of the dataset:


In [None]:
table = pd.pivot_table(df,index=['Sex','Pclass'],values=['Survived'], aggfunc=np.mean)
table

In [None]:
#Plotting using a bargraph
table.plot(kind='bar');

# Find the relationship between features with columns parameter
Using multiple features as indexes is fine, but using some features as columns will help you to intuitively understand the relationship between them. Also, the resultant table can always be better viewed by incorporating the columns parameter of the pivot_table.

This columns parameter is optional and displays the values horizontally on the top of the resultant table.

Both columns and the index parameters are optional, but using them effectively will help you to intuitively understand the relationship between the features.

In [None]:
#columns
table = pd.pivot_table(df,index=['Sex'],columns=['Pclass'],values=['Survived'],aggfunc=np.sum)
table

In [None]:
table.plot(kind='bar');

# Handling missing data

In [None]:
#display null values
table = pd.pivot_table(df,index=['Sex','Survived','Pclass'],columns=['Embarked'],values=['Age'],aggfunc=np.mean)
table

In [None]:
#handling null values
table = pd.pivot_table(df,index=['Sex','Survived','Pclass'],columns=['Embarked'],values=['Age'],aggfunc=np.mean,fill_value=np.mean(df['Age']))
table