# Building a Pivot Table using Pandas in Python

Hello all, in this notebook I will be exploring the Titanic Dataset with pivot tables using Pandas in Python

Pivot tables are important because they allow anyone to filter and extract significance about the data set they are working with. Pivot tables allow anyone to look at their data in a number of ways and perspectives.<p>Pandas library offers a function called **pivot_table** that summarized a feature’s values in a neat two-dimensional table.

**pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc=’mean’, fill_value=None, margins=False, dropna=True, margins_name=’All’)** <p>

**Parameters:**

- **data** : DataFrame<p>
- **values** : column to aggregate, optional<p>
- **index**: column, Grouper, array, or list of the previous<p>
- **columns**: column, Grouper, array, or list of the previous<p>

- **aggfunc**: function, list of functions, dict, default numpy.mean (If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names. If dict is passed, the key is column to aggregate and value is function or list of function)
    
<p>
    
- **fill_value[scalar, default None]** : Value to replace missing values with<p>
- **margins[boolean, default False]** : Add all row / columns (e.g. for subtotal / grand totals)<p>
- **dropna[boolean, default True]** : Do not include columns whose entries are all NaN<p>
- **margins_name[string, default ‘All’]** : Name of the row / column that will contain the totals when margins is True.<p>

- **Returns**: DataFrame

## 1. Exploring the Titanic Dataset using Pandas in Python

In [None]:
# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12,6
plt.style.use('ggplot')

Let's import titanic dataset

In [None]:
df = pd.read_csv('../input/titanic/train.csv') # reading data into dataframe
df.head() # displaying first five values

Dropping a few features to make it easier to analyze the data using pivot_table function

In [None]:
df.drop(['PassengerId','Ticket','Name'],axis=1,inplace=True)
df.head()

## 2. Building a Pivot Table using Pandas

### 2.1 Group data using index in a pivot table

- **A single index**

Using the ‘Sex’ column as the index for now:

In [None]:
table1 = pd.pivot_table(data=df,index=['Sex'])
table1

The values in each cell is the mean value of respective category.<br>For example, 27.915709 is the mean age of females.

We can instantly compare all the feature values for both the genders.<br> Visualizing the findings:

In [None]:
table1.plot(kind='bar')

- **Multiple index index**

You can even use more than one feature as an index to group your data. This increases the level of granularity in the resultant table and you can get more specific with your findings:

In [None]:
# multiple indexes

table2 = pd.pivot_table(df,index=['Sex','Pclass'])
table2

Using multiple indexes on the dataset enables us to concur that the disparity in ticket fare for female and male passengers was valid across every Pclass on Titanic.

### 2.2 Different aggregation function for different features

Using multiple indexes on the dataset enables us to concur that the disparity in ticket fare for female and male passengers was valid across every Pclass on Titanic.

Using np.mean() for the ‘Age’ feature and np.sum() for the ‘Survived’ feature;

In [None]:
# different aggregate functions

table3 = pd.pivot_table(df,index=['Sex','Pclass'],
                        aggfunc={'Age':np.mean,'Survived':np.sum})
table3

### 2.3 Aggregate on specific features with values parameter

values parameter is where you tell the function which features to aggregate on. It is an optional field and if you don’t specify this value, then the function will aggregate on all the numerical features of the dataset:

In [None]:
table4 = pd.pivot_table(df,index=['Sex','Pclass'],
                        values=['Survived'],
                        aggfunc=np.mean)
table4

In [None]:
table4.plot(kind='bar');

Findings: The survival rate of passengers aboard the Titanic decreased with a degrading Pclass among both the genders. Moreover, the survival rate of male passengers was lower than the female passengers in any given Pclass.

### 2.4 Find the relationship between features with columns parameter

*columns* parameter is optional and displays the values horizontally on the top of the resultant table.

In [None]:
#columns

table5 = pd.pivot_table(df,index=['Sex'],
                        columns=['Pclass'],
                        values=['Survived'],
                        aggfunc=np.sum)
table5

In [None]:
table5.plot(kind='bar');

### 2.5 Handling missing data

pivot_table even allows you to deal with the missing values through the parameters dropna and fill_value:

In [None]:
# display null values

table6 = pd.pivot_table(df,index=['Sex','Survived','Pclass'],
                        columns=['Embarked'],
                        values=['Age'],
                        aggfunc=np.mean)
table6

Replacing the NaN values with the mean value from the ‘Age’ column:

In [None]:
#handling null values

table7 = pd.pivot_table(df,index=['Sex','Survived','Pclass'],
                        columns=['Embarked'],
                        values=['Age'],
                        aggfunc=np.mean,
                        fill_value=np.mean(df['Age']))
table7

This is just an introduction in using pivot table for data analysis.<br> Feel free to upvote if you find this notebook useful.<p> THANK YOU