# Data Analysis with Python and Pandas Tutorial
# Part 4 - Pivot tables

## Tutorial Objectives

In this tutorial, you will learn:

  * How to create various pivot tables to solve coplex analytical problems

More information about pivot tables is available here:

https://www.dataquest.io/blog/pandas-pivot-table/

In [None]:
# import the Pandas, NumPy, and Matplotlib libraries
import pandas as pd
import numpy as np
import matplotlib as plt

In [None]:
# read the data set from a URL
url = 'https://s3.amazonaws.com/dq-content/blog/pivot_table/data.csv'
df = pd.read_csv(url, index_col=0)

In [None]:
# output the first few rows of the data frame
df.head()

In [None]:
# output detailed information
df.info()

In [None]:
# output the values in the Year column
df.Year.value_counts()

In [None]:
# convert the Year, Country, and Region columns to categorical columns
df.Year = df.Year.astype('category')
df.Country = df.Country.astype('category')
df.Region = df.Region.astype('category')

## The happiest countries in the world

In [None]:
# create a pivot table with countries in the rows (index), years in the columns, and Happiness in the values
# add totals (margins)
# sort the resulting data frame in descending order by total happiness
df1 = pd.pivot_table(df, index='Country', columns='Year', values='Happiness Score', margins=True)
df1.sort_values(by='All', ascending=False, inplace=True)

In [None]:
# output the happiest countries
df1.head()

In [None]:
# output the least happy countries
df1.tail()

In [None]:
# location total happiness in Vietnam
df1.loc['Vietnam', 'All']

## The happiest regions in the world

In [None]:
# create a pivot table with regions in the rows, year in the columns, happiness in the value, including total
# sort by the total in descending order
df2 = pd.pivot_table(df, index='Region', columns='Year', values='Happiness Score', margins=True)
df2.sort_values(by='All', ascending=False, inplace=True)

In [None]:
# output the happiest few regions
df2.head()

In [None]:
# output the least happy regions
df2.tail()

## Is happiness affected by region?

In [None]:
# prepare a pivot table as the previous one, without total
# sort by 2017 in descending order
df3 = pd.pivot_table(df, index='Region', columns='Year', values='Happiness Score')
df3.sort_values(by=2017, ascending=False, inplace=True)
df3

In [None]:
# plot the pivot table as a bar chart
df3.plot(kind='bar', figsize=(10,4))

## Exercise

Go ahead and load the titanic dataset, do basic cleanups of columns, and then ask research questions such as:
  * Which passenger class had the best chance of survival?
  * What was the influence of gender on survival rates?
  * Which age groups had the best chance to survive?

The dataset is available at:
https://www.kaggle.com/c/titanic/data

And a detailed description is available on Kaggle:
https://1drv.ms/u/s!AgtH78k0_cuvglrqiiwiOUvCS2ZJ

To answer the questions, you need to create relevant tables.

Discuss your solutions with the person next to you!