# Data Visualization Practice

The goal of this notebook is to practice visualizing data. We will be using the Life Expectancy dataset from Kaggle ([link to dataset](https://www.kaggle.com/kumarajarshi/life-expectancy-who.)). The dataset contains data on life expectancy for multiple countries from 2000 to 2015 along with contributing factors.

There are 7 tasks for you to complete in this notebook. For each task you will be asked to determine the appropriate visual for the given scenario, create a basic visualization, and finally add style to it.

In order to format the data properly for visualizations, we need to use Pandas. The Pandas coding has been provided to you as the focus of this notebook is on creating compelling visuals. Though you will not be writing the pandas code used to format the data, this is a good chance to start thinking about how you would structure/aggregate a dataset to make appropriate visualizations.

### Visualization Tasks:
- Distribution of world-wide life expectancy
- Distribution of life expectancy for developed vs developing countries
- Ten countries with highest average life expectancy in 2015
- Average world-wide life expectancy over 15-year period
- Distribution of world-wide life expectancy for each year in 15-year period
- Correlation between years of schooling and life expectancy
- Life expectancy over 15-year period for 3 countries with highest average life expectancy and 3 countries with lowest average life expectancy

In [None]:
# IMPORT PANDAS AND PREPARE DATASET
import pandas as pd
import numpy as np

# READ IN DATASET
df = pd.read_csv('./data/life-expectancy.csv')

# CLEAN COLUMN NAMES
df.columns = [col.strip().replace(' ', '_').lower() for col in df.columns]

# DROP MISSING VALUES
df = df.dropna(subset=['life_expectancy', 'bmi', 'schooling'])

# DISPLAY DATAFRAME
df.head()

# Task 0: Start by importing matplotlib using the standard alias.

In [None]:
# IMPORT MATPLOTLIB USING STANDARD ALIAS

## Task 1: Distribution of world-wide life expectancy

In [None]:
# PANDAS CODE
life_expectancy = df['life_expectancy']

In [None]:
# Your code here

## Task 2: Distribution of life expectancy for developed vs developing countries

In [None]:
# PANDAS CODE
life_expectancy_developed = df[df['status'] == 'Developed']['life_expectancy']
life_expectancy_developing = df[df['status'] == 'Developing']['life_expectancy']

In [None]:
# Your code here

## Task 3: Ten countries with highest average life expectancy in 2015

In [None]:
# PANDAS CODE
df_2015 = df[df['year'] == 2015]
top_10 = dict(df_2015[['country', 'life_expectancy']].sort_values(by='life_expectancy', ascending=False)[:10].values)
top_10

In [None]:
# Your code here

## Task 4: Average world-wide life expectancy over 15-year period

In [None]:
# PANDAS CODE
avg_per_year = df.groupby('year').mean()['life_expectancy'].to_dict()
avg_per_year

In [None]:
# Your code here

## Task 5: Distribution of world-wide life expectancy for each year in 15-year period

In [None]:
# PANDAS CODE
country_expectancy_per_year = df.groupby('year')['life_expectancy'].apply(lambda x: x.values).to_dict()
country_expectancy_per_year

In [None]:
# Your code here

## Task 6: Correlation between years of schooling and life expectancy

In [None]:
# PANDAS CODE
# We have life_expectancy list from earlier in notebook
years_of_schooling = df['schooling']

In [None]:
# Your code here

## Task 7: Life expectancy over 15-year period for 3 countries with highest average life expectancy and 3 countries with lowest average life expectancy

In [None]:
# PANDAS CODE
top_3_country_names = list(df.groupby('country').mean()['life_expectancy'].sort_values(ascending=False).index[:3])
low_3_country_names = list(df.groupby('country').mean()['life_expectancy'].sort_values(ascending=False).index[-3:])

country_expectancies_per_year = []
for country in top_3_country_names:
    d = {}
    d['name'] = country
    d['yearly_expectancies'] = dict(df[df['country'] == country][['year', 'life_expectancy']].sort_values(by='year').values)
    country_expectancies_per_year.append(d)
for country in low_3_country_names:
    d = {}
    d['name'] = country
    d['yearly_expectancies'] = dict(df[df['country'] == country][['year', 'life_expectancy']].sort_values(by='year').values)
    country_expectancies_per_year.append(d)
    
country_expectancies_per_year

In [None]:
# Your code here