In [159]:
import pandas as pd

In [3]:
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot, plot_mpl
init_notebook_mode(connected=True)

In [4]:
# Personal plotting function for pretty plots
def clean_plot(leg=True, grid=None, font=None):
    ax = plt.gca()
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.yaxis.set_ticks_position('left')
    ax.xaxis.set_ticks_position('bottom')
    
    axis_color = 'lightgrey'
    ax.spines['bottom'].set_color(axis_color)
    ax.spines['left'].set_color(axis_color)
    ax.tick_params(axis='both', color=axis_color)
    
    if leg:
        ax.legend(frameon = False, loc='upper left', bbox_to_anchor=(1, 1))
        
    if grid is not None:
        plt.grid(color='lightgrey', axis = grid, linestyle='-', linewidth=.5)
        
    if font is not None:
        for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
            ax.get_xticklabels() + ax.get_yticklabels()):
            
            item.set_fontfamily(font['family'])
            item.set_color(font['color'])

## Problem 1

GapMinder Test

Score: 31%

Apparently, many things surprised me observing my 31% score. An answer that surprised me in particular what that the proportion of people living in extreem poverty has halved over the past 20 years. It may be a misunderstanding of the term "extreem poverty". I know that statistically the less wealthy populations tend to have more childeren, and would be expanding faster. However, if the "extreem poverty" defination relates to access to electricity, clean water, and medicine then the proportional decline makes more sense. In general, I took the gapminder with a pessimistic view, assuming the worst for many of the questions. The world seems to be improving more than I had assumed in many of the areas pertaining to the questions.

**Question Restatement: What is the distribution of population by income bracket?**

In [137]:
# Gapfinder path head
gp = 'ddf--gapminder--systema_globalis'

popfm = pd.read_csv(f'{gp}/ddf--datapoints--female_population_with_projections--by--geo--time.csv')
popm = pd.read_csv(f'{gp}/ddf--datapoints--male_population_with_projections--by--geo--time.csv')

pop = popfm.merge(popm, how='left', on=['geo', 'time'])
pop['total'] = pop['female_population_with_projections'] + pop['male_population_with_projections']

pop = pop[pop.time <= 2019]

# Get most recent year up to 2019 (don't include projections)
pop = pop.sort_values(['geo', 'time'], ascending=False).drop_duplicates('geo').reset_index(drop=True)

pop = pop.merge(cont_map[['country', 'income_groups', 'name']], how='left', left_on='geo', right_on='country')


In [147]:
fig = px.bar(pop, x="income_groups", y="total", barmode='group', hover_name='name',
            labels=dict(total='Population', income_groups='Income Groups', name='Country'))

fig.layout.title.text = 'Income Bracket Distribution by Population'

fig.show()

The majority of the world's population lie's in the lower-middle income bracket. India has the largest population of the lower-middle income bracket, and the second largest population of any other country. The second highest population is upper-middle income, followed by high income and low income, respectively. Interstingly, if India and China are removed, the high-income bracket has the highest population.

## Problem 2

In [34]:
gdp = pd.read_csv(f'{gp}/ddf--datapoints--gdppercapita_us_inflation_adjusted--by--geo--time.csv').rename(columns={'gdppercapita_us_inflation_adjusted' : 'gdp'})
cont_map = pd.read_csv(f'{gp}/ddf--entities--geo--country.csv')

gdp = gdp.merge(cont_map[['country', 'name', 'world_4region', 'world_6region']],
                how='left', left_on='geo', right_on='country') \
    .rename(columns={'name' : 'Country', 'time' : 'Year', 'gdp' : 'GDP'})
gdp.head()

In [114]:
fig = px.area(gdp, x="Year", y="GDP", color='world_4region', line_group='Country',
             labels=dict(world_4region='Large Continent'))

fig.layout.title.text = 'GDP Over Time by Country and Large Continent'

fig.show()

Generally, GDP has increased over time until 2008. 2008 has a dip in GDP across every continent, likely reflecting the 2008 financial crisis. After an abrupt recovery spike, this plot does not indicate that global GDP ever recovered from the financial crisis. Also noteworthy is that eurpoe and the americas have constituted a disproportionate amount of growth for global GDP since 1960.

In [113]:
fig = px.area(gdp, x="Year", y="GDP", color='world_6region', line_group='Country',
             labels=dict(world_6region='Small Continent'))

fig.layout.title.text = 'GDP Over Time by Country and Small Continent'

fig.show()

A more granular breakdown of continents reveals similar trends to the larger continent plot. An additional insight is that sub-saharan Africa and the East-Asia Pacific continents have changed the least since 1960.

## Problem 3

In [68]:
life_exp = pd.read_csv(f'{gp}/ddf--datapoints--life_expectancy_years--by--geo--time.csv')
mortality = pd.read_csv(f'{gp}/ddf--datapoints--child_mortality_0_5_year_olds_more_years_version_7--by--geo--time.csv')
gdp = pd.read_csv(f'{gp}/ddf--datapoints--gdppercapita_us_inflation_adjusted--by--geo--time.csv').rename(columns={'gdppercapita_us_inflation_adjusted' : 'gdp'})
cont_map = pd.read_csv(f'{gp}/ddf--entities--geo--country.csv')

gdp = gdp.merge(cont_map[['country', 'name', 'world_4region', 'world_6region']],
                how='left', left_on='geo', right_on='country') \
    .merge(life_exp, how='left', on=['geo', 'time']) \
    .merge(mortality, how='left', on=['geo', 'time']) \
    .rename(columns={'name' : 'Country', 'time' : 'Year', 'gdp' : 'GDP',
                    'child_mortality_0_5_year_olds_more_years_version_7': 'Child Mortality',
                    'life_expectancy_years' : 'Life Expectancy'}) \
    .sort_values('Year')

In [118]:
xcol = "GDP"
ycol = "Life Expectancy"

fig = px.scatter(gdp, x=xcol, y=ycol, animation_frame="Year", animation_group="Country",
               color="world_6region", hover_name="Country", log_x=True,
               range_x=[round(min(gdp[xcol]) * .95), round(max(gdp[xcol]) * 1.05)],
               range_y=[round(min(gdp[ycol]) * .95), round(max(gdp[ycol]) * 1.05)],
               labels=dict(world_6region='Small Continent'))

fig.layout.title.text = 'Life Expectancy by GDP Over Time'

fig.show()

Over time, the life expectancy for all continents has increased. Furthermore, the average between contients seems to have grown closer togeather as time has increased, even with similar disparity in GDP. Interestingly, China had the lowest life expectancy in 1960, but it rapidly increased relative to the other low life expectanies. One outlier is Lesotho (sub-saharan Africa), which increased from 50 to 60 by 1990, and decreased from 60 to 50 by 2015. 

In [149]:
xcol = "GDP"
ycol = "Child Mortality"

fig = px.scatter(gdp, x=xcol, y=ycol, animation_frame="Year", animation_group="Country",
           color="world_6region", hover_name="Country", log_x=True,
           range_x=[round(min(gdp[xcol]) * .95), round(max(gdp[xcol]) * 1.05)], 
           range_y=[round(min(gdp[ycol]) * .95), round(max(gdp[ycol]) * 1.05)],
           labels=dict(world_6region='Small Continent'))

fig.layout.title.text = 'Child Mortality by GDP Over Time'

fig.show()

Child mortality decreased over time for all continents, regardless of GDP. GDP's impact on child mortality is much more significant in 1960 compared to 2012. Sub-saharan Africa sticks out as an outlier, with a noticbly higher child mortality rate respective to other continents.

## Problem 4

In [152]:
popfm = pd.read_csv(f'{gp}/ddf--datapoints--female_population_with_projections--by--geo--time.csv')
popm = pd.read_csv(f'{gp}/ddf--datapoints--male_population_with_projections--by--geo--time.csv')

pop = popfm.merge(popm, how='left', on=['geo', 'time'])
pop['total'] = pop['female_population_with_projections'] + pop['male_population_with_projections']

pop = pop[pop.time <= 2019].merge(cont_map[['country', 'name', 'main_religion_2008', 'world_4region', 'world_6region']],
                                 how = 'left', left_on='geo', right_on='country')

In [158]:
viz = pop
bounds = viz.groupby(['main_religion_2008', 'time']).agg({'total' : 'sum'})

xcol = "main_religion_2008"
ycol = "total"

fig = px.bar(viz, x=xcol, y=ycol, animation_frame="time", barmode='group', hover_name='name',
             range_y=[round(min(bounds[ycol]) * .95), round(max(bounds[ycol]) * 1.05)])

fig.layout.title.text = 'Population for Religious Categories Over Time'
fig.layout.xaxis.title = 'Religion'
fig.layout.yaxis.title = 'Population'

fig.show()

Of countries labeled with a national religion, eastern religions constitute the majority of world religions. However, China and India once again contribute vastly to the eastern religion bracked, and removing them would dramatically change the number of people in the religion. Islam grew faster than Christianity and eastern religions between 1950 and 2018. This inference is limited, as not all peoples in a country with a national religion will identify as that religion, which is not accounted for in this visualization.

## Problem 5

I used interactive visualizations to answer the previous questions. Most of the questions were most effectively answered using more than two dimensions, and interactive plots are one method to layering information. Additionally, using interactive plots was effective to inspect individual examples in the plot. The individual examples added to the story of the data in some responses.

Generally, I believe static and interactive plots can be used for different purposes. Static graphs can be useful to draw out specific stories for the reader, and push the reader focus on a particular point. One example might be to make a line of focus on a lineplot bold, with other lines faded. Interactive plots can be useful to help people explore data themselves, and draw their own inferences. However, in the context of a data narritive, a viewer might miss the point while on their own exploration. An interactive animation is one method that can create a specific narrative while enabling interactivity. The change in life expectancy by gdp is an example that shows life expectancy for all countries has increased over time.