# Data Visualization Assignment 1 - Jesper Provoost (s1789198)

From the Worldbank Data, I have chosen the dataset about access to electricity. Firstly, I had to wrangle and clean this dataset since the data was not properly formatted. Also, there were a lot of missing (NaN) values. I did this using the Trifacta Wrangler software. The wrangled dataset was exported in CSV format and loaded into a Pandas DataFrame.

In [18]:
import pandas as pd
import numpy as np

data = pd.read_csv('Data.csv')

I decided to work with Bokeh.

In [12]:
from bokeh.io import show, output_notebook, push_notebook
from bokeh.plotting import figure
from bokeh.models import HoverTool, ColumnDataSource, ColorMapper, ColorBar
from bokeh.layouts import Column
from ipywidgets import interact, IntSlider

Firstly, I was interested what the general trend in terms of electricity access has been over the past 25 years. Instead of plotting all countries at the same time, I decided to group the data by year (from 1990 to 2016) before applying an aggregate mean function. This results in a graph which shows the mondial average electricity access per year. I added a tooltip so that the exact percentage can be requested per year.

In [13]:
avg_per_year = ColumnDataSource(data.groupby(["year"]).mean())

hover_access = HoverTool(tooltips=[("Year", "@year"),("Electricity access", "@electricityAccess%")],names=["access"])
hover_consumption = HoverTool(tooltips=[("Year", "@year"),("Electricity consumption", "@electricityConsumption kWh/capita")])

plot_access = figure(title="Access to electricity from 1990 to 2016", plot_width=800, plot_height=300, x_axis_label='Year', y_axis_label='Access to electricity (%)', tools=[hover_access])
plot_access.line("year", "electricityAccess", source=avg_per_year, color="blue", line_width=3, line_alpha=0.5, name="access", legend="World Bank Historical Data")
plot_access.line([1990,2016],[66.468,88.264], color="grey", line_dash="dotted", legend="UN Development Goal Projection")

plot_access.legend.location = "top_left"
plot_access.legend.click_policy="hide"

plot_consumption = figure(title="Consumption of electricity from 1990 to 2016", plot_width=800, plot_height=300, x_axis_label='Year', y_axis_label='Electricity consumption per capita (kWh)', tools=[hover_consumption])
plot_consumption.line("year", "electricityConsumption", source=avg_per_year, color="red", line_width=3, line_alpha=0.5)

output_notebook()
show(Column(plot_access,plot_consumption))

This graph gives great insight in how the access to electricity has globally improved over the years. In 26 years, the percentage has increased from 66.459% to 83.606%.

In [36]:
hover_correlation = HoverTool(tooltips=[("Country", "@Country_Name"),("Electricity access", "@electricityAccess%"),("Urban population", "@urbanPopulation%"),("GDP per capita", "@gdpPerCap $")])

plot_correlation = figure(title="Correlation between urban population and electricity access", x_axis_label='Access to electricity (%)', y_axis_label='Urban population (%)', x_range=(-5,105), y_range=(-5,105),tools=[hover_correlation])
s = plot_correlation.scatter("electricityAccess","urbanPopulation",fill_alpha=0.2,radius="radius",source=ColumnDataSource(data), color="purple")

def update(f):
    filtered_data = data[data["year"]==f]
    s.data_source.data = ColumnDataSource.from_df(data[data["year"] == f])
    push_notebook()

show(plot_correlation, notebook_handle=True)

menu = IntSlider(min=1990,max=2016,step=1,value=1990,description="Year:")

interact(update, f=menu)

<function __main__.update>

This visualization is not completely truthful.

In [63]:
country_list = data["Country_Name"].dropna().unique().tolist()
data_by_country = data.groupby(["Country_Name"]).mean()

p = figure(y_range=country_list, x_range=(0,100), plot_width=800, plot_height=3000)
p.hbar(y="Country_Name", left='electricityAccess', right='urbanPopulation', height=0.4, source=ColumnDataSource(data_by_country))

p.xaxis.axis_label = "Time (seconds)"
p.outline_line_color = None

show(p)