# Data Visualization Assignment 1 - Jesper Provoost (s1789198)

From the Worldbank Data, I have chosen the dataset about access to electricity. Firstly, I had to wrangle and clean this dataset since the data was not properly formatted. Also, there were a lot of missing (NaN) values. I did this using the Trifacta Wrangler software. The wrangled dataset was exported in CSV format and loaded into a Pandas DataFrame.

![Image](map.png)

In [2]:
import pandas as pd
import numpy as np

data = pd.read_csv('Data.csv')

I decided to work with Bokeh.

In [3]:
from bokeh.io import show, output_notebook, push_notebook
from bokeh.plotting import figure
from bokeh.models import HoverTool, ColumnDataSource, ColorBar, Diamond, LinearColorMapper, GeoJSONDataSource
from bokeh.layouts import Column
from bokeh.palettes import RdYlGn
from ipywidgets import interact, IntSlider

Firstly, I was interested what the general trend in terms of electricity access has been over the past 25 years. Instead of plotting all countries at the same time, I decided to group the data by year (from 1990 to 2016) before applying an aggregate mean function. This results in a graph which shows the mondial average electricity access per year. I added a tooltip so that the exact percentage can be requested per year.

In [4]:
avg_per_year = ColumnDataSource(data.groupby(["year"]).mean())

hover_access = HoverTool(tooltips=[("Year", "@year"),("Electricity access", "@electricityAccess%")],names=["access"])
hover_consumption = HoverTool(tooltips=[("Year", "@year"),("Electricity consumption", "@electricityConsumption kWh/capita")])

plot_access = figure(title="Access to electricity from 1990 to 2016", plot_width=800, plot_height=300, x_axis_label='Year', y_axis_label='Access to electricity (%)', tools=[hover_access])
plot_access.line("year", "electricityAccess", source=avg_per_year, color="blue", line_width=3, line_alpha=0.5, name="access", legend="World Bank Historical Data")
plot_access.line([1990,2016],[67.465,88.264], color="grey", line_dash="dotted", legend="UN Development Goal Projection")

plot_access.legend.location = "top_left"
plot_access.legend.click_policy="hide"

plot_consumption = figure(title="Consumption of electricity from 1990 to 2016", plot_width=800, plot_height=300, x_axis_label='Year', y_axis_label='Electricity consumption per capita (kWh)', tools=[hover_consumption])
plot_consumption.line("year", "electricityConsumption", source=avg_per_year, color="red", line_width=3, line_alpha=0.5)

output_notebook()
show(Column(plot_access,plot_consumption))

In [16]:
hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Year", "@year")],names=["dot"])

plot_correlation = figure(title="Correlation between urban population and electricity access", x_axis_label='Access to electricity (%)', y_axis_label='GDP per capita ($)', tools=[hover])

for x in data["Country_Name"].unique():
    plot_correlation.line("electricityAccess","gdpPerCap",source=ColumnDataSource(data[data["Country_Name"]==x]),line_alpha=0.2,color="green")

plot_correlation.circle("electricityAccess","gdpPerCap",source=ColumnDataSource(data.groupby("Country_Name").max()),line_alpha=0.5,color="green",name="dot")

show(plot_correlation)

This graph gives great insight in how the access to electricity has globally improved over the years. In 26 years, the percentage has increased from 66.459% to 83.606%.

In [None]:
hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Electricity access", "@electricityAccess%"),("Urban population", "@urbanPopulation%"),("GDP per capita", "@gdpPerCap $")])

source = ColumnDataSource(data.groupby("Country_Name").mean())

plot_correlation = figure(title="Correlation between urban population and electricity access", x_axis_label='Access to electricity (%)', y_axis_label='Urban population (%)', x_range=(-5,105), y_range=(-5,105),tools=[hover])
plot_correlation.scatter("electricityAccess","urbanPopulation",fill_alpha=0.2,radius="radius",source=source, color="purple")

show(plot_correlation)

<img style="float: left;" src="vis1.gif",width=600>

This visualization is not completely truthful.

In [None]:
access_by_country = data.pivot(index="Country_Name", columns="year", values="electricityAccess").groupby(["Country_Name"]).mean().reset_index().sort_values(1990, ascending=False).dropna()
access_by_country = access_by_country[access_by_country[1990]!=100]
access_by_country.columns = access_by_country.columns.astype(str)

country_list = access_by_country["Country_Name"].unique()
hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Access in 1990", "@1990%"),("Access in 2016", "@2016%")],names=["bar"])

p = figure(title="Increase in electricity access between 1990 and 2016",y_range=country_list, x_range=(0,100), plot_width=800, plot_height=2000, outline_line_color=None,tools=[hover])
p.hbar(y="Country_Name", left="1990", right="2016", height=0.5, source=ColumnDataSource(access_by_country), name="bar")
p.diamond(x="2016",y="Country_Name",angle=4.71,size=10,source=ColumnDataSource(access_by_country))

show(p)

In [None]:
source = ColumnDataSource(data.groupby("Country_Name").mean())
mapper = LinearColorMapper(palette=RdYlGn[10], low=data.renewableEnergy.max(), high=data.renewableEnergy.min())

hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Electricity access", "@electricityAccess%"),("Urban population", "@urbanPopulation%"),("GDP per capita", "@gdpPerCap $")])

plot_correlation = figure(title="Do developing countries lead the way towards green energy?", x_axis_label='Access to electricity (%)', y_axis_label='Use of renewable energy sources (%)', width=600, height=600, x_range=(-5,105), y_range=(-5,105),tools=[hover])
s = plot_correlation.circle("electricityAccess","renewableEnergy",fill_alpha=0.2,radius="radius",source=source, color={'field': 'renewableEnergy', 'transform': mapper})

show(plot_correlation)

<img style="float: left;" src="vis2.gif",width=600>