# Data Visualization Assignment 1 - Jesper Provoost (s1789198)

From the Worldbank Data, I have chosen the dataset about access to electricity as my base dataset. This dataset contains the percentage of a population that has access to electricity. I was very curious about this indicator and its possible correlations with other datasets.

Firstly, I had to wrangle and clean the data since it was not properly formatted. Also, there were some missing (NaN) values. I did this using the Trifacta Wrangler software.To explore the possible causalities and connections with other data, I added some other datasets from Worldbank (i.e. about GDP/capita, urban population and renewable energy usage) to the electricity access dataset. The wrangled datasets were combined, and then exported in CSV format using the Trifacta software.

### *Access to electricity is not a universal privilege*

To explore the electricity access data per country, I imported the dataset into Tableau. Then I created a world map which shows the percentage of population which has access to electricity per country. To the viewer, this map is functional, since it is easy to find the areas where electricity access is low and where it is high. Since the dataset contains all countries of the world, a world map is an ideal way to visualize the percentages per country. For the viewers, it is insightful to see that all countries with low electricity access are located around the equator. In the western countries like The Netherlands, we take the availability of electricity for granted. It is enlightning for viewers to see that the availability of electricity is almost zero in a significant amount of countries. It will make them realize that access to electricity is definitely not a universal privilige.

Then I imported the CSV file into a Pandas DataFrame, which makes it easy to work with the data in Python. 

In [4]:
%%HTML
<div class="flourish-embed" data-src="visualisation/46846"></div><script src="https://public.flourish.studio/resources/embed.js"></script>

In [None]:
import pandas as pd
import numpy as np

data = pd.read_csv('data.csv')

I decided to work with Bokeh to create visualizations. Since I have been learning Python for the last months, I wanted to learn more about specific Python tools for data visualization and analytics. I discovered that Bokeh is a great tool for creating interactive visualizations, for example using tooltips, widgets and animations). I truly believe that these interactive visualization could help people understand the message of my data better and quicker, which is why I am very eager to learn about a framework like Bokeh.

In [None]:
from bokeh.io import show, output_notebook, push_notebook
from bokeh.plotting import figure
from bokeh.models import HoverTool, ColumnDataSource, ColorBar, Diamond, LinearColorMapper, GeoJSONDataSource
from bokeh.layouts import Column
from bokeh.palettes import RdYlGn
from ipywidgets import interact, IntSlider

Firstly, I was interested what the general trend in terms of electricity access has been over the past 25 years. Instead of plotting all countries at the same time, I decided to group the data by year (from 1990 to 2016) before applying an aggregate mean function. This results in a graph which shows the mondial average electricity access per year. I added a tooltip so that the exact percentage can be requested per year.

In [None]:
avg_per_year = ColumnDataSource(data.groupby(["year"]).mean())

hover = HoverTool(tooltips=[("Year", "@year"),("Electricity access", "@electricityAccess%")],names=["access"])

plot_access = figure(title="Access to electricity from 1990 to 2016", plot_width=800, plot_height=300, x_axis_label='Year', y_axis_label='Access to electricity (%)', tools=[hover])
plot_access.line("year", "electricityAccess", source=avg_per_year, color="blue", line_width=3, line_alpha=0.5, name="access", legend="World Bank Historical Data")
plot_access.line([1990,2016],[67.465,88.264], color="grey", line_dash="dotted", legend="UN Development Goal Projection")

plot_access.legend.location = "top_left"
plot_access.legend.click_policy="hide"

output_notebook()
show(plot_access)

In [None]:
hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Electricity access", "@electricityAccess%"),("GDP per capita", "@gdpPerCap{int}$")],names=["circle"])

plot_correlation = figure(title="Correlation between GDP per capita and access to electricity",width=900,height=400, x_axis_label='Access to electricity (%)', y_axis_label='GDP per capita ($)', tools=[hover])
plot_correlation.left[0].formatter.use_scientific = False

for x in data["Country_Name"].unique():
    plot_correlation.line("electricityAccess","gdpPerCap",source=ColumnDataSource(data[data["Country_Name"]==x]),line_alpha=0.2,color="green")

plot_correlation.circle("electricityAccess","gdpPerCap",source=ColumnDataSource(data[data["year"]==2016]),line_alpha=0.3,color="green",name="circle")

show(plot_correlation)

This graph gives great insight in how the access to electricity has globally improved over the years. In 26 years, the percentage has increased from 66.459% to 83.606%.

In [None]:
hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Electricity access", "@electricityAccess%"),("Urban population", "@urbanPopulation%"),("GDP per capita", "@gdpPerCap{int}$")])

source = ColumnDataSource(data.groupby("Country_Name").mean())

plot_correlation = figure(title="Correlation between urban population and electricity access", x_axis_label='Access to electricity (%)', y_axis_label='Urban population (%)', x_range=(-5,105), y_range=(-5,105),tools=[hover])
plot_correlation.scatter("electricityAccess","urbanPopulation",fill_alpha=0.2,radius="radius",source=source, color="purple")

show(plot_correlation)

In [None]:
from IPython.display import Image
Image("vis1.gif",width=600, height=600)

This visualization is not completely truthful.

In [None]:
access_by_country = data.pivot(index="Country_Name", columns="year", values="electricityAccess").groupby(["Country_Name"]).mean().reset_index().sort_values(1990, ascending=False).dropna()
access_by_country = access_by_country[access_by_country[1990]!=100]
access_by_country.columns = access_by_country.columns.astype(str)

country_list = access_by_country["Country_Name"].unique()
hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Access in 1990", "@1990%"),("Access in 2016", "@2016%")],names=["bar"])

p = figure(title="Increase in electricity access between 1990 and 2016",y_range=country_list, x_range=(0,100), plot_width=800, plot_height=2000, outline_line_color=None,tools=[hover])
p.hbar(y="Country_Name", left="1990", right="2016", height=0.5, source=ColumnDataSource(access_by_country), name="bar")
p.diamond(x="2016",y="Country_Name",angle=4.71,size=10,source=ColumnDataSource(access_by_country))

show(p)

In [None]:
source = ColumnDataSource(data.groupby("Country_Name").mean())
mapper = LinearColorMapper(palette=RdYlGn[10], low=data.renewableEnergy.max(), high=data.renewableEnergy.min())

hover = HoverTool(tooltips=[("Country", "@Country_Name"),("Electricity access", "@electricityAccess%"),("Urban population", "@urbanPopulation%"),("GDP per capita", "@gdpPerCap{int}$")])

plot_correlation = figure(title="Do developing countries lead the way towards sustainable energy?", x_axis_label='Access to electricity (%)', y_axis_label='Use of renewable energy sources (%)', width=600, height=600, x_range=(-5,105), y_range=(-5,105),tools=[hover])
s = plot_correlation.circle("electricityAccess","renewableEnergy",fill_alpha=0.2,radius="radius",source=source, color={'field': 'renewableEnergy', 'transform': mapper})

show(plot_correlation)

In [None]:
Image("vis2.gif",width=600, height=600)