# A Study of the Energy Sources used for Global Electricity Generation

___
### __Visualization Goals__

Global warming is a critical issue facing all of humanity.  There has been a worldwide movement to adopt more renewable energy sources, such as wind and solar power, in order to reduce the impacts of burning fossil fuels to generate electricity.

The first goal of these visualizations is to encourage the discovery of how reliant humans are on fossil fuels. The second goal is to identify which continents have been increasing or decreasing the use of fossil fuels for the generation of electricity.  And lastly, demonstrate how a continent's GDP and population affect the demand for electricity and resultant pollution.

___
### __Dataset Import & Transform__

The “Data on Energy” dataset was selected from the "Our World in Data" github repository (https://github.com/owid/energy-data).  This dataset contains a robust amount of information on energy production and consumption.

A subset of the data was taken from 1985 to 2018 focusing on the types of energy which are used to generate electricity for each continent.  The "energy-data.csv" and "continents.csv" contains the raw data from the Our World in Data repository.  These .csv files were combined and quickly transformed externally using a Knime workflow to produce the "energy.csv" file.  This "energy" dataset will be used to study how electricity is generated globally.

#### *Knime workflow:*

![workflow](./workflow.svg)

In [2]:
# Importing Python packages
import pandas as pd
import altair as alt

In [31]:
# Import energy dataframe
energy = pd.read_csv('https://raw.githubusercontent.com/ryan-bulger/electricity-sources/main/data/energy.csv')

# Set N/A = 0
energy = energy.fillna(0)

In [32]:
# Format energy dataframe

# Energy source groups
energy['Low Carbon Sources (TWh)'] = sum([energy['Biofuel Power (TWh)'], energy['Hydro Power (TWh)'], energy['Other Renewable Power (TWh)'], \
                                            energy['Solar Power (TWh)'], energy['Wind Power (TWh)'], energy['Nuclear Power (TWh)']])
energy['High Carbon Sources (TWh)'] = sum([energy['Coal Power (TWh)'], energy['Oil Power (TWh)'], energy['Natural Gas Power (TWh)']])

# Per capita calculations
energy['GDP per capita ($/person)'] = energy['GDP ($)'] / energy['Population']
energy['Electricity Generation per capita (kWh per person)'] = sum([energy['Low Carbon Sources (TWh)'], \
                                                                    energy['High Carbon Sources (TWh)']]) / energy['Population'] * 10**9
energy['GHGs per capita (Tonnes CO2e per person)'] = energy['GHGs from Electricity Generation (MM tonnes of CO2e)'] / energy['Population'] * 10**6

# Display energy dataframe
energy

Unnamed: 0,Year,Continent,Biofuel Power (TWh),Coal Power (TWh),Natural Gas Power (TWh),Hydro Power (TWh),Nuclear Power (TWh),Oil Power (TWh),Other Renewable Power (TWh),Solar Power (TWh),Wind Power (TWh),Population,GDP ($),Electricity Demand (TWh),GHGs from Electricity Generation (MM tonnes of CO2e),Low Carbon Sources (TWh),High Carbon Sources (TWh),GDP per capita ($/person),Electricity Generation per capita (kWh per person),GHGs per capita (Tonnes CO2e per person)
0,1985,Africa,0.00,135.403,9.236,10.904,5.315,11.911,0.00,0.00,0.00,540131194,1225082245818,0.00,0.00,16.219,156.550,2268.119782,319.864881,0.000000
1,1986,Africa,0.00,135.988,10.289,11.488,8.803,13.242,0.00,0.00,0.00,555618223,1251227005661,0.00,0.00,20.291,159.519,2251.954587,323.621495,0.000000
2,1987,Africa,0.00,142.773,11.716,11.911,6.167,15.011,0.00,0.00,0.00,571509829,1267464622308,0.00,0.00,18.078,169.500,2217.747724,328.214828,0.000000
3,1988,Africa,0.00,143.032,13.156,13.110,10.493,16.779,0.00,0.00,0.00,587756630,1323389864320,0.00,0.00,23.603,172.967,2251.594957,334.441144,0.000000
4,1989,Africa,0.00,148.462,13.715,13.629,11.099,17.412,0.00,0.00,0.00,604293660,1364475496994,0.00,0.00,24.728,179.589,2257.967586,338.108793,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
199,2014,South America,56.44,48.470,219.320,636.750,19.720,96.790,0.00,0.82,15.52,408493800,6223611926995,1094.33,243.65,729.250,364.580,15235.511352,2677.715060,0.596459
200,2015,South America,58.74,52.190,233.360,636.930,20.410,96.790,0.00,1.71,27.29,412362690,6391625133915,1127.52,254.33,745.080,382.340,15500.008340,2734.049484,0.616763
201,2016,South America,59.78,52.190,217.070,653.650,22.650,79.470,0.00,3.21,40.81,416164870,5922122056469,1129.29,235.09,780.100,348.730,14230.230573,2712.458646,0.564896
202,2017,South America,60.59,49.390,222.240,659.150,20.570,66.320,0.06,5.39,51.57,419903920,5944422958512,1136.69,226.66,797.330,337.950,14156.626493,2703.666115,0.539790


___
### *Summary & Justification of the Key Design Elements:*

#### *Task Elicitation*  

>**Task 1**  
- Goal:
  - To understand how the mixture of energy sources used for global electricity generation have changed over time, and to demonstrate how humans are reliant on fossil fuels for the generation of electricity.
- Means:
    - Users will navigate the visualizations by scanning through the years to see what the breakdown has been between high and low carbon generation sources.
- Characteristics:
    - The task's intention is to determine if there has been a trend downwards or upwards of the use of high carbon sources compared to low carbon sources.
- Target data:
    - The data used will be created from the total amount of electricity generated from each generation type.  These generation types will be broken out into two groups: High-carbon sources (Coal, Oil, & Natural Gas), and Low-carbon sources (Wind, Solar, Hydro, Biofuels, Nuclear, and Other Renewables).  These generation types and groups will be compared to the Year data.
- Workflow:
    - The visualization will be broken down into two charts.  The first chart will show a 100% area chart comparing the two generation groups (High-carbon & Low-carbon sources) on the y-axis versus the Year on the x-axis.  Users will be able to hover over this chart and a vertical line will track the cursor to show which year is currently being evaluated.  A second chart will be a waterfall chart showing the total generation and a breakdown of each generation type for the year that is being highlighted.
- Roles:
    - This visualization will be targeted towards the general public to assist in their understanding of global electricity generation.
 
>**Task 2**  
- Goal:
    - To understand how greenhouse gasses and the generation of electricity are connected, and which continents are increasing or decreasing pollutants due to the generation of electricity.
- Means:
    - The users will use this visualization to make relationships between continents of the amount of greenhouse gasses that are being emitted over time.
- Characteristics:
    - Users will evaluate the low-level characteristics of the visualization by comparing the overall trends of greenhouse gasses over time by continent.
- Target data:
    - The greenhouse gas emissions will be compared to the Year for each individual continent.
- Workflow:
    - There will be 6 area charts, one for each continent, faceted into 3 rows by 2 columns.  Users will scan and compare the trends between each continent.
- Roles:
    - This visualization will be targeted towards the general public to assist in their understanding of global electricity generation.

>**Task 3**  
- Goal:
    - To understand how the population and per capita GDP of a continent affects the type of electricity generation and the subsequent per capita amount of greenhouse gas emissions that come from that generation.
- Means:
    - Users will be able to organize the data by zooming the view to a single continent to understand how generation and emissions are related to GDP.
- Characteristics:
    - The users will observe the high-level characteristics of patterns and overall trends for each continent, and how those trends are comparable to the other continents.
- Target data:
    - The GDP per capita will be compared to both the Electricity Generation per capita and the Greenhouse Gasses per capita.
- Workflow:
    - There will be two bubble charts with the GDP per capita on the x-axis for both charts.  The first chart will compare the Generation per capita on the y-axis to the GDP per capita for each continent, and the second chart will compare the Greenhouse Gasses per capita on the y-axis to the GDP per capita for each continent.  The size of the bubbles will indicate the population.  All years will be displayed with bubbles getting a darker hue by year to allow the user to observe how the data changes over time.
- Roles:
    - This visualization will be targeted towards the general public to assist in their understanding of global electricity generation.

___
### __Visualization Implementation__

##### *Low-fidelity Prototyping:*

<img src="Task1-2.jpg" width=500>
<img src="Task3.jpg" width=500>

___
### __Visualizations__

#### Task 1 - Mixture of Global Electricity Generation Types

In [5]:
# Create generation dataframe

# Renaming cols
generation = energy.rename({'Biofuel Power (TWh)':'Biofuel',
                            'Coal Power (TWh)':'Coal',
                            'Natural Gas Power (TWh)':'Nat. Gas',
                            'Hydro Power (TWh)':'Hydro',
                            'Nuclear Power (TWh)':'Nuclear',
                            'Oil Power (TWh)':'Oil',
                            'Other Renewable Power (TWh)':'Other',
                            'Solar Power (TWh)':'Solar',
                            'Wind Power (TWh)':'Wind'
                            },
                            axis=1)

# Pivoting dataframe from wide to long format
generation = generation.melt(id_vars=['Year'], 
                            value_vars=['Coal','Biofuel','Nat. Gas','Hydro','Nuclear','Oil','Other','Solar','Wind'],
                            var_name='EnergyType',
                            value_name='TWh')

# Adding High/Low Carbon nomenclature to dataframe
high_c = ['Coal', 'Oil', 'Nat. Gas']
generation['SourceType'] = ['High Carbon Sources' if x in high_c else 'Low Carbon Sources' for x in generation['EnergyType']]

# Display generation dataframe
generation

Unnamed: 0,Year,EnergyType,TWh,SourceType
0,1985,Coal,135.403,High Carbon Sources
1,1986,Coal,135.988,High Carbon Sources
2,1987,Coal,142.773,High Carbon Sources
3,1988,Coal,143.032,High Carbon Sources
4,1989,Coal,148.462,High Carbon Sources
...,...,...,...,...
1831,2014,Wind,15.520,Low Carbon Sources
1832,2015,Wind,27.290,Low Carbon Sources
1833,2016,Wind,40.810,Low Carbon Sources
1834,2017,Wind,51.570,Low Carbon Sources


In [12]:
# Create High/Low Carbon area chart
bar_genType = alt.Chart(generation).mark_bar(color='grey', opacity=0.7, size=13).encode(
    x=alt.X('Year:N', title=None),
    y=alt.Y('sum(TWh):Q', stack=True, title='Terawatt Hours'),
    color=alt.Color(
        'SourceType:N', 
        scale=alt.Scale(range=['#303030','green']),
        legend=alt.Legend(
            direction='horizontal',
            orient='top'),
            title=None),
    order = alt.Order('SourceType', sort='descending'),
).properties(height=300, width=450)

# Create Generation Energy Type bar chart
bar_genSource = alt.Chart(generation).mark_bar().encode(
    x=alt.X('EnergyType:N', sort='-y', title=None,),
    y=alt.Y('sum(TWh):Q', title='Terawatt Hours',scale=alt.Scale(domainMin=0)),
    color=alt.Color(
        'EnergyType',
        scale=alt.Scale(
            domain=['Coal','Oil','Nat. Gas','Hydro','Solar','Wind','Biofuel','Nuclear','Other'],
            range=['#303030','#303030','#303030','green','green','green','green','green','green'],
        ),
        legend=None))

In [13]:
# Selected interval
interval = alt.selection_multi(fields=['Year'], on='mouseover')

# Background genType bar chart
bar_genType_background = bar_genType.add_selection(interval)

# Selected genType bar chart 
bar_genType_selected = bar_genType.transform_filter(interval).mark_bar().encode(
    y=alt.Y('sum(TWh):Q', stack=True),
    color=alt.Color(
        'SourceType:N', 
        scale=alt.Scale(range=['#303030','green']),
        legend=alt.Legend(direction='horizontal', orient='top'), title=None))

# Selected genSource bar chart
bar_genSource_selected = bar_genSource.transform_filter(interval)

# Concatenate charts
alt.concat(bar_genType_background + bar_genType_selected, bar_genSource_selected
).resolve_scale(color='independent',
).properties(title=alt.TitleParams(
    text='Global Electricity Generation Sources',
    fontSize=20,
    subtitle='*Hover mouse to highlight a year.  Hold down <Shift> to select multiple years.',
    subtitleFontSize=10))

#### Task 2 - Greenhouse Gas Emissions from Electricity Generation

In [8]:
# Create Greehouse Gasses dataframe

# Filter to years >= 2000 because GHGs were not recorded for all continents pervious to 2000
ghg = energy[energy['Year']>1999]

# Rename cols
ghg.rename(columns={'GHGs from Electricity Generation (MM tonnes of CO2e)':'GHGs'}, inplace=True)
ghg = ghg[['Year','Continent','GHGs']]

# Display ghg dataframe
ghg

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ghg.rename(columns={'GHGs from Electricity Generation (MM tonnes of CO2e)':'GHGs'}, inplace=True)


Unnamed: 0,Year,Continent,GHGs
15,2000,Africa,236.50
16,2001,Africa,244.20
17,2002,Africa,256.58
18,2003,Africa,273.31
19,2004,Africa,287.44
...,...,...,...
199,2014,South America,243.65
200,2015,South America,254.33
201,2016,South America,235.09
202,2017,South America,226.66


In [35]:
# Create GHGs visualization

# Continent selector
selection = alt.selection_multi(fields=['Continent'], on='mouseover')

# Faceted GHGs area charts
ghg_facet = alt.Chart(ghg).mark_area(
    opacity=0.7,
).encode(
    x=alt.X('Year:N', title=None, axis=None),
    y=alt.Y('GHGs', title=None, axis=None),
    color=alt.Color('Continent:N', legend=None),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.4)),
    facet=alt.Facet('Continent:N', columns=2, title=None, header=alt.Header(labelOrient='bottom', labelAnchor='start')),
).properties(width=200, height=80,
).add_selection(selection)

# GHGs details bar chart
ghg_zoom = alt.Chart(ghg).mark_bar(color='#303030').encode(
    x=alt.X('Year:N', title=None),
    y=alt.Y('sum(GHGs):Q', title='Emissions (CO2e)'),
).properties(width=400, height=300,
).transform_filter(selection)

# Concatenating to create GHGs visualization
alt.concat(ghg_facet, ghg_zoom
).resolve_scale(color='independent',
).properties(title=alt.TitleParams(
    text='Global Greenhouse Gas Emissions Created by Electricity Generation',
    fontSize=20,
    subtitle='*Hover mouse over area charts to view discrete emissions details.',
    subtitleFontSize=10))

#### Task 3 - Influence that Population and GDP have on Electricity Demand and Greenhouse Gases

In [33]:
# Create GDP dataframe

# Filter to years >= 2000 because GHGs were not recorded for all continents pervious to 2000
gdp = energy[energy['Year']>1999]
gdp.drop(columns={'Biofuel Power (TWh)', 'Coal Power (TWh)', 'Natural Gas Power (TWh)', 'Hydro Power (TWh)',
                    'Nuclear Power (TWh)', 'Oil Power (TWh)', 'Other Renewable Power (TWh)', 'Solar Power (TWh)',
                    'Wind Power (TWh)', 'Low Carbon Sources (TWh)', 'High Carbon Sources (TWh)'}, inplace=True)

# Display gdp dataframe
gdp

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gdp.drop(columns={'Biofuel Power (TWh)', 'Coal Power (TWh)', 'Natural Gas Power (TWh)', 'Hydro Power (TWh)',


Unnamed: 0,Year,Continent,Population,GDP ($),Electricity Demand (TWh),GHGs from Electricity Generation (MM tonnes of CO2e),GDP per capita ($/person),Electricity Generation per capita (kWh per person),GHGs per capita (Tonnes CO2e per person)
15,2000,Africa,804634502,2150734280526,424.24,236.50,2672.933208,522.535386,0.293922
16,2001,Africa,824299002,2292132649491,440.66,244.20,2780.705356,530.377932,0.296252
17,2002,Africa,844449025,2450288937929,464.51,256.58,2901.642213,547.386504,0.303843
18,2003,Africa,865145959,2622270527264,487.25,273.31,3031.015171,560.772428,0.315912
19,2004,Africa,886457068,2823553364349,516.33,287.44,3185.211632,581.291547,0.324257
...,...,...,...,...,...,...,...,...,...
199,2014,South America,408493800,6223611926995,1094.33,243.65,15235.511352,2677.715060,0.596459
200,2015,South America,412362690,6391625133915,1127.52,254.33,15500.008340,2734.049484,0.616763
201,2016,South America,416164870,5922122056469,1129.29,235.09,14230.230573,2712.458646,0.564896
202,2017,South America,419903920,5944422958512,1136.69,226.66,14156.626493,2703.666115,0.539790


In [36]:
# Create GDP vs Generation chart
gdp_gen = alt.Chart(gdp).mark_circle().encode(
    x='GDP per capita ($/person):Q',
    y='Electricity Generation per capita (kWh per person):Q',
    color='Continent:N',
    opacity='Population:Q',
    size=alt.Size('Year:O', legend=alt.Legend(symbolLimit=10)),
).properties(height=400, width=300
).interactive()

# Create GDP vs GHGs chart
gdp_ghg = gdp_gen.encode(y='GHGs per capita (Tonnes CO2e per person):Q')

# Create Per Capatia Demand / GHGs visualization 
alt.hconcat(gdp_gen, gdp_ghg).properties(
    title=alt.TitleParams(
        text='Per Capita Electricity Demand and Greenhouse Gas Emissions',
        fontSize=20,
        subtitle='*Scroll to zoom and drag to move around charts.  Double click to reset view.',
        subtitleFontSize=10))

___
### __Visualization Evaluation__

>_Target Question_
- What is the global impact of greenhouse gasses in the generation of electricity?
  
>_Evaluation Participants_
- My business partner and I are currently developing analytics solutions for a consortium of electricity power traders.  For my evaluation I asked two of the power traders and my business partner to participate in the evaluation study.  The power traders typically interact with this type of data on a daily basis.  They were eager to help in the evaluation process due to their experience and breadth of knowledge on the topic of electricity generation.

>_Evaluation Approach_
- The evaluation approach I used involved a combination of Summative Qualitative and Insight Based Evaluation techniques.
- Summative Qualitative Evaluation:
    - This approach was employed to understand the user's early impressions of the low fidelity prototypes and associated visualizations.  I personally already have a solid understanding of how electricity is generated so evaluation techniques such as systematic surveys and semi-structured interviews with my subject matter experts were not necessary, instead I used Think-Aloud Studies.  Users were first asked to review my low fidelity prototypes and provide their verbal feedback.  After they were allowed to interact with the first versions of the visualizations, where they further noted their observations.

>_Quantifying Success of the Visualizations_
- Insight Based Evaluation:
    - To quantify the success of each visualization I used an Insight Based Evaluation approach.  While users interacted with the initial versions of the visualizations I timed how long it took them to make their first insight, and then counted how many insights they each generated per visualization.
    - Results of the Insight Based Evaluation are summarized below:  
    
        | Task | User | Time to 1st Insight | # of Insights |
        |------|------|---------------------|---------------|
        |   1  |   1  |       < 1 sec       |       3       |
        |   1  |   2  |       < 1 sec       |       2       |
        |   1  |   3  |       < 1 sec       |       7       |
        |   2  |   1  |        3 sec        |       4       |
        |   2  |   2  |        5 sec        |       4       |
        |   2  |   3  |        1 sec        |       6       |
        |   3  |   1  |        11 sec       |       6       |
        |   3  |   2  |        14 sec       |       9       |
        |   3  |   3  |        6 sec        |       8       |

>_Results of the Evaluation_
- Task 1
    - The users found it difficult to compare year over year changes because the supply breakdown bar chart's y-axis scale was changing depending on the year(s) selected.    
    - Initially a 100% area chart was used to demonstrate the split between hgh and low carbon sources, as shown in the prototype.  The users said they found it ambiguous to understand the actual volume of electricity that is being produced while viewing the area chart.  Using this feedback I changed this to a bar chart to better show how supply has changed over time.
- Task 2
    - Originally only the area charts were presented to users.  They liked how they could see the impact that Asia has on emissions but struggled to understand how Africa, Oceania, and South America were changing over time.  With this feedback I added a bar chart showing the total emissions globally, and the ability for the user to filter to a specific continent by hovering over that respective continent's area chart. 
- Task 3
    - This visualization was more difficult for users to gain insights from than the other visulizations.  But once they decoded how to read the charts they began to create many insights.
    - Initially there was no interaction (pan/zoom) and users did not like how the bubbles were overlapping each other on a high level.  I added interaction so each continent's data marks could be viewed more thoroughly. 
    - Originally I had Population represented as bubble size and Year indicated by an increase of hue.  The users couldn't easily tell how the years were incresing because the hue was too difficult to differentiate between marks.  I used this information to switch the Year to bubble size and Population to hue.
- Overall
    - The power traders were impressed and found these visualizations to be very insightful.  They have even asked that my business partner and I now incorporate some of the concepts presented into the analytics solutions that we're building for them.

___
### __Summary of Visualization Findings__

>Task 1
- Findings:
    - Global electricity demand has been consistently increasing since 1985.
    - Despite the recent increase of renewable energy supply, the world still heavily relies on fossil fuels in order to meet that demand.
    - It's difficult to imagine the amount of wind and solar power infrastructure needed to completely replace those fossil fuels, especially with demand constantly increasing.
    - The world relies on fossil fuels for our survival, and will rely on them for a very long time going forward.  It's a fallacy to believe renewables will suddenly replace this demand in it's entirety.
- What worked well with this visualization:
    - User insights were quick to generate about electricity demand and how this demand is broken down by supply over time.
- Future improvements:
    - Changing the supply breakdown bar chart to a waterfall chart as was proposed in the prototype.  Unfortunately there was no easy method to create a waterfall chart using Altair.  Will need to investigate if other python visualization packages have this functionality.
    - Would have helped to include the year over year percent change for each energy source to understand which sources have been increasing or decreasing usage over time.

>Task 2
- Findings:
    - Asia disproportionately produces the majority of global emissions from the generation of electricity, so much so that other continents emissions are barely noticeable when conducting a high level review of the data.
    - Asia's emissions have consistently increased since 2000, essentially tripling over the 18 years looked at in this study.  The year over year increase in emissions from electricity generation in Asia does not look like it is slowing down.
    - Europe and North America's emissions peaked around 2007 and have been slightly dropping since.  Despite these decreases Asia has more than replaced these emission reductions.
    - Africa's emissions have been increasing but not nearly as fast as Asia's has been.
    - South America's emissions peaked in 2015 and have been decreasing since.
    - Oceania's emissions have dropped slightly but on average look to be flat over time.
- What worked well with this visualization:
    - The high level area charts gave an instant insight about how disproportionate the emissions are from Asia when compared to other continents.
- Future improvements:
    - It's difficult to hover over the area charts for Africa, Oceania, and South America because their emissions are so much lower.  Would like to find a better methodology to view this data on a high level without losing the context of how many more emissions Asia produces in comparison.
    - Would be interesting to break this down by country after clicking on the continent to identify who are the largest polluters on those continents.

>Task 3
- Findings:
    - Countries with higher GDP's per capita have both higher per capita emissions and demand.
    - Asia and South America are increasing per capita emissions and demand as their GDP grows over time.
    - Africa has slightly increased per capita demand as their GDP grows but have managed to keep the per capita emissions steady with GDP growth.
    - Oceania and North America are consistently decreasing per capita demand with increasing GDP, and are significantly producing less per capita emissions as GDP grows.
    - Europe has a somewhat similar pattern to Oceania and North America but Europe's reductions are not as pronounced.
- What worked well with this visualization:
    - This visualization accurately showed how much further Asia has to go before its per capita emissions and demand are close to that of the rest of the developed world.
- Future improvements:
    - It is still difficult to understand and compare population sizes because of how it is encoded to hue.  The bubbles overlap at points on the high level view and can give the sense that a population is higher than it really is. 