<a href="https://colab.research.google.com/github/luke-scot/emissions-tracking/blob/main/energy_consumption.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Energy consumption


Run the first two cells setup the notebook.





In [1]:
%%capture
"""Installation and downloads"""
# Install floweaver and display widget packages
%pip install floweaver ipysankeywidget openpyxl --upgrade

# Import necessary packages
from floweaver import *
import gdown, os
from google.colab import files

# Import and unzip files -> You can then view them in the left files panel
folder, zip_path = 'example_data', 'example_data.zip'
if not os.path.exists(folder): 
  gdown.download('https://drive.google.com/uc?id=1qriY29v7eKJIs07UxAw5RlJirfwuLnyP', zip_path ,quiet=True)
  ! unzip $zip_path -d 'example_data'
  ! rm $zip_path

In [2]:
"""Display setup"""
# Enable widget display for Sankeys in Colab
from google.colab import output
output.enable_custom_widget_manager()

## Task 1 - US example 

Step through this section to see an example for the US based on the [Sankey diagrams of US energy consumption from the Lawrence Livermore National Laboratory](https://flowcharts.llnl.gov/) (thanks to John Muth for the suggestion and transcribing the data).

In [3]:
"""Load the dataset"""
dataset = Dataset.from_csv('example_data/us-energy-consumption.csv',
                           dim_process_filename='example_data/us-energy-consumption-processes.csv')

In [4]:
"""Define the order the nodes appear in"""
sources = ['Solar', 'Nuclear', 'Hydro', 'Wind', 'Geothermal',
           'Natural_Gas', 'Coal', 'Biomass', 'Petroleum']

uses = ['Residential', 'Commercial', 'Industrial', 'Transportation']

In [5]:
"""define the Sankey diagram definition"""
nodes = {
    'sources': ProcessGroup('type == "source"', Partition.Simple('process', sources), title='Sources'),
    #'imports': ProcessGroup(['Net_Electricity_Import'], title='Net electricity imports'),
    'electricity': ProcessGroup(['Electricity_Generation'], title='Electricity Generation'),
    'uses': ProcessGroup('type == "use"', partition=Partition.Simple('process', uses)),
    
    'energy_services': ProcessGroup(['Energy_Services'], title='Energy services'),
    'rejected': ProcessGroup(['Rejected_Energy'], title='Rejected energy'),
    
    'direct_use': Waypoint(Partition.Simple('source', [
        # This is a hack to hide the labels of the partition, there should be a better way...
        (' '*i, [k]) for i, k in enumerate(sources)
    ])),
}

ordering = [
    [[], ['sources'], []],
    [['imports'], ['electricity', 'direct_use'], []],
    [[], ['uses'], []],
    [[], ['rejected', 'energy_services'], []]
]

bundles = [
    Bundle('sources', 'electricity'),
    Bundle('sources', 'uses', waypoints=['direct_use']),
    Bundle('electricity', 'uses'),
    Bundle('imports', 'uses'),
    Bundle('uses', 'energy_services'),
    Bundle('uses', 'rejected'),
    Bundle('electricity', 'rejected'),
]

In [6]:
"""Define the colours to roughly imitate the original Sankey diagram"""
palette = {
    'Solar': 'gold',
    'Nuclear': 'red',
    'Hydro': 'blue',
    'Wind': 'purple',
    'Geothermal': 'brown',
    'Natural_Gas': 'steelblue',
    'Coal': 'black',
    'Biomass': 'lightgreen',
    'Petroleum': 'green',
    'Electricity': 'orange',
    'Rejected energy': 'lightgrey',
    'Energy services': 'dimgrey',
}

And here's the result!

In [7]:
sdd = SankeyDefinition(nodes, bundles, ordering,
                       flow_partition=dataset.partition('type'))
weave(sdd, dataset, palette=palette) \
    .to_widget(width=700, height=450, margins=dict(left=100, right=120), debugging=True)

VBox(children=(SankeyWidget(groups=[{'id': 'sources', 'type': 'process', 'title': 'Sources', 'nodes': ['source…

You can save a copy of the Sankey by adding `.auto_save_png('filename.png')` or `.auto_save_svg('filename.svg')` to the end of the `weave` call in the previous box.

## Task 2 - Create your own

Follow the steps below to create an equivalent Sankey for your own country.

  1. Find and download the IEA World Energy Balances Highlights spreadsheet, from the webpage: https://www.iea.org/reports/world-energy-balances-overview. Then upload it to Colab using the `upload` button in the left panel.

  2. In the next cell import the Excel sheet to a pandas DataFrame. To find appropriate functions for the next steps either have a look at the [pandas documentation](https://pandas.pydata.org/docs/reference/index.html), or remember [your best friend](https://www.google.com/) when writing code.



In [545]:
"""Read in an Excel file"""
import pandas as pd
fileName = 'WorldEnergyBalancesHighlights2021.xlsx'
sheetName = 'TimeSeries_1971-2020'
data = pd.read_excel(fileName,sheet_name=sheetName,header=1)

3. Filter the DataFrame to contain only the desired country data.

In [733]:
"""Get desired country"""
country = 'United Kingdom'
countryData = data.loc[data['Country']==country]

ValueError: ignored

4. Filter the DataFrame to only contain 'Product', 'Flow' and value for the latest full year.

In [723]:
"""Get values for latest year"""
lastYear = max([colName for colName in data.columns if isinstance(colName, int)])
filterData = countryData[['Product','Flow',lastYear]]

# Display data
display(filterData)

Unnamed: 0,Product,Flow,2019
3780,"Coal, peat and oil shale",Production (PJ),58.094726
3781,"Coal, peat and oil shale",Imports (PJ),201.938391
3782,"Coal, peat and oil shale",Exports (PJ),-22.416345
3783,"Coal, peat and oil shale",Total energy supply (PJ),242.756119
3784,"Coal, peat and oil shale","Electricity, CHP and heat plants (PJ)",-90.851513
...,...,...,...
3887,Total,Other final consumption (PJ),407.099906
6080,Fossil fuels,Electricity output (GWh),139728.651
6081,Nuclear,Electricity output (GWh),56183.934
6082,Renewable sources,Electricity output (GWh),120505.012


5. Filter out rows containing summaries (i.e. Total, Production), different units (GWh) or non-numeric values.

In [724]:
"""Filter out Totals and bad characters"""
remove = '|'.join(['Production','Total','GWh'])
filterData = filterData[~filterData['Product'].str.contains(remove)]
filterData = filterData[~filterData['Flow'].str.contains(remove)]
filterData = filterData[[type(i) is not str for i in filterData[lastYear]]]

6. Let's match the format in the files for the US example that you can find in the 'example_data' folder.

In [725]:
"""Create dataset table"""
# Rename the columns to define source, target and value
filterData.rename(columns={'Product':'source', 'Flow':'target', lastYear:'value'}, inplace=True)

# Create type column
filterData['type'] = filterData['source']

# Get absolute values to display exports
filterData['value'] = abs(filterData['value'])

# Create groupings
groups = [['Electricity','Electricity'],['Oil products','Oil refineries']]
for g in groups:
  filterData['target'] = [g[0] if g[1] in i['target'] else i['target'] for i in filterData.iloc]

# Order data so that imports are considered a source and not a target
orderData = filterData.copy()
importRows = np.where(['Imports' in i for i in filterData['target']])[0]
orderData['source'].iloc[importRows] = filterData['target'].iloc[importRows]
orderData['target'].iloc[importRows] = filterData['source'].iloc[importRows]

display(orderData)

Unnamed: 0,source,target,value,type
3781,Imports (PJ),"Coal, peat and oil shale",201.938391,"Coal, peat and oil shale"
3782,"Coal, peat and oil shale",Exports (PJ),22.416345,"Coal, peat and oil shale"
3784,"Coal, peat and oil shale",Electricity,90.851513,"Coal, peat and oil shale"
3785,"Coal, peat and oil shale",Oil products,0,"Coal, peat and oil shale"
3787,"Coal, peat and oil shale",Industry (PJ),61.131224,"Coal, peat and oil shale"
...,...,...,...,...
3871,Heat,Industry (PJ),28.167735,Heat
3872,Heat,Transport (PJ),0,Heat
3873,Heat,Residential (PJ),11.269748,Heat
3874,Heat,Commercial and public services (PJ),12.490803,Heat


6. Let's display all the individual sources and targets and attribute them to process groups for our Sankey diagram.

In [726]:
"""Display individual sources and targets"""
display(orderData['source'].unique(), orderData['target'].unique())

array(['Imports (PJ)', 'Coal, peat and oil shale',
       'Crude, NGL and feedstocks', 'Oil products', 'Natural gas',
       'Nuclear', 'Renewables and waste', 'Electricity', 'Heat'],
      dtype=object)

array(['Coal, peat and oil shale', 'Exports (PJ)', 'Electricity',
       'Oil products', 'Industry (PJ)', 'Transport (PJ)',
       'Residential (PJ)', 'Commercial and public services (PJ)',
       'Other final consumption (PJ)', 'Crude, NGL and feedstocks',
       'Natural gas', 'Nuclear', 'Renewables and waste', 'Heat'],
      dtype=object)

In [727]:
"""Attribute to process groups"""
sources = ['Coal, peat and oil shale', 'Crude, NGL and feedstocks', 'Natural gas', 'Nuclear', 'Renewables and waste','Heat']
uses = ['Industry (PJ)', 'Transport (PJ)', 'Residential (PJ)', 'Commercial and public services (PJ)',
       'Other final consumption (PJ)']
imports = ['Imports (PJ)']
exports = ['Exports (PJ)']
electricity = ['Electricity']
refining = ['Oil products']

7. Create process table as in us-energy-consumption-processes.csv

In [728]:
"""Get all unique types of sources and targets listed in products and flows respectively"""
import numpy as np
idColumn = np.concatenate((sources,uses))
typeColumn = ['source']*len(sources)+['use']*len(uses)
processes = pd.DataFrame(np.array([idColumn,typeColumn]).transpose(), columns=['id','type'])

We now have the same tables as used in the US example. So now copy the Sankey building boxes and see what you can do.

In [729]:
"""Load the dataset"""
dataset = Dataset(orderData, dim_process=processes.set_index('id'))

6. Fetching the Sankey definition for the US energy consumption example, adapt this to fit with your new source and target values.

In [730]:
"""Define the Sankey diagram definition"""
nodes = {
    # Processes
    'sources': ProcessGroup('type == "source"', Partition.Simple('process', sources), title='Sources'),
    'imports': ProcessGroup(imports, title='Imports'),
    'exports': ProcessGroup(exports, title='Exports'),
    'electricity': ProcessGroup(electricity, title='Electricity\n Generation'),
    'uses': ProcessGroup('type == "use"', Partition.Simple('process', uses), title='Uses'),
    'refining': ProcessGroup(refining, title='Refining'),
    # Waypoints
    'direct_use': Waypoint(Partition.Simple('source', [(' '*i, [k]) for i, k in enumerate(sources)])),
    'direct_use_2': Waypoint(Partition.Simple('source', [(' '*i, [k]) for i, k in enumerate(sources)])),
    'direct_use_3': Waypoint(Partition.Simple('source', [(' '*i, [k]) for i, k in enumerate(sources)])),
    'electricity_use': Waypoint(Partition.Simple('source', ' ')),
    'refining_use':  Waypoint(Partition.Simple('source', ' ')),
    'refining_use_2': Waypoint(Partition.Simple('source', ' ')),
    'import_refining': Waypoint(Partition.Simple('source', ' ')),
    'import_electricity': Waypoint(Partition.Simple('source', ' ')),
    'import_electricity_2': Waypoint(Partition.Simple('source', (' ')))
}

ordering = [ 
    [[],['imports']],
    [['sources'],['import_refining','import_electricity']],
    [['direct_use','refining'], ['import_electricity_2']],
    [['direct_use_2','refining_use', 'electricity'],[]],
    [['direct_use_3','refining_use_2','electricity_use'], ['exports']],
    [['uses'], []],
]

bundles = [
    Bundle('imports','sources'),
    Bundle('sources', 'refining'),   
    Bundle('sources', 'electricity', waypoints=['direct_use']),
    Bundle('sources', 'exports', waypoints=['direct_use','direct_use_2']),
    Bundle('sources', 'uses', waypoints=['direct_use','direct_use_2','direct_use_3']),
    Bundle('imports', 'refining', waypoints=['import_refining']),
    Bundle('refining', 'electricity'),
    Bundle('refining','exports', waypoints=['refining_use']),
    Bundle('refining', 'uses', waypoints=['refining_use','refining_use_2']),
    Bundle('imports','electricity', waypoints=['import_electricity','import_electricity_2']),
    Bundle('electricity', 'exports'),
    Bundle('electricity', 'uses', waypoints=['electricity_use'])
]

In [731]:
"""Define the colours to roughly imitate the original Sankey diagram"""
palette = {
    'Coal, peat and oil shale': 'black',
    'Crude, NGL and feedstocks':'grey',
    'Oil products': 'purple',
    'Natural gas': 'steelblue',
    'Nuclear': 'red',
    'Renewables and waste':'green',
    'Electricity': 'orange',
    'Heat': 'red',
    'Fossil Fuels': 'darkgrey',
    'Renewable sources':'lightgreen'
}

In [732]:
"""Draw out Sankey"""
sdd = SankeyDefinition(nodes, bundles, ordering,
                       flow_partition=dataset.partition('type'))
weave(sdd, dataset, palette=palette) \
    .to_widget(width=900, height=500, margins=dict(left=100, right=200)) \
.auto_save_svg(country+'Sankey.svg')

SankeyWidget(groups=[{'id': 'sources', 'type': 'process', 'title': 'Sources', 'nodes': ['sources^Coal, peat an…

# Task 3 - Let's automate this procedure for any country with just one click.

Define a function that incorporates all of the previous steps while thinking about still being able to modify it from the outside.

In [None]:
fileName = 'WorldEnergyBalancesHighlights2021.xlsx'
sheetName = 'TimeSeries_1971-2020'

In [None]:
import pandas as pd
def draw_Country_Sankey(fileName:str, sheetName:str, params:dict=False):

    def get_country_data(fileName, sheetName, countryName=False, headerRows=1):
        data = pd.read_excel(fileName,sheet_name=sheetName,header=headerRows)
        return data.loc[data['Country']==country] if countryName else data.copy()

    def filter_data(data, source, target, year=False):
        if year is False: year = max([colName for colName in data.columns if isinstance(colName, int)])
        return data[[source,target,year]], year

    def format_data(data, colNames=False):
        if colNames: data.rename(columns=colNames, inplace=True)
        data['type'], data['value'] = data['source'], abs(data['value'])
        return data

    def group_processes(data, groupings):
        for g in groupings:
          data['target'] = [g[0] if g[1] in i['target'] else i['target'] for i in data.iloc]
        return data

    # Order data so that imports are considered a source and not a target
    def reorder_data(data, reorders):
        importRows = np.where([reorders in i for i in data['target']])[0]
        orderData['source'].iloc[importRows] = filterData['target'].iloc[importRows]
        orderData['target'].iloc[importRows] = filterData['source'].iloc[importRows]
        return orderData

    
    
    params={
        'countryName': False,
        'source': 'Product',
        'target':'Flow',
        'year': False,
        'groupings':[['Electricity','Electricity'],['Oil products','Oil refineries']],
        'reordering':'Imports',
        'processes':{'sources':['Coal, peat and oil shale', 'Crude, NGL and feedstocks', 'Natural gas', 'Nuclear', 'Renewables and waste','Heat'],
                     'uses':['Industry (PJ)', 'Transport (PJ)', 'Residential (PJ)', 'Commercial and public services (PJ)', 'Other final consumption (PJ)'],
                     'imports':['Imports (PJ)'],
                     'exports':['Exports (PJ)'],
                     'electricity':['Electricity'],
                     'refining':['Oil products']},


        'palette': {
            'Coal, peat and oil shale': 'black',
            'Crude, NGL and feedstocks':'grey',
            'Oil products': 'purple',
            'Natural gas': 'steelblue',
            'Nuclear': 'red',
            'Renewables and waste':'green',
            'Electricity': 'orange',
            'Heat': 'red',
            'Fossil Fuels': 'darkgrey',
            'Renewable sources':'lightgreen'
            }
    }    
    
    countryData = get_country_data(fileName, sheetName, params['countryName'])
    filterData, year = filter_data(countryData,params['source'],params['target'],params['year'])
    formattedData = format_data(filterData,colNames={params['source']:'source', params['target']:'target', year:'value'})
    groupData = group_processes(formattedDta, params['groupings'])
    orderedData = reorder_data(groupData,params['reordering'])



    


