ETDataset - Start here
The ETDataset is a repository that includes all the input data of the ETModel. This data can apply to:
- A country's energy system
- Energy technologies (like electric vehicles)
- Energy carriers (like fuels)
Needless to say the data of type 1 are country specific, i.e. they vary per country, as far as data and sources are concerned. Almost all data of types 2 and 3 are global as far as the Energy Transition Model is concerned. That means each country mostly uses the same parameters for energy technologies and carriers.
Documentation of a country's energy system data can be found in the source_analyses directory. The generation of these datasets is a bit tricky and explained below.
Documentation and analysis of data on energy technologies is found in the nodes_source_analyses directory.
Documentation and analysis of data on energy carriers, like fuels for example, is found in the carriers_source_analyses directory.
Each country shown in the Energy Transition Model (ETM) requires a country-specific dataset (Input Data) to correctly model the energy system of that country. The ETDataset repository is dedicated to creating these country-specific datasets. Once a complete dataset has been created it needs to be exported to ETSource from where the Input Data will be used in calculations by ETEngine. The user can interact with the model through the front-end of the ETM that is maintained in the ETModel repository. The Input Data is created by various Research Analyses. In addition to creating and documenting datasets, the ETDataset repository is used for logging all issues and discussions encountered in the process of creating and maintaining Input Data.
One of the key ingredients in the creation of a dataset is the energy balance. We currently use the open source energy balances of Eurostat, but some of the older datasets use proprietary energy balances of IEA. Therefore, we created two versions of the ETDataset repository:
- ETDataset is a private repository that is used by Quintel to generate the Input Data for all countries shown in the ETM. This repository is private, because it not only contains the Eurostat energy balances, but also the proprietary IEA energy balances. Non-Quintel employees can only get access after signing an NDA.
- ETdataset-public is a public copy of the ETDataset repository that does not include the proprietary energy balances. The repository also contains the entire dataset (including a fictional energy balance) for the country example that can be used to review the dataset generation process.
More information on the generation of Input Data for a country and a more in-depth explanation of the relation between ETDataset and ETDataset-public can be found here. If you want to make changes to the model or if you would like to add a new country, have a look at this page. Contact Quintel if you desire more information.
ETDataset's second function is to share the research on all technologies and carriers used by the Energy Transition Model. As explained in the Documentation repository, the ETM can be represented as a network of connected energy conversion technologies. The properties of these technologies are called attributes and the converters themselves are called nodes. These node attributes and the research on which they are based have been documented here.
Energy is carried between nodes by so-called energy carriers or carriers for short. Carriers also have attributes which are documented here.
Since almost all these attributes are the same for all countries, we call this the 'Global dataset'.
The ETDataset repository contains the following folders and files:
- The Analysis Manager is an Excel workbook that serves as the control room for generating country-specific datasets. The Analysis Manager contains macros that facilitate the generation process of Input Data.
- The Analyses folder contains the Research Analyses, Excel files that are tools used to process Research Data. They do not contain any data themselves. The various analyses can be opened via the Analysis Manager. See Dataflow for an explanation.
- The Data folder contains country-specific data that are imported, manipulated and exported by the Research Analyses.
- The Source Analyses folder contains analyses for the assumptions used in the various Research Analyses. Whereas the Analyses folder contains analyses for the uniform data sources, i.e. sources of data that are formatted in the same way for all countries, like Energy Balances for example, the Source Analyses directory contains the non-uniform data and manipulation of such data. For example, data on how many diesel and gasoline cars are found in a country.
- The Nodes Source Analyses folder contains the analyses for each node. If you want to know what publications and attributes we used for all the technologies, this is where you need to be.
- The Carriers Source Analyses folder contains the analyses for each carrier. If you want to know what publications and attributes we used for the energy carriers, this is where you need to be.
- The Documentation folder contains additional and more detailed information for this repository.
This image outlines the dataflow that we use to generate country-specific Input Data from Research Data (the IEA energy balance) and assumptions based on Source Analyses. The generation of Input Data occurs in various Research Analyses. Descriptions of the various Research Analysis and a detailed visualization of the dataflow can be found here.
If you are new to the project, please carefully read our introduction to the nomenclature. Make sure you are familiar with the terms Input Data, Research Analysis, Research Data, assumption, Node Source Analysis and Source Analysis before you read on. These terms are used with a specific intended meaning.
Generate a country dataset for the ETM
The main purpose of this repository is to create country-specific datasets that serve as input for the ETM. In this section we will outline the Input Data generation process using the Analysis Manager and Research Analyses.
To keep things as simple as possible, we describe the process from two perspectives. First, you might want to investigate how Input Data is generated and perform minor adjustments to the dataset. Second, you might want to create a new dataset (i.e. Input Data for a new country or a different year for a country that already has a dataset). We highly recommend that you first get acquainted with manipulating a dataset that already exists. Once you understand how everything works, you can create your own dataset and start using the ETM to model a new country.
The process of generating Input Data is divided into three steps:
1. Generating output files
The process of generating Input Data is covered by a range of Research Analyses, stored in the Analyses folder. The Analysis Manager serves as control room for managing the process. It it important that you work on the analyses in the given order to end up with a meaningful dataset. Often input to later analyses depends on the output of previous analyses. Nevertheless, creating a complete dataset is an iterative process and you might want to jump back or ahead to have a look at other analyses. However, in the end you have to make sure you export the Input Data files from all analyses in the given order.
- Before getting started describes the prerequisites for starting with the dataset generation process.
- A. Investigate the Input Data generation process is a walk-through to get familiar with the Analysis Manager and the analyses.
- B. Manipulate Input Data of an existing dataset is a walk-through describing the steps required to make minor changes to an existing dataset.
- C. Create Input Data for a new country or year is a walk-through for generating a whole new dataset for a new country or a new starting year.
Once you have finished your dataset, you may want to test the impact of your changes on the ETM (on your own computer) and eventually share your changes with others.
2. Testing the dataset
Testing your dataset for the ETM involves two steps:
- You should test if Atlas perform its calculations with your dataset. Atlas is dedicated to initialising the graph structure and energy flows for the ETM.
- You should investigate the impact of your changes on the ETM results by running ETEngine and ETModel on your local machine.
See the Testing locally documentation for a detailed instruction. When you are satisfied with your dataset, you can start sharing your work.
3. Sharing your work
The tools and data of ETDataset are available in this GitHub repository. Git is a version control tool that enables easy collaboration within projects. You can obtain all files, make changes, upload those changes and request for review and merge into the master project. A short introduction to GitHub can be found here.
Sharing your work involves the following steps:
Furthermore, the Understanding the GitHub Workflow page gives a 5-minute introduction to these steps. Do you want an introduction to Git in an interactive way, please try the 15-minute course.
Typically there are two kinds of commits: commits of new sets of input and output files and commits of changes to an Excel analysis. The first are generated for each analysis step that is completed and involve only text files. They are easily dealt with by Git. The latter involve binary files, which are less straightforward when using Git. Changes to input and output files and changes to Excel files should be committed in different commits. When you accidentally saved the Excel file during the dataset generation process, you should checkout (i.e. discard) the changed Excel file (see Commit your changes.
Committing input and output files
The best thing to do is to commit the inputs and outputs for each analysis step. That way if something goes wrong it is easy to roll back step by step. When committing inputs and outputs, at least state the following in the commit message:
- Which Excel analysis step it involves.
- The reason for creating new input and output files.
- What kind of changes you made to the data.
Committing changes to Excel analysis files
When changing Excel analysis files, make sure to document the changes to the Excel both in the Excel file itself (on the Changelog sheet) and in the commit message. Commit one changed Excel file at a time. If you do not properly document your commit, your pull request will not be considered.
Consider the fall-out of your changes. Will more or fewer csv-files be exported to etsource? Do we need to add or delete new or superfluous files somewhere else?
State the following in the commit message:
- Which Excel file you created or changed.
- What worksheets you made changes on.
- What kind of changes you made to the analysis.
After you commit changes on your local machine, you push your
new_branch to GitHub as described in the walk-through. Your changes are then online and visible for other, but not yet merged. By following the walk-through you will end up with a pull request that you assign to a Quintel team member.
Rules (Code of Conduct)
There are a couple of rules that you need to follow when collaborating with other people via Git. There also some extra rules that apply to working on this repository:
- Every commit has to be well documented.
- Always use pull requests instead of just pushing your changes.
- Country-specific data is NEVER stored within an Excel analysis. More specifically, Excel files may not contain energy balances, autoproducer tables, technical specifications and dashboard inputs.
- Commit data changes seperately from changes to Excel files. Only under certain conditions it may be legitimate to make changes to the calculations, text or formatting of an analysis.