# Introduction

Provided the data downloaded from www.euets.info as zip file, we extract the data using the *ziploader* sub-module:

# Packages and options

In [1]:
import os
from pyeutl import download_data
from pyeutl.ziploader import (
    get_installations, get_accounts, get_transactions,
    get_compliance, get_account_holders) 

# Get Data

Download the data from euets.info. This has to be done only once:

In [2]:
fn_zip = "./eutl.zip"
if not os.path.exists(fn_zip):
    download_data(fn_out=fn_zip)

Get installation data from zip file. The function merges information from lookup tables into the installation table:

In [3]:
df_inst = get_installations(fn_zip, )
df_inst.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19624 entries, 0 to 19623
Data columns (total 39 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      19624 non-null  object 
 1   name                    19624 non-null  object 
 2   tradingSystem_id        19624 non-null  object 
 3   registry_id             19624 non-null  object 
 4   activity_id             19624 non-null  int64  
 5   eprtrID                 18158 non-null  float64
 6   parentCompany           17127 non-null  object 
 7   subsidiaryCompany       5466 non-null   object 
 8   permitID                16483 non-null  object 
 9   designatorICAO          1675 non-null   float64
 10  monitoringID            1674 non-null   object 
 11  monitoringExpiry        0 non-null      float64
 12  monitoringFirstYear     1675 non-null   object 
 13  permitDateExpiry        5849 non-null   object 
 14  isAircraftOperator      19596 non-null

Get account holder informations

In [4]:
df_accountHolder = get_account_holders(fn_zip)
df_accountHolder = df_accountHolder.rename(columns = {
    c: "accountHolder" + c.capitalize() for c in df_accountHolder.columns if c!= "id"})
df_accountHolder.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20406 entries, 0 to 20405
Data columns (total 9 columns):
 #   Column                              Non-Null Count  Dtype 
---  ------                              --------------  ----- 
 0   id                                  20406 non-null  int64 
 1   accountHolderTradingsystem_id       20406 non-null  object
 2   accountHolderName                   20380 non-null  object
 3   accountHolderAddressmain            20319 non-null  object
 4   accountHolderPostalcode             20240 non-null  object
 5   accountHolderCity                   20336 non-null  object
 6   accountHolderLegalentityidentifier  2992 non-null   object
 7   accountHolderCountry_id             20291 non-null  object
 8   accountHolderCountry                20291 non-null  object
dtypes: int64(1), object(8)
memory usage: 1.4+ MB


Next, load the account data. The account dataframe includes the information of related look-up tables. If a dataframe with installation or account holder information is passed to the function, this information is also merged into the account table:

In [5]:
df_acc = get_accounts(fn_zip, 
                  df_installation=df_inst[["id", "name", "activity_id", "activity", "activityCategory", 
                                                   "nace_id", "nace", "naceCategory"]],
                    df_accountHolder=df_accountHolder)
df_acc.info()
# export the list with all accounts and related installation and account holder informations
df_acc.to_excel("eutl_account_full_list.xlsx", index=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45532 entries, 0 to 45531
Data columns (total 36 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   id                                  45532 non-null  int64  
 1   tradingSystem_id                    44748 non-null  object 
 2   accountIDTransactions               33878 non-null  object 
 3   accountIDESD                        226 non-null    object 
 4   yearValid                           224 non-null    float64
 5   name                                44741 non-null  object 
 6   registry_id                         45532 non-null  object 
 7   accountHolder_id                    44744 non-null  float64
 8   accountType_id                      43283 non-null  object 
 9   isOpen                              44748 non-null  object 
 10  openingDate                         44486 non-null  object 
 11  closingDate                         27208

Compliance data follow the same logic:

In [6]:
df_comp = get_compliance(fn_zip, df_installation=df_inst)
df_comp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 525010 entries, 0 to 525009
Data columns (total 55 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   id                      525010 non-null  object 
 1   installation_id         525010 non-null  object 
 2   year                    525010 non-null  int64  
 3   reportedInSystem_id     525010 non-null  object 
 4   euetsPhase              524786 non-null  object 
 5   compliance_id           231376 non-null  object 
 6   allocatedFree           225554 non-null  float64
 7   allocatedNewEntrance    6897 non-null    float64
 8   allocatedTotal          525010 non-null  int64  
 9   allocated10c            1548 non-null    float64
 10  verified                209363 non-null  float64
 11  verifiedCummulative     219836 non-null  float64
 12  verifiedUpdated         524786 non-null  object 
 13  surrendered             192145 non-null  float64
 14  balance             

Transactions follow the same logic also providing the descriptions provided in the look-up tables. In addition, we can pass a table with account information. The transaction loader also allows to resample the transaction data to a higher frequency: 

In [7]:
freq = "ME"
# get transactions merged with account information
cols = ['id', 'name', 'registry_id', 'accountHolder_id', 'accountType_id',
        'isRegisteredEutl', 'installation_id', 'registry', 'accountType', 
        'accountCategory', 'installationName', 'activity_id', 'activity', 
        'nace_id', 'nace', "naceCategory" ]
df_trans = get_transactions(fn_zip, drop=["project_id"], freq=freq, 
                      df_account=df_acc[cols])
df_trans.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2016690 entries, 0 to 2016689
Data columns (total 45 columns):
 #   Column                           Dtype         
---  ------                           -----         
 0   date                             datetime64[ns]
 1   id                               int64         
 2   transactionID                    object        
 3   transactionTypeMain_id           int64         
 4   transactionTypeSupplementary_id  int64         
 5   transferringAccount_id           float64       
 6   acquiringAccount_id              float64       
 7   unitType_id                      object        
 8   tradingSystem_id                 object        
 9   acquiringYear                    float64       
 10  transferringYear                 float64       
 11  amount                           int64         
 12  transactionTypeMain              object        
 13  transactionTypeSupplementary     object        
 14  unitType                         o

In [8]:
df_trans

Unnamed: 0,date,id,transactionID,transactionTypeMain_id,transactionTypeSupplementary_id,transferringAccount_id,acquiringAccount_id,unitType_id,tradingSystem_id,acquiringYear,...,acquiringInstallation_id,acquiringRegistry,acquiringAccountType,acquiringAccountCategory,acquiringInstallationName,acquiringActivity_id,acquiringActivity,acquiringNace_id,acquiringNace,acquiringNaceCategory
0,2005-01-31,1480620,DK1,1,51,201.0,201.0,EUA2005,euets,,...,,Denmark,Party Holding Account,Administrative Account,,,,,,
1,2005-02-28,449,FI389,10,53,1407.0,1854.0,EUA2005,euets,,...,FI_531,Finland,Former Operator Holding Account,Operator Holding Account,Olkiluodon ydinvoimalaitos,1.0,Combustion installations with a rated thermal ...,35.00,"Electricity, gas, steam and air conditioning s...",Energy: General
2,2005-02-28,1137,FI125,10,53,1407.0,1539.0,EUA2005,euets,,...,FI_87,Finland,Former Operator Holding Account,Operator Holding Account,Centralstationens värmecentral,99.0,Other activity opted-in pursuant to Article 24...,35.11,Production of electricity,Energy: Electricity generation
3,2005-02-28,2691,NL59,10,53,1106.0,1257.0,EUA2005,euets,,...,NL_99,Netherlands,Former Operator Holding Account,Operator Holding Account,Shell Nederland Raffinaderij B.V.,2.0,Mineral oil refineries,19.20,Manufacture of refined petroleum products,"Manufacturing: Refineries, chemicals, and pha..."
4,2005-02-28,4986,FI326,10,53,1407.0,1781.0,EUA2005,euets,,...,FI_263,Finland,Former Operator Holding Account,Operator Holding Account,Mattilan lämpökeskus,99.0,Other activity opted-in pursuant to Article 24...,35.30,Steam and air conditioning supply,Energy: Steam and air conditioning supply
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016685,2023-03-31,1876738,EU770612,10,17,7171817.0,7171851.0,AEA,esd,,...,,European Commission,EU Effort Sharing Account,,,,,,,
2016686,2023-03-31,1876739,EU770614,10,17,7171841.0,7171851.0,AEA,esd,,...,,European Commission,EU Effort Sharing Account,,,,,,,
2016687,2023-03-31,1876740,EU770615,10,17,7171833.0,7171851.0,AEA,esd,,...,,European Commission,EU Effort Sharing Account,,,,,,,
2016688,2023-03-31,1876741,EU770616,10,17,7171673.0,7171851.0,AEA,esd,,...,,European Commission,EU Effort Sharing Account,,,,,,,


We can wrap all of this into a single function, e.g.:

In [9]:
def load_data(fn_zip, freq=None):
    """Load installation account and transaction data from zip file
    :param fn_zip: <string> name of zip file with eutl data
    :param freq: <string> frequency for resampling of transaction data"""
    # get installations
    df_inst = get_installations(fn_zip)
    # get accounts with selected installation information
    df_acc = get_accounts(fn_zip, 
                          df_installation=df_inst[["id", "name", "activity_id", "activity", "activityCategory", 
                                                   "nace_id", "nace", "naceCategory"]]) 
    # get compliance data with installation information
    df_comp = get_compliance(fn_zip, df_installation=df_inst)    
    # get transactions merged with account information
    cols = ['id', 'name', 'registry_id', 'accountHolder_id', 'accountType_id',
            'isRegisteredEutl', 'installation_id', 'registry', 'accountType', 
            'accountCategory', 'installationName', 'activity_id', 'activity', 
            'nace_id', 'nace', "naceCategory" ]
    df_trans = get_transactions(fn_zip, drop=["project_id"], freq=freq, 
                          df_account=df_acc[cols])
    return df_inst, df_acc, df_comp, df_trans

df_inst, df_acc, df_comp, df_trans = load_data(fn_zip, freq=None)