# Introduction

Provided the data downloaded from www.euets.info as zip file, we extract the data using the *ziploader* sub-module:

# Packages and options

In [1]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from pyeutl.ziploader import (get_installations, get_accounts, get_transactions,
                              get_compliance) 

In [2]:
fn_zip = "./eutl_release_20220526.zip"

# Get Data

Get installation data from zip file. The function merges information from lookup tables into the installation table:

In [3]:
df_inst = get_installations(fn_zip, )
df_inst.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17820 entries, 0 to 17819
Data columns (total 32 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   17820 non-null  object 
 1   name                 16204 non-null  object 
 2   registry_id          17820 non-null  object 
 3   activity_id          17820 non-null  int64  
 4   eprtrID              5403 non-null   object 
 5   parentCompany        6348 non-null   object 
 6   subsidiaryCompany    3725 non-null   object 
 7   permitID             16204 non-null  object 
 8   designatorICAO       618 non-null    object 
 9   monitoringID         1615 non-null   object 
 10  monitoringExpiry     0 non-null      float64
 11  monitoringFirstYear  1616 non-null   object 
 12  permitDateExpiry     4899 non-null   object 
 13  isAircraftOperator   17820 non-null  bool   
 14  ec748_2009Code       1615 non-null   object 
 15  permitDateEntry      16204 non-null 

Next, load the account data. The account dataframe includes the information of related look-up tables. If a dataframe with installation information is passed to the function, the installation information is also merged into the account table:

In [4]:
df_acc = get_accounts(fn_zip, 
                          df_installation=df_inst[["id", "name", "activity_id", "activity", "activityCategory", 
                                                   "nace_id", "nace", "naceCategory"]]) 
df_acc.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 42430 entries, 0 to 42429
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   id                         42430 non-null  int64  
 1   name                       41607 non-null  object 
 2   registry_id                42430 non-null  object 
 3   accountHolder_id           41602 non-null  float64
 4   accountType_id             41579 non-null  object 
 5   isOpen                     41606 non-null  object 
 6   openingDate                41570 non-null  object 
 7   closingDate                26279 non-null  object 
 8   commitmentPeriod           40609 non-null  object 
 9   companyRegistrationNumber  37260 non-null  object 
 10  isRegisteredEutl           41606 non-null  object 
 11  installation_id            30614 non-null  object 
 12  registry                   42430 non-null  object 
 13  accountType                41579 non-null  obj

Compliance data follow the same logic:

In [5]:
df_comp = get_compliance(fn_zip, df_installation=df_inst)
df_comp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 478005 entries, 0 to 478004
Data columns (total 46 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   id                    478005 non-null  object 
 1   installation_id       478005 non-null  object 
 2   year                  478005 non-null  int64  
 3   reportedInSystem      478005 non-null  object 
 4   euetsPhase            478005 non-null  object 
 5   compliance_id         209014 non-null  object 
 6   allocatedFree         223739 non-null  float64
 7   allocatedNewEntrance  6886 non-null    float64
 8   allocatedTotal        478005 non-null  int64  
 9   allocated10c          1548 non-null    float64
 10  verified              190430 non-null  float64
 11  verifiedCummulative   198601 non-null  float64
 12  verifiedUpdated       478005 non-null  bool   
 13  surrendered           174173 non-null  float64
 14  complianceCode        209014 non-null  object 
 15  

Transactions follow the same logic also providing the descriptions provided in the look-up tables. In addition, we can pass a table with account information. The transaction loader also allows to resample the transaction data to a higher frequency: 

In [6]:
freq = "M"
# get transactions merged with account information
cols = ['id', 'name', 'registry_id', 'accountHolder_id', 'accountType_id',
        'isRegisteredEutl', 'installation_id', 'registry', 'accountType', 
        'accountCategory', 'installationName', 'activity_id', 'activity', 
        'nace_id', 'nace', "naceCategory" ]
df_trans = get_transactions(fn_zip, drop=["project_id"], freq=freq, 
                      df_account=df_acc[cols])
df_trans.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2080089 entries, 0 to 2080088
Data columns (total 42 columns):
 #   Column                           Dtype         
---  ------                           -----         
 0   date                             datetime64[ns]
 1   id                               int64         
 2   transactionID                    object        
 3   transactionTypeMain_id           int64         
 4   transactionTypeSupplementary_id  int64         
 5   transferringAccount_id           float64       
 6   acquiringAccount_id              float64       
 7   unitType_id                      object        
 8   amount                           int64         
 9   transactionTypeMain              object        
 10  transactionTypeSupplementary     object        
 11  unitType                         object        
 12  transferringName                 object        
 13  transferringRegistry_id          object        
 14  transferringAccountHolder_id     f

We can wrap all of this into a single function, e.g.:

In [7]:
def load_data(fn_zip, freq=None):
    """Load installation account and transaction data from zip file
    :param fn_zip: <string> name of zip file with eutl data
    :param freq: <string> frequency for resampling of transaction data"""
    # get installations
    df_inst = get_installations(fn_zip)
    # get accounts with selected installation information
    df_acc = get_accounts(fn_zip, 
                          df_installation=df_inst[["id", "name", "activity_id", "activity", "activityCategory", 
                                                   "nace_id", "nace", "naceCategory"]]) 
    # get compliance data with installation information
    df_comp = get_compliance(fn_zip, df_installation=df_inst)    
    # get transactions merged with account information
    cols = ['id', 'name', 'registry_id', 'accountHolder_id', 'accountType_id',
            'isRegisteredEutl', 'installation_id', 'registry', 'accountType', 
            'accountCategory', 'installationName', 'activity_id', 'activity', 
            'nace_id', 'nace', "naceCategory" ]
    df_trans = get_transactions(fn_zip, drop=["project_id"], freq=freq, 
                          df_account=df_acc[cols])
    return df_inst, df_acc, df_comp, df_trans

df_inst, df_acc, df_comp, df_trans = load_data(fn_zip, freq=None)

# Export data

We export the compliance data in a format with years given as columns and aggregated out the information in which system the data have been reported

In [12]:
idx_cols = ["installation_id", "name", "registry",
            "activity", "activityCategory",
            "nace_id", "nace", "naceCategory", "year"]
val_cols = ["allocatedFree", "allocatedNewEntrance",
            "allocatedTotal", "allocated10c",
            "verified", "surrendered"]
df_comp_out = df_comp.groupby(idx_cols, as_index=False)[val_cols].sum()
df_comp_out["link"] = df_comp_out.installation_id.map(lambda x: "https://euets.info/installation/%s" % x)
df_comp_out.to_csv("compliance_2022.csv", index=False)