# Monitoring of CO2 emissions from passenger cars, 2023 - Final
[2023 Dataset](https://www.eea.europa.eu/en/datahub/datahubitem-view/fa8b1229-3db6-495d-b18e-9c9b3267c02b?activeAccordion=)

|Name|Definition|Datatype|Cardinality|Relevance Comment|
|---|---|---|---|---|
|ID|Identification number.|integer|1..1|mapping/identification only|
|MS|Member state.|varchar(2)|0..1|only indirect influence?|
|Mp|Manufacturer pooling.|varchar(50)|0..1|mapping/identification only|
|VFN|Vehicle family identification number.|varchar(25)|0..1|mapping/identification only|
|Mh|Manufacturer name EU standard denomination .|varchar(50)|0..1|mapping/identification only|
|Man|Manufacturer name OEM declaration.|varchar(50)|0..1|mapping/identification only|
|MMS|Manufacturer name MS registry denomination .|varchar(125)|0..1|mapping/identification only|
|TAN|Type approval number.|varchar(50)|0..1|mapping/identification only|
|T|Type.|varchar(25)|0..1|mapping/identification only|
|Va|Variant.|varchar(25)|0..1|mapping/identification only|
|Ve|Version.|varchar(35)|0..1|mapping/identification only|
|Mk|Make.|varchar(25)|0..1|mapping/identification only|
|Cn|Commercial name.|varchar(50)|0..1|mapping/identification only|
|Ct|Category of the vehicle type approved.|varchar(5) |0..1|maybe correlated to fuel type or engine type?|
|Cr|Category of the vehicle registered.|varchar(5) |0..1|maybe correlated to fuel type or engine type?|
|M (kg)|Mass in running order Completed/complete vehicle .|integer|0..1|relevant?|
|Mt|WLTP test mass.|integer|0..1|relevant?|
|Enedc (g/km)|Specific CO2 Emissions (NEDC).|integer|0..1|older standard?|
|Ewltp (g/km)|Specific CO2 Emissions (WLTP).|integer|0..1|our target variable?|
|W (mm)|Wheel Base.|integer|0..1|potentially relevant (influence on size and weight?)|
|At1 (mm)|Axle width steering axle.|integer|0..1|potentially relevant (influence on size and weight?)|
|At2 (mm)|Axle width other axle.|integer|0..1|potentially relevant (influence on size and weight?)|
|Ft|Fuel type.|varchar(25)|0..1|highly relevant?|
|Fm|Fuel mode.|varchar(1) |0..1|relevant? (e.g. if hybrid)|
|Ec (cm3)|Engine capacity.|integer|0..1|relevant?|
|Ep (KW)|Engine power.|integer|0..1|relevant?|
|Z (Wh/km)|Electric energy consumption.|integer|0..1|tbd|
|IT|Innovative technology or group of innovative technologies.|varchar(25)|0..1|potentially relevant (influence of car characteristics, but maybe too superficial/complex)|
|Ernedc (g/km)|Emissions reduction through innovative technologies.|float|0..1|probably depending on IT value but with focus emissions -> relevant?|
|Erwltp (g/km)|Emissions reduction through innovative technologies (WLTP).|float|0..1|probably depending on IT value but with focus emissions -> relevant?|
|De|Deviation factor.|float|0..1|tbd|
|Vf|Verification factor.|integer|0..1|tbd|
|R|Total new registrations.|integer|0..1|tbd|
|Year|Reporting year.|integer|0..1|relevant?|
|Status|P = Provisional data, F = Final data.|varchar(1) |0..1|tbd|
|Version_file|Internal versioning of deliverables.|varchar(10)|0..1|tbd|
|E (g/km)|Specific CO2 Emission. Deprecated value, only relevant for data until 2016.|float|0..1|tbd|
|Er (g/km)|Emissions reduction through innovative technologies. Deprecated value, only relevant for data until 2016.|float|0..1|tbd|
|Zr|Electric range.|integer|0..1|tbd|
|Dr|Registration date.|date|0..1|tbd|
|Fc|Fuel consumption.|float|0..1|tbd|

## Basic Analysis of data

* Distribution
* Missing values
* Correlations
* ... TODO

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [4]:
df = pd.read_csv("files/2023-eea_europa_eu-CarsCO2.csv")
df.head(5)

  df = pd.read_csv("files/2023-eea_europa_eu-CarsCO2.csv")


Unnamed: 0,ID,Country,VFN,Mp,Mh,Man,MMS,Tan,T,Va,...,Erwltp (g/km),De,Vf,Status,year,Date of registration,Fuel consumption,ech,RLFI,Electric range (km)
0,132193881,DE,IP-MQB37SZ_A3_1036-WVW-1,VOLKSWAGEN,VOLKSWAGEN,VOLKSWAGEN AG,,E13*2007/46*1845*26,A1,DXDBX0AC4,...,1.17,,,F,2023,2023-03-14,6.3,,RL-MQ281_6F_20_001-WVW-1,
1,132193882,DE,IP-03_356_0299-ZFA-1,STELLANTIS,STELLANTIS EUROPE,STELLANTIS EUROPE SPA,,E3*2007/46*0373*33,356,HXS12,...,1.35,,,F,2023,2023-01-27,5.2,,RL-03_BU_334_0112-1C4-1,
2,132193883,DE,IP-MQB37SZ_A0_0564-WVW-1,VOLKSWAGEN,VOLKSWAGEN,VOLKSWAGEN AG,,E13*2007/46*1845*27,A1,DLAAX0AE2,...,1.17,,,F,2023,2023-05-15,6.6,,RL-MQ200_6F_18_019-WVW-1,
3,132193884,DE,IP-0000667-WBA-1,BMW,BMW AG,BAYERISCHE MOTOREN WERKE AG,,E1*2007/46*2063*05,FML2E,11DJ,...,,,,F,2023,2023-11-10,,,RL-0100492-WBA-1,227.0
4,132193885,DE,IP-MEB31AZ_A0_1902-WVW-1,VOLKSWAGEN,VOLKSWAGEN,VOLKSWAGEN AG,,E1*2018/858*00004*12,E2,4ACX1EBL1GX1,...,,,,F,2023,2023-08-10,,,RL-EQ151_1K_21_001-WVW-1,491.0


In [5]:
missing_counts = df.isna().sum()
print(missing_counts)

ID                             0
Country                        0
VFN                       130446
Mp                        960237
Mh                             0
Man                            0
MMS                     10734898
Tan                        32451
T                           5838
Va                         29555
Ve                         37864
Mk                           316
Cn                           387
Ct                         13212
Cr                             0
r                              0
m (kg)                       178
Mt                        161527
Enedc (g/km)            10734898
Ewltp (g/km)               13366
W (mm)                  10734898
At1 (mm)                10734898
At2 (mm)                10734898
Ft                             0
Fm                             0
ec (cm3)                 1670030
ep (KW)                    50683
z (Wh/km)                8298363
IT                       3748017
Ernedc (g/km)           10734898
Erwltp (g/

In [6]:
missing_percentage = missing_counts / len(df)
print(missing_percentage)

ID                      0.000000
Country                 0.000000
VFN                     0.012152
Mp                      0.089450
Mh                      0.000000
Man                     0.000000
MMS                     1.000000
Tan                     0.003023
T                       0.000544
Va                      0.002753
Ve                      0.003527
Mk                      0.000029
Cn                      0.000036
Ct                      0.001231
Cr                      0.000000
r                       0.000000
m (kg)                  0.000017
Mt                      0.015047
Enedc (g/km)            1.000000
Ewltp (g/km)            0.001245
W (mm)                  1.000000
At1 (mm)                1.000000
At2 (mm)                1.000000
Ft                      0.000000
Fm                      0.000000
ec (cm3)                0.155570
ep (KW)                 0.004721
z (Wh/km)               0.773027
IT                      0.349143
Ernedc (g/km)           1.000000
Erwltp (g/

In [7]:
# delete column f missing values above threshold
del_threshold = 0.7
cols_to_be_dropped = list()

for col, percentage in missing_percentage.items():
    if percentage > del_threshold:
        cols_to_be_dropped.append(col)

print(f"Removing: {cols_to_be_dropped}")
df = df.drop(columns=cols_to_be_dropped)

Removing: ['MMS', 'Enedc (g/km)', 'W (mm)', 'At1 (mm)', 'At2 (mm)', 'z (Wh/km)', 'Ernedc (g/km)', 'De', 'Vf', 'RLFI', 'Electric range (km)']


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10734898 entries, 0 to 10734897
Data columns (total 29 columns):
 #   Column                Dtype  
---  ------                -----  
 0   ID                    int64  
 1   Country               object 
 2   VFN                   object 
 3   Mp                    object 
 4   Mh                    object 
 5   Man                   object 
 6   Tan                   object 
 7   T                     object 
 8   Va                    object 
 9   Ve                    object 
 10  Mk                    object 
 11  Cn                    object 
 12  Ct                    object 
 13  Cr                    object 
 14  r                     int64  
 15  m (kg)                float64
 16  Mt                    float64
 17  Ewltp (g/km)          float64
 18  Ft                    object 
 19  Fm                    object 
 20  ec (cm3)              float64
 21  ep (KW)               float64
 22  IT                    object 
 23  Erwlt

In [None]:
# split categorical, numerical
# plot graphs for analysis