# Elektrikli Araç Özellikleri Veri Seti (2025)

Bu veri seti, modern elektrikli araçlar (EV'ler) için kapsamlı bir özellik ve performans ölçümleri koleksiyonu sağlar. Veri bilimi, makine öğrenimi, otomotiv pazar araştırması, sürdürülebilirlik çalışmaları veya elektrikli araç benimseme analizi üzerinde çalışan araştırmacıları, analistleri, öğrencileri ve geliştiricileri desteklemek için tasarlanmıştır.

1. Çoklu Görevli Model (Multi-task Learning)
Aynı anda hem menzil tahmini yap, hem de araç segmentini/sınıfını tahmin et.

Böylece model hem sayısal hem kategorik çıktı üretir.


In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings 
warnings.filterwarnings("ignore")

pd.set_option("display.max_columns",100)
pd.set_option("display.width",100)
pd.set_option('display.float_format', lambda x: '%.3f' % x) 


In [2]:
df=pd.read_csv('electric_vehicles_spec_2025.csv.csv')

In [3]:
# EDA

In [4]:
df.head()

Unnamed: 0,brand,model,top_speed_kmh,battery_capacity_kWh,battery_type,number_of_cells,torque_nm,efficiency_wh_per_km,range_km,acceleration_0_100_s,fast_charging_power_kw_dc,fast_charge_port,towing_capacity_kg,cargo_volume_l,seats,drivetrain,segment,length_mm,width_mm,height_mm,car_body_type,source_url
0,Abarth,500e Convertible,155,37.8,Lithium-ion,192.0,235.0,156,225,7.0,67.0,CCS,0.0,185,4,FWD,B - Compact,3673,1683,1518,Hatchback,https://ev-database.org/car/1904/Abarth-500e-C...
1,Abarth,500e Hatchback,155,37.8,Lithium-ion,192.0,235.0,149,225,7.0,67.0,CCS,0.0,185,4,FWD,B - Compact,3673,1683,1518,Hatchback,https://ev-database.org/car/1903/Abarth-500e-H...
2,Abarth,600e Scorpionissima,200,50.8,Lithium-ion,102.0,345.0,158,280,5.9,79.0,CCS,0.0,360,5,FWD,JB - Compact,4187,1779,1557,SUV,https://ev-database.org/car/3057/Abarth-600e-S...
3,Abarth,600e Turismo,200,50.8,Lithium-ion,102.0,345.0,158,280,6.2,79.0,CCS,0.0,360,5,FWD,JB - Compact,4187,1779,1557,SUV,https://ev-database.org/car/3056/Abarth-600e-T...
4,Aiways,U5,150,60.0,Lithium-ion,,310.0,156,315,7.5,78.0,CCS,,496,5,FWD,JC - Medium,4680,1865,1700,SUV,https://ev-database.org/car/1678/Aiways-U5


In [5]:
df.shape

(478, 22)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 478 entries, 0 to 477
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   brand                      478 non-null    object 
 1   model                      477 non-null    object 
 2   top_speed_kmh              478 non-null    int64  
 3   battery_capacity_kWh       478 non-null    float64
 4   battery_type               478 non-null    object 
 5   number_of_cells            276 non-null    float64
 6   torque_nm                  471 non-null    float64
 7   efficiency_wh_per_km       478 non-null    int64  
 8   range_km                   478 non-null    int64  
 9   acceleration_0_100_s       478 non-null    float64
 10  fast_charging_power_kw_dc  477 non-null    float64
 11  fast_charge_port           477 non-null    object 
 12  towing_capacity_kg         452 non-null    float64
 13  cargo_volume_l             477 non-null    object 

In [7]:
df.isnull().sum()

brand                          0
model                          1
top_speed_kmh                  0
battery_capacity_kWh           0
battery_type                   0
number_of_cells              202
torque_nm                      7
efficiency_wh_per_km           0
range_km                       0
acceleration_0_100_s           0
fast_charging_power_kw_dc      1
fast_charge_port               1
towing_capacity_kg            26
cargo_volume_l                 1
seats                          0
drivetrain                     0
segment                        0
length_mm                      0
width_mm                       0
height_mm                      0
car_body_type                  0
source_url                     0
dtype: int64

In [8]:
# Feature Engineering

number of cells sütununda çok fazla eksik değer olduğu için onu çıkartıp ilk etapta menzil tahmini yapacağım. 

In [9]:
## number_of_cells sütununu sil.
df = df.drop(columns=['number_of_cells'])


In [10]:
df.head()

Unnamed: 0,brand,model,top_speed_kmh,battery_capacity_kWh,battery_type,torque_nm,efficiency_wh_per_km,range_km,acceleration_0_100_s,fast_charging_power_kw_dc,fast_charge_port,towing_capacity_kg,cargo_volume_l,seats,drivetrain,segment,length_mm,width_mm,height_mm,car_body_type,source_url
0,Abarth,500e Convertible,155,37.8,Lithium-ion,235.0,156,225,7.0,67.0,CCS,0.0,185,4,FWD,B - Compact,3673,1683,1518,Hatchback,https://ev-database.org/car/1904/Abarth-500e-C...
1,Abarth,500e Hatchback,155,37.8,Lithium-ion,235.0,149,225,7.0,67.0,CCS,0.0,185,4,FWD,B - Compact,3673,1683,1518,Hatchback,https://ev-database.org/car/1903/Abarth-500e-H...
2,Abarth,600e Scorpionissima,200,50.8,Lithium-ion,345.0,158,280,5.9,79.0,CCS,0.0,360,5,FWD,JB - Compact,4187,1779,1557,SUV,https://ev-database.org/car/3057/Abarth-600e-S...
3,Abarth,600e Turismo,200,50.8,Lithium-ion,345.0,158,280,6.2,79.0,CCS,0.0,360,5,FWD,JB - Compact,4187,1779,1557,SUV,https://ev-database.org/car/3056/Abarth-600e-T...
4,Aiways,U5,150,60.0,Lithium-ion,310.0,156,315,7.5,78.0,CCS,,496,5,FWD,JC - Medium,4680,1865,1700,SUV,https://ev-database.org/car/1678/Aiways-U5


In [11]:
df.sample()

Unnamed: 0,brand,model,top_speed_kmh,battery_capacity_kWh,battery_type,torque_nm,efficiency_wh_per_km,range_km,acceleration_0_100_s,fast_charging_power_kw_dc,fast_charge_port,towing_capacity_kg,cargo_volume_l,seats,drivetrain,segment,length_mm,width_mm,height_mm,car_body_type,source_url
193,Lotus,Emeya S,250,98.9,Lithium-ion,710.0,198,520,4.2,240.0,CCS,2250.0,509,5,AWD,F - Luxury,5139,2005,1464,Liftback Sedan,https://ev-database.org/car/2142/Lotus-Emeya-S


In [12]:
#cargo_volum1 i sayısala çevir.
df['cargo_volume_l'] = df['cargo_volume_l'].str.extract(r'(\d+)').astype(float)


In [13]:
#eksik verileri mean ile doldur.
df['torque_nm'] = df['torque_nm'].fillna(df['torque_nm'].mean())
df['fast_charging_power_kw_dc'] = df['fast_charging_power_kw_dc'].fillna(df['fast_charging_power_kw_dc'].mean())
df['towing_capacity_kg'] = df['towing_capacity_kg'].fillna(df['towing_capacity_kg'].mean())
df['cargo_volume_l'] = df['cargo_volume_l'].fillna(df['cargo_volume_l'].mean())


In [14]:
#volume_mm3, batarya kapasitesi ile birlikte menzil tahmini için kilit bir yardımcı .
df['volume_mm3'] = df['length_mm'] * df['width_mm'] * df['height_mm']


In [15]:
#teorik menzil oluşturduk.
df['estimated_range'] = df['battery_capacity_kWh'] * 1000 / df['efficiency_wh_per_km']


In [16]:
df.head()

Unnamed: 0,brand,model,top_speed_kmh,battery_capacity_kWh,battery_type,torque_nm,efficiency_wh_per_km,range_km,acceleration_0_100_s,fast_charging_power_kw_dc,fast_charge_port,towing_capacity_kg,cargo_volume_l,seats,drivetrain,segment,length_mm,width_mm,height_mm,car_body_type,source_url,volume_mm3,estimated_range
0,Abarth,500e Convertible,155,37.8,Lithium-ion,235.0,156,225,7.0,67.0,CCS,0.0,185.0,4,FWD,B - Compact,3673,1683,1518,Hatchback,https://ev-database.org/car/1904/Abarth-500e-C...,9383758362,242.308
1,Abarth,500e Hatchback,155,37.8,Lithium-ion,235.0,149,225,7.0,67.0,CCS,0.0,185.0,4,FWD,B - Compact,3673,1683,1518,Hatchback,https://ev-database.org/car/1903/Abarth-500e-H...,9383758362,253.691
2,Abarth,600e Scorpionissima,200,50.8,Lithium-ion,345.0,158,280,5.9,79.0,CCS,0.0,360.0,5,FWD,JB - Compact,4187,1779,1557,SUV,https://ev-database.org/car/3057/Abarth-600e-S...,11597583861,321.519
3,Abarth,600e Turismo,200,50.8,Lithium-ion,345.0,158,280,6.2,79.0,CCS,0.0,360.0,5,FWD,JB - Compact,4187,1779,1557,SUV,https://ev-database.org/car/3056/Abarth-600e-T...,11597583861,321.519
4,Aiways,U5,150,60.0,Lithium-ion,310.0,156,315,7.5,78.0,CCS,1052.261,496.0,5,FWD,JC - Medium,4680,1865,1700,SUV,https://ev-database.org/car/1678/Aiways-U5,14837940000,384.615


In [17]:
df.tail()

Unnamed: 0,brand,model,top_speed_kmh,battery_capacity_kWh,battery_type,torque_nm,efficiency_wh_per_km,range_km,acceleration_0_100_s,fast_charging_power_kw_dc,fast_charge_port,towing_capacity_kg,cargo_volume_l,seats,drivetrain,segment,length_mm,width_mm,height_mm,car_body_type,source_url,volume_mm3,estimated_range
473,Zeekr,7X Premium RWD,210,71.0,Lithium-ion,440.0,148,365,6.0,240.0,CCS,2000.0,539.0,5,RWD,JD - Large,4787,1930,1650,SUV,https://ev-database.org/car/3081/Zeekr-7X-Prem...,15244201500,479.73
474,Zeekr,X Core RWD (MY25),190,49.0,Lithium-ion,343.0,148,265,5.9,70.0,CCS,1600.0,362.0,5,RWD,JB - Compact,4432,1836,1566,SUV,https://ev-database.org/car/3197/Zeekr-X-Core-RWD,12742780032,331.081
475,Zeekr,X Long Range RWD (MY25),190,65.0,Lithium-ion,343.0,146,360,5.6,114.0,CCS,1600.0,362.0,5,RWD,JB - Compact,4432,1836,1566,SUV,https://ev-database.org/car/3198/Zeekr-X-Long-...,12742780032,445.205
476,Zeekr,X Privilege AWD (MY25),190,65.0,Lithium-ion,543.0,153,350,3.8,114.0,CCS,1600.0,362.0,5,AWD,JB - Compact,4432,1836,1566,SUV,https://ev-database.org/car/3199/Zeekr-X-Privi...,12742780032,424.837
477,firefly,,150,41.2,Lithium-ion,200.0,125,250,8.1,65.0,CCS,0.0,404.0,5,RWD,B - Compact,4003,1885,1557,Hatchback,https://ev-database.org/car/3178/firefly-firefly,11748584835,329.6


In [18]:
df.sample()

Unnamed: 0,brand,model,top_speed_kmh,battery_capacity_kWh,battery_type,torque_nm,efficiency_wh_per_km,range_km,acceleration_0_100_s,fast_charging_power_kw_dc,fast_charge_port,towing_capacity_kg,cargo_volume_l,seats,drivetrain,segment,length_mm,width_mm,height_mm,car_body_type,source_url,volume_mm3,estimated_range
14,Audi,A6 Sportback e-tron performance,210,94.9,Lithium-ion,565.0,141,610,5.4,200.0,CCS,2100.0,502.0,5,RWD,JE - Executive,4928,1923,1455,Liftback Sedan,https://ev-database.org/car/2270/Audi-A6-Sport...,13788371520,673.05


In [19]:
#Kategorik değişkenleri sayısala çevir.
df = pd.get_dummies(df, columns=['drivetrain', 'car_body_type'], drop_first=True)


In [21]:
#hedef ve özellikleri ayır.
target = 'range_km'

features = [
    'top_speed_kmh', 'battery_capacity_kWh', 'torque_nm',
    'efficiency_wh_per_km', 'acceleration_0_100_s', 'fast_charging_power_kw_dc',
    'towing_capacity_kg', 'cargo_volume_l', 'seats',
    'volume_mm3', 'estimated_range'
] + [col for col in df.columns if col.startswith('drivetrain_') or col.startswith('car_body_type_')]

x = df[features]
y = df[target]


In [23]:
#Train-Test Split
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)


## Amaç: Menzil tahmini yapmak:

In [24]:
#ölçekleme
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)


In [26]:
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

xgb_model = XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42)
xgb_model.fit(x_train_scaled, y_train)

y_pred_xgb = xgb_model.predict(x_test_scaled)

# Performans metrikleri
mae_xgb = mean_absolute_error(y_test, y_pred_xgb)
rmse_xgb = mean_squared_error(y_test, y_pred_xgb, squared=False)
r2_xgb = r2_score(y_test, y_pred_xgb)

print(f"XGBoost MAE: {mae_xgb:.2f} km")
print(f"XGBoost RMSE: {rmse_xgb:.2f} km")
print(f"XGBoost R2: {r2_xgb:.3f}")


XGBoost MAE: 12.47 km
XGBoost RMSE: 16.81 km
XGBoost R2: 0.973


In [27]:
from sklearn.ensemble import RandomForestRegressor

rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(x_train_scaled, y_train)

y_pred_rf = rf_model.predict(x_test_scaled)

# Performans metrikleri
mae_rf = mean_absolute_error(y_test, y_pred_rf)
rmse_rf = mean_squared_error(y_test, y_pred_rf, squared=False)
r2_rf = r2_score(y_test, y_pred_rf)

print(f"Random Forest MAE: {mae_rf:.2f} km")
print(f"Random Forest RMSE: {rmse_rf:.2f} km")
print(f"Random Forest R2: {r2_rf:.3f}")


Random Forest MAE: 15.70 km
Random Forest RMSE: 21.87 km
Random Forest R2: 0.955


Sonuç: Random Forest modeli de yüksek başarı gösterse de, tüm metriklerde XGBoost Regressor daha iyi sonuç vermiştir. Özellikle RMSE değerinin düşük olması, XGBoost modelinin büyük hataları engellemede daha başarılı olduğunu göstermektedir. Bu nedenle menzil tahmini için XGBoost modeli tercih edilmiştir. Overfiting yok çünkü RMSE yüksek değil.



## Amaç: Araç segmentasyonu/sınıfını tahmin etmek

In [39]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix


In [40]:
df['predicted_range_km'] = xgb_model.predict(scaler.transform(df[features]))


In [41]:
le = LabelEncoder()
df['segment_encoded'] = le.fit_transform(df['segment'])


In [42]:
features_classification = [
    'top_speed_kmh', 'battery_capacity_kWh', 'torque_nm',
    'efficiency_wh_per_km', 'acceleration_0_100_s',
    'fast_charging_power_kw_dc', 'towing_capacity_kg',
    'cargo_volume_l', 'seats', 'volume_mm3',
    'estimated_range', 'predicted_range_km'
] + [col for col in df.columns if col.startswith('drivetrain_') or col.startswith('car_body_type_')]


In [43]:
x = df[features_classification]
y = df['segment_encoded']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)


In [44]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(x_train, y_train)

y_pred = clf.predict(x_test)


In [45]:
# Test setinde gerçekten bulunan sınıf indexleri
unique_labels = np.unique(y_test)

# O sınıfların orijinal (string) isimleri
target_names = le.inverse_transform(unique_labels)

# Raporu üret
print("Sınıflandırma Raporu:\n")
print(classification_report(y_test, y_pred, labels=unique_labels, target_names=target_names))

# Opsiyonel: Confusion Matrix
print("\nConfusion Matrix:\n")
print(confusion_matrix(y_test, y_pred, labels=unique_labels))


Sınıflandırma Raporu:

                   precision    recall  f1-score   support

      B - Compact       1.00      1.00      1.00         8
       C - Medium       1.00      1.00      1.00         5
        D - Large       1.00      1.00      1.00         8
    E - Executive       0.83      0.83      0.83         6
       F - Luxury       0.88      0.88      0.88         8
     JB - Compact       1.00      1.00      1.00         7
      JC - Medium       1.00      0.95      0.97        19
       JD - Large       0.79      1.00      0.88        11
   JE - Executive       1.00      0.71      0.83         7
      JF - Luxury       1.00      1.00      1.00         6
N - Passenger Van       1.00      1.00      1.00        11

         accuracy                           0.95        96
        macro avg       0.95      0.94      0.94        96
     weighted avg       0.95      0.95      0.95        96


Confusion Matrix:

[[ 8  0  0  0  0  0  0  0  0  0  0]
 [ 0  5  0  0  0  0  0  0  0  0  

Sonuç: Araç sınıfı (segment) tahmini için oluşturulan Random Forest sınıflandırma modeli, %95 doğruluk ile yüksek başarı göstermiştir. Özellikle “Compact”, “Medium” ve “Luxury” segmentleri %100 doğrulukla sınıflandırılmıştır. En çok karışıklık “Executive” sınıflarında yaşanmış, bunun sebebi büyük ihtimalle benzer batarya, hacim ve performans değerleridir. Modelin genel F1-skoru 0.95’tir ve segment tahmini için güçlü bir temel sunmaktadır.

In [46]:
# Örnek olarak test setinden 1 araç alalım.Chatgpt den
sample_index = 10  # herhangi bir index
sample_input = x_test.iloc[[sample_index]]  # çift köşeli çünkü 2D olmalı


In [47]:
true_label_encoded = y_test.iloc[sample_index]
true_label_str = le.inverse_transform([true_label_encoded])[0]

print(f"Gerçek segment: {true_label_str}")


Gerçek segment: JE - Executive


In [48]:
#Chatgpt den
pred_encoded = clf.predict(sample_input)[0]
pred_str = le.inverse_transform([pred_encoded])[0]

print(f"Tahmin edilen segment: {pred_str}")


Tahmin edilen segment: JE - Executive


In [50]:
print(f" Araç indexi: {sample_index}")
print(f" Gerçek sınıf:     {true_label_str}")
print(f" Tahmin edilen:   {pred_str}")


 Araç indexi: 10
 Gerçek sınıf:     JE - Executive
 Tahmin edilen:   JE - Executive


In [51]:
import joblib

joblib.dump(xgb_model, "xgb_model.pkl")
joblib.dump(rf_model, "rf_model.pkl")
joblib.dump(scaler, "scaler.pkl")
joblib.dump(le, "label_encoder.pkl")


['label_encoder.pkl']

In [52]:
from sklearn.ensemble import RandomForestClassifier
import joblib

# Doğru model türü: SINIFLANDIRICI
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(x_train, y_train)

# Doğru şekilde kaydet
joblib.dump(rf_model, "rf_model.pkl")


['rf_model.pkl']

In [53]:
x_train.columns

Index(['top_speed_kmh', 'battery_capacity_kWh', 'torque_nm', 'efficiency_wh_per_km',
       'acceleration_0_100_s', 'fast_charging_power_kw_dc', 'towing_capacity_kg', 'cargo_volume_l',
       'seats', 'volume_mm3', 'estimated_range', 'predicted_range_km', 'drivetrain_FWD',
       'drivetrain_RWD', 'car_body_type_Coupe', 'car_body_type_Hatchback',
       'car_body_type_Liftback Sedan', 'car_body_type_SUV', 'car_body_type_Sedan',
       'car_body_type_Small Passenger Van', 'car_body_type_Station/Estate'],
      dtype='object')