<a href="https://colab.research.google.com/github/elemnurguner/data-ai-projects/blob/main/%C4%B0kinciElAra%C3%A7FiyatTahmini(Used_Car_Price_Prediction).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

🚗 İkinci El Araç Fiyat Tahmini | Used Car Price Prediction
Makine Öğrenmesi ile İkinci El Araç Fiyat Tahmin Modeli
Used Car Price Prediction Model with Machine Learning

📌 Proje Özeti | Project Overview
Bu proje, ikinci el araçların özelliklerine göre fiyat tahmini yapan bir makine öğrenmesi modelini içerir. Kaggle Cars24 veri seti kullanılarak geliştirilmiştir.

This project contains a machine learning model that predicts used car prices based on their features. Developed using the Kaggle Cars24 dataset.

🌟 Öne Çıkan Özellikler | Key Features
Veri Ön İşleme

Eksik veri tamamlama

Kategorik değişken dönüşümü (One-Hot Encoding)

Aykırı değer temizleme

Özellik Mühendisliği

Araç yaşı hesaplama (2024 - Year)

Yıllık ortalama km (km_per_year)

Model Pipeline

Standart ölçeklendirme (StandardScaler)

XGBoost Regressor ile optimizasyon

Hiperparametre Ayarı

RandomizedSearchCV ile en iyi parametrelerin bulunması

🛠️ Kullanılan Teknolojiler | Technologies Used
Python 3.11

Kütüphaneler: Pandas, NumPy, Scikit-learn, XGBoost, Matplotlib, Seaborn

Araçlar: Google Colab, Jupyter Notebook



1. Veri Yükleme ve Keşifçi Veri Analizi (EDA)


In [7]:
import pandas as pd
import numpy as np

# Veriyi yükle
df = pd.read_csv("train-data.csv")

# 1.1. Birimleri Kaldırma ve Sayısala Çevirme
def clean_numeric(col, unit):
    return pd.to_numeric(
        df[col].str.replace(unit, '', regex=False).str.strip(),
        errors='coerce'
    )

df['Mileage'] = clean_numeric('Mileage', 'kmpl|km/kg')  # Hem kmpl hem km/kg için
df['Engine'] = clean_numeric('Engine', ' CC')
df['Power'] = clean_numeric('Power', ' bhp')

# 1.2. Eksik Veri Doldurma
df['Mileage'].fillna(df['Mileage'].median(), inplace=True)
df['Engine'].fillna(df['Engine'].median(), inplace=True)
df['Power'].fillna(df['Power'].median(), inplace=True)
df['Seats'].fillna(df['Seats'].mode()[0], inplace=True)

# 1.3. Yeni Özellikler
df['car_age'] = 2024 - df['Year']
df['km_per_year'] = df['Kilometers_Driven'] / df['car_age']

# 1.4. Aykırı Değer Temizleme (Price için)
Q1 = df['Price'].quantile(0.25)
Q3 = df['Price'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['Price'] >= (Q1 - 1.5*IQR)) & (df['Price'] <= (Q3 + 1.5*IQR))]

# 1.5. Kategorik Değişkenler
cat_cols = ['Fuel_Type', 'Transmission', 'Owner_Type', 'Location']
df = pd.get_dummies(df, columns=cat_cols, drop_first=True)

# 1.6. Gereksiz Sütunları Çıkar
df = df.drop(['Unnamed: 0', 'Name', 'New_Price'], axis=1)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Mileage'].fillna(df['Mileage'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Engine'].fillna(df['Engine'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which

In [9]:
from sklearn.model_selection import train_test_split

X = df.drop('Price', axis=1)
y = df['Price']

# Eğer ayrılmış train-test dosyanız varsa:
# train_df = pd.read_csv("train.csv")
# test_df = pd.read_csv("test.csv")
# X_train, X_test = train_df.drop('Price', axis=1), test_df.drop('Price', axis=1)
# y_train, y_test = train_df['Price'], test_df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, r2_score

numeric_features = ['Year', 'Kilometers_Driven', 'Engine', 'Power', 'Seats', 'car_age', 'km_per_year']

model = Pipeline([
    ('scaler', StandardScaler()),
    ('xgb', XGBRegressor(
        n_estimators=300,
        max_depth=7,
        learning_rate=0.05,
        subsample=0.8,
        random_state=42
    ))
])

model.fit(X_train, y_train)

# Test Performansı
y_pred = model.predict(X_test)
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f} Lakh")
print(f"R2: {r2_score(y_test, y_pred):.2f}")

  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count


MAE: 0.82 Lakh
R2: 0.89


In [11]:
from sklearn.model_selection import RandomizedSearchCV

param_grid = {
    'xgb__n_estimators': [200, 300, 400],
    'xgb__max_depth': [5, 7, 9],
    'xgb__learning_rate': [0.03, 0.05, 0.07],
}

search = RandomizedSearchCV(
    model,
    param_grid,
    cv=3,
    scoring='r2',
    n_iter=10,
    random_state=42
)

search.fit(X_train, y_train)
print("En İyi Parametreler:", search.best_params_)

  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_samp

En İyi Parametreler: {'xgb__n_estimators': 400, 'xgb__max_depth': 5, 'xgb__learning_rate': 0.05}


In [12]:
new_car = {
    'Year': 2019,
    'Kilometers_Driven': 45000,
    'Fuel_Type_Diesel': 1,
    'Transmission_Manual': 1,
    'Owner_Type_First Owner': 1,
    'Engine': 1498,
    'Power': 110,
    'Seats': 5,
    'car_age': 5,
    'km_per_year': 9000,
    'Location_Mumbai': 1
}

new_df = pd.DataFrame([new_car])

# Eksik sütunları tamamla
missing_cols = set(X_train.columns) - set(new_df.columns)
for col in missing_cols:
    new_df[col] = 0

pred_price = search.best_estimator_.predict(new_df[X_train.columns])
print(f"Tahmini Fiyat: ₹{pred_price[0]*100000:.2f}")  # Lakh to Rupees

Tahmini Fiyat: ₹964758.00


Veri Seti Yapısı Kontrolü:



In [14]:
# Eğitim verisinin son halini göster
print("\nEğitim verisi örnek satır:")
print(X_train.iloc[0])

# Yeni veriyi karşılaştır
print("\nYeni veri:")
print(new_df)


Eğitim verisi örnek satır:
Year                                2013
Kilometers_Driven                  38998
Mileage                              NaN
Engine                            1995.0
Power                              181.0
Seats                                4.0
car_age                               11
km_per_year                  3545.272727
Fuel_Type_Diesel                    True
Fuel_Type_Electric                 False
Fuel_Type_LPG                      False
Fuel_Type_Petrol                   False
Transmission_Manual                False
Owner_Type_Fourth & Above          False
Owner_Type_Second                  False
Owner_Type_Third                   False
Location_Bangalore                 False
Location_Chennai                   False
Location_Coimbatore                False
Location_Delhi                     False
Location_Hyderabad                 False
Location_Jaipur                    False
Location_Kochi                     False
Location_Kolkata             