## Car Price Prediction

In this notebook, we build a model to predict the **price of a car** based on:  
- **Age** (years of use)  
- **Mileage** (total kilometers driven)  
- **Accident history** (whether the car had an accident or not)  

This is a **regression task**, since the target variable (price) is continuous.  
The goal is to understand how these features influence car value and to create a model that can estimate price for new cars.

In [117]:
# Importing libraries
import pandas as pd
import torch
from torch import nn

In [118]:
# Importing dataframe
df = pd.read_csv("./data/used_cars.csv")

df

Unnamed: 0,brand,model,model_year,milage,fuel_type,engine,transmission,ext_col,int_col,accident,clean_title,price
0,Ford,Utility Police Interceptor Base,2013,"51,000 mi.",E85 Flex Fuel,300.0HP 3.7L V6 Cylinder Engine Flex Fuel Capa...,6-Speed A/T,Black,Black,At least 1 accident or damage reported,Yes,"$10,300"
1,Hyundai,Palisade SEL,2021,"34,742 mi.",Gasoline,3.8L V6 24V GDI DOHC,8-Speed Automatic,Moonlight Cloud,Gray,At least 1 accident or damage reported,Yes,"$38,005"
2,Lexus,RX 350 RX 350,2022,"22,372 mi.",Gasoline,3.5 Liter DOHC,Automatic,Blue,Black,None reported,,"$54,598"
3,INFINITI,Q50 Hybrid Sport,2015,"88,900 mi.",Hybrid,354.0HP 3.5L V6 Cylinder Engine Gas/Electric H...,7-Speed A/T,Black,Black,None reported,Yes,"$15,500"
4,Audi,Q3 45 S line Premium Plus,2021,"9,835 mi.",Gasoline,2.0L I4 16V GDI DOHC Turbo,8-Speed Automatic,Glacier White Metallic,Black,None reported,,"$34,999"
...,...,...,...,...,...,...,...,...,...,...,...,...
4004,Bentley,Continental GT Speed,2023,714 mi.,Gasoline,6.0L W12 48V PDI DOHC Twin Turbo,8-Speed Automatic with Auto-Shift,C / C,Hotspur,None reported,Yes,"$349,950"
4005,Audi,S4 3.0T Premium Plus,2022,"10,900 mi.",Gasoline,349.0HP 3.0L V6 Cylinder Engine Gasoline Fuel,Transmission w/Dual Shift Mode,Black,Black,None reported,Yes,"$53,900"
4006,Porsche,Taycan,2022,"2,116 mi.",,Electric,Automatic,Black,Black,None reported,,"$90,998"
4007,Ford,F-150 Raptor,2020,"33,000 mi.",Gasoline,450.0HP 3.5L V6 Cylinder Engine Gasoline Fuel,A/T,Blue,Black,None reported,Yes,"$62,999"


In [119]:
# Preparing the age data
age = df["model_year"].max() - df["model_year"]

age

0       11
1        3
2        2
3        9
4        3
        ..
4004     1
4005     2
4006     2
4007     4
4008     4
Name: model_year, Length: 4009, dtype: int64

In [120]:
# Preparing the milage data
milage = df["milage"]
milage = milage.str.replace(",", "")
milage = milage.str.replace(" mi.", "")
milage = milage.astype(int)

milage

0       51000
1       34742
2       22372
3       88900
4        9835
        ...  
4004      714
4005    10900
4006     2116
4007    33000
4008    43000
Name: milage, Length: 4009, dtype: int64

In [121]:
# Preparing the accident data
accident = df["accident"] != "None reported"
accident = accident.astype(int)

accident

0       1
1       1
2       0
3       0
4       0
       ..
4004    0
4005    0
4006    0
4007    0
4008    1
Name: accident, Length: 4009, dtype: int64

In [122]:
# Preparing the price data (output)
price = df["price"]
price = price.str.replace("$", "")
price = price.str.replace(",", "")
price = price.astype(int)

price

0        10300
1        38005
2        54598
3        15500
4        34999
         ...  
4004    349950
4005     53900
4006     90998
4007     62999
4008     40000
Name: price, Length: 4009, dtype: int64

In [123]:
# Define the input tensor (X)
X = torch.column_stack([
    torch.tensor(age, dtype=torch.float32),
    torch.tensor(milage, dtype=torch.float32),
    torch.tensor(accident, dtype=torch.float32)
])

# Normalize X
X_mean = X.mean(axis=0)
X_std = X.std(axis=0)
X = (X - X_mean) / X_std

X

tensor([[ 0.4121, -0.2623,  1.6270],
        [-0.8984, -0.5732,  1.6270],
        [-1.0622, -0.8097, -0.6145],
        ...,
        [-1.0622, -1.1970, -0.6145],
        [-0.7346, -0.6065, -0.6145],
        [-0.7346, -0.4153,  1.6270]])

In [124]:
# Define the output tensor (y)
y = torch.tensor(price, dtype=torch.float32).reshape(-1, 1)

# Normalize y
y_mean = y.mean()
y_std = y.std()
y = (y - y_mean) / y_std

y

tensor([[-0.4352],
        [-0.0832],
        [ 0.1276],
        ...,
        [ 0.5901],
        [ 0.2343],
        [-0.0578]])

In [125]:
# Define model
model = nn.Linear(3, 1) # 3 inputs and 1 output

# Define loss function
criterion = nn.MSELoss()

# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [126]:
# Training loop
epochs = 1000

for epoch in range(epochs):
    optimizer.zero_grad()             # Resetting the gradient
    y_pred = model(X)                 # Calculating the model prediction
    loss = criterion(y_pred, y)       # Calculating the error
    loss.backward()                   # Calculates the gradients of the parameters with respect to the loss
    optimizer.step()                  # Update the model weights using the calculated gradients

    if epoch % 100 == 0:
        print(f"Loss: {loss.item()}") # Print loss 

Loss: 1.0453537702560425
Loss: 0.9103996753692627
Loss: 0.906356155872345
Loss: 0.905862033367157
Loss: 0.9057636260986328
Loss: 0.9057421684265137
Loss: 0.9057374000549316
Loss: 0.9057362675666809
Loss: 0.9057360887527466
Loss: 0.9057360887527466


In [127]:
# Creating data to evaluate
X_data = torch.tensor([
    [5, 10000, 0],
    [2, 10000, 0],
    [5, 20000, 1],
    [5, 20000, 0]
], dtype=torch.float32)

# Evaluating model
model.eval()
with torch.no_grad():
    prediction = model((X_data - X_mean) / X_std)
    print(prediction * y_std + y_mean)

tensor([[70222.7656],
        [70864.4844],
        [61760.9922],
        [65873.7344]])
