# üöó Advanced Car Price Prediction Model Training

This notebook implements the advanced machine learning pipeline for the car price prediction application.

### Steps:
1. **Load Data**: Import the enhanced dataset (602+ records).
2. **Feature Engineering**: Calculate `Car_Age`, `Depreciation`, etc.
3. **Preprocessing**: Encode categorical variables and scale numerical features.
4. **Training**: Train a Voting Regressor (RandomForest + GradientBoosting).
5. **Evaluation**: Check accuracy and error metrics.
6. **Save**: Export the model and scaler for the Flask app.

In [None]:
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, VotingRegressor
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Pandas display options
pd.set_option('display.max_columns', None)
print("‚úÖ Libraries loaded successfully")

### 1. Load Data

In [None]:
# Load the dataset
# Assuming dataset is in the parent directory 'ML/car_data.csv'
csv_path = '../car_data.csv'

try:
    df = pd.read_csv(csv_path)
    print(f"‚úÖ Data loaded: {df.shape[0]} records")
    display(df.head())
except FileNotFoundError:
    print("‚ùå Error: car_data.csv not found in parent directory. Please check the path.")

### 2. Feature Engineering

In [None]:
# Current Year
current_year = 2024

# 1. Car Age
df['Car_Age'] = current_year - df['Year']

# 2. Price Ratio (Depreciation info)
# Avoid division by zero
df['Price_Per_Year'] = df['Present_Price'] / (df['Car_Age'] + 1)

# 3. Usage Intensity
df['Kms_Per_Year'] = df['Kms_Driven'] / (df['Car_Age'] + 1)

print("‚úÖ Feature Engineering Complete")
df[['Year', 'Car_Age', 'Present_Price', 'Price_Per_Year', 'Kms_Driven', 'Kms_Per_Year']].head()

### 3. Encoding Categorical Variables

In [None]:
# Initialize Encoders
car_name_encoder = LabelEncoder()
fuel_type_encoder = LabelEncoder()
seller_type_encoder = LabelEncoder()
transmission_encoder = LabelEncoder()

# Fit Encoders
df['Car_Name_Encoded'] = car_name_encoder.fit_transform(df['Car_Name'])
df['Fuel_Type_Encoded'] = fuel_type_encoder.fit_transform(df['Fuel_Type'])
df['Seller_Type_Encoded'] = seller_type_encoder.fit_transform(df['Seller_Type'])
df['Transmission_Encoded'] = transmission_encoder.fit_transform(df['Transmission'])

print("‚úÖ Encoding Complete")

### 4. Prepare Training Data

In [None]:
# Select Features and Target
# CRITICAL: Must match the order expected by the Flask app
feature_cols = [
    'Year', 'Present_Price', 'Kms_Driven', 'Owner',
    'Car_Name_Encoded', 'Fuel_Type_Encoded', 'Seller_Type_Encoded', 'Transmission_Encoded',
    'Car_Age', 'Price_Per_Year', 'Kms_Per_Year'
]

X = df[feature_cols]
y = df['Selling_Price']

# Scale Features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")

### 5. Advanced Model Training (Ensemble)

In [None]:
# Define base models
rf_model = RandomForestRegressor(n_estimators=200, random_state=42)
gb_model = GradientBoostingRegressor(n_estimators=200, random_state=42)

# Create Ensemble Voting Regressor
voting_model = VotingRegressor(estimators=[
    ('rf', rf_model),
    ('gb', gb_model)
])

# Train Model
print("‚è≥ Training Voting Regressor...")
voting_model.fit(X_train, y_train)
print("‚úÖ Training Complete!")

### 6. Evaluation

In [None]:
# Make Predictions
y_pred = voting_model.predict(X_test)

# Calculate Metrics
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print("=" * 30)
print(f"üöÄ Model Accuracy (R¬≤): {r2*100:.2f}%")
print(f"üìâ Mean Absolute Error: ${mae:.2f}")
print(f"üìâ Root Mean Sq Error:  ${rmse:.2f}")
print("=" * 30)

### 7. Save Model Artifacts

In [None]:
# Save all necessary files for the Flask App
print("üíæ Saving model artifacts...")

with open('model.pkl', 'wb') as f:
    pickle.dump(voting_model, f)
    
with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

# Save Encoders
encoders = {
    'car_name_encoder.pkl': car_name_encoder,
    'fuel_type_encoder.pkl': fuel_type_encoder,
    'seller_type_encoder.pkl': seller_type_encoder,
    'transmission_encoder.pkl': transmission_encoder
}

for filename, encoder in encoders.items():
    with open(filename, 'wb') as f:
        pickle.dump(encoder, f)

print("‚úÖ All files saved successfully! Ready for Flask App.")