# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
# Create ML model

In [3]:
# import libraries
# numpy: For numerical operations and handling arrays.
# pandas: For data manipulation and analysis.
# pickle: For saving and loading machine learning models.    
import numpy as np
import pandas as pd
import pickle

### 2. Load the dataset

In [4]:
# load the dataset
# andshow the first five rows, to you verify correct loading of the dataset.
df = pd.read_csv('cardata.csv')
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [5]:
# Check the shape of the dataset
print("Shape of the dataset:", df.shape)

# Check basic information about the dataset
df.info()

Shape of the dataset: (301, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [6]:
# Check for duplicates
duplicate_count = df.duplicated().sum()
print(f"Number of duplicate records: {duplicate_count}")

# Drop duplicates if present
if duplicate_count > 0:
    df = df.drop_duplicates()
    print("Duplicates have been dropped.")

# Confirm the shape of the dataset after dropping duplicates
print("Shape of the dataset after removing duplicates:", df.shape)

Number of duplicate records: 2
Duplicates have been dropped.
Shape of the dataset after removing duplicates: (299, 9)


### 5. Drop the columns which you think redundant for the analysis.

In [7]:
# Drop columns that are not relevant for the analysis
columns_to_drop = ['Car_Name'] 
df = df.drop(columns=columns_to_drop)

# Verify the columns after dropping
print("Columns after dropping redundant ones:", df.columns)

Columns after dropping redundant ones: Index(['Year', 'Selling_Price', 'Present_Price', 'Kms_Driven', 'Fuel_Type',
       'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [8]:
# Calculate the current year
current_year = 2024  # Set to the current year or use datetime for dynamic year extraction

# Create a new feature 'age_of_the_car' based on 'Year'
df['age_of_the_car'] = current_year - df['Year']

# Drop the 'Year' column
df = df.drop('Year', axis=1)

# Display the first few rows to verify the new column
df.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,10
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,11
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,7
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,13
4,4.6,6.87,42450,Diesel,Dealer,Manual,0,10


### 7. Encode the categorical columns

In [10]:
from sklearn.preprocessing import LabelEncoder

# Define the categorical columns
categorical_cols = ['Fuel_Type', 'Seller_Type', 'Transmission']

# Apply Label Encoding to binary columns
label_encoder = LabelEncoder()
for col in categorical_cols:
    df[col] = label_encoder.fit_transform(df[col])

# Display the first few rows after label encoding
df.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,2,0,1,0,10
1,4.75,9.54,43000,1,0,1,0,11
2,7.25,9.85,6900,2,0,1,0,7
3,2.85,4.15,5200,2,0,1,0,13
4,4.6,6.87,42450,1,0,1,0,10


### 8. Separate the target and independent features.

In [11]:
# Define the independent features (X) and the target variable (y)
X = df.drop('Selling_Price', axis=1)  # Independent features (drop the target column)
y = df['Selling_Price']  # Target variable

# Display the shapes of X and y to confirm
print(f"Shape of independent features (X): {X.shape}")
print(f"Shape of target variable (y): {y.shape}")

Shape of independent features (X): (299, 7)
Shape of target variable (y): (299,)


### 9. Split the data into train and test.

In [12]:
from sklearn.model_selection import train_test_split

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the train and test sets to confirm
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of y_test: {y_test.shape}")

Shape of X_train: (239, 7)
Shape of X_test: (60, 7)
Shape of y_train: (239,)
Shape of y_test: (60,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [13]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

# Initialize the Random Forest Regressor model
rf_model = RandomForestRegressor(random_state=42)

# Train the model on the training data
rf_model.fit(X_train, y_train)

# Predict on the training set and calculate the R2 score
y_train_pred = rf_model.predict(X_train)
train_r2_score = r2_score(y_train, y_train_pred)

# Predict on the testing set and calculate the R2 score
y_test_pred = rf_model.predict(X_test)
test_r2_score = r2_score(y_test, y_test_pred)

# Print the R2 scores for training and testing sets
print(f"R2 Score for Training Set: {train_r2_score:.4f}")
print(f"R2 Score for Testing Set: {test_r2_score:.4f}")

R2 Score for Training Set: 0.9856
R2 Score for Testing Set: 0.5777


### 11. Create a pickle file with an extension as .pkl

In [14]:
# Create a pickle file for the Random Forest Regressor model
with open('random_forest_model.pkl', 'wb') as file:
    pickle.dump(rf_model, file)

print("Model has been saved as random_forest_model.pkl")

Model has been saved as random_forest_model.pkl


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [None]:
# Refer the html fie in templates/index.html

### b) Create app.py file and write the predict function

In [None]:
# Refer the html file app.py in Lab5solution - "Normal root" directory

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

# OUPUT SCREEN (Captured Text)

## Used Car Price Prediction

Present Price (in lakhs):<br>
Kms Driven:<br>
Number of Owners:<br>
Age of the Car:<br>
Fuel Type (0 for Diesel, 1 for Petrol):<br>
Seller Type (0 for Dealer, 1 for Individual):<br>
Transmission Type (0 for Manual, 1 for Automatic):<br>

---

### **Inputs:**

- **Present Price**: ₹1500000 lakhs  
- **Kms Driven**: 500000 km  
- **Number of Owners**: 3  
- **Age of the Car**: 7 years  
- **Fuel Type**: Diesel  
- **Seller Type**: Individual  
- **Transmission Type**: Manual  

---

### **Predicted Selling Price of the Car: ₹12.56 lakhs**



### Happy Learning :)