# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [31]:
# pip show scikit-learn


Name: scikit-learn
Version: 1.2.2
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author: 
Author-email: 
License: new BSD
Location: /opt/anaconda3/lib/python3.11/site-packages
Requires: joblib, numpy, scipy, threadpoolctl
Required-by: imbalanced-learn
Note: you may need to restart the kernel to use updated packages.


In [32]:
# pip show scikit-learn

Name: scikit-learn
Version: 1.2.2
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author: 
Author-email: 
License: new BSD
Location: /opt/anaconda3/lib/python3.11/site-packages
Requires: joblib, numpy, scipy, threadpoolctl
Required-by: imbalanced-learn
Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.preprocessing import LabelEncoder
import pickle


### 2. Load the dataset

In [2]:
# Load the dataset

df = pd.read_csv('car+data.csv')
df

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0
...,...,...,...,...,...,...,...,...,...
296,city,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0
297,brio,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0
298,city,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0
299,city,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [3]:
# Check the shape of the dataset
print(df.shape)

# Check basic info and null values
print(df.info())

# # Check the first few rows of the dataset
df.head()


(301, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [20]:
# Check for duplicates
duplicates = df.duplicated().sum()
print(f"Number of duplicates: {duplicates}")

# Drop duplicates
df = df.drop_duplicates()


Number of duplicates: 2


### 5. Drop the columns which you think redundant for the analysis.

In [4]:
df = df.drop(columns=['Car_Name'])

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [5]:
# Calculate age of the car
df['age_of_the_car'] = 2024 - df['Year']

# Drop the 'Year' column
df = df.drop(columns=['Year'])
df

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,10
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,11
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,7
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,13
4,4.60,6.87,42450,Diesel,Dealer,Manual,0,10
...,...,...,...,...,...,...,...,...
296,9.50,11.60,33988,Diesel,Dealer,Manual,0,8
297,4.00,5.90,60000,Petrol,Dealer,Manual,0,9
298,3.35,11.00,87934,Petrol,Dealer,Manual,0,15
299,11.50,12.50,9000,Diesel,Dealer,Manual,0,7


### 7. Encode the categorical columns

In [6]:
le = LabelEncoder()

# Encode 'Fuel_Type', 'Seller_Type', 'Transmission'
df['Fuel_Type'] = le.fit_transform(df['Fuel_Type'])
df['Seller_Type'] = le.fit_transform(df['Seller_Type'])
df['Transmission'] = le.fit_transform(df['Transmission'])


### 8. Separate the target and independent features.

In [7]:
# Separate target variable (Selling Price) and features
X = df.drop(columns=['Selling_Price'])
y = df['Selling_Price']

### 9. Split the data into train and test.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [9]:
# Build and train Random Forest Regressor
model = RandomForestRegressor()
model.fit(X_train, y_train)

# Check the R2 score for training and test sets
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

train_score = r2_score(y_train, y_train_pred)
test_score = r2_score(y_test, y_test_pred)

print(f"R2 Score - Training: {train_score}")
print(f"R2 Score - Testing: {test_score}")


R2 Score - Training: 0.9857813344912233
R2 Score - Testing: 0.9549145293339902


### 11. Create a pickle file with an extension as .pkl

In [10]:
import pickle

# Save the model to a pickle file
with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

In [None]:
python3 -m venv venv
source venv/bin/activate
pip install flask scikit-learn pandas numpy


### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [None]:
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Car Price Prediction</title>
</head>
<body>
    <h1>Predict the Selling Price of a Used Car</h1>
    <form action="/predict" method="POST">
        <label for="present_price">Present Price (Lakhs):</label>
        <input type="text" name="present_price" required><br>

        <label for="kms_driven">Kms Driven:</label>
        <input type="text" name="kms_driven" required><br>

        <label for="fuel_type">Fuel Type (0: Diesel, 1: Petrol, 2: CNG):</label>
        <input type="text" name="fuel_type" required><br>

        <label for="seller_type">Seller Type (0: Dealer, 1: Individual):</label>
        <input type="text" name="seller_type" required><br>

        <label for="transmission">Transmission (0: Manual, 1: Automatic):</label>
        <input type="text" name="transmission" required><br>

        <label for="owner">Owner (0: First, 1: Second, 2: Third):</label>
        <input type="text" name="owner" required><br>

        <label for="age_of_the_car">Age of the Car:</label>
        <input type="text" name="age_of_the_car" required><br>

        <button type="submit">Predict</button>
    </form>
</body>
</html>


### b) Create app.py file and write the predict function

In [34]:
from flask import Flask, request, render_template
import pickle
import numpy as np

app = Flask(__name__)

# Load the trained model
model = pickle.load(open('model.pkl', 'rb'))

@app.route('/')
def home():
    return render_template('index.html')

@app.route('/predict', methods=['POST'])
def predict():
    try:
        # Get form data
        present_price = float(request.form['present_price'])
        kms_driven = float(request.form['kms_driven'])
        fuel_type = int(request.form['fuel_type'])
        seller_type = int(request.form['seller_type'])
        transmission = int(request.form['transmission'])
        owner = int(request.form['owner'])
        age_of_the_car = int(request.form['age_of_the_car'])

        # Arrange the input data in the same order as training features
        input_features = np.array([[present_price, kms_driven, fuel_type, seller_type, transmission, owner, age_of_the_car]])

        # Make the prediction
        prediction = model.predict(input_features)

        # Return the result
        return f'The predicted selling price of the car is {prediction[0]:.2f} Lakhs'

    except Exception as e:
        return f"Error: {str(e)}"

if __name__ == '__main__':
    app.run(debug=True)


### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

In [None]:
Predict the Selling Price of a Used Car
Present Price (Lakhs): 
1000000

Kms Driven: 
200000

Fuel Type (0: Diesel, 1: Petrol, 2: CNG): 
0

Seller Type (0: Dealer, 1: Individual): 
1

Transmission (0: Manual, 1: Automatic): 
1

Owner (0: First, 1: Second, 2: Third): 
2

Age of the Car: 
5

Predict

The predicted selling price of the car is 28.98 Lakhs

### Happy Learning :)