# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [21]:
#Load libraries...
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import Normalizer
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
##from sklearn.metrics import silhouette_score
from sklearn.metrics import r2_score

from scipy.cluster.hierarchy import ward
from scipy.cluster.hierarchy import fcluster
from scipy.cluster.hierarchy import dendrogram
from scipy.cluster.hierarchy import linkage
from scipy.spatial.distance import pdist
from scipy.spatial import distance
from scipy.stats import zscore

from sklearn.ensemble import RandomForestRegressor

import pickle


### 2. Load the dataset

In [22]:
df = pd.read_csv("car+data.csv")
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [23]:
df.shape

(301, 9)

### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [24]:
df.duplicated().sum()

2

In [25]:
df = df.drop_duplicates()

In [27]:
df.duplicated().sum()

0

### 5. Drop the columns which you think redundant for the analysis.

In [28]:
df.sample(5)

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
120,Bajaj Pulsar RS200,2016,1.05,1.26,5700,Petrol,Individual,Manual,0
21,ignis,2017,4.9,5.71,2400,Petrol,Dealer,Manual,0
210,i10,2012,3.1,4.6,35775,Petrol,Dealer,Manual,0
285,jazz,2016,7.4,8.5,15059,Petrol,Dealer,Automatic,0
224,verna,2013,5.11,9.4,36198,Petrol,Dealer,Automatic,0


In [29]:
df.drop(['Seller_Type', 'Owner'], inplace=True, axis=1)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 299 entries, 0 to 300
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       299 non-null    object 
 1   Year           299 non-null    int64  
 2   Selling_Price  299 non-null    float64
 3   Present_Price  299 non-null    float64
 4   Kms_Driven     299 non-null    int64  
 5   Fuel_Type      299 non-null    object 
 6   Transmission   299 non-null    object 
dtypes: float64(2), int64(2), object(3)
memory usage: 18.7+ KB


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [30]:
df['age_of_the_car'] = 2023 -df['Year']
df.sample(5)

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
157,Yamaha FZ S V 2.0,2015,0.48,0.84,23000,Petrol,Manual,8
215,verna,2012,4.5,9.4,36100,Petrol,Manual,11
229,i20,2012,3.1,6.79,52132,Diesel,Manual,11
278,jazz,2016,6.0,8.4,4000,Petrol,Manual,7
30,ritz,2012,3.1,5.98,51439,Diesel,Manual,11


In [31]:
df.drop(['Year'], inplace=True, axis=1)
df.sample(5)

Unnamed: 0,Car_Name,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
251,city,5.0,9.9,56701,Petrol,Manual,10
61,etios cross,4.5,7.7,40588,Petrol,Manual,8
219,verna,4.5,9.4,36000,Petrol,Manual,11
28,alto k10,1.95,3.95,44542,Petrol,Manual,13
112,KTM 390 Duke,1.15,2.4,7000,Petrol,Manual,9


### 7. Encode the categorical columns

In [32]:
numerical_columns = df.select_dtypes(include=[np.number]).columns.tolist()
categorical_columns = df.select_dtypes(include=[object]).columns.tolist()
print('Numerical Columns: ', numerical_columns)
print('~'*50)
print('Categorical Columns: ', categorical_columns)
print('~'*50)

Numerical Columns:  ['Selling_Price', 'Present_Price', 'Kms_Driven', 'age_of_the_car']
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Categorical Columns:  ['Car_Name', 'Fuel_Type', 'Transmission']
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In [33]:
df.isnull().sum().sum()

0

In [34]:
df[categorical_columns] = df[categorical_columns].apply(lambda x: x.fillna(x.mode().iloc[0]))
df.sample(10)

Unnamed: 0,Car_Name,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
187,Honda CB twister,0.25,0.51,32000,Petrol,Manual,10
221,i20,4.5,6.79,32000,Petrol,Automatic,10
117,Royal Enfield Thunder 500,1.1,1.9,14000,Petrol,Manual,8
158,Honda Dream Yuga,0.48,0.54,8600,Petrol,Manual,6
229,i20,3.1,6.79,52132,Diesel,Manual,11
7,s cross,6.5,8.61,33429,Diesel,Manual,8
66,innova,19.75,23.15,11000,Petrol,Automatic,6
14,dzire,2.25,7.21,77427,Petrol,Manual,14
175,Hero Honda CBZ extreme,0.38,0.787,75000,Petrol,Manual,12
41,alto k10,2.55,3.98,46706,Petrol,Manual,9


### 8. Separate the target and independent features.

In [35]:
df.sample(10)

Unnamed: 0,Car_Name,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
243,i20,6.25,7.6,7000,Petrol,Manual,7
203,i10,2.95,4.6,53460,Petrol,Manual,12
255,brio,3.0,5.35,53675,Petrol,Manual,11
177,Honda Activa 125,0.35,0.57,24000,Petrol,Automatic,7
72,corolla altis,7.45,18.61,56001,Petrol,Manual,10
64,fortuner,33.0,36.23,6000,Diesel,Automatic,6
268,brio,4.8,5.8,19000,Petrol,Manual,6
284,brio,3.5,5.9,9800,Petrol,Manual,10
1,sx4,4.75,9.54,43000,Diesel,Manual,10
52,innova,18.0,19.77,15000,Diesel,Automatic,6


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 299 entries, 0 to 300
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Car_Name        299 non-null    object 
 1   Selling_Price   299 non-null    float64
 2   Present_Price   299 non-null    float64
 3   Kms_Driven      299 non-null    int64  
 4   Fuel_Type       299 non-null    object 
 5   Transmission    299 non-null    object 
 6   age_of_the_car  299 non-null    int64  
dtypes: float64(2), int64(2), object(3)
memory usage: 18.7+ KB


In [37]:
x = df.drop(['Car_Name', 'Fuel_Type', 'Transmission'],axis=1)
y = df['Selling_Price']

In [38]:
x.info()

<class 'pandas.core.frame.DataFrame'>
Index: 299 entries, 0 to 300
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Selling_Price   299 non-null    float64
 1   Present_Price   299 non-null    float64
 2   Kms_Driven      299 non-null    int64  
 3   age_of_the_car  299 non-null    int64  
dtypes: float64(2), int64(2)
memory usage: 11.7 KB


In [39]:
y.info()

<class 'pandas.core.series.Series'>
Index: 299 entries, 0 to 300
Series name: Selling_Price
Non-Null Count  Dtype  
--------------  -----  
299 non-null    float64
dtypes: float64(1)
memory usage: 4.7 KB


In [40]:
y.unique()

array([ 3.35,  4.75,  7.25,  2.85,  4.6 ,  9.25,  6.75,  6.5 ,  8.75,
        7.45,  6.85,  7.5 ,  6.1 ,  2.25,  7.75,  3.25,  2.65,  4.9 ,
        4.4 ,  2.5 ,  2.9 ,  3.  ,  4.15,  6.  ,  1.95,  3.1 ,  2.35,
        4.95,  5.5 ,  2.95,  4.65,  0.35,  5.85,  2.55,  1.25,  1.05,
        5.8 , 14.9 , 23.  , 18.  , 16.  ,  2.75,  3.6 ,  4.5 ,  4.1 ,
       19.99,  6.95, 18.75, 23.5 , 33.  , 19.75,  4.35, 14.25,  3.95,
        1.5 ,  5.25, 14.5 , 14.73, 12.5 ,  3.49, 35.  ,  5.9 ,  3.45,
        3.8 , 11.25,  3.51,  4.  , 20.75, 17.  ,  7.05,  9.65,  1.75,
        1.7 ,  1.65,  1.45,  1.35,  1.2 ,  1.15,  1.11,  1.1 ,  1.  ,
        0.95,  0.9 ,  0.75,  0.8 ,  0.78,  0.72,  0.65,  0.6 ,  0.55,
        0.52,  0.51,  0.5 ,  0.48,  0.45,  0.42,  0.4 ,  0.38,  0.31,
        0.3 ,  0.27,  0.25,  0.2 ,  0.18,  0.17,  0.16,  0.15,  0.12,
        0.1 ,  5.75,  5.15,  7.9 ,  4.85, 11.75,  3.15,  6.45,  3.5 ,
        8.25,  5.11,  2.7 ,  6.15, 11.45,  3.9 ,  9.1 ,  4.8 ,  2.  ,
        5.35,  6.25,

### 9. Split the data into train and test.

In [42]:
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=0)

print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)

(209, 4) (90, 4)
(209,) (90,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [43]:
rf_reg = RandomForestRegressor(n_estimators=100, max_depth=3)
rf_reg = rf_reg.fit(X_train, y_train)
rf_reg_score = rf_reg.score(X_test, y_test)
print(rf_reg_score)

0.9542424154759371


### 11. Create a pickle file with an extension as .pkl

In [44]:
with open('model.pkl', 'wb') as file: 
    pickle.dump(rf_reg_score, file) 

In [45]:
model = pickle.load(open('model.pkl', 'rb')) 

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

### Happy Learning :)