# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import pickle
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

from scipy.stats import zscore

### 2. Load the dataset

In [5]:
df = pd.read_csv('car_data.csv')
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [7]:
def basic_info(df):
    print(df.shape)
    print(df.info())
basic_info(df)

(301, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


In [9]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,301.0,2013.627907,2.891554,2003.0,2012.0,2014.0,2016.0,2018.0
Selling_Price,301.0,4.661296,5.082812,0.1,0.9,3.6,6.0,35.0
Present_Price,301.0,7.628472,8.644115,0.32,1.2,6.4,9.9,92.6
Kms_Driven,301.0,36947.20598,38886.883882,500.0,15000.0,32000.0,48767.0,500000.0
Owner,301.0,0.043189,0.247915,0.0,0.0,0.0,0.0,3.0


In [11]:
df.describe(include='O')

Unnamed: 0,Car_Name,Fuel_Type,Seller_Type,Transmission
count,301,301,301,301
unique,98,3,2,2
top,city,Petrol,Dealer,Manual
freq,26,239,195,261


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [13]:
df.duplicated().sum()

2

In [15]:
df.drop_duplicates(inplace=True)

In [17]:
df.duplicated().sum()

0

### 5. Drop the columns which you think redundant for the analysis.

In [19]:
df.drop('Car_Name', inplace=True, axis =1)

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [21]:
df['age_of_the_car'] = 2024 - df['Year']

In [23]:
df.drop('Year', axis =1, inplace=True)

In [25]:
df.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,10
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,11
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,7
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,13
4,4.6,6.87,42450,Diesel,Dealer,Manual,0,10


In [27]:
df['Owner'].value_counts()

Owner
0    288
1     10
3      1
Name: count, dtype: int64

In [29]:
cat_feature = df.select_dtypes(include='O')
for col in cat_feature:
    print(df[col].value_counts(), '\n')

Fuel_Type
Petrol    239
Diesel     58
CNG         2
Name: count, dtype: int64 

Seller_Type
Dealer        193
Individual    106
Name: count, dtype: int64 

Transmission
Manual       260
Automatic     39
Name: count, dtype: int64 



### 7. Encode the categorical columns

In [31]:
le = LabelEncoder()

cat_data = df.select_dtypes(include='O')
for col in cat_data:
    df[col] = le.fit_transform(df[col])
    print(df[col].value_counts(), '\n')

df.head()

Fuel_Type
2    239
1     58
0      2
Name: count, dtype: int64 

Seller_Type
0    193
1    106
Name: count, dtype: int64 

Transmission
1    260
0     39
Name: count, dtype: int64 



Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,2,0,1,0,10
1,4.75,9.54,43000,1,0,1,0,11
2,7.25,9.85,6900,2,0,1,0,7
3,2.85,4.15,5200,2,0,1,0,13
4,4.6,6.87,42450,1,0,1,0,10


### 8. Separate the target and independent features.

In [33]:
x = df.drop('Selling_Price', axis =1)
y = df['Selling_Price']

In [None]:
# x = x.apply(zscore)

### 9. Split the data into train and test.

In [35]:

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=42)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [37]:
model = RandomForestRegressor(n_estimators=4, random_state=42)
model = model.fit(x_train, y_train)

In [39]:
test_pred = model.predict(x_test)
r2_score(y_test, test_pred)

0.6401646378378523

In [41]:
model.score(x_test, y_test)

0.6401646378378523

### 11. Create a pickle file with an extension as .pkl

In [45]:
pickle.dump(model, open('model.pkl', 'wb'))

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

In [None]:
from flask import Flask, render_template, request
import pickle
import numpy as np
from werkzeug.exceptions import HTTPException

app = Flask(__name__)

model = pickle.load(open('model.pkl', 'rb')) ## rb :- read binary


@app.route('/')
def home():
    return render_template('home.html')

@app.route('/predict', methods=['POST'])
def predict():
    try:
        data1 = request.form['present_price']
        data2 = request.form['kms_driven']
        data3 = request.form['fuel_type']
        data4 = request.form['seller_type']
        data5 = request.form['transmission']
        data6 = request.form['owner']
        data7 = request.form['age_of_car']
        
        if not all([data1, data2, data3, data4, data5, data6, data7]):
            raise ValueError("All fields must be filled out.")
        
        arr = np.array([[data1, data2, data3, data4, data5, data6, data7]], dtype=float)
        pred = model.predict(arr)
        return render_template('predict.html', data=cleanvalue(pred))
    
    except ValueError as e:
        return render_template('400.html', error_message="Invalid input type. Please enter valid numbers."), 400

    except Exception as e:
        return render_template('500.html'), 500

def cleanvalue(val):
    val = str(val)
    return val[1:len(val)-1]

@app.errorhandler(500)
def internal_server_error(e):
    return render_template('500.html'), 500

if __name__ == '__main__':
    app.run(debug=False)

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

### Happy Learning :)