# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import pickle

### 2. Load the dataset

In [2]:
df = pd.read_csv(r"C:\Users\SANATH\Downloads\car+data.csv")
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [3]:
df.shape

(301, 9)

- Number of Rows = 301
- Number of Columns = 9

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


- The DataFrame have 4 Categorical columns and 5 Numerical columns

In [5]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,301.0,2013.627907,2.891554,2003.0,2012.0,2014.0,2016.0,2018.0
Selling_Price,301.0,4.661296,5.082812,0.1,0.9,3.6,6.0,35.0
Present_Price,301.0,7.628472,8.644115,0.32,1.2,6.4,9.9,92.6
Kms_Driven,301.0,36947.20598,38886.883882,500.0,15000.0,32000.0,48767.0,500000.0
Owner,301.0,0.043189,0.247915,0.0,0.0,0.0,0.0,3.0


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [6]:
df[df.duplicated()]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
17,ertiga,2016,7.75,10.79,43000,Diesel,Dealer,Manual,0
93,fortuner,2015,23.0,30.61,40000,Diesel,Dealer,Automatic,0


In [7]:
df.drop_duplicates(inplace=True)

In [8]:
df[df.duplicated()]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner


### 5. Drop the columns which you think redundant for the analysis.

In [9]:
df.drop('Car_Name',axis=1,inplace=True)
df.head()

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [10]:
current_year = 2023
df['age_of_the_car'] = current_year - df['Year']

In [11]:
df.drop('Year',axis=1,inplace=True)

In [12]:
df.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,9
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,10
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,6
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,12
4,4.6,6.87,42450,Diesel,Dealer,Manual,0,9


### 7. Encode the categorical columns

In [13]:
label_encoder = LabelEncoder()

categorical_columns = ['Fuel_Type', 'Seller_Type', 'Transmission']

for column in categorical_columns:
    df[column] = label_encoder.fit_transform(df[column])

In [14]:
df.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,2,0,1,0,9
1,4.75,9.54,43000,1,0,1,0,10
2,7.25,9.85,6900,2,0,1,0,6
3,2.85,4.15,5200,2,0,1,0,12
4,4.6,6.87,42450,1,0,1,0,9


### 8. Separate the target and independent features.

In [16]:
X = df.drop('Selling_Price', axis=1)
Y = df['Selling_Price']

In [20]:
X.head()

Unnamed: 0,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,5.59,27000,2,0,1,0,9
1,9.54,43000,1,0,1,0,10
2,9.85,6900,2,0,1,0,6
3,4.15,5200,2,0,1,0,12
4,6.87,42450,1,0,1,0,9


### 9. Split the data into train and test.

In [17]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [18]:
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train,Y_train)

RandomForestRegressor(random_state=42)

In [19]:
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)

train_r2 = r2_score(Y_train, train_predictions)
test_r2 = r2_score(Y_test, test_predictions)

print(f"Train R2 Score: {train_r2}")
print(f"Test R2 Score: {test_r2}")

Train R2 Score: 0.9856305226349797
Test R2 Score: 0.5777259891991424


### 11. Create a pickle file with an extension as .pkl

In [None]:
with open("model.pkl", "wb") as model_file:
    pickle.dump(model, model_file)


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [None]:
<!DOCTYPE html>
<html>
<head>
    <title>Car Price Predictor</title>
</head>
<body>
    <h1>Car Price Predictor</h1>
    <form action="/predict" method="POST">
        <label for="present_price">Present Price</label>
        <input type="number" name="Present_Price" required><br><br>

        <label for="age_of_the_car">age_of_the_car</label>
        <input type="number" name="age_of_the_car" required><br><br>

        <label for="fuel_type">Fuel Type:<select name="Fuel_Type" required>
            <option value="0">Petrol</option>
            <option value="1">Diesel</option>
            <option value="2">CNG</option>
        </select></label>

        <label for="kms_driven">Kilometers Driven:</label>
        <input type="number" name="Kms_Driven" required><br><br>

        <label for="seller_type">Seller Type:<select name="Seller_Type" required>
            <option value="0">Dealer</option>
            <option value="1">Individual</option>
        </select></label>

        <label for="transmission">Transmission:<select name="Transmission" required>
            <option value="0">Manual</option>
            <option value="1">Automatic</option>
        </select></label>

        <label for="owner">Owner:</label>
        <input type="number" name="Owner" required><br><br>

        <button type="submit">Predict Selling Price</button>
    </form>
</body>
</html>


### b) Create app.py file and write the predict function

In [None]:
from flask import Flask, request, render_template
app = Flask(__name__)
import pickle
import numpy as np

# Load your model.pkl file
with open('model.pkl', 'rb') as model_file:
    model = pickle.load(model_file)


@app.route('/',methods=['GET'])
def home():
    return render_template('index.html')


@app.route('/predict', methods=['POST'])
def predict():
    if request.method == 'POST':
        Present_Price=float(request.form['Present_Price'])
        Kms_Driven = int(request.form['Kms_Driven'])
        Owner = int(request.form['Owner'])
        Fuel_Type = request.form['Fuel_Type']
        age_of_the_car=request.form['age_of_the_car']
        Seller_Type = request.form['Seller_Type']
        Transmission = request.form['Transmission']


        prediction = model.predict([[Present_Price, Kms_Driven, Owner, age_of_the_car, Fuel_Type, Seller_Type, Transmission]])
        output = round(prediction[0], 2)
        return render_template('index.html',prediction_text="You can sell your car at {} lakhs".format(output))


if __name__ == "__main__":
    app.run(debug=True)



### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

### Happy Learning :)