# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
import pickle


### 2. Load the dataset

In [3]:
# Load the dataset
url = "car+data.csv"
data = pd.read_csv(url)


### 3. Check the shape and basic information of the dataset.

In [4]:
# Check the shape
print(data.shape)

# Check the basic information
print(data.info())

# Display the first few rows of the dataset
print(data.head())


(301, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None
  Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0     ritz  2014           3.35           5.59       27000    Petrol   
1      sx4  2013           4.75           9.54       43000    Diesel   
2     ciaz  2017           7.25           9.85        6900    Petrol   
3  wagon r  2011           2.85           4.15 

### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [5]:
# Check for duplicates
duplicates = data.duplicated().sum()
print(f"Number of duplicate records: {duplicates}")

# Drop duplicates if any
if duplicates > 0:
    data = data.drop_duplicates()


Number of duplicate records: 2


### 5. Drop the columns which you think redundant for the analysis.

In [6]:
# Drop the 'Car_Name' column as it is redundant for the analysis
data = data.drop(['Car_Name'], axis=1)


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [7]:
# Extract 'age_of_the_car' from 'Year'
data['age_of_the_car'] = 2024 - data['Year']  # Assuming the current year is 2024
data = data.drop(['Year'], axis=1)


### 7. Encode the categorical columns

In [8]:
# Encode categorical columns
data = pd.get_dummies(data, drop_first=True)


### 8. Separate the target and independent features.

In [9]:
# Separate the target and independent features
X = data.drop(['Selling_Price'], axis=1)
y = data['Selling_Price']


### 9. Split the data into train and test.

In [10]:
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [11]:
# Build the Random Forest Regressor model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict on the train and test sets
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Calculate the r2-score for train and test sets
r2_train = r2_score(y_train, y_train_pred)
r2_test = r2_score(y_test, y_test_pred)

print(f"R2 Score for Train Set: {r2_train}")
print(f"R2 Score for Test Set: {r2_test}")


R2 Score for Train Set: 0.9840594603786668
R2 Score for Test Set: 0.547390441726516


### 11. Create a pickle file with an extension as .pkl

In [12]:
# Save the model as a pickle file
with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

### Happy Learning :)