# Final Validation of the selected model

Here the final validation and visualization of the selected model is performed.

### Importing the standard libraries

In [17]:
import numpy as np
import pandas as pd

### Importing the dataset

In [18]:
dataset = pd.read_csv('../assets/car-details-for-ml-fuel-types.csv')

dataset.head()

Unnamed: 0,km_driven,seats,mileage,engine,max_power,nm,selling_price,fuel
0,145500,5.0,23.4,1248,74.0,190.0,450000,0
1,120000,5.0,21.14,1498,103.52,250.0,370000,0
2,140000,5.0,17.7,1497,78.0,124.54,158000,1
3,127000,5.0,23.0,1396,90.0,219.67,225000,0
4,120000,5.0,16.1,1298,88.2,112.78,130000,1


### Splitting the dataset into the Training and Test set

In [19]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [20]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=73)

### Feature scaling the data

The artificial neural network requires feature scaled data.

In [21]:
from sklearn.preprocessing import StandardScaler

std_scaler = StandardScaler()

X_train_scaled = std_scaler.fit_transform(X_train)
X_test_scaled = std_scaler.transform(X_test)

# Sample the scaled values
print(X_train_scaled[:1, :12])
print(X_test_scaled[:1, :12])

[[-0.23 -0.45  0.59 -0.9  -0.7  -0.98 -0.64]]
[[-1.02 -0.45 -0.38  1.05  3.04  2.62  5.11]]


## Artificial Neural Network Classification

Let's import the model from the model selection notebook, since it is the model I will use for the final validation.

In [22]:
from tensorflow.python.keras.api import keras

ann_class = keras.models.load_model('../assets/ann_class.h5')

Let's see if the model is working as expected.

In [23]:
y_pred = ann_class.predict(X_test_scaled)
y_pred = (y_pred > 0.5)

np.set_printoptions(precision=2)
actual_vs_pred = np.concatenate((y_test.reshape(len(y_test), 1), y_pred.reshape(len(y_pred), 1)), 1)

print(["Actual", "Predictions"])
print(actual_vs_pred[4:12])

['Actual', 'Predictions']
[[0 0]
 [0 0]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [0 0]]


In [24]:
from sklearn.metrics import confusion_matrix, accuracy_score

accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print("Accuracy: ", accuracy)
print("Confusion Matrix: \n", cm)

Accuracy:  1.0
Confusion Matrix: 
 [[866   0]
 [  0 668]]


Perfect. The model still performs like it should. Let's see if the model performs well on real world data.

## Trying real world data

Even though the model is based on real world data, it's still a good idea to test it on something that you find yourself. Let's try to get some data that is newer than the dataset and see if it holds up.
(Dataset goes up to 2020)

In [25]:
dataset.head(1)

Unnamed: 0,km_driven,seats,mileage,engine,max_power,nm,selling_price,fuel
0,145500,5.0,23.4,1248,74.0,190.0,450000,0


In [26]:
predictions = []

def append(car, actual, predicted):
    predictions.append([actual, "Petrol" if predicted > 0.5 else "Diesel", car])

In [27]:
# Data from https://www.cardekho.com/used-car-details/used-BMW-7-Series-740Li-DPE-Signature-cars-New-Delhi_2DE7F9C50F867CD984ED8E8C3204DAEB.htm
car = "2021 BMW 7 Series"
actual = "Petrol"
result = ann_class.predict(std_scaler.transform([[11000, 5, 11.86, 2998, 335.2, 450, 12500000]]))
append(car, actual, result[0][0])

# Data from https://www.cardekho.com/buy-used-car-details/used-Maruti-Swift-Dzire-Amt-Zdi-cars-New-Delhi_4c983a6b-42aa-430a-88c9-7e0b5aeceba2.htm
car = "2016 Maruti Swift Dzire"
actual = "Diesel"
result = ann_class.predict(std_scaler.transform([[78192, 5, 26.59, 1248, 74, 190, 549500]]))
append(car, actual, result[0][0])

# Data from https://www.cardekho.com/used-car-details/used-Audi-Q5-2012-2017-2.0-TDI-Premium-Plus-cars-New-Delhi_8CF8185D79A93CEF20D03BDF858DBFBE.htm
car = "2013 Audi Q5"
actual = "Diesel"
result = ann_class.predict(std_scaler.transform([[60000, 5, 14.16, 1968, 174.3, 380, 1430000]]))
append(car, actual, result[0][0])

# Data from https://www.cardekho.com/buy-used-car-details/used-Hyundai-I20-Magna-1.2-cars-New-Delhi_9ed4c70a-3930-4c97-84d9-8c261c401c20.htm
car = "2015 Hyundai i20"
actual = "Petrol"
result = ann_class.predict(std_scaler.transform([[70296, 5, 18.6, 1197, 81.83, 114.7, 435000]]))
append(car, actual, result[0][0])

# Data from https://www.cardekho.com/buy-used-car-details/used-Mahindra-Xuv300-W6-Diesel-cars-New-Delhi_6ef1d552-3d68-4120-a61e-9fbb7570103c.htm
car = "2019 Mahindra XUV300"
actual = "Diesel"
result = ann_class.predict(std_scaler.transform([[71786, 5, 20, 1497, 115, 300, 765000]]))
append(car, actual, result[0][0])


In [28]:
print(["Actual", "Predictions", "Car"])
for i in range(len(predictions)):
    print(predictions[i])

['Actual', 'Predictions', 'Car']
['Petrol', 'Petrol', '2021 BMW 7 Series']
['Diesel', 'Diesel', '2016 Maruti Swift Dzire']
['Diesel', 'Diesel', '2013 Audi Q5']
['Petrol', 'Petrol', '2015 Hyundai i20']
['Diesel', 'Diesel', '2019 Mahindra XUV300']


5 for 5 perfect predictions on the real world data. The model is working as hoped and expected. I think this wraps it up nicely.

## Summary of the final model

After having determined that an artificial neural network is the best model for this problem, I decided to test it on some real world data. And as expected it performed to perfection. I think this is a good model for this problem.

The results of the model were:

Accuracy: 100%

Confusion Matrix:

[100%, 0%] - True Positive - False Positive

[0%, 100%] - False Negative - True Negative

In other words: The model is 100% accurate