<a href="https://colab.research.google.com/github/myeze/MachineLearningModels/blob/main/CarEvaluationModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Car Evaluation Neural Network Model

**This notebook contains a model created by Myles Ezeanii.**

Throughout the notebook, I was able to form a Neural Network Model used to predict the value of comparative value of motor vehichles using a preexisting database. The features included are:

---
* **Buying Price**
  * vhigh, high, med, low.
* **Maintenance Price**
  * vhigh, high, med, low
* **Number of Doors**
  * 2, 3, 4, 5, more
* **Seating Capacity**
  *  2, 4, more
* **Luggage Boot Size**
  * small, med, big
* **Car Safety (estimated)**
  *  low, med, high
---

The "goal" field represents the evaulation level of our vehichle

This is represented as an categorical value in the form of: (unacceptable, acceptable, good, very good).

---

The overall goal is for the model to accuratly determine the worth a car has in respect to others that have been seen.

---
Bohanec, M. (1988). Car Evaluation [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5JP48.

In [1]:
!pip install ucimlrepo
!pip install tensorflow

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.7


In [2]:
from ucimlrepo import fetch_ucirepo
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split

from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import LabelEncoder

# fetch dataset
car_evaluation = fetch_ucirepo(id=19)

# data (as pandas dataframes)
X = car_evaluation.data.features
y = car_evaluation.data.targets

# metadata
print(car_evaluation.metadata)

# variable information
print(car_evaluation.variables)
print(X)
print(y)

{'uci_id': 19, 'name': 'Car Evaluation', 'repository_url': 'https://archive.ics.uci.edu/dataset/19/car+evaluation', 'data_url': 'https://archive.ics.uci.edu/static/public/19/data.csv', 'abstract': 'Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.', 'area': 'Other', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 1728, 'num_features': 6, 'feature_types': ['Categorical'], 'demographics': [], 'target_col': ['class'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1988, 'last_updated': 'Thu Aug 10 2023', 'dataset_doi': '10.24432/C5JP48', 'creators': ['Marko Bohanec'], 'intro_paper': {'ID': 249, 'type': 'NATIVE', 'title': 'Knowledge acquisition and explanation for multi-attribute decision making', 'authors': 'M. Bohanec, V. Rajkovič', 'venue': '8th Intl Workshop on Expert Systems and their Applications, 

In [3]:
# Split the data into training and testing sets
XTrained, XTested, yTrained, yTested = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
# Create an imputer to replace missing values with the mean
imputer = SimpleImputer(strategy='mean')

In [5]:
# Separate the categorical and numerical features
categoryfeatures = X.select_dtypes(include=['object']).columns.tolist()
numericalfeatures = X.select_dtypes(exclude=['object']).columns.tolist()

In [6]:
# Create transformers for numerical and categorical features
numericalTransformer = SimpleImputer(strategy='mean') # Use mean for numerical features
categoryTransformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')), # Use most frequent for categorical features
    ('onehot', OneHotEncoder(sparse_output=False, handle_unknown='ignore')) # One-hot encode categorical features
])

In [7]:
# Create a ColumnTransformer to apply transformers to the correct columns
preprocessor = ColumnTransformer(transformers=[
    ('num', numericalTransformer, numericalfeatures),
    ('cat', categoryTransformer, categoryfeatures)
])

In [8]:
# Fit the preprocessor on the training data and transform both training and testing data
XTrainedImputed = preprocessor.fit_transform(XTrained)
XTestedImputed = preprocessor.transform(XTested)

In [9]:
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(XTrainedImputed.shape[1],)), # Input layer with 6 features
    keras.layers.Dense(64, activation='relu'), # Hidden layer with 64 units and ReLU activation
    keras.layers.Dense(4, activation='softmax') # Output layer with 4 units (for 4 classes) and softmax activation
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [10]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [11]:
# Create a LabelEncoder
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()

# Fit the encoder on the target variable and transform it
yTrainedEncoded = label_encoder.fit_transform(yTrained.values.ravel())

# Now use yTrainedEncoded in model.fit
model.fit(XTrainedImputed, yTrainedEncoded, epochs=10, batch_size=32, validation_split=0.1)

Epoch 1/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 45ms/step - accuracy: 0.6439 - loss: 1.0310 - val_accuracy: 0.6475 - val_loss: 0.7412
Epoch 2/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.7827 - loss: 0.5228 - val_accuracy: 0.8058 - val_loss: 0.5209
Epoch 3/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.8728 - loss: 0.3696 - val_accuracy: 0.8705 - val_loss: 0.4056
Epoch 4/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.9035 - loss: 0.2574 - val_accuracy: 0.9137 - val_loss: 0.3159
Epoch 5/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - accuracy: 0.9297 - loss: 0.1979 - val_accuracy: 0.9281 - val_loss: 0.2633
Epoch 6/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9538 - loss: 0.1667 - val_accuracy: 0.9353 - val_loss: 0.2071
Epoch 7/10
[1m39/39[0m [32m━━━━━━

<keras.src.callbacks.history.History at 0x7ead9a79d600>

In [12]:
# Assuming label_encoder is already defined from previous code
yTestedEncoded = label_encoder.transform(yTested.values.ravel())
loss, accuracy = model.evaluate(XTestedImputed, yTestedEncoded, verbose=0)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Test Loss: 0.13039006292819977
Test Accuracy: 0.9508670568466187
