<a href="https://colab.research.google.com/github/myeze/MachineLearningModels/blob/main/CarEvaluationModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Car Evaluation Neural Network Model

**This notebook contains a model created by Myles Ezeanii.**

Throughout the notebook, I was able to form a Neural Network Model used to predict the value of comparative value of motor vehichles using a preexisting database. The features included are:

---
* **Buying Price**
  * vhigh, high, med, low.
* **Maintenance Price**
  * vhigh, high, med, low
* **Number of Doors**
  * 2, 3, 4, 5, more
* **Seating Capacity**
  *  2, 4, more
* **Luggage Boot Size**
  * small, med, big
* **Car Safety (estimated)**
  *  low, med, high
---

The "goal" field represents the evaulation level of our vehichle.

This is represented as an categorical value in the form of: (unacceptable, acceptable, good, very good).

---

The data was preprocessed through the addition of imputation for missing values and one-hot encoding for categorical features.

There were 2 hidden layers created, each using ReLU Activation functions while the last layer used a softmax function.

I felt that the use of a Neural Network worked best here because of our use of categorical data made usable through one-hot encoding, feature interaction allowing us to see how each combination of features affect car evaluation, and its ability to learn complex relationships between said features and car evaluation.

The overall goal is for the model to accurately determine the worth a car has in respect to others that have been seen.

---
Bohanec, M. (1988). Car Evaluation [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5JP48.

In [None]:
!pip install ucimlrepo
!pip install tensorflow



In [None]:
from ucimlrepo import fetch_ucirepo

# fetch dataset
car_evaluation = fetch_ucirepo(id=19)

# data (as pandas dataframes)
X = car_evaluation.data.features
y = car_evaluation.data.targets

# metadata
#print(car_evaluation.metadata)

# variable information
#print(car_evaluation.variables)
print(X)
print(y)

     buying  maint  doors persons lug_boot safety
0     vhigh  vhigh      2       2    small    low
1     vhigh  vhigh      2       2    small    med
2     vhigh  vhigh      2       2    small   high
3     vhigh  vhigh      2       2      med    low
4     vhigh  vhigh      2       2      med    med
...     ...    ...    ...     ...      ...    ...
1723    low    low  5more    more      med    med
1724    low    low  5more    more      med   high
1725    low    low  5more    more      big    low
1726    low    low  5more    more      big    med
1727    low    low  5more    more      big   high

[1728 rows x 6 columns]
      class
0     unacc
1     unacc
2     unacc
3     unacc
4     unacc
...     ...
1723   good
1724  vgood
1725  unacc
1726   good
1727  vgood

[1728 rows x 1 columns]


In [None]:
# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
XTrained, XTested, yTrained, yTested = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Create an imputer to replace missing values with the mean
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')

In [None]:
# Separate the categorical and numerical features
categoryfeatures = X.select_dtypes(include=['object']).columns.tolist()
numericalfeatures = X.select_dtypes(exclude=['object']).columns.tolist()

In [None]:
# Create transformers for numerical and categorical features
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

numericalTransformer = SimpleImputer(strategy='mean') # Use mean for numerical features
categoryTransformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')), # Use most frequent for categorical features
    ('onehot', OneHotEncoder(sparse_output=False, handle_unknown='ignore')) # One-hot encode categorical features
])

In [None]:
# Create a ColumnTransformer to apply transformers to the correct columns
from sklearn.compose import ColumnTransformer

preprocessor = ColumnTransformer(transformers=[
    ('num', numericalTransformer, numericalfeatures),
    ('cat', categoryTransformer, categoryfeatures)
])

In [None]:
# Fit the preprocessor on the training data and transform both training and testing data
XTrainedImputed = preprocessor.fit_transform(XTrained)
XTestedImputed = preprocessor.transform(XTested)

In [None]:
from tensorflow import keras
model = keras.Sequential([
    keras.layers.Input(shape=(XTrainedImputed.shape[1],)),
    keras.layers.Dense(128, activation='relu'), # Input layer with 6 features
    keras.layers.Dense(64, activation='relu'), # Hidden layer with 64 units and ReLU activation
    keras.layers.Dense(4, activation='softmax') # Output layer with 4 units (for 4 classes) and softmax activation
])

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Create a LabelEncoder
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()

# Fit the encoder on the target variable and transform it
yTrainedEncoded = label_encoder.fit_transform(yTrained.values.ravel())

# Now use yTrainedEncoded in model.fit
model.fit(XTrainedImputed, yTrainedEncoded, epochs=10, batch_size=32, validation_split=0.1)

Epoch 1/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 17ms/step - accuracy: 0.6070 - loss: 1.0655 - val_accuracy: 0.6259 - val_loss: 0.7569
Epoch 2/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.7543 - loss: 0.5429 - val_accuracy: 0.8201 - val_loss: 0.4837
Epoch 3/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8715 - loss: 0.3553 - val_accuracy: 0.8561 - val_loss: 0.3520
Epoch 4/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9072 - loss: 0.2487 - val_accuracy: 0.8993 - val_loss: 0.2880
Epoch 5/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9220 - loss: 0.2016 - val_accuracy: 0.8921 - val_loss: 0.2461
Epoch 6/10
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9566 - loss: 0.1672 - val_accuracy: 0.9137 - val_loss: 0.2010
Epoch 7/10
[1m39/39[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x791f0a495330>

In [None]:
# Assuming label_encoder is already defined from previous code
yTestedEncoded = label_encoder.transform(yTested.values.ravel())
loss, accuracy = model.evaluate(XTestedImputed, yTestedEncoded, verbose=0)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Test Loss: 0.12043614685535431
Test Accuracy: 0.9566473960876465
