# Experimentation of 3 different feature extraction techniques for traditional ML

- Histogram of Oriented Gradient (HOG) features: HOG features describe the local gradient orientation in an image, capturing shape and texture information. It divides the image into small cells and calculates gradients within each cell.

- Local Binary Patterns (LBP): LBP is a texture descriptor that encodes the local patterns of pixel intensities in an image. It compares each pixel with its surrounding neighborhood to generate binary patterns. 
- Normalized Raw Pixel Values:  Normalized raw pixel values represent the pixel intensity values of an image after normalization, ensuring uniformity and scale invariance. It enhances image quality and feature extraction.


In [2]:
import os
import pandas as pd
import cv2 
import matplotlib.pyplot as plt
import warnings
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
import seaborn as sns


# Define the path to the dataset directory
dataset_dir = './PlantVillage'

# Create lists to store image paths and corresponding labels
image_paths = []
labels = []

# Iterate through each folder in the dataset directory
for folder_name in os.listdir(dataset_dir):
    folder_path = os.path.join(dataset_dir, folder_name)
    # Iterate through each file in the folder
    for file_name in os.listdir(folder_path):
        # Append the image path to the list
        image_path = os.path.join(folder_path, file_name)
        image_paths.append(image_path)
        # Append the label (folder name) to the labels list
        labels.append(folder_name)

# Create a DataFrame to store the image paths and labels
dataset = pd.DataFrame({'Image_Path': image_paths, 'Label': labels})


# Display the first few rows of the dataset
dataset.head()


Unnamed: 0,Image_Path,Label
0,./PlantVillage\Tomato_Bacterial_spot\00416648-...,Tomato_Bacterial_spot
1,./PlantVillage\Tomato_Bacterial_spot\0045ba29-...,Tomato_Bacterial_spot
2,./PlantVillage\Tomato_Bacterial_spot\00639d29-...,Tomato_Bacterial_spot
3,./PlantVillage\Tomato_Bacterial_spot\00728f4d-...,Tomato_Bacterial_spot
4,./PlantVillage\Tomato_Bacterial_spot\00a7c269-...,Tomato_Bacterial_spot


In [3]:
# Display the counts of images in each label
label_counts = dataset['Label'].value_counts()
print("\nCounts of images in each label:")
print(label_counts)


Counts of images in each label:
Tomato_Spider_mites_Two_spotted_spider_mite    3237
Tomato__Tomato_YellowLeaf__Curl_Virus          3208
Tomato_Bacterial_spot                          3080
Tomato_healthy                                 3051
Tomato__Tomato_mosaic_virus                    2931
Tomato_Late_blight                             2861
Tomato_Early_blight                            2813
Tomato_Septoria_leaf_spot                      2717
Tomato__Target_Spot                            2711
Tomato_Leaf_Mold                               2685
Name: Label, dtype: int64


### Preprocessing and Feature Extraction using HOG (Histogram of Oriented Gradient)

This cell contains Python code utilizing scikit-image (skimage) and scikit-learn (sklearn) libraries to preprocess images, extract HOG features, and encode labels using LabelEncoder from sklearn.

#### Libraries Used:
- `sklearn.preprocessing.LabelEncoder`: Used to encode categorical labels into numerical format.
- `skimage.feature.hog`: Used to extract Histogram of Oriented Gradient (HOG) features from grayscale images.

#### Functions Defined:
1. `extract_hog_features(image)`: This function takes an input image, converts it to grayscale, computes HOG features using the skimage.feature.hog function, and returns the extracted features.

2. `read_resize_image(image_path, target_size=(64, 64))`: This function reads an image from the given `image_path`, converts it from BGR to RGB color space, resizes it to the specified `target_size`, and returns the resized image.

3. `preprocess_images_and_labels(image_paths, labels)`: This function preprocesses a list of image paths and corresponding labels. It reads and resizes each image using the `read_resize_image` function, extracts HOG features using the `extract_hog_features` function, and encodes labels using the `LabelEncoder`. The function returns numpy arrays of HOG features and encoded labels.

#### Usage:
- Call `preprocess_images_and_labels` with a list of image paths and corresponding labels to preprocess images, extract HOG features, and encode labels for further machine learning tasks.



In [19]:
from sklearn.preprocessing import LabelEncoder
from skimage.feature import hog


# Preprocessing function to extract HOG features
def extract_hog_features(image):
    # Convert the image to grayscale
    gray_image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    
    # Compute HOG features
    features, hog_image = hog(gray_image, orientations=9, pixels_per_cell=(8, 8),
                              cells_per_block=(2, 2), visualize=True, block_norm='L2-Hys')
    
    return features 

    
# Preprocessing function (reading and resizing images)
def read_resize_image(image_path, target_size=(64, 64)):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert BGR to RGB
    image = cv2.resize(image, target_size)  # Resize image
    return image

# Function to preprocess images and labels and extract HOG features
def preprocess_images_and_labels(image_paths, labels):
    features_list = []
    encoded_labels = []

    for image_path, label in zip(image_paths, labels):
        # Read and resize image
        image = read_resize_image(image_path)
        
        # Extract HOG features
        features = extract_hog_features(image)
        
        # Append HOG features and label to lists
        features_list.append(features)
        encoded_labels.append(label)
    
    # Convert lists to numpy arrays
    features_array = np.array(features_list)
    encoded_labels = np.array(encoded_labels)
    
    return features_array, encoded_labels


In [20]:
# Preprocess images and labels to extract HOG features
hog_features, encoded_labels = preprocess_images_and_labels(image_paths, labels)

We use train_test_split from scikit-learn to split the preprocessed images and encoded labels into training and testing sets.

In [21]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(hog_features, encoded_labels, test_size=0.2, random_state=42)


### Testing the HOG feature extraction with the 4 proposed traditional ML techniques

In [22]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Create an instance of the RandomForestClassifier class
model = RandomForestClassifier()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 0.43556920976275815
Confusion Matrix:
[[350  25  19  10  23  38  15  41  30  23]
 [ 91  97  48  58  36  75  32  64  39  18]
 [ 43  41 255  61  30  28  16  21  56  23]
 [ 17  28  41 260  23  31   9  29  99  30]
 [ 92  32  38  59  80  59  24  36  71  58]
 [ 30  32  13  12   9 299  40  57 107  33]
 [114  35  11  21  24 102  79  60  61  47]
 [ 50  12   9   7  10  64   7 439  17   9]
 [  4  12  28  66  13  61   7  18 329  81]
 [ 27   8   7  10  14  62  32  12  72 364]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.43      0.61      0.50       574
                        Tomato_Early_blight       0.30      0.17      0.22       558
                         Tomato_Late_blight       0.54      0.44      0.49       574
                           Tomato_Leaf_Mold       0.46      0.46      0.46       567
                  Tomato_Septoria_leaf_spot       0.31      0.15      0.

In [23]:
from sklearn.neighbors import KNeighborsClassifier


# Create an instance of the KNeighborsClassifier class
knn_model = KNeighborsClassifier()

# Train the model
knn_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)



Accuracy: 0.46816862945895205
Confusion Matrix:
[[289   6   1   6  16  29  20  15  27 165]
 [ 79  37   6  52  31  67  27  26  93 140]
 [ 34  17 239  75  23  18  14  11  46  97]
 [ 11   3   5 306  15  21  10  14  96  86]
 [ 29   8  15  66 119  49  18   9  72 164]
 [ 19   7   1  13   7 331  22  31  90 111]
 [ 36  13   1  11  20  70  89  16  58 240]
 [ 23   4   2  15   8  53  18 427  26  48]
 [  2   3   5  52  21  29   1  13 388 105]
 [  2   0   0   5   3  19   9   5  47 518]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.55      0.50      0.53       574
                        Tomato_Early_blight       0.38      0.07      0.11       558
                         Tomato_Late_blight       0.87      0.42      0.56       574
                           Tomato_Leaf_Mold       0.51      0.54      0.52       567
                  Tomato_Septoria_leaf_spot       0.45      0.22      0.

In [24]:
from sklearn.tree import DecisionTreeClassifier


# Create an instance of the DecisionTreeClassifier class
dt_model = DecisionTreeClassifier()

# Train the model
dt_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = dt_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 0.2261478067929681
Confusion Matrix:
[[155  60  29  26  40  56  77  53  24  54]
 [ 70  78  41  58  51  56  59  48  50  47]
 [ 40  60 133  56  69  40  36  37  66  37]
 [ 30  52  73 152  59  42  35  29  66  29]
 [ 56  52  45  53  59  60  52  60  71  41]
 [ 49  53  39  44  44 139  80  67  66  51]
 [ 65  58  33  39  54  90  67  52  47  49]
 [ 74  39  33  29  63  55  59 215  30  27]
 [ 41  59  59  69  55  68  48  33 117  70]
 [ 53  38  26  36  45  47  61  31  61 210]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.24      0.27      0.26       574
                        Tomato_Early_blight       0.14      0.14      0.14       558
                         Tomato_Late_blight       0.26      0.23      0.25       574
                           Tomato_Leaf_Mold       0.27      0.27      0.27       567
                  Tomato_Septoria_leaf_spot       0.11      0.11      0.1

In [25]:
from sklearn.naive_bayes import GaussianNB

# Create an instance of the GaussianNB class
nb_model = GaussianNB()
    
# Train the model
nb_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = nb_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

Accuracy: 0.4362519201228879
Confusion Matrix:
[[329  39  24  24  29  33  18  29  26  23]
 [ 71 147  84  71  26  49  36  35  28  11]
 [ 31  47 199  86  36  16  11  11 121  16]
 [  8  33  45 268  39  30  15  16  96  17]
 [ 77  35  31  78 110  47  37   8  80  46]
 [ 17  47  28  35  13 295  27  61  76  33]
 [ 93  52  11  30  22  88 116  47  58  37]
 [ 35  11  29   9  24  64  19 401  11  21]
 [  1  23  11  77  16  56  12   4 349  70]
 [ 20  12   3  23  18  60  51   9  70 342]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.48      0.57      0.52       574
                        Tomato_Early_blight       0.33      0.26      0.29       558
                         Tomato_Late_blight       0.43      0.35      0.38       574
                           Tomato_Leaf_Mold       0.38      0.47      0.42       567
                  Tomato_Septoria_leaf_spot       0.33      0.20      0.2

### Preprocessing and Feature Extraction using LBP (Local Binary Patterns)

This cell contains Python code utilizing scikit-image (skimage) and OpenCV (cv2) libraries to preprocess images, extract Local Binary Pattern (LBP) features, and encode labels.

#### Libraries Used:
- `skimage.feature.local_binary_pattern`: Used to extract LBP features from grayscale images.


#### Functions Defined:
 `extract_lbp_features(image)`: This function takes an input image, converts it to grayscale using OpenCV (`cv2`), computes LBP features using the `local_binary_pattern` function from scikit-image (`skimage.feature.local_binary_pattern`), and returns the flattened LBP features array.


#### Usage:
- Call `preprocess_images_and_labels` with a list of image paths and corresponding labels to preprocess images, extract LBP features, and encode labels for further machine learning tasks.



In [33]:
from skimage.feature import local_binary_pattern
# Preprocessing function (reading and resizing images)
def read_resize_image(image_path, target_size=(64, 64)):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert BGR to RGB
    image = cv2.resize(image, target_size)  # Resize image
    return image
    
# Function to extract LBP features
def extract_lbp_features(image):
    # Convert the image to grayscale
    gray_image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    
    # Compute LBP features
    lbp_radius = 3
    lbp_points = 8 * lbp_radius
    lbp_features = local_binary_pattern(gray_image, lbp_points, lbp_radius, method='uniform')
    
    return lbp_features.ravel()  # Flatten the 2D array to 1D

# Function to preprocess images and labels and extract LBP features
def preprocess_images_and_labels(image_paths, labels):
    features_list = []
    encoded_labels = []

    for image_path, label in zip(image_paths, labels):
        # Read and resize image
        image = read_resize_image(image_path)
        
        # Extract LBP features
        features = extract_lbp_features(image)
        
        # Append LBP features and label to lists
        features_list.append(features)
        encoded_labels.append(label)
    
    # Convert lists to numpy arrays
    features_array = np.array(features_list)
    encoded_labels = np.array(encoded_labels)
    
    return features_array, encoded_labels

# Preprocess images and labels to extract LBP features
lbp_features, encoded_labels = preprocess_images_and_labels(image_paths, labels)

In [34]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(lbp_features, encoded_labels, test_size=0.2, random_state=42)

### Testing the HOG feature extraction with the 4 proposed traditional ML techniques

In [35]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Create an instance of the RandomForestClassifier class
model = RandomForestClassifier()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 0.20634920634920634
Confusion Matrix:
[[151  64  40  27  37  74  41  70  29  41]
 [124  80  43  24  27  59  21 125  23  32]
 [ 69  59  97  45  30  69  32  87  42  44]
 [ 74  43  56  39  44  90  35  63  64  59]
 [ 89  35  36  26  44 128  35  19  50  87]
 [ 78  37  42  25  37 168  41  43  66  95]
 [ 79  28  51  33  47  98  43  42  53  80]
 [ 65  42  48  15  21  64  23 309  23  14]
 [ 49  20  40  36  61 141  49  15 107 101]
 [ 54  10  27  25  55 127  38  12  89 171]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.18      0.26      0.21       574
                        Tomato_Early_blight       0.19      0.14      0.16       558
                         Tomato_Late_blight       0.20      0.17      0.18       574
                           Tomato_Leaf_Mold       0.13      0.07      0.09       567
                  Tomato_Septoria_leaf_spot       0.11      0.08      0.

In [36]:
from sklearn.neighbors import KNeighborsClassifier


# Create an instance of the KNeighborsClassifier class
knn_model = KNeighborsClassifier()

# Train the model
knn_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)



Accuracy: 0.09865164703874381
Confusion Matrix:
[[  0   0 574   0   0   0   0   0   0   0]
 [  0   0 558   0   0   0   0   0   0   0]
 [  0   0 574   0   0   0   0   0   0   0]
 [  0   0 567   0   0   0   0   0   0   0]
 [  0   0 549   0   0   0   0   0   0   0]
 [  0   0 632   0   0   0   0   0   0   0]
 [  0   0 554   0   0   0   0   0   0   0]
 [  0   0 620   0   0   0   0   4   0   0]
 [  0   0 619   0   0   0   0   0   0   0]
 [  0   0 608   0   0   0   0   0   0   0]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.00      0.00      0.00       574
                        Tomato_Early_blight       0.00      0.00      0.00       558
                         Tomato_Late_blight       0.10      1.00      0.18       574
                           Tomato_Leaf_Mold       0.00      0.00      0.00       567
                  Tomato_Septoria_leaf_spot       0.00      0.00      0.

In [37]:
from sklearn.tree import DecisionTreeClassifier


# Create an instance of the DecisionTreeClassifier class
dt_model = DecisionTreeClassifier()

# Train the model
dt_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = dt_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 0.12203447687318655
Confusion Matrix:
[[ 79  78  50  43  40  73  61  54  49  47]
 [ 57  44  52  51  40  81  61  78  46  48]
 [ 69  59  72  55  40  66  58  65  53  37]
 [ 49  56  70  56  44  66  59  49  68  50]
 [ 60  59  44  47  58  49  65  43  59  65]
 [ 60  55  68  70  57  78  62  48  72  62]
 [ 67  59  55  46  54  77  44  40  61  51]
 [ 67  66  57  47  54  63  46 128  60  36]
 [ 63  61  57  53  50  84  56  39  68  88]
 [ 57  53  48  65  70  79  49  30  69  88]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.13      0.14      0.13       574
                        Tomato_Early_blight       0.07      0.08      0.08       558
                         Tomato_Late_blight       0.13      0.13      0.13       574
                           Tomato_Leaf_Mold       0.11      0.10      0.10       567
                  Tomato_Septoria_leaf_spot       0.11      0.11      0.

In [38]:
from sklearn.naive_bayes import GaussianNB

# Create an instance of the GaussianNB class
nb_model = GaussianNB()
    
# Train the model
nb_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = nb_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

Accuracy: 0.3715651135005974
Confusion Matrix:
[[292  74  35  23  18  36  46  17   9  24]
 [ 56 189  47  58  13  25  26 121  13  10]
 [ 39 125 126  75  31  14  20  60  47  37]
 [ 14  90  37 197  57  41  27  19  65  20]
 [ 42  25  18  50 103  92  42   6  61 110]
 [ 42  20  18  51  61 185  63   7  80 105]
 [ 67  31  21  43  48 107 104   6  61  66]
 [ 40  43   9  42  31  69  26 354   7   3]
 [  2   4   8  71  52  63  26   0 273 120]
 [  6   0   1   9  53  82  44   1  58 354]]
Classification Report:
                                             precision    recall  f1-score   support

                      Tomato_Bacterial_spot       0.49      0.51      0.50       574
                        Tomato_Early_blight       0.31      0.34      0.33       558
                         Tomato_Late_blight       0.39      0.22      0.28       574
                           Tomato_Leaf_Mold       0.32      0.35      0.33       567
                  Tomato_Septoria_leaf_spot       0.22      0.19      0.2

### Preprocessing Images and Labels


#### Functions Defined:
1. `read_resize_image(image_path, target_size=(32, 32))`: This function reads an image from the given `image_path`, converts it from BGR to RGB color space using OpenCV (`cv2`), resizes it to the specified `target_size`, and returns the resized image.

2. `preprocess_images_and_labels(image_paths, labels)`: This function preprocesses a list of image paths and corresponding labels. It reads and resizes each image using the `read_resize_image` function, normalizes pixel values to the range [0, 1], and encodes labels. The function returns numpy arrays of preprocessed images and encoded labels.

#### Usage:
- Call `preprocess_images_and_labels` with a list of image paths and corresponding labels to preprocess images, normalize pixel values, and encode labels for further machine learning tasks.

- After preprocessing, call `LabelEncoder` from scikit-learn to encode the labels into numerical format.



In [None]:
from sklearn.preprocessing import LabelEncoder

# Preprocessing function (reading and resizing images)
def read_resize_image(image_path, target_size=(32, 32)):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert BGR to RGB
    image = cv2.resize(image, target_size)  # Resize image
    return image

# Function to preprocess images and labels
def preprocess_images_and_labels(image_paths, labels):
    images = []
    encoded_labels = []

    for image_path, label in zip(image_paths, labels):
        # Read and resize image
        image = read_resize_image(image_path)
        
        # Convert image to array and normalize pixel values
        image = image / 255.0  # Normalize pixel values to [0, 1]
        
        # Append image and label to lists
        images.append(image)
        encoded_labels.append(label)
    
    # Convert lists to numpy arrays
    images = np.array(images)
    encoded_labels = np.array(encoded_labels)
    
    return images, encoded_labels

# Preprocess images and labels
images, encoded_labels = preprocess_images_and_labels(image_paths, labels)

# Encode labels
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(encoded_labels)

In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(images, encoded_labels, test_size=0.2, random_state=42)

### Testing the normalized raw pixel feature extraction method with the 4 proposed traditional ML techniques
We train the model using the training data (X_train and y_train). Since scikit-learn models expect a 2D array as input, we reshape X_train and X_test to 2D arrays using reshape(-1, 32 * 32 * 3).


In [48]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Create an instance of the RandomForestClassifier class
model = RandomForestClassifier()

# Train the model
model.fit(X_train.reshape(-1, 32*32*3), y_train)  # Reshape X_train to 2D array

# Make predictions on the test set
y_pred = model.predict(X_test.reshape(-1, 32*32*3))  # Reshape X_test to 2D array

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 0.6562553336746885
Confusion Matrix:
[[439  36  15   1   7   6  15  37   3  15]
 [ 65 245  17  22  33  48  54  47  18   9]
 [ 29  85 271  32  34  37  14  25  16  31]
 [  6  20   8 349  43  27  12  14  79   9]
 [ 20  19  22  52 309   8  16  31  58  14]
 [ 13  34  11  13  12 442  31  28  36  12]
 [ 35  39   7  17  21 129 238   2  37  29]
 [ 25   3   3   0   3  43   5 540   0   2]
 [  5   5  18  31  20  28   6  11 477  18]
 [  3   1   5   5   2  10  10   3  34 535]]
Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.76      0.72       574
           1       0.50      0.44      0.47       558
           2       0.72      0.47      0.57       574
           3       0.67      0.62      0.64       567
           4       0.64      0.56      0.60       549
           5       0.57      0.70      0.63       632
           6       0.59      0.43      0.50       554
           7       0.73      0.87      0.79       624
           8

### K- Nearest Neighbours

In [49]:
from sklearn.neighbors import KNeighborsClassifier

# Create an instance of the KNeighborsClassifier class
knn_model = KNeighborsClassifier()

# Train the KNN model
knn_model.fit(X_train.reshape(-1, 32*32*3), y_train)

# Make predictions on the test set using the trained KNN model
knn_y_pred = knn_model.predict(X_test.reshape(-1, 32*32*3))

# Evaluate the KNN model
knn_accuracy = accuracy_score(y_test, knn_y_pred)
knn_conf_matrix = confusion_matrix(y_test, knn_y_pred)
knn_class_report = classification_report(y_test, knn_y_pred)

print("KNN Model Accuracy:", knn_accuracy)
print("KNN Model Confusion Matrix:")
print(knn_conf_matrix)
print("KNN Model Classification Report:")
print(knn_class_report)


KNN Model Accuracy: 0.434715821812596
KNN Model Confusion Matrix:
[[489   1  15   3  15   2   5  12   2  30]
 [332  21  28  29  36  37  17  23  10  25]
 [125  15 277  38  39  11   3  14   3  49]
 [ 58   9  73 314  26  14   2  11  31  29]
 [ 93   6  80  53 215  23   6  20  15  38]
 [208   6  47  18  37 213  35  25  18  25]
 [280   6  38   8  39  64  52   5  14  48]
 [146   1   8   6  15  25   7 407   0   9]
 [ 19   4 189  63  35  66   6   7 193  37]
 [114   0  99  10   9   4   4   0   2 366]]
KNN Model Classification Report:
              precision    recall  f1-score   support

           0       0.26      0.85      0.40       574
           1       0.30      0.04      0.07       558
           2       0.32      0.48      0.39       574
           3       0.58      0.55      0.57       567
           4       0.46      0.39      0.42       549
           5       0.46      0.34      0.39       632
           6       0.38      0.09      0.15       554
           7       0.78      0.65    

### Decision Tree Classifier

In [50]:
from sklearn.tree import DecisionTreeClassifier

# Create an instance of the DecisionTreeClassifier class
dt_model = DecisionTreeClassifier()

# Train the Decision Tree model
dt_model.fit(X_train.reshape(-1, 32*32*3), y_train)

# Make predictions on the test set using the trained Decision Tree model
dt_y_pred = dt_model.predict(X_test.reshape(-1, 32*32*3))

# Evaluate the Decision Tree model
dt_accuracy = accuracy_score(y_test, dt_y_pred)
dt_conf_matrix = confusion_matrix(y_test, dt_y_pred)
dt_class_report = classification_report(y_test, dt_y_pred)

print("Decision Tree Model Accuracy:", dt_accuracy)
print("Decision Tree Model Confusion Matrix:")
print(dt_conf_matrix)
print("Decision Tree Model Classification Report:")
print(dt_class_report)


Decision Tree Model Accuracy: 0.37753882915173237
Decision Tree Model Confusion Matrix:
[[273  67  30  18  25  30  42  48  12  29]
 [ 78 117  43  35  30  59  82  52  32  30]
 [ 35  56 200  38  56  42  31  29  43  44]
 [ 30  39  39 211  71  35  28  15  66  33]
 [ 32  41  43  62 189  37  41  40  44  20]
 [ 41  53  43  34  39 218  82  43  44  35]
 [ 42  63  38  28  48  96 123  13  43  60]
 [ 67  30  29  12  34  47  18 361  16  10]
 [ 24  33  35  68  65  63  59  27 204  41]
 [ 36  16  48  36  34  29  37   8  48 316]]
Decision Tree Model Classification Report:
              precision    recall  f1-score   support

           0       0.41      0.48      0.44       574
           1       0.23      0.21      0.22       558
           2       0.36      0.35      0.36       574
           3       0.39      0.37      0.38       567
           4       0.32      0.34      0.33       549
           5       0.33      0.34      0.34       632
           6       0.23      0.22      0.22       554
     

### Naive Bayes Classifier

In [51]:
from sklearn.naive_bayes import GaussianNB

# Create an instance of the GaussianNB class
nb_model = GaussianNB()

# Train the Naive Bayes model
nb_model.fit(X_train.reshape(-1, 32*32*3), y_train)

# Make predictions on the test set using the trained Naive Bayes model
nb_y_pred = nb_model.predict(X_test.reshape(-1, 32*32*3))

# Evaluate the Naive Bayes model
nb_accuracy = accuracy_score(y_test, nb_y_pred)
nb_conf_matrix = confusion_matrix(y_test, nb_y_pred)
nb_class_report = classification_report(y_test, nb_y_pred)

print("Naive Bayes Model Accuracy:", nb_accuracy)
print("Naive Bayes Model Confusion Matrix:")
print(nb_conf_matrix)
print("Naive Bayes Model Classification Report:")
print(nb_class_report)


Naive Bayes Model Accuracy: 0.40757808499743986
Naive Bayes Model Confusion Matrix:
[[194  78   5   1  34   3  23 166   7  63]
 [ 54 218  59  19  64  31  30  55  18  10]
 [ 25 121 115  14  50  14  10  23  41 161]
 [ 31  28  32 156 127  20  17  27  87  42]
 [ 27  42  14  37 205  38  21  30  75  60]
 [ 41  54  30  15  91 184  26  59 106  26]
 [ 97  49  36  25  79  67 119  14  47  21]
 [ 36  12   8   0  20  44   0 429  18  57]
 [ 16   4   7  53  52  14   0  26 375  72]
 [ 19   3  42  23  21  14  37  13  43 393]]
Naive Bayes Model Classification Report:
              precision    recall  f1-score   support

           0       0.36      0.34      0.35       574
           1       0.36      0.39      0.37       558
           2       0.33      0.20      0.25       574
           3       0.45      0.28      0.34       567
           4       0.28      0.37      0.32       549
           5       0.43      0.29      0.35       632
           6       0.42      0.21      0.28       554
           

The greatest accuracy is by Random forest of 65%, while the others are performing fairly badly. It seems reasonable to not utilize basic machine learning classifiers for tasks like image classification.
And the best method for feature extraction seems to be the raw pixel values, seemingly have the best representation of the spatial features of the images.