#My Observation
The Naive Bayes classifier, rooted in Bayes' theorem and the assumption of feature independence, exhibits computational efficiency and robust performance across various domains, notably in text classification, spam filtering, and recommendation systems. However, its simplicity and reliance on strong independence assumptions might limit its performance in capturing intricate relationships within complex datasets.

In contrast, the Binary Bonsai Tree (BB-tree) algorithm employs a hierarchical, tree-based approach, effectively navigating structured data by recursively partitioning features. Its unique structure allows for decision-making through a series of classifiers, offering a distinctive perspective on classification tasks. While advantageous in structured data scenarios, its performance might vary based on the depth of the tree and feature representation.

Nearest Neighbors (NN), functioning as a lazy learning algorithm, proves beneficial in scenarios where data distribution is unclear or fluctuates, showcasing adaptability and simplicity. Its reliance on similarity metrics, however, can lead to increased computational requirements with larger datasets, affecting scalability.

Gradient Boosting, an ensemble learning technique, iteratively builds a robust model by combining weak learners, often decision trees, to minimize errors sequentially. Its strength lies in high predictive accuracy and resilience against overfitting, making it an ideal choice for predictive modeling tasks across diverse domains. Yet, its iterative nature might demand more computational resources and meticulous hyperparameter tuning.

Lastly, the Voting Classifier amalgamates multiple individual classifiers, leveraging ensemble techniques to improve predictive performance by aggregating diverse model predictions. This method significantly reduces the risk of overfitting and enhances model robustness, especially when employed with diverse classifiers, yet its performance heavily relies on the diversity and quality of its constituents.

Overall, the selection among these algorithms hinges on the specific characteristics of the dataset, the desired trade-offs between accuracy and computational efficiency, and the complexity of relationships within the data. Understanding the nuances and performance profiles of each algorithm is crucial for informed selection tailored to the unique requirements of a given classification task.

In [None]:
!pip install scikit-learn




#Naive Bayes
Naive Bayes: This probabilistic classifier, grounded in Bayes' theorem, assumes independence among features within the dataset. It operates efficiently, making it a go-to choice for tasks like text classification, spam filtering, and recommendation systems. Despite its simplifying assumptions, Naive Bayes often performs surprisingly well and is computationally efficient.

In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
# Load the digits dataset
digits = load_digits()

# Split the data into features and labels
X = digits.data
y = digits.target

# Use X, y as features and labels for classification
# You can adapt the code from the previous example to work with these features and labels
# Load the digits dataset (or any other image dataset)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=42)

# Initialize the Naive Bayes classifier
nb_classifier = MultinomialNB()

# Train the classifier using the training data
nb_classifier.fit(X_train, y_train)

# Make predictions on the test data
predicted = nb_classifier.predict(X_test)

# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")

# Classification report
print(metrics.classification_report(y_test, predicted))

# Confusion matrix
print("Confusion Matrix:")
print(metrics.confusion_matrix(y_test, predicted))

Accuracy: 0.9111111111111111
              precision    recall  f1-score   support

           0       1.00      0.97      0.98        33
           1       0.87      0.71      0.78        28
           2       0.86      0.94      0.90        33
           3       1.00      0.88      0.94        34
           4       0.94      1.00      0.97        46
           5       0.97      0.83      0.90        47
           6       0.97      0.97      0.97        35
           7       0.92      1.00      0.96        34
           8       0.82      0.93      0.87        30
           9       0.77      0.85      0.81        40

    accuracy                           0.91       360
   macro avg       0.91      0.91      0.91       360
weighted avg       0.92      0.91      0.91       360

Confusion Matrix:
[[32  0  0  0  1  0  0  0  0  0]
 [ 0 20  4  0  0  0  0  0  2  2]
 [ 0  1 31  0  0  0  0  0  1  0]
 [ 0  0  1 30  0  0  0  0  2  1]
 [ 0  0  0  0 46  0  0  0  0  0]
 [ 0  0  0  0  0 39  1  0  0 

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

# Load the digits dataset (or any other image dataset)
digits = datasets.load_digits()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=42)

# Initialize the Naive Bayes classifier
nb_classifier = MultinomialNB()

# Train the classifier using the training data
nb_classifier.fit(X_train, y_train)

# Make predictions on the test data
predicted = nb_classifier.predict(X_test)

# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")

# Classification report
print(metrics.classification_report(y_test, predicted))

# Confusion matrix
print("Confusion Matrix:")
print(metrics.confusion_matrix(y_test, predicted))


Accuracy: 0.9111111111111111
              precision    recall  f1-score   support

           0       1.00      0.97      0.98        33
           1       0.87      0.71      0.78        28
           2       0.86      0.94      0.90        33
           3       1.00      0.88      0.94        34
           4       0.94      1.00      0.97        46
           5       0.97      0.83      0.90        47
           6       0.97      0.97      0.97        35
           7       0.92      1.00      0.96        34
           8       0.82      0.93      0.87        30
           9       0.77      0.85      0.81        40

    accuracy                           0.91       360
   macro avg       0.91      0.91      0.91       360
weighted avg       0.92      0.91      0.91       360

Confusion Matrix:
[[32  0  0  0  1  0  0  0  0  0]
 [ 0 20  4  0  0  0  0  0  2  2]
 [ 0  1 31  0  0  0  0  0  1  0]
 [ 0  0  1 30  0  0  0  0  2  1]
 [ 0  0  0  0 46  0  0  0  0  0]
 [ 0  0  0  0  0 39  1  0  0 

In [None]:
from sklearn.datasets import fetch_olivetti_faces
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

# Load the Olivetti Faces dataset
faces = fetch_olivetti_faces()

# Split the data into features and labels
X = faces.data
y = faces.target

# Use X, y as features and labels for classification
# You can adapt the code from the previous example to work with these features and labels
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(faces.data, faces.target, test_size=0.2, random_state=42)

# Initialize the Naive Bayes classifier
nb_classifier = MultinomialNB()

# Train the classifier using the training data
nb_classifier.fit(X_train, y_train)

# Make predictions on the test data
predicted = nb_classifier.predict(X_test)

# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")

# Classification report
print(metrics.classification_report(y_test, predicted))

# Confusion matrix
print("Confusion Matrix:")
print(metrics.confusion_matrix(y_test, predicted))


downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /root/scikit_learn_data
Accuracy: 0.8625
              precision    recall  f1-score   support

           0       1.00      0.33      0.50         3
           1       1.00      1.00      1.00         1
           2       1.00      0.50      0.67         2
           3       1.00      0.75      0.86         4
           4       1.00      0.67      0.80         3
           5       1.00      1.00      1.00         3
           7       0.83      0.83      0.83         6
           8       1.00      1.00      1.00         2
           9       1.00      1.00      1.00         2
          10       1.00      1.00      1.00         2
          11       0.67      0.67      0.67         3
          12       1.00      1.00      1.00         2
          13       1.00      1.00      1.00         1
          14       1.00      1.00      1.00         3
          15       1.00      1.00      1.00         2
          17

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
from sklearn.datasets import fetch_20newsgroups_vectorized

# Load the 20 Newsgroups dataset (vectorized version)
newsgroups = fetch_20newsgroups_vectorized()

# Split the data into features and labels
X = newsgroups.data
y = newsgroups.target

# Use X, y as features and labels for classification
# You may need to adapt the classifiers and preprocessing steps as the 20 Newsgroups dataset is not an image dataset
X_train, X_test, y_train, y_test = train_test_split(newsgroups.data, newsgroups.target, test_size=0.2, random_state=42)

# Initialize the Naive Bayes classifier
nb_classifier = MultinomialNB()

# Train the classifier using the training data
nb_classifier.fit(X_train, y_train)

# Make predictions on the test data
predicted = nb_classifier.predict(X_test)

# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")

# Classification report
print(metrics.classification_report(y_test, predicted))

# Confusion matrix
print("Confusion Matrix:")
print(metrics.confusion_matrix(y_test, predicted))


Accuracy: 0.7445868316394167
              precision    recall  f1-score   support

           0       0.88      0.38      0.53        93
           1       0.82      0.62      0.71       118
           2       0.89      0.66      0.76       128
           3       0.62      0.77      0.69       120
           4       0.72      0.82      0.77       102
           5       0.88      0.73      0.80       124
           6       0.88      0.66      0.76       112
           7       0.65      0.95      0.77       112
           8       0.91      0.88      0.90       118
           9       0.97      0.93      0.95       125
          10       0.95      0.94      0.94       117
          11       0.52      0.97      0.68       120
          12       0.92      0.50      0.65       138
          13       0.87      0.90      0.88       118
          14       0.90      0.87      0.88       122
          15       0.38      0.98      0.55       120
          16       0.81      0.84      0.83       10

In [None]:
from sklearn.datasets import fetch_lfw_people

# Load the LFW dataset
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# Split the data into features and labels
X = lfw_people.data
y = lfw_people.target

# Use X, y as features and labels for classification
# Adjust the classifiers and preprocessing steps accordingly for this dataset
X_train, X_test, y_train, y_test = train_test_split(lfw_people.data, lfw_people.target, test_size=0.2, random_state=42)

# Initialize the Naive Bayes classifier
nb_classifier = MultinomialNB()

# Train the classifier using the training data
nb_classifier.fit(X_train, y_train)

# Make predictions on the test data
predicted = nb_classifier.predict(X_test)

# Calculate accuracy
accuracy = metrics.accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")

# Classification report
print(metrics.classification_report(y_test, predicted))

# Confusion matrix
print("Confusion Matrix:")
print(metrics.confusion_matrix(y_test, predicted))

Accuracy: 0.5697674418604651
              precision    recall  f1-score   support

           0       0.45      0.45      0.45        11
           1       0.57      0.49      0.53        47
           2       0.50      0.50      0.50        22
           3       0.71      0.70      0.70       119
           4       0.37      0.53      0.43        19
           5       0.67      0.46      0.55        13
           6       0.28      0.33      0.31        27

    accuracy                           0.57       258
   macro avg       0.51      0.49      0.50       258
weighted avg       0.58      0.57      0.57       258

Confusion Matrix:
[[ 5  1  2  2  0  0  1]
 [ 1 23  3 19  0  0  1]
 [ 2  6 11  2  1  0  0]
 [ 3  6  5 83  6  1 15]
 [ 0  0  0  3 10  2  4]
 [ 0  1  0  2  2  6  2]
 [ 0  3  1  6  8  0  9]]


#boasting binary tree image classification
Binary Bonsai Tree (BB-tree): BB-tree employs a binary tree structure, recursively partitioning data based on distinct features. This method constructs a tree of classifiers, utilizing the principles of decision trees. Its utility lies in its ability to navigate structured data and make decisions hierarchically, offering a unique approach to classification tasks.

In [None]:
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# BB-tree node structure
class BBTreeNode:
    def __init__(self, features, classifiers=None):
        self.features = features
        self.classifiers = classifiers if classifiers else []
        self.left = None
        self.right = None

# Function to train BB-tree
def train_bb_tree(data, labels, max_depth):
    if max_depth <= 0 or len(set(labels)) == 1:
        return BBTreeNode(None, classifiers=[DecisionTreeClassifier()])

    selected_features = select_features(data)
    left_data, right_data, left_labels, right_labels = split_data(data, labels, selected_features)

    left_node = train_bb_tree(left_data, left_labels, max_depth - 1)
    right_node = train_bb_tree(right_data, right_labels, max_depth - 1)

    node = BBTreeNode(selected_features)
    node.left = left_node
    node.right = right_node

    if left_node.classifiers:
        left_node.classifiers[0].fit(left_data, left_labels)
    if right_node.classifiers:
        right_node.classifiers[0].fit(right_data, right_labels)

    return node

# Function for prediction
def predict(node, images):
    predictions = []
    if node.classifiers:
        for image in images:
            predictions.append(node.classifiers[0].predict([image]))  # Assuming one classifier for simplicity
        return predictions

    for image in images:
        if some_condition(image, node.features):
            predictions.append(predict(node.left, [image])[0])
        else:
            predictions.append(predict(node.right, [image])[0])
    return predictions


def select_features(data):
    return np.random.choice(data.shape[1], 10)

def split_data(data, labels, selected_features):
    split_point = data.shape[0] // 2
    left_data, right_data = data[:split_point], data[split_point:]
    left_labels, right_labels = labels[:split_point], labels[split_point:]
    return left_data, right_data, left_labels, right_labels

def some_condition(image, features):
    return np.mean(image[features]) > 0.5

# Usage example
max_depth = 5
bb_tree = train_bb_tree(train_images, train_labels, max_depth)

predicted_labels = predict(bb_tree, test_images)
print(f"Predicted labels: {predicted_labels}")



Predicted labels: [array([7], dtype=uint8), array([6], dtype=uint8), array([1], dtype=uint8), array([0], dtype=uint8), array([4], dtype=uint8), array([1], dtype=uint8), array([3], dtype=uint8), array([3], dtype=uint8), array([6], dtype=uint8), array([9], dtype=uint8), array([0], dtype=uint8), array([5], dtype=uint8), array([9], dtype=uint8), array([0], dtype=uint8), array([1], dtype=uint8), array([3], dtype=uint8), array([9], dtype=uint8), array([7], dtype=uint8), array([3], dtype=uint8), array([0], dtype=uint8), array([9], dtype=uint8), array([6], dtype=uint8), array([7], dtype=uint8), array([5], dtype=uint8), array([9], dtype=uint8), array([6], dtype=uint8), array([7], dtype=uint8), array([4], dtype=uint8), array([0], dtype=uint8), array([1], dtype=uint8), array([9], dtype=uint8), array([3], dtype=uint8), array([3], dtype=uint8), array([2], dtype=uint8), array([7], dtype=uint8), array([2], dtype=uint8), array([7], dtype=uint8), array([1], dtype=uint8), array([0], dtype=uint8), array(

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np

# BB-tree node structure
class BBTreeNode:
    def __init__(self, features, classifiers=None):
        self.features = features
        self.classifiers = classifiers if classifiers else []
        self.left = None
        self.right = None

# Function to train BB-tree
def train_bb_tree(data, labels, max_depth):
    if max_depth <= 0 or len(set(labels)) == 1:
        return BBTreeNode(None, classifiers=[DecisionTreeClassifier()])

    selected_features = select_features(data)
    left_data, right_data, left_labels, right_labels = split_data(data, labels, selected_features)

    left_node = train_bb_tree(left_data, left_labels, max_depth - 1)
    right_node = train_bb_tree(right_data, right_labels, max_depth - 1)

    node = BBTreeNode(selected_features)
    node.left = left_node
    node.right = right_node

    if left_node.classifiers:
        left_node.classifiers[0].fit(left_data, left_labels)
    if right_node.classifiers:
        right_node.classifiers[0].fit(right_data, right_labels)

    return node

# Function for prediction
def predict(node, images):
    predictions = []
    if node.classifiers:
        for image in images:
            predictions.append(node.classifiers[0].predict([image])[0])  # Assuming one classifier for simplicity
        return predictions

    for image in images:
        if some_condition(image, node.features):
            predictions.append(predict(node.left, [image])[0])
        else:
            predictions.append(predict(node.right, [image])[0])
    return predictions

# Dummy functions (to be replaced with actual implementations)
def select_features(data):
    return np.random.choice(data.shape[1], 10)

def split_data(data, labels, selected_features):
    split_point = data.shape[0] // 2
    left_data, right_data = data[:split_point], data[split_point:]
    left_labels, right_labels = labels[:split_point], labels[split_point:]
    return left_data, right_data, left_labels, right_labels

def some_condition(image, features):
    return np.mean(image[features]) > 0.5


# Train the BB-tree
max_depth = 5
bb_tree = train_bb_tree(train_images, train_labels, max_depth)

# Predict labels for the test images
predicted_labels = predict(bb_tree, test_images)

# Calculate accuracy
accuracy = accuracy_score(test_labels, predicted_labels)
print(f"Accuracy: {accuracy}")

# Generate confusion matrix
conf_matrix = confusion_matrix(test_labels, predicted_labels)
print("Confusion Matrix:")
print(conf_matrix)


Accuracy: 0.6966
Confusion Matrix:
[[837   1  14   8   7  19  26  32  22  14]
 [  1 900  56   5   4  44   5  55  22  43]
 [ 28 101 635  36  27  36  62  17  77  13]
 [ 16  20  17 749  31  54  23  17  29  54]
 [ 15  32  34  19 645  25  28  37  19 128]
 [ 25  18  16 116  41 486  45  26  48  71]
 [ 28  24  17  11  66  28 722  20  32  10]
 [ 24  23  36  25  42  18  41 743  20  56]
 [ 42  72  60  83  20  67  38   6 529  57]
 [ 19  21  35  34  78  34  14  39  15 720]]


#NNR
Nearest Neighbors (NN): As a lazy learning algorithm, NN stores all instances in the training set and classifies new instances based on their similarity to the nearest neighbors. NN is effective in scenarios where data distribution is unknown or frequently changes, offering simplicity and adaptability in classification tasks.

In [None]:
import tensorflow as tf

# Load your image dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values to range [0, 1]
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Expand dimensions to add a channel dimension (for grayscale images)
train_images = train_images[..., tf.newaxis]
test_images = test_images[..., tf.newaxis]

# Convert labels to one-hot encoding
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)

# Build a simple neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),  # Input layer, flattens input (28x28) for each image
    tf.keras.layers.Dense(128, activation='relu'),  # Hidden layer with 128 neurons and ReLU activation
    tf.keras.layers.Dense(10, activation='softmax')  # Output layer with 10 neurons (for 10 classes) and softmax activation
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=32, validation_data=(test_images, test_labels))

# Evaluate the model on the test data
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_accuracy}')


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy: 0.9765999913215637


In [None]:
import numpy as np
from sklearn.metrics import confusion_matrix

# Assuming 'model' is your trained neural network model
# Predict labels for test data
predicted_labels = model.predict(test_images)
# Convert predicted labels to class numbers
predicted_classes = np.argmax(predicted_labels, axis=1)
# Convert true labels to class numbers
true_classes = np.argmax(test_labels, axis=1)

# Generate confusion matrix
conf_matrix = confusion_matrix(true_classes, predicted_classes)
print("Confusion Matrix:")
print(conf_matrix)


Confusion Matrix:
[[ 968    0    0    3    0    2    1    3    1    2]
 [   0 1131    2    1    0    1    0    0    0    0]
 [   6    4  987    6    9    1    2    6   11    0]
 [   0    0    1 1001    0    2    0    3    2    1]
 [   1    0    0    0  970    0    2    2    0    7]
 [   2    0    0   15    1  868    1    1    3    1]
 [   4    5    0    1   16    6  923    0    3    0]
 [   0    7    5    9    0    0    0  999    3    5]
 [   4    1    1    8    8    7    0    4  937    4]
 [   1    5    0    5    6    3    0    6    1  982]]


In [None]:
from sklearn.metrics import classification_report

# Make predictions
predictions = model.predict(test_images)
predicted_labels = tf.argmax(predictions, axis=1)
true_labels = tf.argmax(test_labels, axis=1)

# Generate classification report
class_report = classification_report(true_labels, predicted_labels)
print(class_report)


              precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.98      1.00      0.99      1135
           2       0.99      0.96      0.97      1032
           3       0.95      0.99      0.97      1010
           4       0.96      0.99      0.97       982
           5       0.98      0.97      0.97       892
           6       0.99      0.96      0.98       958
           7       0.98      0.97      0.97      1028
           8       0.98      0.96      0.97       974
           9       0.98      0.97      0.98      1009

    accuracy                           0.98     10000
   macro avg       0.98      0.98      0.98     10000
weighted avg       0.98      0.98      0.98     10000



#XG Boost (Gradient boosting image classification)
Gradient Boosting: This ensemble learning technique builds a robust model by sequentially adding weak models (usually decision trees) and correcting errors made by prior models. Its iterative nature and focus on reducing errors make it a powerful tool in predictive modeling, particularly prized for its high accuracy and performance across various domains.

In [None]:
import xgboost as xgb
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split

# Load your image dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Reshape and preprocess the data if needed
train_images = train_images.reshape((train_images.shape[0], -1))  # Reshape if needed
test_images = test_images.reshape((test_images.shape[0], -1))  # Reshape if needed

# Normalize pixel values to range [0, 1]
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Initialize XGBoost classifier
model = xgb.XGBClassifier()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(train_images, train_labels, test_size=0.2, random_state=42)

# Train the model
model.fit(X_train, y_train)

# Predict on the test set
predictions = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")

# Generate confusion matrix
conf_matrix = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(conf_matrix)

# Generate classification report
class_report = classification_report(y_test, predictions)
print("Classification Report:")
print(class_report)


Accuracy: 0.977
Confusion Matrix:
[[1156    0    2    2    4    0    1    2    8    0]
 [   0 1307    8    4    1    0    1    0    1    0]
 [   0    4 1151    2    4    0    0    8    3    2]
 [   1    0   12 1172    1    8    0    6    9   10]
 [   2    0    1    1 1149    0    1    3    0   19]
 [   3    1    2    8    2 1077    2    0    7    2]
 [   3    0    0    1    2    6 1160    0    5    0]
 [   0    8   14    1    2    0    0 1263    3    8]
 [   1    0    7    4    4    4    2    1 1134    3]
 [   2    3    3    3   11    2    2   11    2 1155]]
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.98      0.99      1175
           1       0.99      0.99      0.99      1322
           2       0.96      0.98      0.97      1174
           3       0.98      0.96      0.97      1219
           4       0.97      0.98      0.98      1176
           5       0.98      0.98      0.98      1104
           6       0.99      0.99

#voting classification
Voting Classifier: By combining multiple individual classifiers, the voting classifier aggregates predictions using majority voting or weighted averages of predicted class labels. This technique enhances predictive performance by leveraging the diversity of multiple models, leading to reduced overfitting and improved robustness in predictions.

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load your image dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Reshape and preprocess the data if needed
train_images = train_images.reshape((train_images.shape[0], -1))  # Reshape if needed
test_images = test_images.reshape((test_images.shape[0], -1))  # Reshape if needed

# Normalize pixel values to range [0, 1]
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Initialize individual classifiers
log_clf = LogisticRegression()
tree_clf = DecisionTreeClassifier()
rf_clf = RandomForestClassifier()

# Create a voting classifier combining different classifiers
voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('tree', tree_clf), ('rf', rf_clf)],
    voting='hard'  # Use 'soft' for weighted voting based on class probabilities
)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(train_images, train_labels, test_size=0.2, random_state=42)

# Train the voting classifier
voting_clf.fit(X_train, y_train)

# Predict on the test set
predictions = voting_clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")

# Generate confusion matrix
conf_matrix = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(conf_matrix)

# Generate classification report
class_report = classification_report(y_test, predictions)
print("Classification Report:")
print(class_report)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Accuracy: 0.953
Confusion Matrix:
[[1158    0    5    1    3    1    0    0    6    1]
 [   1 1308    6    1    2    1    1    1    1    0]
 [   7   13 1120    3    4    3    2    8   10    4]
 [   6    5   27 1135    1   26    1    5    4    9]
 [   2    5    3    2 1130    0    7    3    2   22]
 [  15    7    8   26    8 1025    4    1    6    4]
 [  12    3    4    1    5   10 1139    0    3    0]
 [   0   17   17    6    9    1    1 1240    1    7]
 [  11   13   16   21   10   14    3    1 1068    3]
 [   5    8    8    9   18    8    1   18    6 1113]]
Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.99      0.97      1175
           1       0.95      0.99      0.97      1322
           2       0.92      0.95      0.94      1174
           3       0.94      0.93      0.94      1219
           4       0.95      0.96      0.96      1176
           5       0.94      0.93      0.93      1104
           6       0.98      0.97