<a href="https://colab.research.google.com/github/shivani-202/Deep-Learning-Assignment/blob/main/perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1) Design a single unit perceptron for classification of a linearly separable binary dataset without using pre-defined models.

2) Use the Perceptron from sklearn.
Identify the problem with single unit Perceptron.

* Classify using OR-, And- and XOR-ed data and analyze the result.

* Multiclass  classification task: Classify MNIST dataset  and analyze the result

In [None]:
import numpy as np

class Perceptron:
    def __init__(self, learning_rate=0.1, max_epochs=100):
        self.learning_rate = learning_rate
        self.max_epochs = max_epochs
        self.weights = None
        self.bias = None

    def step_function(self, x):
        return 1 if x >= 0 else 0

    def fit(self, X, y):
        # Initialize weights and bias
        self.weights = np.zeros(X.shape[1])
        self.bias = 0

        for epoch in range(self.max_epochs):
            for i in range(len(X)):
                # Compute linear combination
                linear_output = np.dot(X[i], self.weights) + self.bias

                # Applying step function
                y_pred = self.step_function(linear_output)

                # Calculating error
                error = y[i] - y_pred

                # Updating weights and bias
                self.weights += self.learning_rate * error * X[i]
                self.bias += self.learning_rate * error

    def predict(self, X):
        linear_output = np.dot(X, self.weights) + self.bias
        return np.array([self.step_function(x) for x in linear_output])

# Defining datasets
AND_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
AND_labels = np.array([0, 0, 0, 1])

OR_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
OR_labels = np.array([0, 1, 1, 1])

XOR_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
XOR_labels = np.array([0, 1, 1, 0])

# Training perceptron on AND dataset
perceptron = Perceptron(learning_rate=0.1, max_epochs=10)
print("Training on AND dataset:")
perceptron.fit(AND_data, AND_labels)
AND_predictions = perceptron.predict(AND_data)
print("Predictions:", AND_predictions)
print("Actual Labels:", AND_labels)

# Training perceptron on OR dataset
print("\nTraining on OR dataset:")
perceptron.fit(OR_data, OR_labels)
OR_predictions = perceptron.predict(OR_data)
print("Predictions:", OR_predictions)
print("Actual Labels:", OR_labels)

# Training perceptron on XOR dataset
print("\nTraining on XOR dataset:")
perceptron.fit(XOR_data, XOR_labels)
XOR_predictions = perceptron.predict(XOR_data)
print("Predictions:", XOR_predictions)
print("Actual Labels:", XOR_labels)

Training on AND dataset:
Predictions: [0 0 0 1]
Actual Labels: [0 0 0 1]

Training on OR dataset:
Predictions: [0 1 1 1]
Actual Labels: [0 1 1 1]

Training on XOR dataset:
Predictions: [1 1 0 0]
Actual Labels: [0 1 1 0]


It correctly classify AND and OR datasets (linearly separable).
It fails on the XOR dataset (non-linearly separable).

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score, classification_report


print("MNIST dataset...")
data = pd.read_csv('/content/sample_data/mnist_train_small.csv')

data_test = pd.read_csv('/content/sample_data/mnist_test.csv')

data.head(), data_test.head()

MNIST dataset...


(   6  0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  ...  0.581  0.582  0.583  \
 0  5  0    0    0    0    0    0    0    0    0  ...      0      0      0   
 1  7  0    0    0    0    0    0    0    0    0  ...      0      0      0   
 2  9  0    0    0    0    0    0    0    0    0  ...      0      0      0   
 3  5  0    0    0    0    0    0    0    0    0  ...      0      0      0   
 4  2  0    0    0    0    0    0    0    0    0  ...      0      0      0   
 
    0.584  0.585  0.586  0.587  0.588  0.589  0.590  
 0      0      0      0      0      0      0      0  
 1      0      0      0      0      0      0      0  
 2      0      0      0      0      0      0      0  
 3      0      0      0      0      0      0      0  
 4      0      0      0      0      0      0      0  
 
 [5 rows x 785 columns],
    7  0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  ...  0.658  0.659  0.660  \
 0  2  0    0    0    0    0    0    0    0    0  ...      0      0      0   
 1  1  0    0    0    0

In [4]:
y_train = data.iloc[:, 0].values  # Training labels
X_train = data.iloc[:, 1:].values  # Training features
y_test = data_test.iloc[:, 0].values  # Test labels
X_test = data_test.iloc[:, 1:].values  # Test features

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize Perceptron model
print("Training Perceptron...")
perceptron = Perceptron(max_iter=1000, eta0=1.0, random_state=42)
perceptron.fit(X_train, y_train)


y_pred = perceptron.predict(X_test)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print("\nAccuracy on MNIST test set:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Training Perceptron...

Accuracy on MNIST test set: 0.8772877287728773

Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.95      0.95       980
           1       0.92      0.98      0.95      1135
           2       0.90      0.86      0.88      1032
           3       0.84      0.90      0.87      1010
           4       0.85      0.90      0.88       982
           5       0.84      0.80      0.82       892
           6       0.92      0.90      0.91       958
           7       0.85      0.91      0.88      1027
           8       0.83      0.80      0.82       974
           9       0.87      0.74      0.80      1009

    accuracy                           0.88      9999
   macro avg       0.88      0.88      0.87      9999
weighted avg       0.88      0.88      0.88      9999



### **Summary of Analysis**
Accuracy: **87.7%**, with good performance on simple, linearly separable digits like '0', '1', and '6'. It struggled with '5', '8' and '9' due to structural similarities and confusion.

**Strengths**:
   - High precision and recall for clearly defined digits ('0', '1').
   - Weighted and macro averages of **0.88** indicates consistent performance across classes.

**Weaknesses**:
   - Struggled with non-linearly separable digits like '5', '8', and '9' due to the perceptron's inability to learn non-linear decision boundaries.
   - Lower recall for digits with overlapping features, e.g., '9' (recall: **0.74**) and '8'.

  The perceptron performed well for linearly separable data but is limited for complex datasets like MNIST.
  We can use a non-linear model such as a multi-layer perceptron (MLP) or a convolutional neural network (CNN).

In [5]:
# Multilayer Perceptron

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report


y_train = data.iloc[:, 0].values  # Training labels
X_train = data.iloc[:, 1:].values  # Training features
y_test = data_test.iloc[:, 0].values  # Test labels
X_test = data_test.iloc[:, 1:].values  # Test features

# Standardization
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initializing MLPClassifier
print("Training Multi-Layer Perceptron...")
mlp = MLPClassifier(hidden_layer_sizes=(128, 64),  # Two hidden layers with 128 and 64 neurons
                     activation='relu',           # ReLU activation function
                     solver='adam',               # Adam optimizer
                     max_iter=100,                # Maximum number of iterations
                     random_state=42)

mlp.fit(X_train, y_train)

y_pred = mlp.predict(X_test)

# Evaluating performance
accuracy = accuracy_score(y_test, y_pred)
print("\nAccuracy on MNIST test set:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Training Multi-Layer Perceptron...

Accuracy on MNIST test set: 0.9657965796579658

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.98      0.98       980
           1       0.98      0.99      0.99      1135
           2       0.96      0.96      0.96      1032
           3       0.96      0.97      0.96      1010
           4       0.97      0.96      0.96       982
           5       0.96      0.96      0.96       892
           6       0.97      0.97      0.97       958
           7       0.96      0.97      0.96      1027
           8       0.96      0.94      0.95       974
           9       0.96      0.96      0.96      1009

    accuracy                           0.97      9999
   macro avg       0.97      0.97      0.97      9999
weighted avg       0.97      0.97      0.97      9999



**Summary of MLP Analysis**
Performance: Achieved 96.6% accuracy, with consistently high precision, recall, and F1-scores (~0.97) across all classes.


*   Multilayer Perceptron effectively handles non-linear decision boundaries, resolving limitations of the single-layer perceptron. It performs exceptionally well for clear and distinct digits like '0', '1', and '6'.
*   Slight misclassification for visually similar digits like '8' and '9'.
*   Higher computational complexity compared to simpler models.

**Conclusion**: The MLP is a robust model for MNIST, offering significant improvements in accuracy and generalization. Further enhancements could involve deeper architectures like CNNs.

In [6]:
# Increasing number of layers in multilayer perceptron

y_train = data.iloc[:, 0].values
X_train = data.iloc[:, 1:].values
y_test = data_test.iloc[:, 0].values
X_test = data_test.iloc[:, 1:].values
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize MLPClassifier
print("Training Multi-Layer Perceptron...")
mlp = MLPClassifier(hidden_layer_sizes=(256, 128, 64, 32),  # Added more layers: 256 -> 128 -> 64 -> 32 neurons
                     activation='relu',                   # ReLU activation function
                     solver='adam',                       # Adam optimizer
                     max_iter=100,                        # Maximum number of iterations
                     random_state=42)

mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("\nAccuracy on MNIST test set:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Training Multi-Layer Perceptron...

Accuracy on MNIST test set: 0.9670967096709671

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.98      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.96      0.97      0.96      1032
           3       0.96      0.97      0.96      1010
           4       0.97      0.97      0.97       982
           5       0.97      0.96      0.96       892
           6       0.97      0.97      0.97       958
           7       0.96      0.96      0.96      1027
           8       0.96      0.95      0.95       974
           9       0.97      0.96      0.96      1009

    accuracy                           0.97      9999
   macro avg       0.97      0.97      0.97      9999
weighted avg       0.97      0.97      0.97      9999



**4-Layer vs 2-Layer MLP**

1. **Accuracy**:
   - **4-Layer MLP**: 96.7%
   - **2-Layer MLP**: 96.6%
   - The performance difference is minimal (0.1% improvement with 4 layers).

2. **Precision, Recall, F1-Score**:
   - Both models showed strong results (~0.96-0.97), with the 4-layer model slightly outperforming the 2-layer model in handling complex digits (e.g., '8' and '9').

3. **Computational Cost**:
   - The 4-layer MLP has slightly higher computational cost, with a small improvement in accuracy.

4. **Conclusion**:
   - The 4-layer MLP performs marginally better, but both architectures are highly effective for MNIST. Adding layers provides only small gains, and more advanced models (e.g., CNNs) may yield larger improvements.