**Assignment 1: Perceptron**

*CPSC 381/581: Machine Learning*

*Yale University*

*Instructor: Alex Wong*

*Student: Hailey Robertson, hdr22*


**Prerequisites**:

1. Enable Google Colaboratory as an app on your Google Drive account

2. Create a new Google Colab notebook, this will also create a "Colab Notebooks" directory under "MyDrive" i.e.
```
/content/drive/MyDrive/Colab Notebooks
```

3. Create the following directory structure in your Google Drive
```
/content/drive/MyDrive/Colab Notebooks/CPSC 381-581: Machine Learning/Assignments
```

4. Move the 01_assignment_perceptron_multiclass.ipynb into
```
/content/drive/MyDrive/Colab Notebooks/CPSC 381-581: Machine Learning/Assignments
```
so that its absolute path is
```
/content/drive/MyDrive/Colab Notebooks/CPSC 381-581: Machine Learning/Assignments/01_assignment_perceptron.ipynb
```

In this assignment, you will implement both binary and multiclass perceptron classifiers from scratch.
You will test your implementations on the digits dataset from scikit-learn. The assignment is divided
into three main parts:

1. Implementing a binary perceptron for digit classification (0 vs 1)
2. Implementing a multiclass perceptron for full digits classification (0-9)
3. Comparing your implementations with scikit-learn's Perceptron


**Submission**:

1. Implement all TODOs in the code blocks below.

2. Report your validation and testing scores. For full credit, your testing scores should be higher than 0.9.

```
Report validation and testing scores here.
```

3. List any collaborators.

```
Collaborators: None.

Collaboration details: N/A.
```

In [15]:
import numpy as np
import sklearn.datasets as skdata
from sklearn.linear_model import Perceptron
import sklearn.metrics as skmetrics
from sklearn.model_selection import train_test_split
import warnings, time
import matplotlib.pyplot as plt

warnings.filterwarnings(action='ignore')
np.random.seed(42)

In [None]:
class BinaryPerceptron:
    '''
    Implementation of Binary Perceptron
    '''

    def __init__(self):
        self.__weights = None

    def __update(self, x, y):
        '''
        Update weights for misclassified examples

        Arg(s):
            x : numpy.ndarray
                Feature vector of shape d x 1
            y : int
                Label/target (-1 or 1)
        '''

        # DONE: Implement weight update rule for binary perceptron
        self.__weights += x * y


    def fit(self, x, y, max_iter=100):
        '''
        Fit the binary perceptron to training data

        Arg(s):
            x : numpy.ndarray
                Features of shape d x N
            y : numpy.ndarray
                Labels/targets of shape 1 x N
            max_iter : int
                Maximum number of iterations
        '''

        n_features, n_samples = x.shape

        # DONE: Initialize weights (including a bias term, w0) as zeros vector with shape d+1 x 1
        self.__weights = np.zeros((n_features + 1, 1))

        # DONE: Append artificial coordinate (x0) to the data
        # Use ones to shift decision boundary
        x = np.concatenate((np.ones((1, n_samples)), x), axis=0)

        # DONE: Implement training loop

        for _ in range(max_iter):
            n_updates = 0

            # Process each sample
            for n in range(n_samples):
                # DONE: Calculate prediction
                x_n = x[:, n:n+1]
                y_n = y[0, n]

                prediction = np.sign(np.matmul(self.__weights.T, x_n))

                if prediction == 0:
                    prediction = 1
                else:
                    prediction

                # DONE: Update weights if misclassified
                if prediction != y_n:
                    self.__update(x_n, y_n)
                    n_updates += 1

            # DONE: Break if no updates were made, e.g., check for convergence
            if n_updates == 0:
                break


    def predict(self, x):
        '''
        Make predictions

        Arg(s):
            x : numpy.ndarray
                Features of shape d x N

        Returns:
            numpy.ndarray : Predicted labels (-1 or 1) of 1 x N
        '''

        n_features, n_samples = x.shape

        # DONE: Append artificial coordinate (x0) to the data
        x = np.concatenate((np.ones((1, n_samples)), x), axis=0)

        # DONE: Implement prediction logic
        predictions = np.sign(np.matmul(self.__weights.T, x))
        if predictions == 0:
            predictions = 1
        else:
            predictions
            
        return predictions

    def score(self, x, y):
        '''
        Calculate prediction accuracy

        Arg(s):
            x : numpy.ndarray
                Features of shaped d x N
            y : numpy.ndarray
                Labels/targets of shape 1 x N

        Returns:
            float: Accuracy score
        '''

        # DONE: Implement accuracy calculation
        predictions = self.predict(x)
        accuracy = np.where(predictions == y, 1, 0)
        mean_accuracy = np.mean(accuracy)
        return mean_accuracy



In [22]:
class MulticlassPerceptron:
    '''
    Implementation of Multiclass Perceptron using one-vs-rest strategy
    '''

    def __init__(self):
        self.__weights = None
        self.__n_classes = None

    def __update(self, x, y, y_hat):
        '''
        Update weights for misclassified examples

        Arg(s):
            x : numpy.ndarray
                Feature vector of shape d x 1
            y : int
                Label/target (-1 or 1)
            y_hat : int
                Predicted label (-1 or 1)
        '''

        # DONE: Implement weight update rule for multiclass case
        self.__weights[:, y] += x.flatten()
        self.__weights[:, y_hat] -= x.flatten()


    def fit(self, x, y, max_iter=100):
        '''
        Fit the multiclass perceptron to training data

        Arg(s):
            x : numpy.ndarray)
                Feature vector of shape d x N
            y : numpy.ndarray
                Label/target (-1 or 1) of shape 1 x N
            max_iter : int
                Maximum number of iterations
        '''

        n_features, n_samples = x.shape

        # DONE: Get number of classes from unique values in y
        self.__n_classes = len(np.unique(y))

        # DONE: Initialize weights matrix of zeros with shape d+1 x C
        self.__weights = np.zeros((n_features + 1, self.__n_classes))

        # DONE: Append artificial coordinate (x0) to the data such that it is d+1 x N
        x = np.concatenate((np.ones((1, n_samples)), x), axis=0)

        # DONE: Implement training loop
        for _ in range(max_iter):
            n_updates = 0

            # Process each sample
            for n in range(n_samples):
                x_n = x[:, n:n+1]
                y_n = y[0, n]

                # DONE: Calculate scores and make prediction for each class
                scores = np.matmul(self.__weights.T, x_n).flatten()
                y_hat = np.argmax(scores)

                # Update if prediction is wrong
                if y_hat != y_n:
                    self.__update(x_n, y_n, y_hat)
                    n_updates += 1

            # DONE: Break if no updates were made, e.g., check for convergence
            if n_updates == 0:
                break


    def predict(self, x):
        '''
        Make predictions on new data

        Arg(s):
            x : numpy.ndarray
                Features of shape d x N

        Returns:
            numpy.ndarray : Predicted class labels
        '''

        n_features, n_samples = x.shape

        # DONE: Append artificial coordinate (x0) to the data
        x = np.concatenate((np.ones((1, n_samples)), x), axis=0)

        scores = np.matmul(self.__weights.T, x)
        predictions = np.argmax(scores, axis=0)

        # DONE: Implement prediction logic for multiclass case
        return predictions.reshape(1, -1)

    def score(self, x, y):
        '''
        Calculate prediction accuracy

        Arg(s):
            x : numpy.ndarray
                Features of shape d x N
            y : numpy.ndarray
                Label/target (-1 or 1) of shape 1 x N

        Returns:
            float : Accuracy score
        '''

        # DONE: Implement accuracy calculation
        predictions = self.predict(x)
        accuracy = np.where(predictions == y, 1, 0)
        mean_accuracy = np.mean(accuracy)
        return mean_accuracy

In [23]:
def prepare_binary_digits_data(digits_zero=0, digits_one=1):
    '''
    Prepare binary classification dataset from digits

    Args:
        digits_zero : int
            First digit to classify
        digits_one : int
            Second digit to classify

    Returns:
        tuple: (X_train, y_train, X_val, y_val, X_test, y_test)
            X_train : N x d
            y_train : N x 1
            X_val : M x d
            y_val : M x 1
            X_test : P x d
            y_test : P x 1
    '''

    # Load digits dataset using sklearn.datasets
    digits = skdata.load_digits()

    # Select only the two specified digits
    mask = np.isin(digits.target, [digits_zero, digits_one])
    X = digits.data[mask]
    y = digits.target[mask]

    # Convert labels to -1/1
    y = np.where(y == digits_zero, -1, 1)

    # Split into train (60%), validation (20%), and test (20%) sets using random_state=42
    X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42)

    return X_train, np.expand_dims(y_train, axis=-1), X_val, np.expand_dims(y_val, axis=-1), X_test, np.expand_dims(y_test, axis=-1)



In [24]:
def prepare_multiclass_digits_data():
    '''
    Prepare multiclass classification dataset from digits

    Returns:
        tuple: (X_train, y_train, X_val, y_val, X_test, y_test)
            X_train : N x d
            y_train : N x 1
            X_val : M x d
            y_val : M x 1
            X_test : P x d
            y_test : P x 1
    '''

    # Load digits dataset using sklearn.datasets
    digits = skdata.load_digits()
    X, y = digits.data, digits.target

    # Split into train (60%), validation (20%), and test (20%) sets with random_state=42
    X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42)

    return X_train, np.expand_dims(y_train, axis=-1), X_val, np.expand_dims(y_val, axis=-1), X_test, np.expand_dims(y_test, axis=-1)



In [25]:
# Binary classification experiment
print("Binary Classification Experiment (0 vs 1)")
print("-" * 50)

labels = [0, 1]

# Load and prepare binary data (0 vs 1)
X_train, y_train, X_val, y_val, X_test, y_test = prepare_binary_digits_data(0, 1)

# Try different max_iter values
max_iters = [10, 50, 100]
best_val_score = 0
best_model = None


for max_iter in max_iters:
    # DONE: Initialize and train binary perceptron
    model = BinaryPerceptron()
    model.fit(X_train, y_train, max_iter=max_iter)

    # DONE: Calculate validation score
    val_score = model.score(X_val, y_val)
    print("Max iterations: {}, Validation accuracy: {:.4f}".format(max_iter, val_score))

    # DONE: Update best_model if current model performs better
    if val_score > best_val_score:
        best_val_score = val_score
        best_model = model


# DONE: Test best model on test set
test_score = best_model.score(X_test, y_test)
print("\nBest model test accuracy: {:.4f}".format(test_score))

# DONE: Create a confusion matrix using skmetrics.confusion_matrix for your model on the test set
conf_matrix = skmetrics.confusion_matrix(y_test.flatten(), best_model.predict(X_test).flatten())

# Show confusion matrix
plt.show()
time.sleep(1)

# DONE: Compare with scikit-learn implementation by training with max_iter=10 and random_state=42 and testing on the test set
sk_model = Perceptron(max_iter=10, random_state=42)
sk_model.fit(X_train.T, y_train.flatten())

sk_score = skmetrics.accuracy_score(y_test.flatten(), sk_model.predict(X_test.T))
print("Scikit-learn Perceptron test accuracy: {:.4f}".format(sk_score))

# DONE: Create a confusion matrix using skmetrics.confusion_matrix for scikit model on the test set
sk_conf_matrix = skmetrics.confusion_matrix(y_test.flatten(), sk_model.predict(X_test.T))

# Show confusion matrix
plt.show()
time.sleep(1)


print("\nMulticlass Classification Experiment (0-9)")
print("-" * 50)

labels = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Load and prepare multiclass data
X_train, y_train, X_val, y_val, X_test, y_test = prepare_multiclass_digits_data()

# Try different max_iter values
max_iters = [10, 50, 100]
best_val_score = 0
best_model = None

for max_iter in max_iters:
    # DONE: Initialize and train multiclass perceptron
    model = MulticlassPerceptron(max_iter=max_iter, random_state=42)
    model.fit(X_train, y_train)

    # DONE: Calculate validation score
    val_score = skmetrics.accuracy_score(y_val, model.predict(X_val))
    print("Max iterations: {}, Validation accuracy: {:.4f}".format(max_iter, val_score))

    # DONE: Update best_model if current model performs better
    if val_score > best_val_score:
        best_val_score = val_score
        best_model = model


# DONE: Test best model on test set
test_score = skmetrics.accuracy_score(y_test, best_model.predict(X_test))
print("\nBest model test accuracy: {:.4f}".format(test_score))

# DONE: Create a confusion matrix using skmetrics.confusion_matrix for your model on the test set
conf_matrix = skmetrics.confusion_matrix(y_test, best_model.predict(X_test))


# Show confusion matrix
plt.show()
time.sleep(1)

# DONE: Compare with scikit-learn implementation by training with max_iter=10 and random_state=42 and testing on the test set
sk_model = Perceptron(max_iter=10, random_state=42)
sk_model.fit(X_train, y_train)

sk_score = skmetrics.accuracy_score(y_test, sk_model.predict(X_test))
print("Scikit-learn Perceptron test accuracy: {:.4f}".format(sk_score))

# TODO: Create a confusion matrix using skmetrics.confusion_matrix for your model on the test set
sk_conf_matrix = skmetrics.confusion_matrix(y_test, sk_model.predict(X_test))

# Show confusion matrix
plt.show()
time.sleep(1)

Binary Classification Experiment (0 vs 1)
--------------------------------------------------


ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 73 is different from 217)