# **Introduction to AI - Lab 13**

## **Neural Networks and Perceptrons**

### Perceptron Training: Loss Function and Gradient Descent

The perceptron is trained using a simple loss function known as the perceptron criterion. This criterion seeks to minimize the number of misclassified samples. The loss for a single sample is defined as:

$
\text{Loss} = -y(x \cdot W)
$

if the sample is misclassified, where $( y )$ is the true label and $( x )$ is the input vector. If the sample is correctly classified, the loss is zero.

The training process involves updating the weights using gradient descent. The update rule for the weights is derived from minimizing the loss function:

$
W = W + \eta \cdot (y - \hat{y}) \cdot x
$

where:
- $( \eta )$ is the learning rate,
- $( y )$ is the true label,
- $( \hat{y} )$ is the predicted label,
- $( x )$ is the input vector.

The perceptron update rule adjusts the weights to reduce the classification error for the misclassified samples.


### Motivation
In this lab, we will explore the basics of neural networks, focusing on perceptrons. We will build and train a perceptron to perform binary classification on a simple dataset.

### Components of a Neural Network
- **Neurons:** Basic units of the network that perform computations.
- **Activation Functions:** Functions applied to the neuron's output to introduce non-linearity.
- **Layers:** Stacked neurons that form the architecture of the network.
- **Weights:** Parameters that are adjusted during training to minimize the error.
- **Biases:** Additional parameters that allow the activation function to be shifted left or right.

### Perceptron
A perceptron is the simplest type of neural network, consisting of a single layer of output nodes connected to the input nodes. It is a linear classifier used for binary classification tasks.
- **Activation Function:** The perceptron uses a step function as its activation function.

## **Task 1:** **Implementing a Perceptron**
In this task, we will implement a simple perceptron from scratch and use it to classify data points from the Sklearn Moons dataset.

In [None]:
import numpy as np
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

# Generate dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=42)

# Plot the dataset
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
plt.title('Moons Dataset')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

### Perceptron Model
We will define a perceptron model with a simple training algorithm.

In [None]:
class Perceptron:
    def __init__(self, input_size, lr=0.1, epochs=1000):
        self.W = np.zeros(input_size + 1)  # Initialize weights (including bias)
        self.lr = lr
        self.epochs = epochs

    def activation_fn(self, x):
        return 1 if x >= 0 else 0

    def predict(self, x):
        z = self.W.T.dot(np.insert(x, 0, 1))  # Insert bias term (1) into input
        a = self.activation_fn(z)
        return a

    def compute_loss(self, y, y_hat):
        # Compute the loss for a single sample
        return -y * (y_hat - y)  # Simple perceptron loss

    def update_weights(self, x, y, y_hat):
        # Update the weights using the perceptron update rule
        error = y - y_hat
        self.W += self.lr * error * np.insert(x, 0, 1)

    def train_epoch(self, X, y):
        # Train for one epoch
        for i in range(y.shape[0]):
            y_hat = self.predict(X[i])
            self.update_weights(X[i], y[i], y_hat)

    def fit(self, X, y):
        # Train the perceptron for the specified number of epochs
        for _ in range(self.epochs):
            self.train_epoch(X, y)

### Derivation of the Perceptron Training Rule

Starting with the perceptron criterion, the goal is to minimize the number of misclassified samples. The loss for a misclassified sample is:

$
\text{Loss} = -y(x \cdot W)
$

To minimize this loss, we adjust the weights using the gradient of the loss function with respect to the weights. The gradient is computed as:

$
\frac{\partial \text{Loss}}{\partial W} = -y \cdot x
$

The weight update rule derived from gradient descent is:

$
W = W - \eta \cdot \frac{\partial \text{Loss}}{\partial W}
$

Substituting the gradient, we get:

$
W = W + \eta \cdot y \cdot x
$

This leads to the final update rule used in the perceptron algorithm, where the update is applied only when a sample is misclassified:

$
W = W + \eta \cdot (y - \hat{y}) \cdot x
$


### Training the Perceptron
We will now train our perceptron on the Moons dataset.

In [None]:
# Initialize and train the perceptron
perceptron = Perceptron(input_size=2)
perceptron.fit(X, y)

# Predict and plot decision boundary
h = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = np.array([perceptron.predict(np.array([x1, x2])) for x1, x2 in zip(xx.ravel(), yy.ravel())])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
plt.title('Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

## **Conclusion**
In this lab, we implemented a simple perceptron and used it to classify data points from the Moons dataset. We visualized the decision boundary of our perceptron and observed how it separates the data points into two classes.