# Class notes
lecture 5 campus x dl playlist

### Lecture 5: Perceptron Trick - How to Train a Perceptron

This video, part of the 100 Days of Deep Learning course, focuses on how to **train the weights and biases of a Perceptron**. The goal is to develop mathematical intuition and then translate it into code, including an animation for step-by-step visualisation.

---

### 1. Recap & Problem Statement

*   The previous video covered what a Perceptron is, its difference from a neuron, and how it makes predictions.
*   **Perceptron Prediction**: A Perceptron model calculates `w0 + w1*x1 + w2*x2` (where `w0` is the bias or intercept, and `w1, w2` are weights for inputs `x1, x2`).
*   If this sum is **positive** (`>=0`), the prediction is typically `1` (e.g., placement occurs).
*   If the sum is **negative** (`<0`), the prediction is `0` (e.g., no placement).
*   The crucial missing piece was how to find the optimal values for these weights (`w0, w1, w2`) to achieve accurate predictions.

---

### 2. Linearly Separable Data

*   The Perceptron is suitable for **linearly separable data**, meaning you can draw a straight line (or a hyperplane in higher dimensions) to classify the data into two distinct classes.
*   The training process involves finding this optimal line that properly classifies both labels.

---

### 3. Identifying Positive and Negative Regions

*   For a given line equation (e.g., `2x + 3y + 5 = 0`), how do you determine which side is the "positive region" and which is the "negative region"?
*   You evaluate the expression `2x + 3y + 5`.
    *   If `2x + 3y + 5 > 0`, that region is considered the **positive region**.
    *   If `2x + 3y + 5 < 0`, that region is the **negative region**.

---

### 4. Line Transformations based on Coefficients

Changes in the coefficients of the line equation (`w0 + w1*x1 + w2*x2 = 0`) result in different types of line transformations:

*   **Changing `w0` (the intercept/bias 'C')**:
    *   **Increases or decreases `w0`**: The line moves **parallel** to its original position, either upwards or downwards. For `2x + 3y + C = 0`, increasing `C` moves the line downwards, while decreasing `C` moves it upwards.
*   **Changing `w1` (coefficient of `x`) or `w2` (coefficient of `y`)**:
    *   These changes cause the line to **rotate** around a point.
    *   For example, changing the coefficient of `y` in `2x + y + 5 = 0` results in rotation.
*   The Perceptron trick uses a combination of these transformations to move the line.

---

### 5. The Perceptron Trick (Intuition for Weight Update)

The core idea is to move the decision boundary (the line) towards a **misclassified point**.

*   **Adding '1' to Coordinates**: To apply transformations, the Perceptron concept suggests adding a `1` as an additional coordinate (e.g., for a point `(x, y)`, it becomes `(x, y, 1)`) and using a similar representation for line coefficients. This allows the bias term (`w0`) to be treated similarly to other weights.
*   **Adjusting Coefficients**: When a point is misclassified, the line's coefficients (`w0, w1, w2`) are adjusted.
    *   If a **negative point** (actual output `0`) is in the **positive region** (predicted output `1`), you need to move the line such that this point falls into the negative region. The transformation involves **adding** the coordinates (multiplied by a learning rate) to the current weights.
    *   If a **positive point** (actual output `1`) is in the **negative region** (predicted output `0`), you need to move the line such that this point falls into the positive region. The transformation involves **subtracting** the coordinates (multiplied by a learning rate) from the current weights.
*   **Learning Rate (`alpha`)**: Instead of making huge transformations in one go, a **small number called the learning rate** (e.g., `0.01` or `0.001`) is used. All coordinate changes are multiplied by this learning rate to make small, incremental adjustments to the weights. This ensures smoother convergence.

---

### 6. Perceptron Training Algorithm

The training process involves an iterative loop to adjust weights.

**a. Model Representation:**
*   The model can be expressed as `w0*x0 + w1*x1 + w2*x2 = 0`, where `x0` is always `1` (to account for the bias `w0`).
*   This can be represented as a **dot product** of the input vector `X = [x0, x1, x2]` and the weight vector `W = [w0, w1, w2]`.
*   Prediction (`y_hat`) is `1` if `dot_product(X, W) >= 0`, and `0` if `dot_product(X, W) < 0`.

**b. Initial (Conceptual) Algorithm:**
1.  **Initialise weights** (`w0, w1, w2`) with random values.
2.  Loop for a fixed number of **epochs** (e.g., 1000 times).
3.  **Randomly select a data point** (student) from the training data.
4.  **Check for misclassification**:
    *   If a **negative point** (actual `0`) is classified as **positive** (model predicts `1`):
        *   Update `W_new = W_old + learning_rate * X_old`.
    *   If a **positive point** (actual `1`) is classified as **negative** (model predicts `0`):
        *   Update `W_new = W_old - learning_rate * X_old`.
    *   If correctly classified, no change to weights.

**c. Simplified Algorithm (for Implementation):**
To simplify the code, a single update rule can be used that incorporates both misclassification scenarios and handles correct classifications without explicit `if` conditions:

*   **Prediction (y_hat)**:
    *   `dot_product = w0 + w1*x1 + w2*x2` (using `x0=1` for `w0`).
    *   `y_hat = 1` if `dot_product >= 0`, else `y_hat = 0`.
*   **Weight Update Rule**:
    *   `W_new = W_old + learning_rate * (actual_y - predicted_y) * X`

    This single rule works as follows:
    *   **Correct Prediction (e.g., `actual_y = 1`, `predicted_y = 1` OR `actual_y = 0`, `predicted_y = 0`)**:
        *   `(actual_y - predicted_y)` will be `0`.
        *   `W_new = W_old + learning_rate * 0 * X = W_old`. **No change** in weights, which is correct.
    *   **False Negative (e.g., `actual_y = 1`, `predicted_y = 0`)**:
        *   `(actual_y - predicted_y)` will be `(1 - 0) = 1`.
        *   `W_new = W_old + learning_rate * 1 * X = W_old + learning_rate * X`. The weights are adjusted to make the output more positive.
    *   **False Positive (e.g., `actual_y = 0`, `predicted_y = 1`)**:
        *   `(actual_y - predicted_y)` will be `(0 - 1) = -1`.
        *   `W_new = W_old + learning_rate * (-1) * X = W_old - learning_rate * X`. The weights are adjusted to make the output more negative.

---

### 7. Code Implementation Details

The video demonstrates the implementation of the simplified Perceptron algorithm:

*   **`perceptron` Function**: Takes input data `X` and output `Y` (where `Y` contains `0`s and `1`s) and returns the learned weights and intercept.
*   **Initialisation**:
    *   A `weights` array is created, typically initialised with ones (e.g., ``). The first element is for the intercept (`w0`), the second for `w1`, and the third for `w2`.
    *   A `learning_rate` (e.g., `0.01`) and `epochs` (e.g., `1000`) are defined.
*   **Data Preparation**: For `X = [x1, x2]`, a column of ones (`x0=1`) is added to `X` to account for the intercept term, effectively making `X = [1, x1, x2]`.
*   **Training Loop**:
    *   Iterates for the specified number of `epochs`.
    *   In each iteration, a **random student index** is selected.
    *   The **dot product** of the selected student's `X` vector and the current `weights` vector is calculated to get the `output`.
    *   A **step function** converts this `output` into a binary `prediction` (`0` or `1`).
    *   The **weight update formula** `weights = weights + learning_rate * (Y[random_index] - prediction) * X[random_index]` is applied.
*   **Visualisation**: The code also visualises how the line moves with each update, showing it gradually converging to correctly classify the points.

In [1]:
class Perceptron:
    def __init__(self,lr,epochs):
        self.lr=lr
        self.epochs=epochs
        self.weights=None

    def activate(self,i):
        
        return 1 if i>0 else 0
        
    def fit(self,X,y):
        X=np.insert(X,0,1,axis=1)
        self.weights=np.ones(X.shape[1])
        
        for i in range(self.epochs):
            i=np.random.randint(0,len(X))
            y_hat=self.activate(X[i].T@self.weights)
            self.weights=self.weights+self.lr*X[i]*(y[i]-y_hat)
        return self.weights
        
    def predict(self,X):
        X=np.insert(X,0,1,axis=1)
        return [ self.activate(i)  for i in X @ self.weights ]