# **The Idea of Linear Classifier:**

### **1. What is a classifier?**

At the most basic level, a **`classifier`** is a function that:

> **Assigns an input to one of several categories (classes).**

**Examples:**   
   * Email → spam / not spam
   * Image → cat / dog
   * Sensor data → normal / abnormal

Mathematically:

> $f(x) \rightarrow \text{class label}$ 

### **2. What makes a classifier “linear”?**

A classifier is called **linear** if its decision is based on a **linear function of the input**.

That linear function is:

> $z = w^\top x + b$ 

Where:   
   * ($x \in \mathbb{R}^n$) is the input vector
   * ($w \in \mathbb{R}^n$) is the weight vector
   * ($b$) is a bias (threshold)

The classifier then applies a **decision rule**:

> $\text{class} =
\begin{cases}
+1 & \text{if } z \ge 0 \\
-1 & \text{if } z < 0
\end{cases}$ 

This is the **essence of a linear classifier**.

### **3. Geometric meaning (core intuition):**

The equation:

> $w^\top x + b = 0$ 

defines a **hyperplane** in input space.

* In 2D → a line
* In 3D → a plane
* In higher dimensions → a hyperplane

This hyperplane:

* **Separates space into two halves**
* Each half corresponds to a class

So a linear classifier:

> **Decides class membership by which side of a hyperplane a point lies on.**

### **4. Why linear classifiers matter historically:**

Before learning rules existed, people needed to answer:

> *What kind of decision can a single neuron make?*

The answer:   
   * A neuron computes a weighted sum
   * Applies a threshold
   * Outputs a class

That is *exactly* a linear classifier.

So historically:    
   * **MCP neuron = fixed linear classifier**
   * **Perceptron = learned linear classifier**

### **5. How Linear Classifiers fit into Our Learning Pipeline?**

Let’s place them precisely.

**MCP neuron:**

> $y = H(w^\top x - \theta)$ 

* Fixed weights
* Hard-coded decision boundary
* Linear classifier with no learning

**Hebb / Oja:**

* Learn weights from data
* No target labels
* Discover dominant directions
* Still linear projections

They shape the hyperplane but **not toward a classification goal**.

**Delta Rule:**

> $\Delta w = \eta (t - y)x$

* Learns weights using labeled data
* Minimizes squared error
* Learns the **best linear fit**

But output is still continuous.

**Perceptron:**

> $y = \text{sign}(w^\top x + b)$ 

* Converts linear output into class labels
* Updates only on misclassification
* Fully realizes the **linear classifier concept**

### **6. Linear Separability (critical concept):**

A dataset is **`linearly separable`** if:

> There exists at least one hyperplane that perfectly separates the classes.

Visually:   
   * OR → linearly separable
   * AND → linearly separable
   * XOR → **not linearly separable**

This explains:   
   * Why XOR failed for MCP
   * Why Hebb and Delta couldn’t fix it
   * Why multilayer networks were needed

### **7. Mathematical Perspective:**

Linear classifier defines:

> $f(x) = \text{sign}(w^\top x + b)$ 

Learning means: Find $(w, b)$ such that

> $y_i (w^\top x_i + b) > 0
\quad \forall i$

This is a **`geometric constraint problem`**.

### **8. Optimization view:**

Different learning rules correspond to different optimization goals:

| Rule       | Objective                      |
| ---------- | ------------------------------ |
| Hebb       | Correlation maximization       |
| Oja        | Variance maximization          |
| Delta      | Squared error minimization     |
| Perceptron | Misclassification minimization |

But all operate within the **same linear model**.

### **9. Why Linear Classifiers are Still Important Today?**

Even modern deep networks:   
   * End with a linear classifier
   * Use learned representations + linear separation

Examples:   
   * Softmax layer
   * Logistic regression head
   * SVMs

Linear classifiers are **not obsolete** — they are foundational.

### **10. Intuitive summary (mental model):**

Think of learning as:   
   * Rotating a ruler (hyperplane)
   * Sliding it with bias
   * Until it separates points as well as possible

That ruler is the **linear classifier**.

----

> **A linear classifier assigns classes by separating input space with a hyperplane defined by a weighted sum and bias, and all early neuron models and learning rules are fundamentally attempts to learn such a separating hyperplane.**