```{contents}
```

## Cost Function 

### 1. **Hard-margin SVM (ideal separable case)**

* Goal: maximize the margin

$$
\max_{w,b} \frac{2}{\|w\|}
$$

* Equivalent to minimizing

$$
\min_{w,b} \frac{1}{2}\|w\|^2
$$

* Constraints:

  $$
  y_i (w^T x_i + b) \geq 1, \quad \forall i
  $$

  This means each correctly classified point lies outside or on the margin.

---

### 2. **Soft-margin SVM (real-world case, with overlap)**

Real data are noisy and not perfectly separable. So we add **slack variables** $\xi_i \geq 0$:

$$
y_i (w^T x_i + b) \geq 1 - \xi_i
$$

* If $\xi_i = 0$: correctly classified, outside the margin.
* If $0 < \xi_i < 1$: correctly classified, but inside the margin.
* If $\xi_i > 1$: misclassified.

---

### 3. **Cost function for soft margin**

$$
\min_{w,b,\xi} \frac{1}{2}\|w\|^2 + C \sum_{i=1}^n \xi_i
$$

* First term: keeps the margin large.
* Second term: penalizes violations (misclassified or margin-crossing points).
* $C$: hyperparameter that controls tradeoff:

  * Large $C$: less tolerance for violations → narrower margin.
  * Small $C$: more tolerance → wider margin, better generalization.

---

### 4. **Hinge loss interpretation**

The penalty for each point is given by hinge loss:

$$
L_i = \max(0, 1 - y_i (w^T x_i + b))
$$

* If correctly classified with margin ≥ 1 → loss = 0.
* If close to boundary or misclassified → positive loss.

So the full objective becomes:

$$
\min_{w,b} \frac{1}{2}\|w\|^2 + C \sum_{i=1}^n \max(0, 1 - y_i (w^T x_i + b))
$$

---

### 5. **Summary**

* Hard margin → perfect separation, no errors.
* Soft margin → allows errors with penalty, controlled by $C$.
* Slack variables $\xi_i$ measure violations.
* Hinge loss is the error function for SVC.

