# Perceptron as Minimizing Loss

- ðŸ“º **Video:** [https://youtu.be/hhTkyP7EzGw](https://youtu.be/hhTkyP7EzGw)

## Overview
- Connect perceptron updates to subgradient descent on the hinge loss.
- Understand when perceptron converges and how margins relate to mistake bounds.

## Key ideas
- **Hinge loss:** upper-bounds the 0/1 loss and yields perceptron-like subgradients.
- **Mistake bound:** perceptron converges in finite steps on linearly separable data with a margin.
- **Update rule:** only mistaken examples trigger weight changes.
- **Regularization:** variants like averaged perceptron smooth out oscillations.

## Demo
Simulate hinge-loss subgradient descent and the classic perceptron on the same dataset to show they share updates, matching the lecture walkthrough (https://youtu.be/hOUX9xFIN90).

In [1]:
from sklearn.datasets import make_classification
import numpy as np
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=300, n_features=4, n_informative=4, n_redundant=0, class_sep=1.5, random_state=5)
y_signed = np.where(y == 1, 1, -1)
X = np.c_[np.ones(len(X)), X]

perceptron_w = np.zeros(X.shape[1])
hinge_w = np.zeros_like(perceptron_w)
eta = 0.1

for epoch in range(10):
    mistakes = 0
    for xi, yi in zip(X, y_signed):
        if yi * (perceptron_w @ xi) <= 0:
            perceptron_w += eta * yi * xi
            mistakes += 1
        margin = yi * (hinge_w @ xi)
        if margin < 1:
            hinge_w += eta * yi * xi
    preds = np.where((X @ perceptron_w) > 0, 1, -1)
    acc = accuracy_score(y_signed, preds)
    print(f"epoch {epoch+1:2d} | perceptron mistakes {mistakes:3d} | accuracy {acc:.3f}")

print()
print('Final hinge weights close to perceptron weights? diff norm =', np.linalg.norm(perceptron_w - hinge_w))


epoch  1 | perceptron mistakes  36 | accuracy 0.910
epoch  2 | perceptron mistakes  31 | accuracy 0.940
epoch  3 | perceptron mistakes  27 | accuracy 0.930
epoch  4 | perceptron mistakes  31 | accuracy 0.923
epoch  5 | perceptron mistakes  24 | accuracy 0.923
epoch  6 | perceptron mistakes  28 | accuracy 0.937
epoch  7 | perceptron mistakes  30 | accuracy 0.933
epoch  8 | perceptron mistakes  23 | accuracy 0.923
epoch  9 | perceptron mistakes  27 | accuracy 0.913
epoch 10 | perceptron mistakes  24 | accuracy 0.913

Final hinge weights close to perceptron weights? diff norm = 1.9347429944849273


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 2.0-2.5, 4.2-4.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and logistic regression](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Eisenstein 4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and LR connections](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Thumbs up? Sentiment Classification using Machine Learning Techniques](https://www.aclweb.org/anthology/W02-1011/)
- [Baselines and Bigrams: Simple, Good Sentiment and Topic Classification](https://www.aclweb.org/anthology/P12-2018/)
- [Convolutional Neural Networks for Sentence Classification](https://www.aclweb.org/anthology/D14-1181/)
- [[GitHub] NLP Progress on Sentiment Analysis](https://github.com/sebastianruder/NLP-progress/blob/master/english/sentiment_analysis.md)


*Links only; we do not redistribute slides or papers.*