# Multiclass Perceptron and Logistic Regression

- 📺 **Video:** [https://youtu.be/EA627DC7k6M](https://youtu.be/EA627DC7k6M)

## Overview
- See how the multiclass perceptron extends the binary mistake-driven update rule to linearly separable problems with more than two labels.
- Understand why multinomial logistic regression provides probabilistic scores and smoother updates that converge even on data that is not strictly separable.

## Key ideas
- **One-vs-all perceptron:** maintain a weight vector per class and update only the true and predicted classes on each mistake.
- **Multinomial logistic regression:** optimize a softmax cross-entropy loss with gradient-based updates, yielding calibrated probabilities.
- **Regularisation and convergence:** logistic regression benefits from L2 penalties and convexity, giving stable solutions even with noisy features.
- **Evaluation:** inspect accuracy and per-class precision/recall to see how the two models behave on overlapping class clusters.

## Demo
Train a multiclass perceptron and a multinomial logistic regression model on noisy blobs to compare accuracy and per-class precision/recall.
The lecture video (https://youtu.be/EA627DC7k6M) motivates these algorithms; this demo shows them in code.

In [1]:
from sklearn.datasets import make_blobs
from sklearn.linear_model import Perceptron, LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

X, y = make_blobs(n_samples=600, centers=3, cluster_std=2.2, random_state=7)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=7, stratify=y)

perceptron = Perceptron(max_iter=1000, tol=1e-3, random_state=7)
logistic = LogisticRegression(max_iter=1000, multi_class='multinomial', solver='lbfgs', random_state=7)

perceptron.fit(X_train, y_train)
logistic.fit(X_train, y_train)

y_pred_perc = perceptron.predict(X_test)
y_pred_log = logistic.predict(X_test)

print('Multiclass Perceptron')

print()
print(classification_report(y_test, y_pred_perc, digits=3))
print()
print('Multinomial Logistic Regression')
print()
print(classification_report(y_test, y_pred_log, digits=3))


Multiclass Perceptron

              precision    recall  f1-score   support

           0      0.769     1.000     0.870        70
           1      1.000     0.700     0.824        70
           2      1.000     1.000     1.000        70

    accuracy                          0.900       210
   macro avg      0.923     0.900     0.898       210
weighted avg      0.923     0.900     0.898       210


Multinomial Logistic Regression

              precision    recall  f1-score   support

           0      0.956     0.929     0.942        70
           1      0.931     0.957     0.944        70
           2      1.000     1.000     1.000        70

    accuracy                          0.962       210
   macro avg      0.962     0.962     0.962       210
weighted avg      0.962     0.962     0.962       210





## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*