# Multiclass Classification

- 📺 **Video:** [https://youtu.be/My6GaGhqxdI](https://youtu.be/My6GaGhqxdI)

## Overview
- Extend binary linear models to multiclass problems using one-vs-rest, one-vs-one, and softmax formulations.
- Compare the modeling and computational trade-offs between these strategies.

## Key ideas
- **One-vs-rest:** fit K binary classifiers and choose the argmax score.
- **One-vs-one:** train pairwise classifiers when classes are numerous but data per pair is small.
- **Softmax regression:** a single model shares features and normalizes scores into probabilities.
- **Evaluation:** macro-averaged metrics emphasize performance on minority classes.

## Demo
Contrast one-vs-rest and multinomial logistic regression on the same dataset to show their output distributions, as discussed in the lecture (https://youtu.be/SbgP-3NBp58).

In [1]:
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report

X, y = make_blobs(n_samples=600, centers=4, cluster_std=2.3, random_state=9)

ovr = OneVsRestClassifier(LinearSVC(max_iter=2000, dual=False))
ovr.fit(X, y)
softmax = LogisticRegression(max_iter=2000, multi_class='multinomial', solver='lbfgs')
softmax.fit(X, y)

print('One-vs-rest LinearSVC')

print()
print(classification_report(y, ovr.predict(X), digits=3))
print()
print('Multinomial logistic regression')
print()
print(classification_report(y, softmax.predict(X), digits=3))


One-vs-rest LinearSVC

              precision    recall  f1-score   support

           0      0.854     0.933     0.892       150
           1      0.667     0.787     0.722       150
           2      0.740     0.740     0.740       150
           3      0.661     0.480     0.556       150

    accuracy                          0.735       600
   macro avg      0.730     0.735     0.727       600
weighted avg      0.730     0.735     0.727       600


Multinomial logistic regression

              precision    recall  f1-score   support

           0      0.907     0.913     0.910       150
           1      0.732     0.727     0.729       150
           2      0.789     0.800     0.795       150
           3      0.669     0.660     0.664       150

    accuracy                          0.775       600
   macro avg      0.774     0.775     0.775       600
weighted avg      0.774     0.775     0.775       600





## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*