# Multiclass Classification Examples

- 📺 **Video:** [https://youtu.be/va2i7LXt9zI](https://youtu.be/va2i7LXt9zI)

## Overview
- Survey real-world multiclass NLP setups such as intent detection or topic tagging.
- Understand evaluation beyond accuracy, including per-class precision/recall.

## Key ideas
- **Label sets:** multiclass tasks assume one gold label drawn from several mutually exclusive options.
- **One-vs-rest vs. multinomial:** training strategies differ in how they share parameters across classes.
- **Class imbalance:** rare classes demand macro-averaged metrics.
- **Error inspection:** confusion matrices highlight where labels are confusable.

## Demo
Generate a synthetic 4-class dataset, train a multinomial logistic regression model, and inspect per-class metrics to connect to the lecture (https://youtu.be/JUdxV9C0VGA).

In [1]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd

X, y = make_classification(
    n_samples=500, n_features=6, n_classes=4, n_informative=6, n_redundant=0,
    class_sep=1.5, weights=[0.4, 0.3, 0.2, 0.1], random_state=4
)
model = LogisticRegression(max_iter=2000, multi_class='multinomial', solver='lbfgs')
model.fit(X, y)

preds = model.predict(X)
print(classification_report(y, preds, digits=3))
print()
print('Confusion matrix:')
print(pd.DataFrame(confusion_matrix(y, preds)))


              precision    recall  f1-score   support

           0      0.771     0.873     0.819       197
           1      0.728     0.776     0.752       152
           2      0.690     0.588     0.635       102
           3      0.679     0.388     0.494        49

    accuracy                          0.738       500
   macro avg      0.717     0.656     0.675       500
weighted avg      0.733     0.738     0.729       500


Confusion matrix:
     0    1   2   3
0  172   10  13   2
1   17  118  14   3
2   21   17  60   4
3   13   17   0  19




## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*