# Multiclass Classification Examples

- 📺 **Video:** [https://youtu.be/va2i7LXt9zI](https://youtu.be/va2i7LXt9zI)

## Overview
Provides case studies of multiclass classification in NLP, illustrating the variety of tasks that go beyond binary decisions. One likely example is Natural Language Inference (NLI), where the task is to label a pair of sentences as “entailment”, “neutral”, or “contradiction” - a three-way classification The video might describe the SNLI or MultiNLI dataset (Bowman et al., 2015) which introduced large-scale data for this 3-class problem, demonstrating how multiclass classifiers are used in practice for NLU tasks.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- Another example given is Authorship Attribution of tweets (Schwartz et al., 2013) which could involve classifying a tweet's author among many possible authors (a multiclass problem potentially with dozens of classes).
- By walking through these, the lecture emphasizes any new considerations that arise: for instance, with many classes, evaluation might look at confusion matrices (to see which classes are confused) and one has to worry about things like an “accuracy” that might be low simply due to many categories.
- The video might also mention class imbalance - e.g., if some authors have many more tweets than others, a classifier might be biased - and strategies to han that (resampling, class-weighting).
- Through these real examples, students appreciate that multiclass classification is central to many NLP tasks (topic classification, language identification, etc.) and see how the theory is applied.

## Demo

In [None]:
# Tiny logistic regression demo on synthetic data
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

X, y = make_classification(n_samples=800, n_features=10, random_state=0)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=0)
clf = LogisticRegression(max_iter=500).fit(Xtr, ytr)
print("Accuracy:", accuracy_score(yte, clf.predict(Xte)))


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*