# Logistic Regression

- 📺 **Video:** [https://youtu.be/0naHFT07ja8](https://youtu.be/0naHFT07ja8)

## Overview
- Understand logistic regression as a probabilistic linear classifier optimized with convex loss.
- Learn how regularization and interpretability make it a strong baseline for NLP tasks.

## Key ideas
- **Sigmoid link:** maps linear scores to calibrated probabilities.
- **Cross-entropy loss:** convex objective enables efficient optimization.
- **Regularization:** L2 penalties control coefficient magnitude and prevent overfitting.
- **Interpretability:** weights reveal how features push decisions toward positive or negative classes.

## Demo
Fit logistic regression on a sentiment dataset and inspect probabilities to see how the model expresses confidence, following the lecture (https://youtu.be/QqBt22E7U4c).

In [1]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

texts = [
    'wonderful soundtrack and heartfelt acting',
    'unconvincing plot with clumsy twists',
    'sharp writing and charismatic leads',
    'slow pacing and lifeless direction',
    'the chemistry between leads sells it',
    'overly long and disappointingly bland'
]
labels = [1, 0, 1, 0, 1, 0]

model = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000, random_state=19))
model.fit(texts, labels)

proba = model.predict_proba(texts)
for doc, probs in zip(texts, proba):
    print(f"{probs[1]:.3f} -> {doc}")


0.729 -> wonderful soundtrack and heartfelt acting
0.225 -> unconvincing plot with clumsy twists
0.751 -> sharp writing and charismatic leads
0.250 -> slow pacing and lifeless direction
0.796 -> the chemistry between leads sells it
0.250 -> overly long and disappointingly bland


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 2.0-2.5, 4.2-4.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and logistic regression](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Eisenstein 4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and LR connections](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Thumbs up? Sentiment Classification using Machine Learning Techniques](https://www.aclweb.org/anthology/W02-1011/)
- [Baselines and Bigrams: Simple, Good Sentiment and Topic Classification](https://www.aclweb.org/anthology/P12-2018/)
- [Convolutional Neural Networks for Sentence Classification](https://www.aclweb.org/anthology/D14-1181/)
- [[GitHub] NLP Progress on Sentiment Analysis](https://github.com/sebastianruder/NLP-progress/blob/master/english/sentiment_analysis.md)


*Links only; we do not redistribute slides or papers.*