# Sentiment Analysis

- 📺 **Video:** [https://youtu.be/cKbnEmjxnOY](https://youtu.be/cKbnEmjxnOY)

## Overview
- Frame sentiment analysis as supervised document classification.
- Explore how labeled corpora enable rapid prototyping of usable models.

## Key ideas
- **Label design:** binary, ternary, or fine-grained sentiment require consistent annotation guidelines.
- **Lexical cues:** words like 'excellent' or 'boring' provide strong signals even in simple models.
- **Evaluation:** precision/recall highlight class imbalance considerations.
- **Error analysis:** inspect misclassified reviews to inspire richer features or models.

## Demo
Train and evaluate a sentiment classifier on a small review dataset, mirroring the lecture's discussion (https://youtu.be/WEkRCbUwqcs) about iteration and evaluation.

In [1]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

texts = [
    'What a delightful surprise',
    'An absolute snooze fest',
    'Smart script and compelling cast',
    'Predictable drama with clichés',
    'Energetic, heartfelt storytelling',
    'Messy pacing and awkward humor',
    'A crowd-pleasing feel-good movie',
    'Fails to deliver on its premise'
]
labels = [1, 0, 1, 0, 1, 0, 1, 0]

X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.25, random_state=11, stratify=labels)
model = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000, random_state=11))
model.fit(X_train, y_train)

print(classification_report(y_test, model.predict(X_test), digits=3))


              precision    recall  f1-score   support

           0      0.000     0.000     0.000       1.0
           1      0.000     0.000     0.000       1.0

    accuracy                          0.000       2.0
   macro avg      0.000     0.000     0.000       2.0
weighted avg      0.000     0.000     0.000       2.0



## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 2.0-2.5, 4.2-4.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and logistic regression](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Eisenstein 4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and LR connections](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Thumbs up? Sentiment Classification using Machine Learning Techniques](https://www.aclweb.org/anthology/W02-1011/)
- [Baselines and Bigrams: Simple, Good Sentiment and Topic Classification](https://www.aclweb.org/anthology/P12-2018/)
- [Convolutional Neural Networks for Sentence Classification](https://www.aclweb.org/anthology/D14-1181/)
- [[GitHub] NLP Progress on Sentiment Analysis](https://github.com/sebastianruder/NLP-progress/blob/master/english/sentiment_analysis.md)


*Links only; we do not redistribute slides or papers.*