# Course Preview

- 📺 **Video:** [https://youtu.be/Mz8-LTednt4](https://youtu.be/Mz8-LTednt4)

## Overview
- Get a tour of the workflow we will revisit: representing text, training linear models, and evaluating NLP systems.
- Preview how each module builds from bag-of-words baselines toward neural architectures and responsible deployment.

## Key ideas
- **Feature pipelines:** combine tokenization and vectorization to turn raw text into numerical features.
- **Linear classifiers:** logistic regression offers a strong baseline for many classification problems.
- **Evaluation loop:** hold-out validation provides early feedback on whether features generalize.
- **Iteration:** small experiments guide which modeling ideas deserve deeper investment later in the course.

## Demo
Train a bag-of-words logistic regression baseline on a toy sentiment dataset to illustrate the course pipeline end to end.
Use the lecture video (https://youtu.be/DGPJc93HJJo) to dive into what each stage covers in more depth.

In [1]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

texts = [
    'Loved the crisp dialogue and pacing',
    'The plot was boring and predictable',
    'Fantastic soundtrack with great acting',
    'Terrible effects and wooden performances',
    'A delightful, funny story',
    'I will never watch this again'
]
labels = [1, 0, 1, 0, 1, 0]

model = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000, random_state=2))
model.fit(texts, labels)

pred = model.predict(texts)
print(classification_report(labels, pred, digits=3))


              precision    recall  f1-score   support

           0      1.000     1.000     1.000         3
           1      1.000     1.000     1.000         3

    accuracy                          1.000         6
   macro avg      1.000     1.000     1.000         6
weighted avg      1.000     1.000     1.000         6



## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 2.0-2.5, 4.2-4.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and logistic regression](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Eisenstein 4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Perceptron and LR connections](https://www.cs.utexas.edu/~gdurrett/courses/online-course/perc-lr-connections.pdf)
- [Thumbs up? Sentiment Classification using Machine Learning Techniques](https://www.aclweb.org/anthology/W02-1011/)
- [Baselines and Bigrams: Simple, Good Sentiment and Topic Classification](https://www.aclweb.org/anthology/P12-2018/)
- [Convolutional Neural Networks for Sentence Classification](https://www.aclweb.org/anthology/D14-1181/)
- [[GitHub] NLP Progress on Sentiment Analysis](https://github.com/sebastianruder/NLP-progress/blob/master/english/sentiment_analysis.md)


*Links only; we do not redistribute slides or papers.*