# Sentiment Analysis

This notebook illustrates 3 different methods for Sentiment Analysis:

1. Rule-based with TextBlob  
2. Machine-learning with scikit-learn (Naive Bayes)  
3. Transformer-based with Hugging Face


In [1]:
# 1. Install dependencies (run once)
!pip install textblob scikit-learn transformers torch
!python -m textblob.download_corpora


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

## 1) Rule-based Sentiment with TextBlob

Uses a simple polarity/subjectivity lexicon approach.


In [2]:
from textblob import TextBlob

def sentiment_textblob(text: str):
    blob = TextBlob(text)
    return {
        "polarity": blob.sentiment.polarity,
        "subjectivity": blob.sentiment.subjectivity
    }

# Test
sample = "I love using Jupyter notebooks!"
print("TextBlob result:", sentiment_textblob(sample))


TextBlob result: {'polarity': 0.625, 'subjectivity': 0.6}


## 2) Machine-Learning Sentiment with scikit-learn

A tiny toy dataset + Naive Bayes pipeline trained on the fly.


In [3]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

# Toy training data
TRAIN_TEXTS = [
    "I love this product, it’s amazing!",
    "That was the worst experience ever.",
    "Absolutely fantastic service.",
    "I hate this. So bad.",
    "Very happy with the results!",
    "Terrible, I will never buy again."
]
TRAIN_LABELS = ["positive", "negative", "positive", "negative", "positive", "negative"]

# Train pipeline
ml_pipeline = make_pipeline(CountVectorizer(), MultinomialNB())
ml_pipeline.fit(TRAIN_TEXTS, TRAIN_LABELS)

def sentiment_naive_bayes(text: str):
    label = ml_pipeline.predict([text])[0]
    probs = dict(zip(ml_pipeline.classes_, ml_pipeline.predict_proba([text])[0]))
    return {"label": label, "probabilities": probs}

# Test
print("Naive Bayes result:", sentiment_naive_bayes(sample))


Naive Bayes result: {'label': np.str_('positive'), 'probabilities': {np.str_('negative'): np.float64(0.32231404958677673), np.str_('positive'): np.float64(0.6776859504132231)}}


## 3) Transformer-based Sentiment with Hugging Face

Uses `distilbert-base-uncased-finetuned-sst-2-english` under the hood.


In [4]:
from transformers import pipeline

# Load once
hf_pipeline = pipeline("sentiment-analysis")

def sentiment_transformer(text: str):
    return hf_pipeline(text)[0]

# Test
print("Transformer result:", sentiment_transformer(sample))


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


Transformer result: {'label': 'POSITIVE', 'score': 0.9187505841255188}


## Summary

| Method           | Output                                  | Pros                               | Cons                               |
|------------------|-----------------------------------------|------------------------------------|------------------------------------|
| TextBlob         | polarity & subjectivity                 | Very quick, no training            | Not very accurate                  |
| Naive Bayes      | label + class probabilities             | Lightweight, interpretable         | Needs labeled data                 |
| Transformer      | label + confidence score                | State-of-the-art accuracy          | Heavy compute, download size       |




### Next: Try different datasets and different Hugging Face model!