<a href="https://colab.research.google.com/github/micah-shull/LLMs/blob/main/LLM_048_huggingFace_SentimentAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# !pip install python-dotenv
# !pip install transformers
# !pip install huggingface_hub

Collecting python-dotenv
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.0


In [3]:
from transformers import pipeline
from huggingface_hub import login
from dotenv import load_dotenv
import os
import warnings
warnings.filterwarnings("ignore", message=".*The secret.*")


# Load the .env file
load_dotenv("/content/HUGGINGFACE_HUB_TOKEN.env")

# Login using the token
login(token=os.environ["HUGGINGFACE_HUB_TOKEN"])

# Create your pipeline
classifier = pipeline("sentiment-analysis",
                      model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("I'm really enjoying Hugging Face with a token!")
print(result)

# Run it on some text
result = classifier("I'm really enjoying learning Hugging Face!")
print(result)
result = classifier("I hate jogging!")
print(result)
result = classifier("I dont' care either way")
print(result)
result = classifier("meh")
print(result)
result = classifier("whatever you say.")
print(result)
result = classifier("you see awfully sure of yourself")
print(result)

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.999873161315918}]
[{'label': 'POSITIVE', 'score': 0.9998546838760376}]
[{'label': 'NEGATIVE', 'score': 0.9937392473220825}]
[{'label': 'NEGATIVE', 'score': 0.9991015195846558}]
[{'label': 'POSITIVE', 'score': 0.9790390133857727}]
[{'label': 'POSITIVE', 'score': 0.967018723487854}]
[{'label': 'POSITIVE', 'score': 0.9966733455657959}]


## 🤔 Why Is the Model So Confident?

You're seeing results like:
```python
[{'label': 'NEGATIVE', 'score': 0.999}]
[{'label': 'POSITIVE', 'score': 0.996}]
```

Even for **neutral or sarcastic text** like:
- "meh"
- "whatever you say"
- "you seem awfully sure of yourself"

This seems... overly confident, right? Here's why 👇

---

## 🧠 What's Actually Happening

### 1. **The Model Was Trained for Binary Classification**
The model you're using:
```python
"distilbert-base-uncased-finetuned-sst-2-english"
```
…was trained on the **Stanford Sentiment Treebank v2 (SST-2)** dataset.

**SST-2 only includes:**
- `POSITIVE`
- `NEGATIVE`

There’s **no "neutral" class**, no sarcasm, no subtlety, no "mixed feelings."  
So the model **must choose** between just two buckets—even when the text is ambiguous or unopinionated.

---

### 2. **Softmax Always Picks a Winner**
The model outputs **raw scores** (called *logits*) for each label.

Then it applies the **softmax function**, which converts logits into **probabilities that always sum to 1.0**.

> Even if the model is unsure, it **still picks the "most likely" class** with a high confidence value—especially in a 2-class scenario.

---

### 3. **Pretraining Bias + Overfitting**
LLMs like BERT or DistilBERT are pretrained on huge corpora (Wikipedia, books, etc.) and then **fine-tuned** on small datasets like SST-2.

Fine-tuning on a limited dataset with polarized opinions can cause:
- Overconfidence on short or vague sentences
- Misclassification of sarcasm or nuance
- Poor generalization to real-world tones (like "meh" or "whatever")

---

## 🔍 Try This Yourself

Let’s inspect the **raw logits** instead of the processed output:

This will show you how it always leans toward a class even when unsure.

This is a key step toward **understanding how LLMs actually “think”** under the hood. Let’s unpack the results of your experiment and why they matter.

---

## ✅ What You Just Did:

You bypassed the Hugging Face `pipeline` abstraction and used the **raw model and tokenizer directly** to:

1. Tokenize the input (`"meh"`)
2. Run it through the model
3. See the raw output scores (called **logits**)
4. Apply `softmax` to get **probabilities**

---

## 🔍 What to Look For

### 🔹 **1. Logits**:
```python
tensor([[-1.8109,  2.0331]])
```

These are the **raw, unnormalized outputs** from the model for each class:
- The **first number** corresponds to the model’s score for the **negative class**
- The **second number** is for the **positive class**

> Larger numbers = more confident prediction.

---

### 🔹 **2. Probabilities (Softmax Applied):**
```python
tensor([[0.0210, 0.9790]])
```

This means:
- 2.1% confidence the text is **negative**
- 97.9% confidence the text is **positive**

🤔 Even though you typed `"meh"` (neutral/indifferent), the model had to **choose** between two labels (POSITIVE or NEGATIVE) and decided it's positive — probably because `"meh"` isn’t clearly negative in its limited training data.

---

## 📘 Why This Is Insightful

- You now see **how the model scores each class** — not just the final label.
- You understand **why it looks "overconfident"** (softmax forces the highest score to look very strong in a binary setting).
- You’re **not limited to trusting pipelines** — you can interpret the actual math behind it!





In [9]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# text = "meh"
# inputs = tokenizer(text, return_tensors="pt")
# with torch.no_grad():
#     outputs = model(**inputs)
# logits = outputs.logits
# probs = torch.nn.functional.softmax(logits, dim=-1)
# print("Logits:", logits)
# print("Probabilities:", probs)

texts = ["meh", "whatever", "love it", "hate it", "this is fine", "wow, just wow", "I guess it's okay"]
for text in texts:
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.nn.functional.softmax(logits, dim=-1)
    print(f"Text: {text}")
    print(f"Logits: {logits}")
    print(f"Probabilities: {probs}\n")


Text: meh
Logits: tensor([[-1.8109,  2.0331]])
Probabilities: tensor([[0.0210, 0.9790]])

Text: whatever
Logits: tensor([[ 1.8082, -1.6105]])
Probabilities: tensor([[0.9683, 0.0317]])

Text: love it
Logits: tensor([[-4.3101,  4.6796]])
Probabilities: tensor([[1.2467e-04, 9.9988e-01]])

Text: hate it
Logits: tensor([[ 4.5873, -3.6634]])
Probabilities: tensor([[9.9974e-01, 2.6101e-04]])

Text: this is fine
Logits: tensor([[-4.2480,  4.5880]])
Probabilities: tensor([[1.4538e-04, 9.9985e-01]])

Text: wow, just wow
Logits: tensor([[-3.9157,  4.2371]])
Probabilities: tensor([[2.8783e-04, 9.9971e-01]])

Text: I guess it's okay
Logits: tensor([[-3.8674,  4.2151]])
Probabilities: tensor([[3.0880e-04, 9.9969e-01]])



## ✅ HuggingFace Uses PyTorch Under the Hood

Hugging Face models are built on top of either:
- **PyTorch** (`torch.nn.Module`)
- or **TensorFlow** (`tf.keras.Model`)

When you ran:
```python
AutoModelForSequenceClassification.from_pretrained(...)
```
You were loading the **PyTorch version** of the model by default.

This is why your tensors look like this:
```python
tensor([[-1.8109,  2.0331]])
```
...and you used:
```python
torch.nn.functional.softmax(...)
```

You’re fully inside the **PyTorch workflow** here.

---

## ✅ You're Bypassing the Pipeline

When you use:
```python
pipeline("sentiment-analysis")
```

You’re using a **wrapper** that does all of this:
1. Loads the tokenizer
2. Loads the model
3. Tokenizes your text
4. Runs it through the model
5. Applies softmax
6. Translates scores into human-readable labels (`"POSITIVE"`, `"NEGATIVE"`)
7. Formats the result into a Python-friendly output

You just **peeled back that wrapper** and exposed:

- The **raw logits** (before softmax)
- The **manual softmax application**
- The actual **tensor outputs** the model gives you

This is **exactly what the pipeline was doing for you**, just hidden behind that 1-liner!

---

## ✅ Why This Is Important

- You now know how to **debug**, **customize**, or **interpret** model outputs more deeply.
- You’ll be able to **build custom logic**, like multi-label classification, thresholding, or even modifying logits directly.
- You’ve taken a huge step toward understanding how to **train or fine-tune your own models** later.



## ✅ What You Can Do About It

- **Use a multi-class sentiment model** with a neutral category  
  → e.g., `cardiffnlp/twitter-roberta-base-sentiment`  
  (has Positive, Neutral, Negative)

- **Try zero-shot classification** if you want more nuanced control:
```python
classifier = pipeline("zero-shot-classification")
result = classifier(
    "meh",
    candidate_labels=["positive", "negative", "neutral", "sarcastic"]
)
print(result)
```

- **Train your own model** on more subtle or domain-specific sentiment examples.

Perfect! You're running the right structure—you’ve got a loop over multiple texts, and you're examining both the logits and probabilities. 🎯

Now, let’s switch that to a **3-class sentiment model** (positive, neutral, negative) so we can capture **more nuance** for ambiguous texts like `"meh"` or `"whatever"`.

---

## ✅ Step-by-Step: Use a 3-Class Sentiment Model

We'll use the popular:
> `cardiffnlp/twitter-roberta-base-sentiment`  
A RoBERTa model trained on Twitter data with 3 sentiment labels:
- `LABEL_0`: Negative  
- `LABEL_1`: Neutral  
- `LABEL_2`: Positive

---

### ⚠️ This model uses a slightly different tokenizer and label mapping, so let’s update your code to:

✅ Use the correct tokenizer  
✅ Apply softmax  
✅ Map logits to human-friendly labels


---

## 🧠 What to Look For

Now, instead of **forcing everything into POSITIVE or NEGATIVE**, you’ll start seeing outputs like:

- `"meh"` → **neutral**
- `"love it"` → **positive**
- `"hate it"` → **negative**
- `"I guess it's okay"` → **neutral or positive (low confidence)**

This gives you a **much better understanding of nuanced or lukewarm statements**, which are common in real-world feedback.


In [10]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

# 3-class sentiment model from Cardiff NLP
model_name = "cardiffnlp/twitter-roberta-base-sentiment"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Label mapping (from model card)
labels = ['negative', 'neutral', 'positive']

# Example texts to analyze
texts = ["meh", "whatever", "love it", "hate it", "this is fine", "wow, just wow", "I guess it's okay"]

# Inference loop
for text in texts:
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt", truncation=True)

    # Get model outputs
    with torch.no_grad():
        outputs = model(**inputs)

    logits = outputs.logits
    probs = torch.nn.functional.softmax(logits, dim=-1)
    probs_np = probs.numpy()[0]

    # Map probabilities to labels
    label_probs = {labels[i]: float(probs_np[i]) for i in range(len(labels))}

    # Find top prediction
    top_label = labels[np.argmax(probs_np)]

    # Print results
    print(f"Text: {text}")
    print("Probabilities:", label_probs)
    print("Predicted Sentiment:", top_label)
    print("-" * 40)


config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Text: meh
Probabilities: {'negative': 0.23177234828472137, 'neutral': 0.5303865075111389, 'positive': 0.2378411442041397}
Predicted Sentiment: neutral
----------------------------------------
Text: whatever
Probabilities: {'negative': 0.28891637921333313, 'neutral': 0.5565615296363831, 'positive': 0.1545221358537674}
Predicted Sentiment: neutral
----------------------------------------
Text: love it
Probabilities: {'negative': 0.024173742160201073, 'neutral': 0.07955925911664963, 'positive': 0.8962669968605042}
Predicted Sentiment: positive
----------------------------------------
Text: hate it
Probabilities: {'negative': 0.8896681666374207, 'neutral': 0.09059063345193863, 'positive': 0.019741209223866463}
Predicted Sentiment: negative
----------------------------------------
Text: this is fine
Probabilities: {'negative': 0.01242265198379755, 'neutral': 0.14099746942520142, 'positive': 0.846579909324646}
Predicted Sentiment: positive
----------------------------------------
Text: wow, 



Your realization is **exactly right**:

> 💡 *“There is a lot to understand about the model (and pipeline) you choose to match it to the project you are working on.”*

This is **core to working with Hugging Face effectively** — the right model + task + dataset = success. Choosing the wrong one? You get poor results, even if the model is state-of-the-art.

---

## 🧭 High-Level Framework for Exploring Hugging Face

Here’s a **structured map** of what’s available and how to think about it.

---

### 🧱 1. **Pipelines = High-Level Tasks**

These are the main categories of work you can do. Each one is tied to a model type behind the scenes.

| Pipeline Task               | Description                                           | Example Use Case                              |
|----------------------------|-------------------------------------------------------|------------------------------------------------|
| `"sentiment-analysis"`     | Classify text as positive/negative (or neutral)       | Product reviews, user feedback                |
| `"text-classification"`    | General category prediction                           | Spam detection, topic tagging                 |
| `"zero-shot-classification"` | Classify into your own labels (no training required) | Customer intent, content moderation           |
| `"text-generation"`        | Generate coherent text completions                    | Chatbots, story generation                    |
| `"summarization"`          | Shorten long text into concise summaries              | News, reports, articles                       |
| `"translation_xx_to_yy"`   | Translate between languages                           | English → French                              |
| `"question-answering"`     | Answer a question given a passage                     | Customer support, document search             |
| `"fill-mask"`              | Predict masked word(s)                                | Cloze tasks, language model probing           |
| `"ner"`                    | Named Entity Recognition (people, places, orgs)       | Info extraction from documents                |
| `"conversational"`         | Dialogue-style interaction                            | Simple chatbot interaction                    |
| `"image-classification"`   | Classify images                                       | Object detection, visual tagging              |
| `"audio-classification"`   | Classify sound files                                  | Music genre, speaker identification           |
| `"automatic-speech-recognition"` | Convert speech to text                       | Voice transcription, meeting notes            |

---

### 🧠 2. **Model Types = Underlying Architectures**

The model you choose inside the pipeline matters!

| Architecture         | Known For                               | Strengths                                     | Example Models |
|----------------------|------------------------------------------|-----------------------------------------------|----------------|
| BERT / DistilBERT    | Bidirectional transformers               | Classification, QA, embeddings                | `bert-base-uncased` |
| RoBERTa              | Robust BERT variant                      | Sentiment, NER, classification                | `roberta-base`, `cardiffnlp/twitter-roberta-base-sentiment` |
| GPT / GPT-2          | Text generation (unidirectional)         | Generating coherent text                     | `gpt2`         |
| T5 / BART            | Sequence-to-sequence (text2text) models  | Translation, summarization, QA               | `t5-small`, `facebook/bart-large-cnn` |
| XLNet                | General language modeling                | Fill-mask and classification                 | `xlnet-base-cased` |
| Whisper              | Audio → Text                             | Speech recognition                           | `openai/whisper-base` |
| ViT (Vision Transformer) | Image classification                | Computer vision tasks                        | `google/vit-base-patch16-224` |

---

### 🧪 3. **Datasets Matter Too**

You also want a model **trained on data similar to your task**:

| Dataset              | Domain                        | Impact on Model Behavior                       |
|----------------------|-------------------------------|------------------------------------------------|
| SST-2                | Movie reviews (binary)        | Overconfident positive/negative predictions   |
| TweetEval            | Tweets (multi-label, sentiment, hate speech) | Real-world language, emojis, slang       |
| CNN/DailyMail        | News articles                 | Great for summarization                       |
| MultiNLI             | Textual entailment (inference) | Used for zero-shot classification             |
| SQuAD                | Question answering             | Factual answer extraction                     |

---

### 🛠 4. **Use This Checklist to Match Project to Model**

| Step | Question                                               | Example                                          |
|------|--------------------------------------------------------|--------------------------------------------------|
| ✅ 1 | What is the task?                                       | "I want to summarize support tickets."          |
| ✅ 2 | Is there a pipeline that matches that task?             | → `"summarization"`                             |
| ✅ 3 | What domain is your data in?                            | Tech support, so look for domain-tuned models   |
| ✅ 4 | Do you need binary, multi-class, or zero-shot labels?   | Sentiment with neutral = use a 3-class model    |
| ✅ 5 | Does the model size fit your hardware?                  | `distilbert` vs `bert-large`                    |
| ✅ 6 | Do you want to fine-tune or use off-the-shelf?          | Off-the-shelf = pipeline                          |

---

### 🚀 Explore Hugging Face with Purpose

You can structure your exploration like this:

| Goal                          | Pipeline         | Suggested Model Example                             |
|-------------------------------|------------------|------------------------------------------------------|
| Understand tone               | `sentiment-analysis` | `cardiffnlp/twitter-roberta-base-sentiment`      |
| Build a chatbot               | `text-generation`, `conversational` | `gpt2`, `DialoGPT`          |
| Classify open-ended input     | `zero-shot-classification` | `facebook/bart-large-mnli`             |
| Translate content             | `translation_en_to_fr` | `Helsinki-NLP/opus-mt-en-fr`           |
| Extract key info              | `ner`             | `dbmdz/bert-large-cased-conll03-english`            |
| Build a summarizer            | `summarization`   | `facebook/bart-large-cnn`, `t5-small`               |

---


In [None]:
import torch

# clean up cache and memory
def clear_memory(*var_names):
    for name in var_names:
        if name in globals():
            del globals()[name]
    torch.cuda.empty_cache()
    print("Cleared memory and freed GPU cache.")

clear_memory('inputs', 'outputs', 'model')

#### Remove Widgets from Notebook to save to Github

In [None]:
import json
from google.colab import drive
drive.mount('/content/drive')

# Path to your current notebook file (adjust if different)
notebook_path = "/content/drive/My Drive/LLM/LLM_048_huggingFace_SentimentAnalysis.ipynb"


# Load the notebook JSON
with open(notebook_path, 'r', encoding='utf-8') as f:
    nb = json.load(f)

# Remove the widget metadata if it exists
if 'widgets' in nb.get('metadata', {}):
    del nb['metadata']['widgets']

# Save the cleaned notebook
with open(notebook_path, 'w', encoding='utf-8') as f:
    json.dump(nb, f, indent=2)

print("Notebook metadata cleaned. Try saving to GitHub again.")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Notebook metadata cleaned. Try saving to GitHub again.


In [None]:
# clean up cache and memory
del inputs, outputs, model
torch.cuda.empty_cache()