# **Step 1 — Why Transformers Replaced RNNs** 🚀

---

## **1️⃣ The Problem with RNNs**

RNNs (and their advanced versions like LSTMs/GRUs) were **the main choice** for sequence tasks like text and speech before Transformers arrived.

They work **word-by-word**, passing information forward through hidden states.

🔹 **Limitations:**

1. **Slow training** → RNNs process text sequentially, so you can’t parallelize easily.
2. **Long-term memory loss** → Information from far-back words fades away (vanishing gradient problem).
3. **Difficult to capture long-range dependencies** → E.g., in "The book that I bought last week was amazing", the word "book" is far from "amazing".
4. **Harder to scale** → More data = much slower.

---

## **2️⃣ The Transformer Breakthrough**

In 2017, Google introduced **Transformers** in the paper *"Attention is All You Need"* — and it completely changed NLP.

🔹 **Key advantages over RNNs:**

1. **Parallel processing** → The whole sentence is processed at once.
2. **Self-Attention mechanism** → The model decides **which other words matter most** for each word.
3. **Better long-range understanding** → No matter how far apart words are, self-attention can link them.
4. **Scalable** → Works better with big datasets and GPUs.

---

## **3️⃣ How Self-Attention Changed Everything**

Let’s say our sentence is:

> "The cat sat on the mat."

When processing the word **"cat"**, the self-attention mechanism also looks at:

* **"The"** → to know it's a noun phrase.
* **"sat"** → to know what the cat is doing.
* **"mat"** → to understand location.

Instead of just passing memory step-by-step like an RNN, **every word can look at every other word directly**.

---

💡 **In short:**

* RNNs: Walk through the sentence **one step at a time** (like reading letter-by-letter).
* Transformers: Look at the **whole sentence at once** (like skimming the entire text instantly).

---
---


# **Step 2 — Key Transformer Ideas** 🧠✨

---

## **1️⃣ Self-Attention — The Core of Transformers**

Self-attention is the **reason Transformers work so well**.

It answers the question:

> "When processing this word, which other words in the sentence are important?"

Example sentence:

*"The cat sat on the mat"*

When looking at **"sat"**, the model might give:

* High attention to **"cat"** (who sat)
* Some attention to **"mat"** (where)
* Less attention to **"The"**

**How it works (simplified):**

* Each word is turned into 3 vectors: **Query (Q)**, **Key (K)**, and **Value (V)**.
* Attention score between words = `Q • K` (dot product).
* Higher score = more relevant.
* Multiply scores with **V** to get a weighted sum → this becomes the updated representation of the word.

💡 Benefit:

Every word **"sees"** the whole sentence at once — unlike RNNs, which can only pass info step-by-step.

---

## **2️⃣ Encoder-Decoder Architecture**

Transformers often have **two main parts**:

1. **Encoder**:

   * Reads the input sentence.
   * Uses multiple self-attention layers to create **contextual embeddings** (meaning each word knows about others).

2. **Decoder**:

   * Takes encoder output and **generates output sequence** (like in translation).
   * Uses **masked self-attention** so it doesn’t "peek" at future words while generating.

💡 **Example:**

* Input (English): *"I love apples"*
* Output (French): *"J'adore les pommes"*
* Encoder understands meaning, decoder generates translation step-by-step.

---

## **3️⃣ BERT vs GPT — Two Different Uses of Transformers**

**🔹 BERT (Bidirectional Encoder Representations from Transformers)**

* Uses **only the encoder** part of the Transformer.
* Reads **both left and right context** at the same time (bidirectional).
* Good for **understanding** tasks:

  * Sentiment analysis
  * Named entity recognition
  * Question answering

**🔹 GPT (Generative Pre-trained Transformer)**

* Uses **only the decoder** part.
* Reads **left-to-right only** (unidirectional).
* Good for **generation** tasks:

  * Text completion
  * Chatbots
  * Story writing

💡 Summary Table:

| Model    | Uses    | Direction     | Good for           |
| -------- | ------- | ------------- | ------------------ |
| **BERT** | Encoder | Bidirectional | Understanding text |
| **GPT**  | Decoder | Left-to-right | Generating text    |

---
---

# **Step 3 — Hugging Face Basics** 🧩🚀

---

## **1️⃣ What is Hugging Face Transformers library?**

* **Hugging Face** is a company + open-source community that maintains the **Transformers library**.
* This library provides **pre-trained state-of-the-art NLP models** (like BERT, GPT, RoBERTa, DistilBERT, etc.)
* You can **download & use these models in just 2 lines of code** — no need to train from scratch.

💡 Think of it as:

> "The app store for AI models" — you pick a model, download it, and start predicting.

---



## **2️⃣ Installing & Loading Pre-trained Models (2 lines)**

In [1]:
# Install Hugging Face Transformers library
!pip install transformers

# Import the pipeline function
from transformers import pipeline



Once installed, you can load a model like:


In [2]:
# Create a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


✅ This downloads a **pre-trained DistilBERT** model fine-tuned for sentiment analysis.

---

## **3️⃣ Pipelines in Hugging Face**

A **pipeline** = A ready-made shortcut for a specific NLP task.

Some common ones:

* `"sentiment-analysis"` → Positive / Negative classification
* `"text-classification"` → Assign labels to text
* `"translation"` → Translate text between languages
* `"summarization"` → Create a short summary
* `"question-answering"` → Answer a question given a context
* `"fill-mask"` → Predict missing word in a sentence
* `"text-generation"` → Generate new text (like GPT)


---

### **Example 1 — Sentiment Analysis**

In [3]:
# Import the pipeline function
from transformers import pipeline #The pipeline function is a shortcut tool in Hugging Face that lets you run powerful AI models with just one function (no need to manually load models, tokenizers, etc.).

# Create a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")
#classifier → A variable name where we store our ready-to-use model.
#pipeline(...) → Creates a pre-built model pipeline for a given task.
#"sentiment-analysis" → This tells Hugging Face:
#“Load a model that can read text and tell if it’s positive, negative, or neutral.”
#By default, this loads DistilBERT fine-tuned on SST-2 dataset.

# Run the pipeline on a sample text
result = classifier("I love learning deep learning with Hugging Face!")
#The text inside "..." → This is the input sentence for analysis.
#The model reads this sentence and decides:
#What the sentiment is (positive/negative/neutral)
#How confident it is (score between 0 and 1)

# Print the result
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998145699501038}]


---

### **Example 2 — Text Classification**


In [4]:
classifier = pipeline("text-classification")
result = classifier("The service was really bad and disappointing.")
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.9997557997703552}]


---
---

# **Step 4: Practice — Sentiment Analysis with Hugging Face.**

We’ll use the exact model:

``distilbert-base-uncased-finetuned-sst-2-english`` → a **DistilBERT model** fine-tuned on the **SST-2 sentiment dataset** (Stanford Sentiment Treebank).

In [5]:
# Import pipeline from transformers
from transformers import pipeline
#We’re importing the pipeline function from Hugging Face’s transformers library.
#This will let us quickly load pre-trained models for different tasks.

# Load pre-trained DistilBERT sentiment analysis model
sentiment = pipeline(
    "sentiment-analysis", # pipeline("sentiment-analysis") → Loads a model for sentiment classification.
    model="distilbert-base-uncased-finetuned-sst-2-english"
    )
#model=... → Explicitly tells Hugging Face which model to load:
#distilbert-base-uncased-finetuned-sst-2-english
#This is a lightweight BERT model (DistilBERT) trained for binary classification (POSITIVE vs NEGATIVE) on the SST-2 dataset.
#sentiment → A variable storing this ready-to-use model.

# Test sentences
texts = [
    "I absolutely loved this movie, it was funtastic!", #Example 1 → clearly positive.
    "The product was terrible and broke on the first day.", #Example 2 → clearly negative.
    "It was okay, not great but not bad either." #Example 3 → more neutral/mixed.
]

# Run sentiment analysis
results = sentiment(texts)
#Passes all sentences at once into the model.
#Hugging Face automatically tokenizes the text, runs it through DistilBERT, and returns predictions.

# Print results
for text, result in zip(texts, results):
  print(f"Text: {text}")
  print(f"Predicted Label: {result['label']} | Confidence: {result['score']:.4f}")
  print("------")

# zip(texts, results)
#zip() is a built-in Python function.
#It combines two lists element by element, making pairs.

#text, result
#This is called tuple unpacking.
#Each item that comes out of zip(texts, results) will be a pair:
#text → one sentence from texts list.
#result → the corresponding prediction dictionary from results.

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


Text: I absolutely loved this movie, it was funtastic!
Predicted Label: POSITIVE | Confidence: 0.9999
------
Text: The product was terrible and broke on the first day.
Predicted Label: NEGATIVE | Confidence: 0.9991
------
Text: It was okay, not great but not bad either.
Predicted Label: POSITIVE | Confidence: 0.9977
------


---
---
# **Step 5: Zero-Shot Classification**.
This is one of the most **mind-blowing** uses of Transformers because it lets you classify text into categories **without any training**.

---

# 🔹 **Zero-Shot Classification**

---

## 🧠 **What It Means**

Normally, to classify text (like "Is this electronics, clothing, or food?"), you need:

* A labeled dataset
* Training a model for those specific categories

But with **Zero-Shot Learning**, the model can:

* Understand natural language instructions
* Classify text into *new categories* you give it **on the fly** ✨

It’s like telling the model:
👉 *“Here’s a sentence. Decide if it’s about clothing, electronics, or food.”*
And the model just **knows** — thanks to its pre-training on huge text corpora.

---

In [6]:
from transformers import pipeline
#from transformers: Importing from the Hugging Face Transformers library.
#pipeline: A shortcut function that lets us load a full NLP model for a task in 1 line (like sentiment-analysis, text-generation, etc.).

# Load zero-shot classification pipeline
classifier = pipeline("zero-shot-classification") # pipeline(...): Creates a ready-to-use model pipeline.
# "zero-shot-classification": Task type — tells Hugging Face we want to classify text into any labels we provide, even if the model was never trained on them.

text = "This smartphone has an amazing battery life." # Our input sentence.

candidate_labels = ["clothing", "electronics", "food"] # candidate_labels: A list of possible categories you want. You decide these — they don’t need training data! Here, we test if the text is about clothing, electronics, or food.

result = classifier(text, candidate_labels)

print(result)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


{'sequence': 'This smartphone has an amazing battery life.', 'labels': ['electronics', 'food', 'clothing'], 'scores': [0.9962769150733948, 0.001884312485344708, 0.001838712370954454]}


Interpretation:

* `electronics` → 99.6% confidence ✅
* `food` → 0.18%
* `clothing` → 0.18%

So the model correctly classifies it as **electronics** 🔥

---
---

# 🔹 **Step 6 — Bonus: Save & Use Model Offline**

---

## 🧠 **Part A — Save the Model Locally**

In [8]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

from: Python keyword, tells the interpreter we’re importing specific items from a library.

transformers: The Hugging Face library for NLP models like BERT, GPT, DistilBERT, etc.

import pipeline: The pipeline function is Hugging Face’s shortcut that bundles model + tokenizer into one ready-to-use tool (e.g., for sentiment-analysis).

AutoTokenizer: A class that automatically loads the right tokenizer for any model you specify. (Tokenizer = breaks text into numbers the model understands).

AutoModelForSequenceClassification: A class that automatically loads a model (like DistilBERT) designed for classification tasks (sentiment, spam detection, etc.).

In [9]:
# Model Name
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

model_name: A variable that stores a string.

"distilbert-base-uncased-finetuned-sst-2-english": The exact name of the pretrained model on Hugging Face Hub:

-distilbert-base-uncased: DistilBERT model, lowercase-only vocabulary.

-finetuned-sst-2: Fine-tuned on SST-2 dataset (movie reviews labeled Positive/Negative).

-english: The language this model understands.

So this is the ID Hugging Face will use to fetch tokenizer + model.

In [10]:
# Load tokenizer + model (only once)
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer: Variable to hold the tokenizer object.

AutoTokenizer: Class we imported earlier.

.from_pretrained(model_name): Method that downloads tokenizer files (like vocab, merges) for that specific model.

from_pretrained: Means “grab this tokenizer by its model name from Hugging Face Hub (or local cache if already downloaded).”

Intuition 💡: Without the tokenizer, the model wouldn’t know how to split words → “loved” becomes [7564].

In [11]:
model = AutoModelForSequenceClassification.from_pretrained(model_name)

model: Variable to hold the actual deep learning model.

AutoModelForSequenceClassification: Loads a neural network fine-tuned for sequence classification.

.from_pretrained(model_name): Downloads the pretrained weights/config for the given model name.

Intuition 💡: This line brings in the brain that turns tokenized numbers into predictions (e.g., "POSITIVE" or "NEGATIVE").

In [12]:
#save locally
model.save_pretrained("./saved_model")

model.save_pretrained(...): Method that saves model weights + config files into the folder you specify.

"./saved_model": Path where to save the model.

./ → means “current working directory.”

saved_model → folder name to create or overwrite.

Intuition 💡: Inside this folder you’ll get files like pytorch_model.bin (weights) + config.json (settings).

In [13]:
tokenizer.save_pretrained("./saved_model")

('./saved_model/tokenizer_config.json',
 './saved_model/special_tokens_map.json',
 './saved_model/vocab.txt',
 './saved_model/added_tokens.json',
 './saved_model/tokenizer.json')

tokenizer.save_pretrained(...): Similar to model saving, but for tokenizer.

"./saved_model": Saves tokenizer files into the same folder as model.

Intuition 💡: You’ll see tokenizer.json, vocab.txt, etc. — these tell the model how to interpret words.

---

## **🧠 Part B — Load Model Offline**

In [14]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

transformers: Hugging Face library.

pipeline: Quick helper that combines model + tokenizer into a ready-to-use function (like "sentiment-analysis").

AutoTokenizer: Automatically finds the correct tokenizer for your saved model.

AutoModelForSequenceClassification: Automatically loads a classification model (like DistilBERT sentiment model).

In [15]:
# Load Model/Tokenizer from saved folder
tokenizer = AutoTokenizer.from_pretrained("./saved_model")

tokenizer: Variable to hold the tokenizer object.

AutoTokenizer: Class for loading tokenizers.

.from_pretrained("./saved_model"): Instead of downloading from internet, this time it loads from the local folder path (./saved_model).

Intuition 💡: It finds vocab + merges + tokenizer rules saved earlier

In [17]:
model = AutoModelForSequenceClassification.from_pretrained("./saved_model")

model: Variable to hold the neural network.

AutoModelForSequenceClassification: Class to load classification models.

.from_pretrained("./saved_model"): Loads weights + config directly from the local folder.

Intuition 💡: Instead of fetching from Hugging Face Hub, it just opens files like pytorch_model.bin and config.json.

In [19]:
# Create Pipeline
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

Device set to use cpu


classifier: Variable name chosen for the ready-made tool.

pipeline(...): Hugging Face function to make inference easy.

"sentiment-analysis": Tells the pipeline what task we want (sentiment classification).

model=model: Explicitly give it our locally loaded model.

tokenizer=tokenizer: Explicitly give it our locally loaded tokenizer.

Intuition 💡: Now classifier knows how to take raw text → tokenize → pass to model → return result.

In [20]:
# Test it
print(classifier("I love this!"))

#lassifier("I love this!"):
#"I love this!" is the input text.
#classifier automatically tokenizes it, runs it through the model, and returns a prediction.

[{'label': 'POSITIVE', 'score': 0.9998764991760254}]


label: The predicted class.

score: Confidence probability (close to 1.0 = very sure).

## **🧠 Part C — Save Predictions to CSV**

In [22]:
import pandas as pd

In [24]:
# Example Texts
texts = [
    "The food was delicious and fresh.",
    "The phone stopped working after a week.",
    "These shoes are really comfortable!"
]

# So here we have 3 reviews: about food, a phone, and shoes.

In [26]:
# Run Predictions
results = classifier(texts)
print(results)

[{'label': 'POSITIVE', 'score': 0.9998831748962402}, {'label': 'NEGATIVE', 'score': 0.9994798302650452}, {'label': 'POSITIVE', 'score': 0.9998409748077393}]


In [27]:
# Convert to DataFrame
df = pd.DataFrame(results)

In [29]:
df["text"] = texts # # Add original texts for clarity
#df["text"]: Creates a new column named "text" inside the DataFrame.
#= texts: Assigns our original input sentences to this new column.
#Intuition 💡: Now each prediction row will also show the original sentence

print(df)

      label     score                                     text
0  POSITIVE  0.999883        The food was delicious and fresh.
1  NEGATIVE  0.999480  The phone stopped working after a week.
2  POSITIVE  0.999841      These shoes are really comfortable!


In [30]:
# Save to csv
df.to_csv("sentiment_results.csv", index = False)
#df.to_csv(...): Pandas function to save the DataFrame as a CSV file.
#"sentiment_results.csv": File name of the saved CSV.
#index=False: Prevents pandas from saving row numbers (0, 1, 2). Keeps the file clean.

---
---