# Exercise 2: Zero-Shot Learning

**Type:** Guided Exercise

Deep dive into Hugging Face ecosystem to leverage pre-trained models for zero-shot tasks and extract model embeddings.

**Learning Objectives:**
- Deep dive into Hugging Face ecosystem
- Leverage pre-trained models for zero-shot tasks
- Understand and extract model embeddings

## Setup and Installation

In [None]:
!pip install transformers torch datasets sentence-transformers scikit-learn matplotlib seaborn plotly pandas numpy -q

In [None]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModel
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.metrics.pairwise import cosine_similarity

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Part A: Zero-Shot Learning

Explore the power of pre-trained models on various tasks without fine-tuning.

## 1. Zero-Shot Classification

Classify text into categories without training on those specific categories.

In [None]:
# TODO: Load zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Test texts
texts = [
    "I love this new smartphone! The camera quality is amazing.",
    "The stock market crashed today, causing major losses.",
    "Scientists discovered a new species in the Amazon rainforest.",
    "The new restaurant downtown has excellent Italian cuisine."
]

# Define candidate labels
candidate_labels = ["technology", "finance", "science", "food", "sports"]

# TODO: Classify each text
for text in texts:
    result = classifier(text, candidate_labels)
    print(f"\nText: {text}")
    print(f"Predicted: {result['labels'][0]} (score: {result['scores'][0]:.3f})")

## 2. Zero-Shot Question Answering

Extract answers from context without specific fine-tuning.

In [None]:
# TODO: Load question answering pipeline
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

context = """
The transformer architecture was introduced in 2017 by Vaswani et al. in the paper 'Attention is All You Need'. 
It relies entirely on attention mechanisms and dispenses with recurrence and convolutions. 
The model achieved state-of-the-art results on machine translation tasks and has since become 
the foundation for models like BERT, GPT, and T5.
"""

questions = [
    "When was the transformer architecture introduced?",
    "Who introduced the transformer architecture?",
    "What does the transformer rely on?",
    "What models are based on transformers?"
]

# TODO: Answer questions
for question in questions:
    result = qa_pipeline(question=question, context=context)
    print(f"\nQ: {question}")
    print(f"A: {result['answer']} (score: {result['score']:.3f})")

## 3. Zero-Shot Sentiment Analysis

Analyze sentiment without training on sentiment labels.

In [None]:
# TODO: Use zero-shot classification for sentiment
reviews = [
    "This product exceeded my expectations! Highly recommend.",
    "Terrible quality, broke after one day. Waste of money.",
    "It's okay, nothing special but does the job.",
    "Absolutely love it! Best purchase I've made this year."
]

sentiment_labels = ["positive", "negative", "neutral"]

for review in reviews:
    result = classifier(review, sentiment_labels)
    print(f"\nReview: {review}")
    print(f"Sentiment: {result['labels'][0]} ({result['scores'][0]:.3f})")

## 4. Zero-Shot Translation

Translate text using pre-trained models.

In [None]:
# TODO: Load translation pipeline
translator = pipeline("translation_en_to_fr", model="t5-small")

texts_to_translate = [
    "Hello, how are you?",
    "Machine learning is fascinating.",
    "I love natural language processing."
]

for text in texts_to_translate:
    result = translator(text)
    print(f"\nEN: {text}")
    print(f"FR: {result[0]['translation_text']}")

## 5. Zero-Shot Summarization

Summarize text without specific training.

In [None]:
# TODO: Load summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

article = """
Artificial intelligence has made remarkable progress in recent years, particularly in natural language processing. 
Large language models like GPT, BERT, and their variants have achieved human-level performance on many tasks. 
These models are trained on massive amounts of text data and can perform various tasks without task-specific training. 
The transformer architecture, introduced in 2017, has been the key innovation enabling these advances. 
Researchers continue to push the boundaries, developing more efficient models and exploring new applications 
in fields ranging from healthcare to education.
"""

summary = summarizer(article, max_length=50, min_length=25, do_sample=False)
print("Original:")
print(article)
print("\nSummary:")
print(summary[0]['summary_text'])

## 6. Mathematical Reasoning

Test zero-shot mathematical reasoning capabilities.

In [None]:
# TODO: Use a generative model for math problems
# Note: This requires a more capable model like GPT-2 or larger
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def solve_math_problem(problem):
    prompt = f"Solve: {problem}\nAnswer:"
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=20)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

math_problems = [
    "15 + 27",
    "8 * 9",
    "100 - 35"
]

print("Testing mathematical reasoning (results may vary):")
for problem in math_problems:
    result = solve_math_problem(problem)
    print(f"\n{problem}")
    print(f"Model output: {result}")

# Part B: Embeddings

Learn to extract and work with contextual embeddings from transformers.

## 1. Loading Pre-trained Models and Extracting Embeddings

In [None]:
# TODO: Load BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model = model.to(device)
model.eval()

def get_embeddings(text):
    """Extract embeddings from BERT model."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Get embeddings from last hidden state
    embeddings = outputs.last_hidden_state
    return embeddings

# Test
test_text = "Natural language processing is fascinating."
embeddings = get_embeddings(test_text)
print(f"Text: {test_text}")
print(f"Embedding shape: {embeddings.shape}")
print(f"[batch_size, sequence_length, hidden_size]")

## 2. Token-Level vs Sentence-Level Embeddings

In [None]:
# TODO: Compare token-level and sentence-level embeddings
sentence = "Transformers are powerful models."

# Get token-level embeddings
inputs = tokenizer(sentence, return_tensors="pt").to(device)
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])

with torch.no_grad():
    outputs = model(**inputs)

token_embeddings = outputs.last_hidden_state[0]  # [seq_len, hidden_size]

# Sentence-level embedding (mean pooling)
sentence_embedding = token_embeddings.mean(dim=0)  # [hidden_size]

print(f"Sentence: {sentence}")
print(f"\nTokens: {tokens}")
print(f"Token embeddings shape: {token_embeddings.shape}")
print(f"Sentence embedding shape: {sentence_embedding.shape}")

# Visualize token embeddings
print("\nFirst few dimensions of each token:")
for i, token in enumerate(tokens):
    print(f"{token:15} {token_embeddings[i][:5].cpu().numpy()}")

## 3. Similarity Computation with Embeddings

In [None]:
# TODO: Compute similarity between sentences
def get_sentence_embedding(text):
    """Get sentence embedding using mean pooling."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Mean pooling
    embeddings = outputs.last_hidden_state.mean(dim=1)
    return embeddings.cpu().numpy()

sentences = [
    "I love machine learning.",
    "Machine learning is amazing.",
    "The weather is nice today.",
    "Natural language processing is a subfield of AI.",
    "It's sunny outside."
]

# Get embeddings for all sentences
embeddings_list = [get_sentence_embedding(s) for s in sentences]
embeddings_matrix = np.vstack(embeddings_list)

# Compute similarity matrix
similarity_matrix = cosine_similarity(embeddings_matrix)

# Visualize
plt.figure(figsize=(10, 8))
sns.heatmap(similarity_matrix, annot=True, fmt='.2f', 
            xticklabels=[f"S{i+1}" for i in range(len(sentences))],
            yticklabels=[f"S{i+1}" for i in range(len(sentences))],
            cmap='coolwarm', vmin=0, vmax=1)
plt.title('Sentence Similarity Matrix')
plt.tight_layout()
plt.show()

print("Sentences:")
for i, s in enumerate(sentences):
    print(f"S{i+1}: {s}")

## 4. Dimensionality Reduction and Visualization

In [None]:
# TODO: Visualize embeddings with PCA
# Create more sentences for better visualization
extended_sentences = [
    # Technology cluster
    "Machine learning models are powerful.",
    "Deep learning requires GPUs.",
    "Neural networks process data.",
    "AI is transforming industries.",
    # Weather cluster
    "The sun is shining brightly.",
    "It's raining heavily today.",
    "The weather forecast predicts snow.",
    "Temperature dropped significantly.",
    # Food cluster
    "I love Italian cuisine.",
    "Pizza is my favorite food.",
    "The restaurant serves delicious pasta.",
    "Fresh ingredients make better meals."
]

# Get embeddings
embeddings_ext = np.vstack([get_sentence_embedding(s) for s in extended_sentences])

# Apply PCA
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(embeddings_ext)

# Plot
plt.figure(figsize=(12, 8))
colors = ['red']*4 + ['blue']*4 + ['green']*4
labels = ['Tech']*4 + ['Weather']*4 + ['Food']*4

for i, (x, y) in enumerate(embeddings_2d):
    plt.scatter(x, y, c=colors[i], s=100, alpha=0.6, label=labels[i] if i % 4 == 0 else "")
    plt.annotate(f"S{i+1}", (x, y), fontsize=8, alpha=0.7)

plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%})')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%})')
plt.title('Sentence Embeddings Visualization (PCA)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Sentences by category:")
for i, s in enumerate(extended_sentences):
    print(f"S{i+1} ({labels[i]}): {s}")

In [None]:
# TODO: Visualize with t-SNE
tsne = TSNE(n_components=2, random_state=42, perplexity=5)
embeddings_tsne = tsne.fit_transform(embeddings_ext)

plt.figure(figsize=(12, 8))
for i, (x, y) in enumerate(embeddings_tsne):
    plt.scatter(x, y, c=colors[i], s=100, alpha=0.6, label=labels[i] if i % 4 == 0 else "")
    plt.annotate(f"S{i+1}", (x, y), fontsize=8, alpha=0.7)

plt.xlabel('t-SNE 1')
plt.ylabel('t-SNE 2')
plt.title('Sentence Embeddings Visualization (t-SNE)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 5. Attention Visualization Dashboard

Visualize attention weights from transformer models.

In [None]:
# TODO: Extract and visualize attention weights
def visualize_attention(text, layer=0, head=0):
    """Visualize attention weights for a specific layer and head."""
    inputs = tokenizer(text, return_tensors="pt").to(device)
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
    
    with torch.no_grad():
        outputs = model(**inputs, output_attentions=True)
    
    # Get attention weights for specified layer and head
    attention = outputs.attentions[layer][0, head].cpu().numpy()
    
    # Plot
    plt.figure(figsize=(10, 8))
    sns.heatmap(attention, xticklabels=tokens, yticklabels=tokens, 
                cmap='viridis', annot=False, square=True)
    plt.title(f'Attention Weights (Layer {layer}, Head {head})')
    plt.xlabel('Key')
    plt.ylabel('Query')
    plt.tight_layout()
    plt.show()

# Test with a sentence
test_sentence = "The transformer model revolutionized natural language processing."
visualize_attention(test_sentence, layer=0, head=0)

## 6. Comparative Study: Different Pre-trained Models

In [None]:
# TODO: Compare embeddings from different models
models_to_compare = [
    "bert-base-uncased",
    "distilbert-base-uncased",
    "roberta-base"
]

def compare_models(text, models):
    """Compare sentence embeddings from different models."""
    results = {}
    
    for model_name in models:
        tokenizer_temp = AutoTokenizer.from_pretrained(model_name)
        model_temp = AutoModel.from_pretrained(model_name).to(device)
        model_temp.eval()
        
        inputs = tokenizer_temp(text, return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = model_temp(**inputs)
        
        # Mean pooling
        embedding = outputs.last_hidden_state.mean(dim=1).cpu().numpy()
        results[model_name] = embedding
        
        print(f"{model_name}:")
        print(f"  Embedding shape: {embedding.shape}")
        print(f"  Mean: {embedding.mean():.4f}, Std: {embedding.std():.4f}")
    
    return results

test_text = "Machine learning is a subset of artificial intelligence."
print(f"Comparing models on: '{test_text}'\n")
comparison_results = compare_models(test_text, models_to_compare)

## Exercises and Exploration

1. **Zero-Shot Tasks:**
   - Try different models for classification
   - Experiment with custom label sets
   - Compare performance across tasks

2. **Embeddings:**
   - Extract embeddings from different layers
   - Compare CLS token vs mean pooling
   - Build a semantic search system

3. **Attention Analysis:**
   - Visualize all attention heads
   - Compare attention patterns across layers
   - Identify which heads focus on different linguistic features

## Reflection Questions

1. **What are the advantages of zero-shot learning compared to fine-tuning?**
   - YOUR ANSWER HERE

2. **How do contextual embeddings differ from traditional word embeddings (like Word2Vec)?**
   - YOUR ANSWER HERE

3. **What did you observe in the attention patterns? Which words attend to which?**
   - YOUR ANSWER HERE

4. **How do different models compare in terms of embedding quality?**
   - YOUR ANSWER HERE

5. **What are potential applications of sentence embeddings in real-world systems?**
   - YOUR ANSWER HERE

## Deliverables Checklist

- [ ] Implemented zero-shot classification
- [ ] Tested question answering and sentiment analysis
- [ ] Extracted and analyzed embeddings
- [ ] Created similarity visualizations
- [ ] Built attention visualization dashboard
- [ ] Compared different pre-trained models
- [ ] Answered all reflection questions