# Tensor Operations for NLP

**Learning Objectives:**
- Understand how to represent text as tensors
- Master text preprocessing operations using PyTorch
- Learn about word embeddings and their tensor representations
- Create and manipulate embedding spaces
- Visualize text data transformations
- Build foundational skills for chatbot development

**Prerequisites:**
- PyTorch fundamentals (Notebook 01)
- Basic understanding of NLP concepts

---

## Introduction

Welcome to the Tensor Operations for NLP notebook! This tutorial will guide you through the essential concepts of representing and manipulating text data using PyTorch tensors, focusing on practical NLP applications that will be crucial for building our chatbot.

Text data is fundamentally different from numerical data - it's discrete, symbolic, and has complex relationships. To process text with neural networks, we need to convert it into numerical representations that tensors can handle effectively.

## 1. Imports and Setup

Let's start by importing the necessary libraries and setting up our environment.

In [None]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter, defaultdict
import re
import json
import os
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")

## 2. Text Representation Using Tensors

Before we can process text with neural networks, we need to convert text into numerical representations that tensors can handle. Let's explore different approaches to text representation.

### 2.1 Character-Level Representation

The simplest approach is to represent text at the character level, where each character is mapped to a unique integer.

In [None]:
# Character-level representation
print("=== Character-Level Text Representation ===")

# Sample text
text = "Hello, world! How are you?"
print(f"Original text: '{text}'")
print(f"Text length: {len(text)} characters")
print()

# Create character vocabulary
chars = sorted(list(set(text)))
vocab_size = len(chars)
print(f"Unique characters: {chars}")
print(f"Vocabulary size: {vocab_size}")
print()

# Create character-to-index and index-to-character mappings
char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for i, ch in enumerate(chars)}

print("Character to index mapping:")
for char, idx in char_to_idx.items():
    print(f"  '{char}' -> {idx}")
print()

# Convert text to tensor
char_indices = [char_to_idx[ch] for ch in text]
char_tensor = torch.tensor(char_indices, dtype=torch.long)

print(f"Character indices: {char_indices}")
print(f"Character tensor: {char_tensor}")
print(f"Tensor shape: {char_tensor.shape}")
print(f"Tensor dtype: {char_tensor.dtype}")

In [None]:
# One-hot encoding for characters
print("=== One-Hot Encoding for Characters ===")

# Create one-hot encoded representation
def char_to_onehot(char_indices, vocab_size):
    """Convert character indices to one-hot encoding"""
    one_hot = torch.zeros(len(char_indices), vocab_size)
    for i, idx in enumerate(char_indices):
        one_hot[i, idx] = 1
    return one_hot

# Alternative using PyTorch's built-in function
one_hot_tensor = F.one_hot(char_tensor, num_classes=vocab_size).float()

print(f"One-hot tensor shape: {one_hot_tensor.shape}")
print(f"First few characters as one-hot vectors:")
for i in range(min(5, len(text))):
    char = text[i]
    vector = one_hot_tensor[i]
    print(f"  '{char}' -> {vector.numpy()}")

# Visualize one-hot encoding
plt.figure(figsize=(12, 6))
plt.imshow(one_hot_tensor[:15].T, cmap='Blues', aspect='auto')
plt.title('One-Hot Encoding Visualization (First 15 Characters)')
plt.xlabel('Character Position')
plt.ylabel('Character Index')
plt.colorbar(label='Activation')

# Add character labels
char_labels = [text[i] for i in range(min(15, len(text)))]
plt.xticks(range(len(char_labels)), char_labels)
plt.yticks(range(vocab_size), chars)
plt.show()

### 2.2 Word-Level Representation

For most NLP tasks, word-level representation is more practical and meaningful than character-level representation.

In [None]:
# Word-level representation
print("=== Word-Level Text Representation ===")

# Sample sentences for our chatbot
sentences = [
    "Hello, how can I help you today?",
    "What is the weather like?",
    "Can you tell me a joke?",
    "How are you doing?",
    "What time is it?",
    "Thank you for your help!",
    "Goodbye, have a nice day!"
]

print("Sample sentences:")
for i, sentence in enumerate(sentences):
    print(f"  {i+1}. {sentence}")
print()

# Simple tokenization function
def simple_tokenize(text):
    """Simple tokenization: lowercase and split by spaces, remove punctuation"""
    # Convert to lowercase and remove punctuation
    text = re.sub(r'[^\w\s]', '', text.lower())
    return text.split()

# Tokenize all sentences
tokenized_sentences = [simple_tokenize(sentence) for sentence in sentences]

print("Tokenized sentences:")
for i, tokens in enumerate(tokenized_sentences):
    print(f"  {i+1}. {tokens}")
print()

# Build vocabulary
all_words = []
for tokens in tokenized_sentences:
    all_words.extend(tokens)

word_counts = Counter(all_words)
vocab = ['<PAD>', '<UNK>'] + [word for word, count in word_counts.most_common()]
vocab_size = len(vocab)

print(f"Vocabulary size: {vocab_size}")
print(f"Vocabulary: {vocab}")
print(f"Word frequencies: {dict(word_counts.most_common())}")

In [None]:
# Create word-to-index mappings
word_to_idx = {word: i for i, word in enumerate(vocab)}
idx_to_word = {i: word for i, word in enumerate(vocab)}

print("Word to index mapping:")
for word, idx in word_to_idx.items():
    print(f"  '{word}' -> {idx}")
print()

# Convert sentences to tensor representation
def sentence_to_indices(tokens, word_to_idx, max_length=None):
    """Convert tokenized sentence to indices with optional padding"""
    indices = [word_to_idx.get(token, word_to_idx['<UNK>']) for token in tokens]
    
    if max_length:
        if len(indices) < max_length:
            # Pad with <PAD> tokens
            indices.extend([word_to_idx['<PAD>']] * (max_length - len(indices)))
        else:
            # Truncate if too long
            indices = indices[:max_length]
    
    return indices

# Convert all sentences to indices with padding
max_length = max(len(tokens) for tokens in tokenized_sentences)
print(f"Maximum sentence length: {max_length} words")

sentence_indices = [sentence_to_indices(tokens, word_to_idx, max_length) 
                   for tokens in tokenized_sentences]

# Create tensor
sentences_tensor = torch.tensor(sentence_indices, dtype=torch.long)

print(f"\nSentences tensor shape: {sentences_tensor.shape}")
print(f"Sentences tensor:")
print(sentences_tensor)

# Show the mapping back to words
print("\nTensor to words mapping:")
for i, indices in enumerate(sentence_indices[:3]):  # Show first 3 sentences
    words = [idx_to_word[idx] for idx in indices]
    print(f"  Sentence {i+1}: {indices} -> {words}")

## 3. Text Preprocessing Operations with Tensors

Now let's explore common text preprocessing operations that we can perform using PyTorch tensors.

In [None]:
# Text preprocessing operations
print("=== Text Preprocessing with Tensors ===")

# Load sample conversation data
def load_conversation_data():
    """Load conversation data from JSON file"""
    try:
        with open('../data/conversations/simple_qa_pairs.json', 'r') as f:
            data = json.load(f)
        return data['conversations']
    except FileNotFoundError:
        # Fallback data if file doesn't exist
        return [
            {"input": "Hello", "response": "Hi there! How can I help you?"},
            {"input": "How are you?", "response": "I'm doing well, thank you for asking!"},
            {"input": "What's your name?", "response": "I'm a helpful AI assistant."},
            {"input": "Tell me a joke", "response": "Why don't scientists trust atoms? Because they make up everything!"},
            {"input": "What time is it?", "response": "I don't have access to real-time information."},
            {"input": "Goodbye", "response": "Goodbye! Have a great day!"}
        ]

conversations = load_conversation_data()
print(f"Loaded {len(conversations)} conversation pairs")

# Extract inputs and responses
inputs = [conv['input'] for conv in conversations]
responses = [conv['response'] for conv in conversations]

print("\nSample conversations:")
for i, (inp, resp) in enumerate(zip(inputs[:3], responses[:3])):
    print(f"  {i+1}. Input: '{inp}' -> Response: '{resp}'")

In [None]:
# Advanced tokenization and preprocessing
class TextPreprocessor:
    def __init__(self, vocab_size=1000, min_freq=1):
        self.vocab_size = vocab_size
        self.min_freq = min_freq
        self.word_to_idx = {}
        self.idx_to_word = {}
        self.vocab = []
        
    def tokenize(self, text):
        """Advanced tokenization with better handling"""
        # Convert to lowercase
        text = text.lower()
        # Handle contractions
        text = re.sub(r"won't", "will not", text)
        text = re.sub(r"can't", "cannot", text)
        text = re.sub(r"n't", " not", text)
        text = re.sub(r"'re", " are", text)
        text = re.sub(r"'ve", " have", text)
        text = re.sub(r"'ll", " will", text)
        text = re.sub(r"'d", " would", text)
        text = re.sub(r"'m", " am", text)
        # Remove punctuation and split
        text = re.sub(r'[^\w\s]', '', text)
        return text.split()
    
    def build_vocab(self, texts):
        """Build vocabulary from texts"""
        # Tokenize all texts
        all_tokens = []
        for text in texts:
            all_tokens.extend(self.tokenize(text))
        
        # Count word frequencies
        word_counts = Counter(all_tokens)
        
        # Filter by minimum frequency and vocabulary size
        filtered_words = [word for word, count in word_counts.items() 
                         if count >= self.min_freq]
        
        # Sort by frequency and take top vocab_size - 2 (for special tokens)
        sorted_words = sorted(filtered_words, 
                            key=lambda x: word_counts[x], reverse=True)
        
        # Build vocabulary with special tokens
        self.vocab = ['<PAD>', '<UNK>'] + sorted_words[:self.vocab_size-2]
        
        # Create mappings
        self.word_to_idx = {word: i for i, word in enumerate(self.vocab)}
        self.idx_to_word = {i: word for i, word in enumerate(self.vocab)}
        
        print(f"Built vocabulary with {len(self.vocab)} words")
        print(f"Most common words: {self.vocab[2:12]}")
        
    def text_to_indices(self, text, max_length=None):
        """Convert text to indices"""
        tokens = self.tokenize(text)
        indices = [self.word_to_idx.get(token, self.word_to_idx['<UNK>']) 
                  for token in tokens]
        
        if max_length:
            if len(indices) < max_length:
                indices.extend([self.word_to_idx['<PAD>']] * (max_length - len(indices)))
            else:
                indices = indices[:max_length]
        
        return indices
    
    def indices_to_text(self, indices):
        """Convert indices back to text"""
        words = [self.idx_to_word.get(idx, '<UNK>') for idx in indices]
        # Remove padding tokens
        words = [word for word in words if word != '<PAD>']
        return ' '.join(words)

# Initialize preprocessor
preprocessor = TextPreprocessor(vocab_size=100, min_freq=1)

# Build vocabulary from all texts
all_texts = inputs + responses
preprocessor.build_vocab(all_texts)

print(f"\nVocabulary size: {len(preprocessor.vocab)}")
print(f"Sample vocabulary: {preprocessor.vocab[:20]}")

## 4. Word Embeddings: From Scratch and Pre-trained

Word embeddings are dense vector representations of words that capture semantic relationships. Let's explore both creating embeddings from scratch and using pre-trained embeddings.

### 4.1 Creating Word Embeddings from Scratch

In [None]:
# Creating word embeddings from scratch
print("=== Word Embeddings from Scratch ===")

# Embedding parameters
vocab_size = len(preprocessor.vocab)
embedding_dim = 50  # Dimension of embedding vectors

print(f"Vocabulary size: {vocab_size}")
print(f"Embedding dimension: {embedding_dim}")

# Create embedding layer
embedding_layer = nn.Embedding(vocab_size, embedding_dim)

print(f"\nEmbedding layer: {embedding_layer}")
print(f"Embedding weight shape: {embedding_layer.weight.shape}")
print(f"Number of parameters: {embedding_layer.weight.numel()}")

# Get embeddings for some words
sample_words = ['hello', 'help', 'you', 'time', '<UNK>', '<PAD>']
sample_indices = [preprocessor.word_to_idx.get(word, preprocessor.word_to_idx['<UNK>']) 
                 for word in sample_words]
sample_tensor = torch.tensor(sample_indices, dtype=torch.long)

# Get embeddings
sample_embeddings = embedding_layer(sample_tensor)

print(f"\nSample word embeddings:")
print(f"Sample indices: {sample_indices}")
print(f"Sample embeddings shape: {sample_embeddings.shape}")

for i, word in enumerate(sample_words):
    embedding = sample_embeddings[i]
    print(f"  '{word}' (idx {sample_indices[i]}): {embedding[:5].detach().numpy()}... (first 5 dims)")

In [None]:
# Demonstrate embedding properties
print("=== Embedding Properties and Operations ===")

# Convert conversation data to tensors
max_input_length = max(len(preprocessor.tokenize(text)) for text in inputs)
max_response_length = max(len(preprocessor.tokenize(text)) for text in responses)
max_length = max(max_input_length, max_response_length)

print(f"Max input length: {max_input_length}")
print(f"Max response length: {max_response_length}")
print(f"Using max length: {max_length}")

# Convert to indices
input_indices = [preprocessor.text_to_indices(text, max_length) for text in inputs]
response_indices = [preprocessor.text_to_indices(text, max_length) for text in responses]

# Create tensors
input_tensor = torch.tensor(input_indices, dtype=torch.long)
response_tensor = torch.tensor(response_indices, dtype=torch.long)

# Get embeddings for our conversation data
input_embeddings = embedding_layer(input_tensor)
response_embeddings = embedding_layer(response_tensor)

print(f"\nInput embeddings shape: {input_embeddings.shape}")
print(f"Response embeddings shape: {response_embeddings.shape}")

# Calculate similarity between words using cosine similarity
def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors"""
    return F.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0)).item()

# Compare some word embeddings
word_pairs = [('hello', 'hi'), ('help', 'you'), ('what', 'how'), ('time', 'day')]

print("\nWord similarity (random embeddings):")
for word1, word2 in word_pairs:
    if word1 in preprocessor.word_to_idx and word2 in preprocessor.word_to_idx:
        idx1 = preprocessor.word_to_idx[word1]
        idx2 = preprocessor.word_to_idx[word2]
        
        emb1 = embedding_layer(torch.tensor([idx1]))[0]
        emb2 = embedding_layer(torch.tensor([idx2]))[0]
        
        similarity = cosine_similarity(emb1, emb2)
        print(f"  '{word1}' vs '{word2}': {similarity:.4f}")
    else:
        print(f"  '{word1}' vs '{word2}': One or both words not in vocabulary")

### 4.2 Training Word Embeddings with Skip-gram

Let's implement a simple Skip-gram model to train meaningful word embeddings on our text data.

In [None]:
# Simple Skip-gram implementation
print("=== Training Word Embeddings with Skip-gram ===")

class SkipGram(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(SkipGram, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear = nn.Linear(embedding_dim, vocab_size)
        
    def forward(self, center_word):
        embeds = self.embeddings(center_word)
        out = self.linear(embeds)
        return out

# Create training data for skip-gram
def create_skipgram_data(texts, preprocessor, window_size=2):
    """Create (center_word, context_word) pairs for skip-gram training"""
    data = []
    
    for text in texts:
        tokens = preprocessor.tokenize(text)
        indices = [preprocessor.word_to_idx.get(token, preprocessor.word_to_idx['<UNK>']) 
                  for token in tokens]
        
        for i, center_idx in enumerate(indices):
            # Get context words within window
            start = max(0, i - window_size)
            end = min(len(indices), i + window_size + 1)
            
            for j in range(start, end):
                if i != j:  # Skip the center word itself
                    context_idx = indices[j]
                    data.append((center_idx, context_idx))
    
    return data

# Create training data
skipgram_data = create_skipgram_data(all_texts, preprocessor, window_size=2)
print(f"Created {len(skipgram_data)} training pairs")

# Show some examples
print("\nSample training pairs:")
for i in range(min(10, len(skipgram_data))):
    center_idx, context_idx = skipgram_data[i]
    center_word = preprocessor.idx_to_word[center_idx]
    context_word = preprocessor.idx_to_word[context_idx]
    print(f"  ({center_word}, {context_word})")

# Initialize model
embedding_dim = 32  # Smaller for faster training
model = SkipGram(vocab_size, embedding_dim)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

print(f"\nModel parameters: {sum(p.numel() for p in model.parameters())}")

In [None]:
# Train the skip-gram model
print("=== Training Skip-gram Model ===")

# Convert data to tensors
center_words = torch.tensor([pair[0] for pair in skipgram_data], dtype=torch.long)
context_words = torch.tensor([pair[1] for pair in skipgram_data], dtype=torch.long)

# Training parameters
n_epochs = 50
batch_size = 16
n_batches = len(skipgram_data) // batch_size

print(f"Training for {n_epochs} epochs with batch size {batch_size}")
print(f"Number of batches per epoch: {n_batches}")

# Training loop
losses = []
for epoch in range(n_epochs):
    epoch_loss = 0
    
    # Shuffle data
    perm = torch.randperm(len(skipgram_data))
    center_words_shuffled = center_words[perm]
    context_words_shuffled = context_words[perm]
    
    for batch_idx in range(n_batches):
        start_idx = batch_idx * batch_size
        end_idx = start_idx + batch_size
        
        batch_center = center_words_shuffled[start_idx:end_idx]
        batch_context = context_words_shuffled[start_idx:end_idx]
        
        # Forward pass
        outputs = model(batch_center)
        loss = criterion(outputs, batch_context)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
    
    avg_loss = epoch_loss / n_batches
    losses.append(avg_loss)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1:3d}: Average Loss = {avg_loss:.4f}")

print("\nTraining completed!")

# Plot training loss
plt.figure(figsize=(10, 4))
plt.plot(losses)
plt.title('Skip-gram Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True)
plt.show()

## 5. Embedding Space Visualization

Let's visualize our trained embeddings to understand the relationships between words.

In [None]:
# Visualize embedding space
print("=== Embedding Space Visualization ===")

# Get trained embeddings
trained_embeddings = model.embeddings.weight.detach().numpy()

print(f"Trained embeddings shape: {trained_embeddings.shape}")

# Use PCA to reduce dimensionality for visualization
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(trained_embeddings)

print(f"PCA explained variance ratio: {pca.explained_variance_ratio_}")
print(f"Total variance explained: {sum(pca.explained_variance_ratio_):.3f}")

# Plot embeddings
plt.figure(figsize=(12, 8))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], alpha=0.6)

# Add labels for some words
words_to_show = ['hello', 'help', 'you', 'what', 'how', 'time', 'day', 'thank', 'goodbye']
for word in words_to_show:
    if word in preprocessor.word_to_idx:
        idx = preprocessor.word_to_idx[word]
        x, y = embeddings_2d[idx]
        plt.annotate(word, (x, y), xytext=(5, 5), textcoords='offset points')

plt.title('Word Embeddings Visualization (PCA)')
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.3f} variance)')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.3f} variance)')
plt.grid(True, alpha=0.3)
plt.show()

# Compare word similarities with trained embeddings
def get_word_similarity(word1, word2, embeddings, preprocessor):
    """Calculate similarity between two words using trained embeddings"""
    idx1 = preprocessor.word_to_idx.get(word1, preprocessor.word_to_idx['<UNK>'])
    idx2 = preprocessor.word_to_idx.get(word2, preprocessor.word_to_idx['<UNK>'])
    
    emb1 = torch.tensor(embeddings[idx1])
    emb2 = torch.tensor(embeddings[idx2])
    
    return F.cosine_similarity(emb1.unsqueeze(0), emb2.unsqueeze(0)).item()

# Test word similarities
test_pairs = [('hello', 'hi'), ('help', 'you'), ('what', 'how'), ('time', 'day'), ('thank', 'thanks')]

print("\nWord similarities (trained embeddings):")
for word1, word2 in test_pairs:
    if word1 in preprocessor.word_to_idx and word2 in preprocessor.word_to_idx:
        similarity = get_word_similarity(word1, word2, trained_embeddings, preprocessor)
        print(f"  '{word1}' vs '{word2}': {similarity:.4f}")
    else:
        print(f"  '{word1}' vs '{word2}': One or both words not in vocabulary")

## 6. Tensor Operations for Text Processing

Let's explore advanced tensor operations that are commonly used in NLP tasks.

In [None]:
# Advanced tensor operations for NLP
print("=== Advanced Tensor Operations for NLP ===")

# Create sample batch of sequences
batch_size = 4
seq_length = 8
vocab_size = len(preprocessor.vocab)

# Random sequences for demonstration
sequences = torch.randint(0, vocab_size, (batch_size, seq_length))
print(f"Sample sequences shape: {sequences.shape}")
print(f"Sample sequences:")
print(sequences)

# 1. Masking operations
print("\n1. Masking Operations:")

# Create padding mask (assuming 0 is padding token)
padding_mask = (sequences != 0).float()
print(f"Padding mask shape: {padding_mask.shape}")
print(f"Padding mask:")
print(padding_mask)

# Calculate sequence lengths
seq_lengths = padding_mask.sum(dim=1)
print(f"Sequence lengths: {seq_lengths}")

# 2. Embedding lookup
print("\n2. Embedding Lookup:")
embedding_dim = 16
embedding = nn.Embedding(vocab_size, embedding_dim)

embedded_sequences = embedding(sequences)
print(f"Embedded sequences shape: {embedded_sequences.shape}")
print(f"First sequence embedding (first 3 tokens, first 5 dims):")
print(embedded_sequences[0, :3, :5])

# 3. Masked operations
print("\n3. Masked Operations:")

# Apply mask to embeddings
mask_expanded = padding_mask.unsqueeze(-1).expand_as(embedded_sequences)
masked_embeddings = embedded_sequences * mask_expanded

print(f"Masked embeddings shape: {masked_embeddings.shape}")

# Calculate mean embeddings (ignoring padding)
sum_embeddings = masked_embeddings.sum(dim=1)  # Sum over sequence length
mean_embeddings = sum_embeddings / seq_lengths.unsqueeze(-1)

print(f"Mean embeddings shape: {mean_embeddings.shape}")
print(f"Mean embeddings (first 5 dims):")
print(mean_embeddings[:, :5])

In [None]:
# 4. Attention-like operations
print("4. Attention-like Operations:")

# Simple attention mechanism
def simple_attention(embeddings, mask):
    """Simple attention mechanism"""
    # Calculate attention scores (simplified)
    attention_scores = torch.sum(embeddings, dim=-1)  # [batch_size, seq_length]
    
    # Apply mask to attention scores
    attention_scores = attention_scores.masked_fill(mask == 0, float('-inf'))
    
    # Apply softmax to get attention weights
    attention_weights = F.softmax(attention_scores, dim=-1)
    
    # Apply attention weights to embeddings
    attended_embeddings = torch.sum(embeddings * attention_weights.unsqueeze(-1), dim=1)
    
    return attended_embeddings, attention_weights

attended_emb, attention_weights = simple_attention(embedded_sequences, padding_mask)

print(f"Attended embeddings shape: {attended_emb.shape}")
print(f"Attention weights shape: {attention_weights.shape}")
print(f"Attention weights (first sequence): {attention_weights[0]}")

# Visualize attention weights
plt.figure(figsize=(10, 6))
plt.imshow(attention_weights.detach().numpy(), cmap='Blues', aspect='auto')
plt.title('Attention Weights Visualization')
plt.xlabel('Sequence Position')
plt.ylabel('Batch Index')
plt.colorbar(label='Attention Weight')
plt.show()

# 5. Sequence similarity operations
print("\n5. Sequence Similarity Operations:")

# Calculate pairwise similarities between sequences
def sequence_similarity_matrix(embeddings):
    """Calculate pairwise cosine similarities between sequences"""
    # Normalize embeddings
    normalized_emb = F.normalize(embeddings, p=2, dim=-1)
    
    # Calculate similarity matrix
    similarity_matrix = torch.mm(normalized_emb, normalized_emb.t())
    
    return similarity_matrix

similarity_matrix = sequence_similarity_matrix(mean_embeddings)
print(f"Similarity matrix shape: {similarity_matrix.shape}")
print(f"Similarity matrix:")
print(similarity_matrix)

# Visualize similarity matrix
plt.figure(figsize=(8, 6))
plt.imshow(similarity_matrix.detach().numpy(), cmap='RdYlBu', vmin=-1, vmax=1)
plt.title('Sequence Similarity Matrix')
plt.xlabel('Sequence Index')
plt.ylabel('Sequence Index')
plt.colorbar(label='Cosine Similarity')
plt.show()

## 7. Summary and Next Steps

Let's summarize what we've learned about tensor operations for NLP.

In [None]:
# Summary of key concepts
print("=== Summary: Tensor Operations for NLP ===")
print()

concepts = {
    "Text Representation": [
        "Character-level: Simple but limited semantic understanding",
        "Word-level: More meaningful for most NLP tasks",
        "Vocabulary building: Map words to unique indices",
        "Padding: Handle variable-length sequences"
    ],
    
    "Text Preprocessing": [
        "Tokenization: Split text into meaningful units",
        "Normalization: Lowercase, handle contractions",
        "Vocabulary filtering: Remove rare words, limit size",
        "Special tokens: <PAD>, <UNK> for handling edge cases"
    ],
    
    "Word Embeddings": [
        "Dense vector representations of words",
        "Capture semantic relationships between words",
        "Can be learned from scratch or pre-trained",
        "Skip-gram: Predict context from center word"
    ],
    
    "Tensor Operations": [
        "Masking: Handle variable-length sequences",
        "Embedding lookup: Convert indices to vectors",
        "Aggregation: Mean, sum, attention-weighted",
        "Similarity: Cosine similarity for semantic comparison"
    ]
}

for category, points in concepts.items():
    print(f"{category}:")
    for point in points:
        print(f"  • {point}")
    print()

print("Key Takeaways:")
print("• Text must be converted to numerical form for neural networks")
print("• Word embeddings capture semantic meaning in dense vectors")
print("• Masking is crucial for handling variable-length sequences")
print("• Tensor operations enable efficient batch processing")
print("• Similarity measures help understand semantic relationships")

print("\nNext Steps:")
print("• Data Preprocessing and Tokenization (Notebook 03)")
print("• Neural Networks Basics (Notebook 04)")
print("• Sequence Models for Chatbots (Notebook 05+)")

print("\n" + "="*50)
print("Ready to build more sophisticated NLP models!")
print("="*50)