# Chapter 2: Classic DP/Graphs for ML Engineers

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jmamath/interview_prep/blob/main/chapter_02_dp_graphs_ml.ipynb)

## Introduction
In the realm of machine learning, dynamic programming (DP) and graph algorithms serve as powerful tools for solving a wide array of complex problems. These algorithms are foundational to applications including sequence generation, speech recognition, and natural language processing (NLP). In this chapter, we will delve into these concepts through exercises that highlight their utility in real-world ML scenarios.

## Problem Exploration`

## Learning Objectives
By the end of this chapter, you will be able to:
- Implement dynamic programming solutions for ML-related problems
- Design and implement beam search algorithms for sequence generation
- Apply graph algorithms to model training and inference problems
- Implement the Viterbi algorithm for sequence tagging
- Use diverse beam search for better generation diversity

## Prerequisites
- Understanding of dynamic programming concepts
- Familiarity with graph traversal algorithms (BFS, DFS)
- Basic knowledge of sequence generation in NLP
- Experience with Python data structures and algorithms


## 2.1 Dynamic Programming Refresher

### Core DP Concepts
Dynamic Programming is a method for solving complex problems by breaking them down into simpler subproblems and storing the results to avoid redundant calculations.

**Key Principles:**
1. **Optimal Substructure**: Optimal solution contains optimal solutions to subproblems
2. **Overlapping Subproblems**: Same subproblems are solved multiple times
3. **Memoization**: Store results of subproblems to avoid recomputation

### Common DP Patterns in ML

**1. Sequence Problems**
- Longest Common Subsequence (LCS)
- Edit Distance (Levenshtein)
- Sequence alignment in bioinformatics

**2. Optimization Problems**
- Knapsack variants for feature selection
- Resource allocation in distributed training
- Path optimization in neural architecture search

**3. Probability Problems**
- Forward-backward algorithm in HMMs
- Viterbi algorithm for sequence tagging
- Belief propagation in graphical models

### DP vs Greedy vs Divide-and-Conquer
- **DP**: Optimal solution, overlapping subproblems
- **Greedy**: Local optimal choice, no overlapping
- **Divide-and-Conquer**: Independent subproblems, no overlapping


## Problem 1: Simple Beam Search (Easy)

### Contextual Introduction
In machine translation, choosing the right words to construct a sentence is key. Beam search is often used to decide between sentence translations by maintaining a balance between exploration and exploitation. Imagine translating, "The cat sat on the mat," where multiple valid word combinations exist.

### Key Concepts
Beam search explores multiple translation paths simultaneously. Unlike greedy search methods which choose the best option at each step, beam search keeps the top-k sequences (beam width), providing a more nuanced result.

### Problem
Implement a basic beam search algorithm for sequence generation. Given a vocabulary and a scoring function, generate the top-k sequences.

### Requirements
- Implement beam search with configurable beam width
- Support early stopping when end token is reached
- Return sequences with their scores
- Handle edge cases (empty vocabulary, zero beam width)

**Example**:
```python
vocabulary = ['hello', 'world', 'end']
beam_width = 2
max_length = 3
# Should return top-2 sequences of length up to 3
```

**Starter Code**:


## 2.2 Graph Algorithms for ML

### Graph Representation
```python
# Adjacency List (most common for ML)
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D'],
    'C': ['A', 'D'],
    'D': ['B', 'C']
}

# Adjacency Matrix (for dense graphs)
adj_matrix = [
    [0, 1, 1, 0],
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [0, 1, 1, 0]
]
```

### Essential Graph Algorithms

**1. Breadth-First Search (BFS)**
- Level-order traversal
- Shortest path in unweighted graphs
- Connected components

**2. Depth-First Search (DFS)**
- Preorder, inorder, postorder traversals
- Cycle detection
- Topological sorting

**3. Shortest Path Algorithms**
- Dijkstra's algorithm (single-source, non-negative weights)
- Bellman-Ford (single-source, negative weights allowed)
- Floyd-Warshall (all-pairs shortest paths)

### ML Applications of Graph Algorithms

**1. Neural Architecture Search (NAS)**
- Graph-based representation of neural networks
- Search space exploration using graph traversal
- Architecture optimization using shortest path algorithms

**2. Knowledge Graphs**
- Entity relationship modeling
- Graph neural networks (GNNs)
- Link prediction and recommendation systems

**3. Dependency Parsing**
- Parse tree construction
- Graph-based parsing algorithms
- Syntactic analysis in NLP


## 2.3 Beam Search and Sequence Generation

### Beam Search Algorithm
Beam search is a heuristic search algorithm that explores a graph by expanding the most promising nodes in a limited set (beam width).

**Algorithm Steps:**
1. Start with initial state(s)
2. At each step, generate all possible next states
3. Score all candidates
4. Keep only the top-k (beam width) candidates
5. Repeat until termination condition

### Beam Search Variants

**1. Standard Beam Search**
```python
def beam_search(initial_states, beam_width, max_length):
    beam = initial_states
    for step in range(max_length):
        candidates = []
        for state in beam:
            candidates.extend(generate_next_states(state))
        beam = select_top_k(candidates, beam_width)
    return beam
```

**2. Top-k Beam Search**
- Select top-k candidates at each step
- More diverse than greedy search
- Balances quality and diversity

**3. Diverse Beam Search**
- Group candidates by similarity
- Select best from each group
- Reduces redundancy in generated sequences

### Applications in ML

**1. Neural Machine Translation**
- Generate target sequences word by word
- Maintain multiple translation hypotheses
- Select best overall translation

**2. Text Generation**
- Language model inference
- Story generation
- Code generation

**3. Speech Recognition**
- Acoustic model + language model
- Generate most likely word sequences
- Handle multiple pronunciation variants


## 2.4 Viterbi Algorithm

### Hidden Markov Models (HMMs)
HMMs are statistical models where the system being modeled is assumed to be a Markov process with unobserved (hidden) states.

**Components:**
- **States**: Hidden variables we want to infer
- **Observations**: Visible variables we can measure
- **Transition Probabilities**: P(state_t | state_{t-1})
- **Emission Probabilities**: P(observation_t | state_t)
- **Initial Probabilities**: P(state_0)

### Viterbi Algorithm
The Viterbi algorithm finds the most likely sequence of hidden states given a sequence of observations.

**Dynamic Programming Formulation:**
```
v[t][s] = max over all previous states s' of:
    v[t-1][s'] * transition[s'][s] * emission[s][observation[t]]
```

**Backtracking:**
- Keep track of which previous state led to each current state
- Trace back from the final state to get the optimal path

### Applications in ML

**1. Part-of-Speech Tagging**
- Hidden states: POS tags (noun, verb, adjective, etc.)
- Observations: words in a sentence
- Find most likely sequence of POS tags

**2. Named Entity Recognition**
- Hidden states: entity types (person, location, organization, etc.)
- Observations: words in text
- Identify and classify named entities

**3. Speech Recognition**
- Hidden states: phonemes or words
- Observations: acoustic features
- Convert speech to text


---

## Practice Questions

Now let's apply these concepts with 5 progressive exercises. Each question builds on the previous concepts and increases in difficulty.


### Question 1: Simple Beam Search (Easy)

**Problem**: Implement a basic beam search algorithm for sequence generation. Given a vocabulary and a scoring function, generate the top-k sequences.

**Requirements**:
- Implement beam search with configurable beam width
- Support early stopping when end token is reached
- Return sequences with their scores
- Handle edge cases (empty vocabulary, zero beam width)

**Example**:
```python
vocabulary = ['hello', 'world', 'end']
beam_width = 2
max_length = 3
# Should return top-2 sequences of length up to 3
```

**Starter Code**:


In [None]:
from typing import List, Tuple, Callable, Optional
import heapq

class BeamSearch:
    """
    Simple beam search implementation for sequence generation.
    """
    
    def __init__(self, vocabulary: List[str], end_token: str = '<END>'):
        """
        Initialize beam search.
        
        Args:
            vocabulary: List of possible tokens
            end_token: Token that signals sequence end
        """
        self.vocabulary = vocabulary
        self.end_token = end_token
        # TODO: Add any additional initialization
    
    def score_sequence(self, sequence: List[str]) -> float:
        """
        Score a sequence (higher is better).
        Simple scoring function - can be replaced with more sophisticated models.
        
        Args:
            sequence: List of tokens
            
        Returns:
            Score for the sequence
        """
        # TODO: Implement scoring function
        pass
    
    def generate_next_tokens(self, sequence: List[str]) -> List[str]:
        """
        Generate possible next tokens for a sequence.
        
        Args:
            sequence: Current sequence
            
        Returns:
            List of possible next tokens
        """
        # TODO: Implement next token generation
        pass
    
    def search(self, beam_width: int, max_length: int) -> List[Tuple[List[str], float]]:
        """
        Perform beam search.
        
        Args:
            beam_width: Number of sequences to keep at each step
            max_length: Maximum sequence length
            
        Returns:
            List of (sequence, score) tuples, sorted by score (descending)
        """
        # TODO: Implement beam search algorithm
        pass

# Test cases
def test_beam_search():
    """Test beam search implementation."""
    print("Running beam search tests...")
    
    # Test case 1: Basic functionality
    vocabulary = ['hello', 'world', 'end']
    beam_search = BeamSearch(vocabulary, end_token='end')
    
    results = beam_search.search(beam_width=2, max_length=3)
    
    assert len(results) <= 2, f"Expected at most 2 results, got {len(results)}"
    assert all(isinstance(seq, list) for seq, _ in results), "Results should be lists"
    assert all(isinstance(score, (int, float)) for _, score in results), "Scores should be numeric"
    print("✓ Test 1: Basic functionality passed")
    
    # Test case 2: Empty vocabulary
    empty_vocab = []
    empty_beam = BeamSearch(empty_vocab)
    empty_results = empty_beam.search(beam_width=2, max_length=3)
    assert len(empty_results) == 0, "Empty vocabulary should return no results"
    print("✓ Test 2: Empty vocabulary handled")
    
    # Test case 3: Zero beam width
    zero_beam = BeamSearch(vocabulary)
    zero_results = zero_beam.search(beam_width=0, max_length=3)
    assert len(zero_results) == 0, "Zero beam width should return no results"
    print("✓ Test 3: Zero beam width handled")
    
    print("🎉 All beam search tests passed!")

# Run tests
test_beam_search()


### Question 2: Top-k Beam Search with Scores (Medium)

**Problem**: Extend the beam search to implement top-k sampling with proper scoring and ranking. Include length normalization and diversity measures.

**Requirements**:
- Implement top-k beam search with configurable k
- Add length normalization to scores
- Include diversity penalty to avoid repetitive sequences
- Support different scoring strategies (greedy, sampling, nucleus)
- Handle sequences of different lengths fairly

**Example**:
```python
# Generate top-3 sequences with diversity
sequences = top_k_beam_search(vocab, k=3, diversity_penalty=0.5)
# Should return diverse, high-scoring sequences
```

**Starter Code**:


In [None]:
import random
import math
from typing import List, Tuple, Dict, Set
from collections import Counter

class TopKBeamSearch:
    """
    Enhanced beam search with top-k sampling and diversity measures.
    """
    
    def __init__(self, vocabulary: List[str], end_token: str = '<END>'):
        """
        Initialize top-k beam search.
        
        Args:
            vocabulary: List of possible tokens
            end_token: Token that signals sequence end
        """
        self.vocabulary = vocabulary
        self.end_token = end_token
        # TODO: Add initialization for diversity tracking
    
    def score_sequence(self, sequence: List[str], length_penalty: float = 0.6) -> float:
        """
        Score a sequence with length normalization.
        
        Args:
            sequence: List of tokens
            length_penalty: Penalty factor for length (0 = no penalty, 1 = full penalty)
            
        Returns:
            Normalized score for the sequence
        """
        # TODO: Implement length-normalized scoring
        pass
    
    def calculate_diversity_penalty(self, sequence: List[str], 
                                  existing_sequences: List[List[str]], 
                                  penalty_weight: float = 0.5) -> float:
        """
        Calculate diversity penalty based on similarity to existing sequences.
        
        Args:
            sequence: Current sequence
            existing_sequences: Previously generated sequences
            penalty_weight: Weight of diversity penalty
            
        Returns:
            Diversity penalty score
        """
        # TODO: Implement diversity penalty calculation
        pass
    
    def top_k_sampling(self, logits: List[float], k: int, temperature: float = 1.0) -> List[int]:
        """
        Sample top-k tokens from logits.
        
        Args:
            logits: Raw scores for each token
            k: Number of top tokens to consider
            temperature: Sampling temperature (higher = more random)
            
        Returns:
            Indices of top-k tokens
        """
        # TODO: Implement top-k sampling
        pass
    
    def search(self, k: int, max_length: int, diversity_penalty: float = 0.5,
              temperature: float = 1.0) -> List[Tuple[List[str], float]]:
        """
        Perform top-k beam search with diversity.
        
        Args:
            k: Number of sequences to generate
            max_length: Maximum sequence length
            diversity_penalty: Weight for diversity penalty
            temperature: Sampling temperature
            
        Returns:
            List of (sequence, score) tuples, sorted by score
        """
        # TODO: Implement top-k beam search with diversity
        pass

# Test cases
def test_top_k_beam_search():
    """Test top-k beam search implementation."""
    print("Running top-k beam search tests...")
    
    # Test case 1: Basic top-k functionality
    vocabulary = ['hello', 'world', 'end', 'good', 'morning']
    top_k_search = TopKBeamSearch(vocabulary, end_token='end')
    
    results = top_k_search.search(k=3, max_length=4)
    
    assert len(results) <= 3, f"Expected at most 3 results, got {len(results)}"
    assert all(isinstance(seq, list) for seq, _ in results), "Results should be lists"
    print("✓ Test 1: Basic top-k functionality passed")
    
    # Test case 2: Diversity penalty
    diverse_results = top_k_search.search(k=3, max_length=4, diversity_penalty=0.8)
    # Check that sequences are different (simple check)
    sequences = [seq for seq, _ in diverse_results]
    unique_sequences = set(tuple(seq) for seq in sequences)
    assert len(unique_sequences) == len(sequences), "Sequences should be unique"
    print("✓ Test 2: Diversity penalty working")
    
    # Test case 3: Temperature sampling
    high_temp_results = top_k_search.search(k=2, max_length=3, temperature=2.0)
    low_temp_results = top_k_search.search(k=2, max_length=3, temperature=0.1)
    
    # Higher temperature should produce more diverse results
    assert len(high_temp_results) <= 2, "Should respect k parameter"
    assert len(low_temp_results) <= 2, "Should respect k parameter"
    print("✓ Test 3: Temperature sampling working")
    
    print("🎉 All top-k beam search tests passed!")

# Run tests
test_top_k_beam_search()


## Problem 3: Viterbi Algorithm for Sequence Tagging (Medium)

### Contextual Introduction
In Natural Language Processing (NLP), sequence tagging is essential for tasks like Part-of-Speech (POS) tagging. The Viterbi algorithm, crucial in Hidden Markov Models (HMMs), is widely used to find the most probable sequence of hidden states.

### Key Concepts
- **Hidden Markov Models**: Statistical models where the system is assumed to be a Markov process with hidden states.
- **Transition Probabilities**: Probability of moving from one state to another.
- **Emission Probabilities**: Probability of an observed output given a state.

### Problem
Implement the Viterbi algorithm for POS tagging a sentence using a given HMM with transition and emission probabilities.

### Requirements
- Implement the Viterbi algorithm
- Process given sentences based on provided HMM parameters
- Handle common NLP sequences and basic language structures
- Test with sentences of varying complexity

**Example**:
```python
states = ['Noun', 'Verb']
observations = ['cat', 'sat']
start_prob = {'Noun': 0.6, 'Verb': 0.4}
trans_prob = {'Noun': {'Noun': 0.7, 'Verb': 0.3}, 'Verb': {'Noun': 0.4, 'Verb': 0.6}}
emit_prob = {'Noun': {'cat': 0.8, 'sat': 0.2}, 'Verb': {'cat': 0.1, 'sat': 0.9}}

# Determine the most likely sequence of states
```

**Starter Code**:


In [None]:
import numpy as np
from typing import List, Tuple, Dict, Optional
from collections import defaultdict

class HMMTagger:
    """
    Hidden Markov Model for Part-of-Speech Tagging using Viterbi algorithm.
    """
    
    def __init__(self, tags: List[str], words: List[str]):
        """
        Initialize HMM tagger.
        
        Args:
            tags: List of possible POS tags
            words: List of possible words
        """
        self.tags = tags
        self.words = words
        self.tag_to_idx = {tag: i for i, tag in enumerate(tags)}
        self.word_to_idx = {word: i for i, word in enumerate(words)}
        
        # Transition probabilities: P(tag_t | tag_{t-1})
        self.transitions = np.zeros((len(tags), len(tags)))
        
        # Emission probabilities: P(word_t | tag_t)
        self.emissions = np.zeros((len(tags), len(words)))
        
        # Initial probabilities: P(tag_0)
        self.initial = np.zeros(len(tags))
        
        # TODO: Initialize probabilities (can be loaded from training data)
    
    def train(self, tagged_sentences: List[List[Tuple[str, str]]]):
        """
        Train the HMM on tagged sentences.
        
        Args:
            tagged_sentences: List of sentences, each is list of (word, tag) tuples
        """
        # TODO: Implement training to estimate probabilities
        pass
    
    def viterbi(self, sentence: List[str]) -> Tuple[List[str], float]:
        """
        Find most likely tag sequence using Viterbi algorithm.
        
        Args:
            sentence: List of words
            
        Returns:
            Tuple of (best_tag_sequence, probability)
        """
        # TODO: Implement Viterbi algorithm
        pass
    
    def forward_pass(self, sentence: List[str]) -> np.ndarray:
        """
        Forward pass: calculate probabilities for each state at each time step.
        
        Args:
            sentence: List of words
            
        Returns:
            Array of shape (len(sentence), len(tags)) with probabilities
        """
        # TODO: Implement forward pass
        pass
    
    def backward_pass(self, forward_probs: np.ndarray, sentence: List[str]) -> List[str]:
        """
        Backward pass: reconstruct the best path.
        
        Args:
            forward_probs: Probabilities from forward pass
            sentence: List of words
            
        Returns:
            Best tag sequence
        """
        # TODO: Implement backward pass
        pass
    
    def get_emission_prob(self, word: str, tag: str) -> float:
        """
        Get emission probability with smoothing for unknown words.
        
        Args:
            word: Word
            tag: POS tag
            
        Returns:
            Emission probability
        """
        # TODO: Implement emission probability with smoothing
        pass

# Test cases
def test_viterbi_algorithm():
    """Test Viterbi algorithm implementation."""
    print("Running Viterbi algorithm tests...")
    
    # Test case 1: Basic functionality
    tags = ['DET', 'NOUN', 'VERB', 'ADJ']
    words = ['the', 'cat', 'sat', 'big', 'dog']
    tagger = HMMTagger(tags, words)
    
    # Simple test sentence
    sentence = ['the', 'cat']
    best_tags, prob = tagger.viterbi(sentence)
    
    assert len(best_tags) == len(sentence), f"Tag sequence length should match sentence length"
    assert all(tag in tags for tag in best_tags), "All tags should be valid"
    assert 0 <= prob <= 1, f"Probability should be between 0 and 1, got {prob}"
    print("✓ Test 1: Basic functionality passed")
    
    # Test case 2: Unknown word handling
    unknown_sentence = ['the', 'unknown_word']
    unknown_tags, unknown_prob = tagger.viterbi(unknown_sentence)
    
    assert len(unknown_tags) == len(unknown_sentence), "Should handle unknown words"
    assert all(tag in tags for tag in unknown_tags), "Should return valid tags for unknown words"
    print("✓ Test 2: Unknown word handling passed")
    
    # Test case 3: Empty sentence
    empty_tags, empty_prob = tagger.viterbi([])
    assert len(empty_tags) == 0, "Empty sentence should return empty tag sequence"
    print("✓ Test 3: Empty sentence handled")
    
    print("🎉 All Viterbi algorithm tests passed!")

# Run tests
test_viterbi_algorithm()


## Problem 4: Constrained Beam Search (Medium-Hard)

### Contextual Introduction
Constrained beam search is valuable in chatbot development, where responses need to fit specific conditions. By respecting constraints, we ensure outputs align with predetermined rules.

### Key Concepts
Constraints enforce rules on the generated sequences. Logical handling of these constraints allows the system to generate coherent and context-aware responses.

### Problem
Implement beam search respecting constraints for applications like constrained text generation in chatbots.

### Requirements
- Design constraints (must-contain, must-not-contain, etc.)
- Integrate these constraints into beam search
- Produce sequences meeting specified conditions

**Example**:
```python
constraints = ['must include hello', 'must not include error']
beam_width = 2
# Generate sequences that fulfill constraints
```

**Starter Code**:

In [None]:
from typing import List, Tuple, Dict, Set, Callable, Optional
import heapq
from abc import ABC, abstractmethod

class Constraint(ABC):
    """Abstract base class for constraints."""
    
    @abstractmethod
    def check(self, sequence: List[str]) -> bool:
        """Check if sequence satisfies constraint."""
        pass
    
    @abstractmethod
    def can_be_satisfied(self, partial_sequence: List[str], vocabulary: List[str]) -> bool:
        """Check if constraint can still be satisfied given partial sequence."""
        pass

class MustContainConstraint(Constraint):
    """Constraint that requires sequence to contain specific tokens."""
    
    def __init__(self, required_tokens: List[str]):
        self.required_tokens = set(required_tokens)
    
    def check(self, sequence: List[str]) -> bool:
        # TODO: Implement must contain check
        pass
    
    def can_be_satisfied(self, partial_sequence: List[str], vocabulary: List[str]) -> bool:
        # TODO: Implement can be satisfied check
        pass

class MustNotContainConstraint(Constraint):
    """Constraint that forbids sequence from containing specific tokens."""
    
    def __init__(self, forbidden_tokens: List[str]):
        self.forbidden_tokens = set(forbidden_tokens)
    
    def check(self, sequence: List[str]) -> bool:
        # TODO: Implement must not contain check
        pass
    
    def can_be_satisfied(self, partial_sequence: List[str], vocabulary: List[str]) -> bool:
        # TODO: Implement can be satisfied check
        pass

class ConstrainedBeamSearch:
    """
    Beam search with constraint satisfaction.
    """
    
    def __init__(self, vocabulary: List[str], end_token: str = '<END>'):
        self.vocabulary = vocabulary
        self.end_token = end_token
        self.constraints = []
    
    def add_constraint(self, constraint: Constraint):
        """Add a constraint to the search."""
        # TODO: Implement constraint addition
        pass
    
    def check_all_constraints(self, sequence: List[str]) -> bool:
        """Check if sequence satisfies all constraints."""
        # TODO: Implement constraint checking
        pass
    
    def can_satisfy_constraints(self, partial_sequence: List[str]) -> bool:
        """Check if partial sequence can still satisfy all constraints."""
        # TODO: Implement partial constraint checking
        pass
    
    def search(self, beam_width: int, max_length: int) -> List[Tuple[List[str], float]]:
        """
        Perform constrained beam search.
        
        Args:
            beam_width: Number of sequences to keep at each step
            max_length: Maximum sequence length
            
        Returns:
            List of (sequence, score) tuples that satisfy all constraints
        """
        # TODO: Implement constrained beam search
        pass

# Test cases
def test_constrained_beam_search():
    """Test constrained beam search implementation."""
    print("Running constrained beam search tests...")
    
    # Test case 1: Basic constraint functionality
    vocabulary = ['hello', 'world', 'good', 'bad', 'end']
    constrained_search = ConstrainedBeamSearch(vocabulary, end_token='end')
    
    # Add must contain constraint
    must_contain = MustContainConstraint(['hello'])
    constrained_search.add_constraint(must_contain)
    
    results = constrained_search.search(beam_width=2, max_length=4)
    
    # All results should contain 'hello'
    for sequence, score in results:
        assert 'hello' in sequence, f"Sequence {sequence} should contain 'hello'"
    print("✓ Test 1: Must contain constraint working")
    
    # Test case 2: Must not contain constraint
    forbidden_search = ConstrainedBeamSearch(vocabulary, end_token='end')
    must_not_contain = MustNotContainConstraint(['bad'])
    forbidden_search.add_constraint(must_not_contain)
    
    forbidden_results = forbidden_search.search(beam_width=2, max_length=4)
    
    # No results should contain 'bad'
    for sequence, score in forbidden_results:
        assert 'bad' not in sequence, f"Sequence {sequence} should not contain 'bad'"
    print("✓ Test 2: Must not contain constraint working")
    
    # Test case 3: Conflicting constraints
    conflicting_search = ConstrainedBeamSearch(vocabulary, end_token='end')
    conflicting_search.add_constraint(MustContainConstraint(['hello']))
    conflicting_search.add_constraint(MustNotContainConstraint(['hello']))
    
    conflicting_results = conflicting_search.search(beam_width=2, max_length=4)
    
    # Should handle conflicting constraints gracefully
    assert isinstance(conflicting_results, list), "Should return list even with conflicting constraints"
    print("✓ Test 3: Conflicting constraints handled")
    
    print("🎉 All constrained beam search tests passed!")

# Run tests
test_constrained_beam_search()


## Problem 5: Diverse Beam Search with Groups (Hard)

### Contextual Introduction
Diverse beam search ensures varied output, crucial in creative applications like storytelling. By grouping similar sequences, we can select the best from each group, ensuring diversity.

### Key Concepts
- **Sequence Similarity**: Measurement to group similar sequences.
- **Grouping Strategies**: Use clustering or other methods to form groups.
- **Selection Process**: Choose the best sequence from each group.

### Problem
Implement a diverse beam search that groups sequences and selects the best from each.

### Requirements
- Measure sequence similarity for grouping
- Group sequences during the search
- Select the best sequence from each group

**Example**:
```python
# Group similar sequences and select best
vocabulary = ['once', 'upon', 'time', 'end']
beam_width = 5
num_groups = 3
# Generate diverse storylines
```

**Starter Code**:

In [None]:
from typing import List, Tuple, Dict, Set
import heapq
from collections import defaultdict
import numpy as np

class DiverseBeamSearch:
    """
    Diverse beam search that groups similar sequences and selects best from each group.
    """
    
    def __init__(self, vocabulary: List[str], end_token: str = '<END>'):
        self.vocabulary = vocabulary
        self.end_token = end_token
    
    def calculate_similarity(self, seq1: List[str], seq2: List[str]) -> float:
        """
        Calculate similarity between two sequences.
        
        Args:
            seq1, seq2: Sequences to compare
            
        Returns:
            Similarity score between 0 and 1
        """
        # TODO: Implement sequence similarity calculation
        pass
    
    def group_sequences(self, sequences: List[Tuple[List[str], float]], 
                       num_groups: int, similarity_threshold: float = 0.7) -> List[List[Tuple[List[str], float]]]:
        """
        Group sequences by similarity.
        
        Args:
            sequences: List of (sequence, score) tuples
            num_groups: Target number of groups
            similarity_threshold: Minimum similarity for grouping
            
        Returns:
            List of groups, each containing similar sequences
        """
        # TODO: Implement sequence grouping
        pass
    
    def select_best_from_groups(self, groups: List[List[Tuple[List[str], float]]]) -> List[Tuple[List[str], float]]:
        """
        Select the best sequence from each group.
        
        Args:
            groups: List of groups of sequences
            
        Returns:
            List of best sequences from each group
        """
        # TODO: Implement best selection from groups
        pass
    
    def search(self, beam_width: int, max_length: int, num_groups: int, 
              similarity_threshold: float = 0.7) -> List[Tuple[List[str], float]]:
        """
        Perform diverse beam search.
        
        Args:
            beam_width: Number of sequences to keep at each step
            max_length: Maximum sequence length
            num_groups: Number of diversity groups
            similarity_threshold: Similarity threshold for grouping
            
        Returns:
            List of diverse (sequence, score) tuples
        """
        # TODO: Implement diverse beam search
        pass

# Test cases
def test_diverse_beam_search():
    """Test diverse beam search implementation."""
    print("Running diverse beam search tests...")
    
    # Test case 1: Basic diverse search
    vocabulary = ['hello', 'world', 'good', 'morning', 'end']
    diverse_search = DiverseBeamSearch(vocabulary, end_token='end')
    
    results = diverse_search.search(beam_width=6, max_length=4, num_groups=3)
    
    assert len(results) <= 6, f"Expected at most 6 results, got {len(results)}"
    assert all(isinstance(seq, list) for seq, _ in results), "Results should be lists"
    print("✓ Test 1: Basic diverse search passed")
    
    # Test case 2: Similarity calculation
    seq1 = ['hello', 'world']
    seq2 = ['hello', 'good']
    similarity = diverse_search.calculate_similarity(seq1, seq2)
    
    assert 0 <= similarity <= 1, f"Similarity should be between 0 and 1, got {similarity}"
    print("✓ Test 2: Similarity calculation working")
    
    # Test case 3: Grouping functionality
    test_sequences = [
        (['hello', 'world'], 0.9),
        (['hello', 'good'], 0.8),
        (['good', 'morning'], 0.7)
    ]
    groups = diverse_search.group_sequences(test_sequences, num_groups=2)
    
    assert len(groups) <= 2, f"Expected at most 2 groups, got {len(groups)}"
    print("✓ Test 3: Grouping functionality working")
    
    print("🎉 All diverse beam search tests passed!")

# Run tests
test_diverse_beam_search()


---

## 💡 Hints

<details>
<summary>Click to reveal hint for Question 1: Simple Beam Search</summary>

**Hint**: For beam search, maintain a beam (priority queue) of sequences sorted by score. At each step, generate all possible next tokens for each sequence in the beam, score the new sequences, and keep only the top-k. Use a heap to efficiently maintain the beam.

**Key insight**: Beam search is essentially a breadth-first search with a limited frontier size (beam width).
</details>

<details>
<summary>Click to reveal hint for Question 2: Top-k Beam Search with Scores</summary>

**Hint**: For top-k sampling, first apply temperature scaling to logits, then select the top-k tokens. For length normalization, divide the total score by sequence length raised to a power. For diversity penalty, calculate similarity between sequences and penalize similar ones.

**Key insight**: Length normalization prevents shorter sequences from being unfairly favored, while diversity penalty encourages exploration of different sequence patterns.
</details>

<details>
<summary>Click to reveal hint for Question 3: Viterbi Algorithm for Sequence Tagging</summary>

**Hint**: The Viterbi algorithm has two phases: forward pass (calculate probabilities) and backward pass (reconstruct path). Use dynamic programming to store the best path to each state at each time step. For unknown words, use smoothing or assign uniform probability across all tags.

**Key insight**: The forward pass calculates the probability of the most likely path ending at each state, while the backward pass traces back to find the actual path.
</details>

<details>
<summary>Click to reveal hint for Question 4: Constrained Beam Search</summary>

**Hint**: Implement constraints as separate classes with `check()` and `can_be_satisfied()` methods. During beam search, filter out sequences that violate constraints and prune branches that can't satisfy constraints. Use early termination when no valid sequences remain.

**Key insight**: Constraints should be checked both on complete sequences and partial sequences to enable early pruning of invalid branches.
</details>

<details>
<summary>Click to reveal hint for Question 5: Diverse Beam Search with Groups</summary>

**Hint**: For similarity calculation, use Jaccard similarity or edit distance. For grouping, use clustering algorithms like K-means or hierarchical clustering on sequence features. Select the highest-scoring sequence from each group to maintain both quality and diversity.

**Key insight**: The key is balancing exploration (diversity) with exploitation (quality) by ensuring each group contributes its best sequence to the final result.
</details>
