# ML Coding Practice Problems

## For Sindhuja's Coding Round (Oct 23)

### General Guidelines
- Write clean, readable code
- Think aloud - explain your approach
- Consider edge cases
- Discuss time/space complexity
- Ask clarifying questions
- Test your solution

**7 Problems covering:**
1. Attention Mechanism
2. Cosine Similarity
3. Data Collator
4. Gradient Descent
5. Confusion Matrix
6. K-Means Clustering
7. Softmax Stability

In [None]:
# Essential imports for ML coding
import numpy as np
import torch
import torch.nn as nn
from collections import Counter, defaultdict
import matplotlib.pyplot as plt

---
## Problem 1: Attention Mechanism Implementation (Medium)

**Time:** 30-40 minutes  
**Topics:** Transformers, Neural Networks

Implement scaled dot-product attention from scratch.

```
Attention(Q, K, V) = softmax(Q @ K^T / sqrt(d_k)) @ V
```

### Your Turn!

In [None]:
def scaled_dot_product_attention(Q, K, V, mask=None):
    """
    Your implementation
    
    Args:
        Q: Query matrix (batch_size, seq_len, d_k)
        K: Key matrix (batch_size, seq_len, d_k)
        V: Value matrix (batch_size, seq_len, d_v)
        mask: Optional mask (batch_size, seq_len, seq_len)
    
    Returns:
        output: (batch_size, seq_len, d_v)
        attention_weights: (batch_size, seq_len, seq_len)
    """
    # TODO: Implement
    # 1. Compute scores: Q @ K^T / sqrt(d_k)
    # 2. Apply mask if provided
    # 3. Apply softmax
    # 4. Multiply by V
    pass

# Test
batch_size, seq_len, d_k, d_v = 2, 4, 8, 8
Q = np.random.randn(batch_size, seq_len, d_k)
K = np.random.randn(batch_size, seq_len, d_k)
V = np.random.randn(batch_size, seq_len, d_v)

output, weights = scaled_dot_product_attention(Q, K, V)
print(f"Output shape: {output.shape}")  # (2, 4, 8)
print(f"Attention weights shape: {weights.shape}")  # (2, 4, 4)

In [None]:
# SOLUTION
def scaled_dot_product_attention_solution(Q, K, V, mask=None):
    """
    Scaled Dot-Product Attention
    """
    # Get dimension of keys
    d_k = Q.shape[-1]
    
    # Compute attention scores: Q @ K^T / sqrt(d_k)
    scores = np.matmul(Q, K.transpose(0, 2, 1)) / np.sqrt(d_k)
    
    # Apply mask if provided
    if mask is not None:
        scores = scores + (mask * -1e9)
    
    # Apply softmax
    attention_weights = np.exp(scores) / np.exp(scores).sum(axis=-1, keepdims=True)
    
    # Apply attention to values
    output = np.matmul(attention_weights, V)
    
    return output, attention_weights

# Test
batch_size, seq_len, d_k, d_v = 2, 4, 8, 8
Q = np.random.randn(batch_size, seq_len, d_k)
K = np.random.randn(batch_size, seq_len, d_k)
V = np.random.randn(batch_size, seq_len, d_v)

output, weights = scaled_dot_product_attention_solution(Q, K, V)
print(f"Output shape: {output.shape}")
print(f"Attention weights sum: {weights.sum(axis=-1)}")

**Key Discussion Points:**
- Why divide by sqrt(d_k)? Prevents softmax saturation
- Time complexity: O(n²d) where n is sequence length
- Space complexity: O(n²) for attention weights

---
## Problem 2: Cosine Similarity (Easy-Medium)

**Time:** 20-25 minutes  
**Topics:** NLP, Embeddings

Find top-k most similar sentences using cosine similarity.

### Your Turn!

In [None]:
def find_similar_sentences(query_embedding, sentence_embeddings, k=5):
    """
    Your implementation
    
    Args:
        query_embedding: (d,)
        sentence_embeddings: (n, d)
        k: top-k similar
    
    Returns:
        indices: top-k indices
        similarities: cosine similarity scores
    """
    # TODO: Implement
    # 1. Normalize embeddings
    # 2. Compute dot product
    # 3. Get top-k
    pass

# Test
np.random.seed(42)
query_embedding = np.random.randn(384)
sentence_embeddings = np.random.randn(100, 384)

indices, similarities = find_similar_sentences(query_embedding, sentence_embeddings, k=5)
print(f"Top 5 indices: {indices}")
print(f"Similarities: {similarities}")

In [None]:
# SOLUTION
def find_similar_sentences_solution(query_embedding, sentence_embeddings, k=5):
    """
    Find top-k similar using cosine similarity
    """
    # Normalize query
    query_norm = query_embedding / np.linalg.norm(query_embedding)
    
    # Normalize sentences
    sentence_norms = sentence_embeddings / np.linalg.norm(
        sentence_embeddings, axis=1, keepdims=True
    )
    
    # Compute cosine similarities
    similarities = np.dot(sentence_norms, query_norm)
    
    # Get top-k indices
    top_k_indices = np.argsort(similarities)[::-1][:k]
    top_k_similarities = similarities[top_k_indices]
    
    return top_k_indices, top_k_similarities

# Test
np.random.seed(42)
query_embedding = np.random.randn(384)
sentence_embeddings = np.random.randn(100, 384)

indices, similarities = find_similar_sentences_solution(query_embedding, sentence_embeddings, k=5)
print(f"Top 5 indices: {indices}")
print(f"Similarities: {similarities}")

---
## Problem 3: Data Collator for Fine-tuning (Medium)

**Time:** 25-30 minutes  
**Topics:** Data Processing

Implement data collator for batching variable-length sequences.

### Your Turn!

In [None]:
def collate_batch(batch, tokenizer, max_length=512):
    """
    Your implementation
    
    Args:
        batch: List of {'input_ids': [...], 'labels': [...]}
        tokenizer: Has pad_token_id
        max_length: Max sequence length
    
    Returns:
        Dict with padded tensors
    """
    # TODO: Implement
    # 1. Find max length in batch
    # 2. Create padded tensors
    # 3. Create attention mask
    pass

# Test
class MockTokenizer:
    pad_token_id = 0

tokenizer = MockTokenizer()
batch = [
    {'input_ids': [1,2,3,4,5], 'labels': [1,2,3,4,5]},
    {'input_ids': [1,2,3], 'labels': [1,2,3]},
]

result = collate_batch(batch, tokenizer)
print(result)

In [None]:
# SOLUTION
def collate_batch_solution(batch, tokenizer, max_length=512):
    """
    Collate batch with padding
    """
    input_ids = [item['input_ids'] for item in batch]
    labels = [item['labels'] for item in batch]
    
    # Find max length in batch
    batch_max_length = min(max(len(ids) for ids in input_ids), max_length)
    
    batch_size = len(batch)
    padded_input_ids = torch.full(
        (batch_size, batch_max_length),
        tokenizer.pad_token_id,
        dtype=torch.long
    )
    padded_labels = torch.full(
        (batch_size, batch_max_length),
        -100,
        dtype=torch.long
    )
    attention_mask = torch.zeros((batch_size, batch_max_length), dtype=torch.long)
    
    for i, (ids, lbls) in enumerate(zip(input_ids, labels)):
        length = min(len(ids), batch_max_length)
        padded_input_ids[i, :length] = torch.tensor(ids[:length])
        padded_labels[i, :length] = torch.tensor(lbls[:length])
        attention_mask[i, :length] = 1
    
    return {
        'input_ids': padded_input_ids,
        'attention_mask': attention_mask,
        'labels': padded_labels
    }

# Test
tokenizer = MockTokenizer()
batch = [
    {'input_ids': [1,2,3,4,5], 'labels': [1,2,3,4,5]},
    {'input_ids': [1,2,3], 'labels': [1,2,3]},
]

result = collate_batch_solution(batch, tokenizer)
print("Input IDs:", result['input_ids'])
print("Attention mask:", result['attention_mask'])

---
## Problem 4: Gradient Descent (Easy-Medium)

**Time:** 20-25 minutes  
**Topics:** Optimization

Minimize f(x) = (x-3)² + 5 using gradient descent.

In [None]:
def gradient_descent(learning_rate=0.1, num_iterations=100, initial_x=0):
    """
    Your implementation
    
    f(x) = (x-3)^2 + 5
    f'(x) = 2(x-3)
    """
    # TODO: Implement
    pass

# Test
x_hist, f_hist = gradient_descent(learning_rate=0.1, num_iterations=50)
print(f"Final x: {x_hist[-1]}")
print(f"Final f(x): {f_hist[-1]}")

In [None]:
# SOLUTION
def gradient_descent_solution(learning_rate=0.1, num_iterations=100, initial_x=0):
    """
    Minimize f(x) = (x-3)^2 + 5
    """
    x = initial_x
    x_history = [x]
    f_history = []
    
    for i in range(num_iterations):
        # Function value
        f_x = (x - 3)**2 + 5
        f_history.append(f_x)
        
        # Gradient
        gradient = 2 * (x - 3)
        
        # Update
        x = x - learning_rate * gradient
        x_history.append(x)
        
        # Early stopping
        if abs(gradient) < 1e-6:
            print(f"Converged at iteration {i}")
            break
    
    return x_history, f_history

# Test
x_hist, f_hist = gradient_descent_solution(learning_rate=0.1, num_iterations=50)
print(f"Final x: {x_hist[-1]:.6f}")  # Should be ~3
print(f"Final f(x): {f_hist[-1]:.6f}")  # Should be ~5

---
## Problem 5: Confusion Matrix and Metrics (Easy)

**Time:** 15-20 minutes  
**Topics:** Evaluation

In [None]:
def compute_metrics(y_true, y_pred):
    """
    Your implementation
    
    Returns:
        Dict with precision, recall, f1_score
    """
    # TODO: Implement
    # Calculate TP, FP, FN, TN
    # precision = TP / (TP + FP)
    # recall = TP / (TP + FN)
    # f1 = 2 * (precision * recall) / (precision + recall)
    pass

# Test
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
print(compute_metrics(y_true, y_pred))

In [None]:
# SOLUTION
def compute_metrics_solution(y_true, y_pred):
    """
    Compute metrics from scratch
    """
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    fn = np.sum((y_true == 1) & (y_pred == 0))
    tn = np.sum((y_true == 0) & (y_pred == 0))
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
    accuracy = (tp + tn) / len(y_true)
    
    return {
        'precision': precision,
        'recall': recall,
        'f1_score': f1_score,
        'accuracy': accuracy,
        'confusion_matrix': {'tp': tp, 'fp': fp, 'fn': fn, 'tn': tn}
    }

# Test
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
metrics = compute_metrics_solution(y_true, y_pred)
for key, value in metrics.items():
    print(f"{key}: {value}")

---
## Problem 6: K-Means Clustering (Medium)

**Time:** 30-35 minutes  
**Topics:** Unsupervised Learning

In [None]:
class KMeans:
    def __init__(self, n_clusters=3, max_iters=100, random_state=42):
        self.n_clusters = n_clusters
        self.max_iters = max_iters
        self.random_state = random_state
        self.centroids = None
    
    def fit(self, X):
        """Your implementation"""
        # TODO: Implement
        # 1. Initialize centroids
        # 2. Assign points to clusters
        # 3. Update centroids
        # 4. Check convergence
        pass
    
    def predict(self, X):
        """Predict cluster labels"""
        pass

# Test
from sklearn.datasets import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4, random_state=42)

kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.predict(X)

---
## Problem 7: Softmax with Numerical Stability (Easy-Medium)

**Time:** 15-20 minutes  
**Topics:** Numerical Stability

In [None]:
def softmax_stable(x):
    """
    Your implementation
    
    Hint: Subtract max to prevent overflow
    """
    # TODO: Implement
    pass

# Test
x_large = np.array([[1000, 1001, 1002]])
print(softmax_stable(x_large))

In [None]:
# SOLUTION
def softmax_stable_solution(x):
    """
    Numerically stable softmax
    """
    x_shifted = x - np.max(x, axis=-1, keepdims=True)
    exp_x = np.exp(x_shifted)
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

# Test
x_large = np.array([[1000, 1001, 1002]])
result = softmax_stable_solution(x_large)
print("Stable softmax:", result)
print("Sum:", result.sum())

---
## Interview Tips for Coding Round

### 1. Clarify Requirements
- Ask about input/output formats
- Confirm edge cases
- Understand performance requirements

### 2. Think Aloud
- Explain your approach before coding
- Discuss trade-offs
- Mention alternative solutions

### 3. Write Clean Code
- Use meaningful variable names
- Add comments for complex logic
- Follow Python conventions (PEP 8)

### 4. Test Your Solution
- Walk through a simple example
- Test edge cases
- Verify output shapes and types

### 5. Discuss Complexity
- Time complexity
- Space complexity
- How to optimize for large scale

### 6. Be Ready to Extend
- How would you handle batching?
- How to optimize for production?
- What if dataset doesn't fit in memory?

---
## Topics to Review Before Interview

- [ ] NumPy operations (broadcasting, indexing, matrix operations)
- [ ] PyTorch basics (tensors, autograd, nn.Module)
- [ ] Attention mechanism implementation
- [ ] Data loading and batching
- [ ] Common ML metrics implementation
- [ ] Numerical stability tricks
- [ ] Vectorization techniques
- [ ] Basic optimization algorithms

## Good Luck!

You've practiced:
- Transformer components (attention)
- Similarity search (embeddings)
- Data processing (collation, padding)
- Optimization (gradient descent)
- Evaluation (metrics)
- Clustering (K-means)
- Numerical stability (softmax)

These cover the most common ML coding interview topics!