# Python Testing and Debugging

## Learning Objectives
By the end of this notebook, you will understand:
- Unit testing with unittest and pytest
- Test-driven development (TDD)
- Debugging techniques and tools
- Code coverage and quality metrics
- Integration and functional testing
- Mock objects and fixtures

## Why Testing and Debugging Matter for ML/NLP
- Ensures data pipeline reliability
- Validates model performance
- Catches preprocessing errors
- Maintains code quality in large projects

## 1. Introduction to Testing

In [None]:
# Example function to test
def calculate_accuracy(true_positives, true_negatives, total_samples):
    """
    Calculate accuracy for a classification model.
    
    Args:
        true_positives (int): Number of correctly predicted positive cases
        true_negatives (int): Number of correctly predicted negative cases
        total_samples (int): Total number of samples
    
    Returns:
        float: Accuracy score between 0 and 1
    """
    if total_samples <= 0:
        raise ValueError("Total samples must be positive")
    
    return (true_positives + true_negatives) / total_samples

def preprocess_text(text):
    """
    Simple text preprocessing for NLP.
    """
    if not isinstance(text, str):
        raise TypeError("Input must be a string")
    
    # Convert to lowercase and remove extra spaces
    return ' '.join(text.lower().split())

# Test the functions
print(f"Accuracy: {calculate_accuracy(85, 90, 200)}")
print(f"Preprocessed: '{preprocess_text('  Hello   WORLD  ')}'")

## 2. Unit Testing with unittest

In [None]:
import unittest
import sys
from io import StringIO

class TestMLFunctions(unittest.TestCase):
    """
    Unit tests for ML utility functions.
    """
    
    def test_calculate_accuracy_valid_input(self):
        """Test accuracy calculation with valid inputs."""
        result = calculate_accuracy(85, 90, 200)
        self.assertAlmostEqual(result, 0.875, places=3)
    
    def test_calculate_accuracy_perfect_score(self):
        """Test perfect accuracy score."""
        result = calculate_accuracy(50, 50, 100)
        self.assertEqual(result, 1.0)
    
    def test_calculate_accuracy_zero_correct(self):
        """Test accuracy with no correct predictions."""
        result = calculate_accuracy(0, 0, 100)
        self.assertEqual(result, 0.0)
    
    def test_calculate_accuracy_invalid_total(self):
        """Test accuracy calculation with invalid total samples."""
        with self.assertRaises(ValueError):
            calculate_accuracy(10, 10, 0)
        
        with self.assertRaises(ValueError):
            calculate_accuracy(10, 10, -5)
    
    def test_preprocess_text_normal_case(self):
        """Test text preprocessing with normal input."""
        result = preprocess_text("  Hello   WORLD  ")
        self.assertEqual(result, "hello world")
    
    def test_preprocess_text_empty_string(self):
        """Test text preprocessing with empty string."""
        result = preprocess_text("")
        self.assertEqual(result, "")
    
    def test_preprocess_text_invalid_type(self):
        """Test text preprocessing with invalid input type."""
        with self.assertRaises(TypeError):
            preprocess_text(123)
        
        with self.assertRaises(TypeError):
            preprocess_text(None)

# Run the tests
if __name__ == '__main__':
    # Capture test output
    test_output = StringIO()
    runner = unittest.TextTestRunner(stream=test_output, verbosity=2)
    suite = unittest.TestLoader().loadTestsFromTestCase(TestMLFunctions)
    result = runner.run(suite)
    
    print(test_output.getvalue())
    print(f"\nTests run: {result.testsRun}")
    print(f"Failures: {len(result.failures)}")
    print(f"Errors: {len(result.errors)}")

## 3. Test-Driven Development (TDD)

In [None]:
# TDD Example: Implementing a confusion matrix calculator

# Step 1: Write the test first
class TestConfusionMatrix(unittest.TestCase):
    def test_binary_confusion_matrix(self):
        """Test binary classification confusion matrix."""
        y_true = [1, 1, 0, 0, 1, 0, 1, 0]
        y_pred = [1, 0, 0, 0, 1, 1, 1, 0]
        
        expected = {
            'tp': 3,  # True Positives
            'tn': 3,  # True Negatives 
            'fp': 1,  # False Positives
            'fn': 1   # False Negatives
        }
        
        result = confusion_matrix_binary(y_true, y_pred)
        self.assertEqual(result, expected)
    
    def test_confusion_matrix_metrics(self):
        """Test precision, recall, F1 calculation from confusion matrix."""
        cm = {'tp': 3, 'tn': 3, 'fp': 1, 'fn': 1}
        
        metrics = calculate_metrics_from_cm(cm)
        
        self.assertAlmostEqual(metrics['precision'], 0.75, places=2)
        self.assertAlmostEqual(metrics['recall'], 0.75, places=2)
        self.assertAlmostEqual(metrics['f1'], 0.75, places=2)

# Step 2: Implement the function to make tests pass
def confusion_matrix_binary(y_true, y_pred):
    """
    Calculate confusion matrix for binary classification.
    
    Args:
        y_true (list): True labels
        y_pred (list): Predicted labels
    
    Returns:
        dict: Confusion matrix components
    """
    if len(y_true) != len(y_pred):
        raise ValueError("y_true and y_pred must have same length")
    
    tp = sum(1 for true, pred in zip(y_true, y_pred) if true == 1 and pred == 1)
    tn = sum(1 for true, pred in zip(y_true, y_pred) if true == 0 and pred == 0)
    fp = sum(1 for true, pred in zip(y_true, y_pred) if true == 0 and pred == 1)
    fn = sum(1 for true, pred in zip(y_true, y_pred) if true == 1 and pred == 0)
    
    return {'tp': tp, 'tn': tn, 'fp': fp, 'fn': fn}

def calculate_metrics_from_cm(cm):
    """
    Calculate precision, recall, and F1 score from confusion matrix.
    
    Args:
        cm (dict): Confusion matrix with tp, tn, fp, fn keys
    
    Returns:
        dict: Precision, recall, and F1 scores
    """
    tp, tn, fp, fn = cm['tp'], cm['tn'], cm['fp'], cm['fn']
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    return {
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

# Step 3: Run the tests
test_output = StringIO()
runner = unittest.TextTestRunner(stream=test_output, verbosity=2)
suite = unittest.TestLoader().loadTestsFromTestCase(TestConfusionMatrix)
result = runner.run(suite)

print(test_output.getvalue())
print(f"\nTDD Tests - Run: {result.testsRun}, Failures: {len(result.failures)}, Errors: {len(result.errors)}")

## 4. Advanced Testing with Fixtures and Setup

In [None]:
import tempfile
import os
import json

class TestDataProcessor(unittest.TestCase):
    """
    Test class demonstrating fixtures and setup/teardown.
    """
    
    def setUp(self):
        """Set up test fixtures before each test method."""
        self.temp_dir = tempfile.mkdtemp()
        
        # Create sample data files
        self.sample_data = {
            'texts': ['hello world', 'machine learning', 'natural language'],
            'labels': [1, 0, 1]
        }
        
        self.data_file = os.path.join(self.temp_dir, 'sample_data.json')
        with open(self.data_file, 'w') as f:
            json.dump(self.sample_data, f)
    
    def tearDown(self):
        """Clean up after each test method."""
        import shutil
        shutil.rmtree(self.temp_dir)
    
    def test_load_data(self):
        """Test data loading functionality."""
        data = load_json_data(self.data_file)
        self.assertEqual(data, self.sample_data)
    
    def test_data_validation(self):
        """Test data validation."""
        is_valid = validate_text_data(self.sample_data)
        self.assertTrue(is_valid)
        
        # Test invalid data
        invalid_data = {'texts': ['hello'], 'labels': [1, 0]}  # Mismatched lengths
        is_valid = validate_text_data(invalid_data)
        self.assertFalse(is_valid)

# Functions to test
def load_json_data(filepath):
    """Load data from JSON file."""
    with open(filepath, 'r') as f:
        return json.load(f)

def validate_text_data(data):
    """Validate text data format."""
    if 'texts' not in data or 'labels' not in data:
        return False
    
    if len(data['texts']) != len(data['labels']):
        return False
    
    return True

# Run the tests
test_output = StringIO()
runner = unittest.TextTestRunner(stream=test_output, verbosity=2)
suite = unittest.TestLoader().loadTestsFromTestCase(TestDataProcessor)
result = runner.run(suite)

print(test_output.getvalue())
print(f"\nFixture Tests - Run: {result.testsRun}, Failures: {len(result.failures)}, Errors: {len(result.errors)}")

## 5. Mock Objects and Dependency Injection

In [None]:
from unittest.mock import Mock, patch, MagicMock
import requests

# Example API client to test
class MLModelAPI:
    """Simple ML model API client."""
    
    def __init__(self, base_url):
        self.base_url = base_url
    
    def predict(self, text):
        """Get prediction from API."""
        response = requests.post(
            f"{self.base_url}/predict",
            json={'text': text}
        )
        response.raise_for_status()
        return response.json()
    
    def get_model_info(self):
        """Get model information."""
        response = requests.get(f"{self.base_url}/info")
        response.raise_for_status()
        return response.json()

class TestMLModelAPI(unittest.TestCase):
    """Test ML API client using mocks."""
    
    def setUp(self):
        self.api = MLModelAPI('http://test-api.com')
    
    @patch('requests.post')
    def test_predict_success(self, mock_post):
        """Test successful prediction."""
        # Mock the response
        mock_response = Mock()
        mock_response.json.return_value = {
            'prediction': 'positive',
            'confidence': 0.95
        }
        mock_response.raise_for_status.return_value = None
        mock_post.return_value = mock_response
        
        # Test the method
        result = self.api.predict('This is great!')
        
        # Assertions
        self.assertEqual(result['prediction'], 'positive')
        self.assertEqual(result['confidence'], 0.95)
        
        # Verify the mock was called correctly
        mock_post.assert_called_once_with(
            'http://test-api.com/predict',
            json={'text': 'This is great!'}
        )
    
    @patch('requests.post')
    def test_predict_http_error(self, mock_post):
        """Test API error handling."""
        # Mock HTTP error
        mock_response = Mock()
        mock_response.raise_for_status.side_effect = requests.HTTPError('500 Server Error')
        mock_post.return_value = mock_response
        
        # Test that exception is raised
        with self.assertRaises(requests.HTTPError):
            self.api.predict('test text')
    
    @patch('requests.get')
    def test_get_model_info(self, mock_get):
        """Test model info retrieval."""
        # Mock successful response
        mock_response = Mock()
        mock_response.json.return_value = {
            'model_name': 'bert-sentiment',
            'version': '1.0.0',
            'accuracy': 0.92
        }
        mock_response.raise_for_status.return_value = None
        mock_get.return_value = mock_response
        
        result = self.api.get_model_info()
        
        self.assertEqual(result['model_name'], 'bert-sentiment')
        self.assertEqual(result['accuracy'], 0.92)
        mock_get.assert_called_once_with('http://test-api.com/info')

# Run mock tests
test_output = StringIO()
runner = unittest.TextTestRunner(stream=test_output, verbosity=2)
suite = unittest.TestLoader().loadTestsFromTestCase(TestMLModelAPI)
result = runner.run(suite)

print(test_output.getvalue())
print(f"\nMock Tests - Run: {result.testsRun}, Failures: {len(result.failures)}, Errors: {len(result.errors)}")

## 6. Debugging Techniques

In [None]:
import logging
import traceback
from functools import wraps

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

def debug_decorator(func):
    """Decorator to add debugging information to functions."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        logger.debug(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
        try:
            result = func(*args, **kwargs)
            logger.debug(f"{func.__name__} returned: {result}")
            return result
        except Exception as e:
            logger.error(f"Error in {func.__name__}: {str(e)}")
            logger.error(f"Traceback: {traceback.format_exc()}")
            raise
    return wrapper

@debug_decorator
def process_ml_data(data, model_type='linear'):
    """Process ML data with debugging."""
    logger.info(f"Processing {len(data)} samples with {model_type} model")
    
    if not data:
        raise ValueError("Empty data provided")
    
    if model_type not in ['linear', 'neural', 'tree']:
        raise ValueError(f"Unsupported model type: {model_type}")
    
    # Simulate processing
    processed = [x * 2 for x in data]
    
    logger.info(f"Successfully processed data. Output size: {len(processed)}")
    return processed

# Demonstration of debugging
print("=== Debugging Examples ===")

# Successful case
try:
    result = process_ml_data([1, 2, 3, 4, 5], 'neural')
    print(f"Success: {result}")
except Exception as e:
    print(f"Error: {e}")

print("\n" + "="*50 + "\n")

# Error case - empty data
try:
    result = process_ml_data([], 'linear')
except Exception as e:
    print(f"Caught expected error: {e}")

print("\n" + "="*50 + "\n")

# Error case - invalid model type
try:
    result = process_ml_data([1, 2, 3], 'quantum')
except Exception as e:
    print(f"Caught expected error: {e}")

## 7. Advanced Debugging with pdb

In [None]:
import pdb

def complex_ml_function(X, y, learning_rate=0.01):
    """
    Simplified gradient descent implementation for debugging demo.
    """
    # Initialize weights
    weights = [0.0] * len(X[0]) if X else []
    
    print(f"Starting with weights: {weights}")
    
    for epoch in range(3):  # Simplified to 3 epochs
        total_error = 0
        
        for i, (features, target) in enumerate(zip(X, y)):
            # Simple prediction: dot product
            prediction = sum(w * f for w, f in zip(weights, features))
            
            # Calculate error
            error = target - prediction
            total_error += error ** 2
            
            # Update weights (simplified gradient descent)
            for j in range(len(weights)):
                weights[j] += learning_rate * error * features[j]
        
        avg_error = total_error / len(X)
        print(f"Epoch {epoch + 1}: Average Error = {avg_error:.4f}")
        print(f"Current weights: {[round(w, 4) for w in weights]}")
        
        # Set breakpoint here for debugging (commented out for notebook)
        # pdb.set_trace()  # Uncomment this line to debug interactively
    
    return weights

# Example usage
sample_X = [[1, 2], [2, 3], [3, 4], [4, 5]]
sample_y = [3, 5, 7, 9]  # y = x1 + x2

print("=== Debugging Complex ML Function ===")
final_weights = complex_ml_function(sample_X, sample_y, learning_rate=0.1)
print(f"\nFinal weights: {[round(w, 4) for w in final_weights]}")

# PDB commands reference (for interactive debugging):
print("\n=== PDB Commands Reference ===")
print("n (next): Execute next line")
print("s (step): Step into function calls")
print("c (continue): Continue execution")
print("l (list): Show current code")
print("p <var>: Print variable value")
print("pp <var>: Pretty print variable")
print("w (where): Show stack trace")
print("q (quit): Quit debugger")

## 8. Property-Based Testing

In [None]:
# Property-based testing example (simplified without external libraries)
import random

def property_test_text_preprocessing():
    """
    Property-based test for text preprocessing function.
    Tests that preprocessing is idempotent (applying twice gives same result).
    """
    test_cases = 0
    failures = []
    
    # Generate random test cases
    for _ in range(100):
        test_cases += 1
        
        # Generate random string with various whitespace patterns
        words = ['hello', 'world', 'machine', 'learning', 'NLP', 'TEST']
        text_parts = random.choices(words, k=random.randint(1, 5))
        
        # Add random whitespace
        text = ''
        for part in text_parts:
            text += ' ' * random.randint(0, 3) + part + ' ' * random.randint(0, 3)
        
        try:
            # Property: preprocessing should be idempotent
            processed_once = preprocess_text(text)
            processed_twice = preprocess_text(processed_once)
            
            if processed_once != processed_twice:
                failures.append({
                    'input': repr(text),
                    'first_result': repr(processed_once),
                    'second_result': repr(processed_twice)
                })
        
        except Exception as e:
            failures.append({
                'input': repr(text),
                'error': str(e)
            })
    
    return test_cases, failures

def property_test_confusion_matrix():
    """
    Property-based test for confusion matrix calculation.
    Tests that tp + tn + fp + fn equals total samples.
    """
    test_cases = 0
    failures = []
    
    for _ in range(50):
        test_cases += 1
        
        # Generate random binary classification data
        n_samples = random.randint(10, 100)
        y_true = [random.randint(0, 1) for _ in range(n_samples)]
        y_pred = [random.randint(0, 1) for _ in range(n_samples)]
        
        try:
            cm = confusion_matrix_binary(y_true, y_pred)
            total_from_cm = cm['tp'] + cm['tn'] + cm['fp'] + cm['fn']
            
            # Property: sum of confusion matrix should equal total samples
            if total_from_cm != n_samples:
                failures.append({
                    'n_samples': n_samples,
                    'cm_total': total_from_cm,
                    'cm': cm
                })
        
        except Exception as e:
            failures.append({
                'n_samples': n_samples,
                'error': str(e)
            })
    
    return test_cases, failures

# Run property-based tests
print("=== Property-Based Testing ===")

# Test text preprocessing idempotence
cases, failures = property_test_text_preprocessing()
print(f"Text preprocessing test: {cases} cases, {len(failures)} failures")
if failures:
    print("Failures:", failures[:3])  # Show first 3 failures

# Test confusion matrix properties
cases, failures = property_test_confusion_matrix()
print(f"Confusion matrix test: {cases} cases, {len(failures)} failures")
if failures:
    print("Failures:", failures[:3])  # Show first 3 failures

print("\nProperty-based testing helps discover edge cases and ensure invariants!")

## 9. Integration Testing Example

In [None]:
# Integration testing for an ML pipeline

class SimpleMLPipeline:
    """Simple ML pipeline for integration testing."""
    
    def __init__(self):
        self.is_trained = False
        self.model_weights = None
        self.preprocessing_params = None
    
    def preprocess(self, texts):
        """Preprocess text data."""
        if not isinstance(texts, list):
            raise TypeError("Input must be a list")
        
        # Simple preprocessing: lowercase and tokenize
        processed = []
        vocab = set()
        
        for text in texts:
            cleaned = preprocess_text(text)
            tokens = cleaned.split()
            processed.append(tokens)
            vocab.update(tokens)
        
        # Create simple vocabulary mapping
        self.preprocessing_params = {word: i for i, word in enumerate(sorted(vocab))}
        
        return processed
    
    def vectorize(self, tokenized_texts):
        """Convert tokens to vectors."""
        if not self.preprocessing_params:
            raise ValueError("Must call preprocess first")
        
        vectors = []
        vocab_size = len(self.preprocessing_params)
        
        for tokens in tokenized_texts:
            vector = [0] * vocab_size
            for token in tokens:
                if token in self.preprocessing_params:
                    vector[self.preprocessing_params[token]] = 1
            vectors.append(vector)
        
        return vectors
    
    def train(self, X, y, epochs=3):
        """Train the model."""
        if not X or not y:
            raise ValueError("Training data cannot be empty")
        
        if len(X) != len(y):
            raise ValueError("X and y must have same length")
        
        # Simple training: initialize weights
        feature_count = len(X[0]) if X else 0
        self.model_weights = [0.1] * feature_count
        
        # Simulate training
        for epoch in range(epochs):
            for features, label in zip(X, y):
                prediction = sum(w * f for w, f in zip(self.model_weights, features))
                error = label - prediction
                
                # Update weights
                for i in range(len(self.model_weights)):
                    self.model_weights[i] += 0.01 * error * features[i]
        
        self.is_trained = True
    
    def predict(self, X):
        """Make predictions."""
        if not self.is_trained:
            raise ValueError("Model must be trained first")
        
        predictions = []
        for features in X:
            prediction = sum(w * f for w, f in zip(self.model_weights, features))
            predictions.append(1 if prediction > 0.5 else 0)
        
        return predictions

class TestMLPipelineIntegration(unittest.TestCase):
    """Integration tests for the ML pipeline."""
    
    def setUp(self):
        self.pipeline = SimpleMLPipeline()
        self.sample_texts = [
            'This is positive',
            'This is negative', 
            'Great product',
            'Bad experience'
        ]
        self.sample_labels = [1, 0, 1, 0]
    
    def test_full_pipeline_integration(self):
        """Test the entire ML pipeline from text to prediction."""
        # Step 1: Preprocess
        tokenized = self.pipeline.preprocess(self.sample_texts)
        self.assertEqual(len(tokenized), len(self.sample_texts))
        self.assertIsNotNone(self.pipeline.preprocessing_params)
        
        # Step 2: Vectorize
        vectors = self.pipeline.vectorize(tokenized)
        self.assertEqual(len(vectors), len(self.sample_texts))
        self.assertTrue(all(isinstance(v, list) for v in vectors))
        
        # Step 3: Train
        self.pipeline.train(vectors, self.sample_labels)
        self.assertTrue(self.pipeline.is_trained)
        self.assertIsNotNone(self.pipeline.model_weights)
        
        # Step 4: Predict
        predictions = self.pipeline.predict(vectors)
        self.assertEqual(len(predictions), len(self.sample_texts))
        self.assertTrue(all(p in [0, 1] for p in predictions))
        
        print(f"Pipeline test completed successfully!")
        print(f"Original texts: {self.sample_texts}")
        print(f"True labels: {self.sample_labels}")
        print(f"Predictions: {predictions}")
    
    def test_pipeline_with_new_data(self):
        """Test pipeline with new unseen data."""
        # Train pipeline
        tokenized = self.pipeline.preprocess(self.sample_texts)
        vectors = self.pipeline.vectorize(tokenized)
        self.pipeline.train(vectors, self.sample_labels)
        
        # Test with new data
        new_texts = ['Amazing quality', 'Terrible service']
        new_tokenized = self.pipeline.preprocess(new_texts)
        new_vectors = self.pipeline.vectorize(new_tokenized)
        predictions = self.pipeline.predict(new_vectors)
        
        self.assertEqual(len(predictions), 2)
        print(f"New data predictions: {dict(zip(new_texts, predictions))}")

# Run integration tests
test_output = StringIO()
runner = unittest.TextTestRunner(stream=test_output, verbosity=2)
suite = unittest.TestLoader().loadTestsFromTestCase(TestMLPipelineIntegration)
result = runner.run(suite)

print(test_output.getvalue())
print(f"\nIntegration Tests - Run: {result.testsRun}, Failures: {len(result.failures)}, Errors: {len(result.errors)}")

## 10. Performance Testing and Profiling

In [None]:
import time
import cProfile
import pstats
from io import StringIO

def performance_test_decorator(func):
    """Decorator to measure function performance."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        
        execution_time = end_time - start_time
        print(f"{func.__name__} executed in {execution_time:.4f} seconds")
        return result
    return wrapper

@performance_test_decorator
def inefficient_text_processing(texts):
    """Inefficient text processing for performance testing."""
    result = []
    for text in texts:
        # Inefficient: multiple string operations
        processed = text.lower()
        processed = processed.strip()
        processed = ' '.join(processed.split())
        
        # Inefficient: nested loops
        word_count = 0
        for word in processed.split():
            for char in word:
                if char.isalpha():
                    word_count += 0.001  # Simulate work
        
        result.append(processed)
    return result

@performance_test_decorator
def efficient_text_processing(texts):
    """More efficient text processing."""
    result = []
    for text in texts:
        # More efficient: single pass
        processed = ' '.join(text.lower().split())
        result.append(processed)
    return result

# Performance comparison
print("=== Performance Testing ===")

# Generate test data
test_texts = [f"Sample text number {i} with some content" for i in range(1000)]

print("Testing inefficient version:")
result1 = inefficient_text_processing(test_texts)

print("\nTesting efficient version:")
result2 = efficient_text_processing(test_texts)

# Verify results are the same
print(f"\nResults match: {result1 == result2}")

# Profiling example
def profile_function(func, *args, **kwargs):
    """Profile a function and return statistics."""
    profiler = cProfile.Profile()
    profiler.enable()
    
    result = func(*args, **kwargs)
    
    profiler.disable()
    
    # Get statistics
    stats_stream = StringIO()
    stats = pstats.Stats(profiler, stream=stats_stream)
    stats.sort_stats('cumulative')
    stats.print_stats(10)  # Show top 10 functions
    
    return result, stats_stream.getvalue()

print("\n=== Profiling Results ===")
result, profile_output = profile_function(inefficient_text_processing, test_texts[:100])
print(profile_output[:500] + "..." if len(profile_output) > 500 else profile_output)

## 11. Test Coverage Analysis

In [None]:
# Simple test coverage tracking (manual implementation)
class CoverageTracker:
    """Simple test coverage tracker."""
    
    def __init__(self):
        self.executed_lines = set()
        self.total_lines = set()
    
    def mark_line(self, line_id):
        """Mark a line as executed."""
        self.executed_lines.add(line_id)
        self.total_lines.add(line_id)
    
    def add_total_line(self, line_id):
        """Add a line to total count without marking as executed."""
        self.total_lines.add(line_id)
    
    def get_coverage(self):
        """Get coverage percentage."""
        if not self.total_lines:
            return 0.0
        return (len(self.executed_lines) / len(self.total_lines)) * 100
    
    def get_uncovered_lines(self):
        """Get lines that weren't executed."""
        return self.total_lines - self.executed_lines

# Instrumented function for coverage tracking
coverage = CoverageTracker()

def instrumented_sentiment_analyzer(text):
    """Simple sentiment analyzer with coverage tracking."""
    coverage.add_total_line('func_start')
    coverage.mark_line('func_start')
    
    if not isinstance(text, str):
        coverage.add_total_line('type_error')
        coverage.mark_line('type_error')
        raise TypeError("Input must be string")
    
    coverage.add_total_line('empty_check')
    coverage.mark_line('empty_check')
    if not text.strip():
        coverage.add_total_line('empty_return')
        coverage.mark_line('empty_return')
        return 'neutral'
    
    coverage.add_total_line('preprocessing')
    coverage.mark_line('preprocessing')
    text_lower = text.lower()
    
    # Simple sentiment keywords
    positive_words = ['good', 'great', 'excellent', 'amazing', 'love']
    negative_words = ['bad', 'terrible', 'awful', 'hate', 'horrible']
    
    coverage.add_total_line('scoring')
    coverage.mark_line('scoring')
    positive_score = sum(1 for word in positive_words if word in text_lower)
    negative_score = sum(1 for word in negative_words if word in text_lower)
    
    coverage.add_total_line('decision')
    coverage.mark_line('decision')
    if positive_score > negative_score:
        coverage.add_total_line('positive_return')
        coverage.mark_line('positive_return')
        return 'positive'
    elif negative_score > positive_score:
        coverage.add_total_line('negative_return')
        coverage.mark_line('negative_return')
        return 'negative'
    else:
        coverage.add_total_line('neutral_return')
        coverage.mark_line('neutral_return')
        return 'neutral'

class TestSentimentWithCoverage(unittest.TestCase):
    """Tests with coverage tracking."""
    
    def test_positive_sentiment(self):
        result = instrumented_sentiment_analyzer("This is great!")
        self.assertEqual(result, 'positive')
    
    def test_negative_sentiment(self):
        result = instrumented_sentiment_analyzer("This is terrible!")
        self.assertEqual(result, 'negative')
    
    def test_neutral_sentiment(self):
        result = instrumented_sentiment_analyzer("This is okay")
        self.assertEqual(result, 'neutral')
    
    def test_empty_text(self):
        result = instrumented_sentiment_analyzer("  ")
        self.assertEqual(result, 'neutral')
    
    # Note: We're not testing the TypeError case

# Run tests with coverage
print("=== Test Coverage Analysis ===")

# Reset coverage
coverage = CoverageTracker()

# Add all possible lines to total count
for line_id in ['func_start', 'type_error', 'empty_check', 'empty_return', 
                'preprocessing', 'scoring', 'decision', 'positive_return', 
                'negative_return', 'neutral_return']:
    coverage.add_total_line(line_id)

# Run tests
test_output = StringIO()
runner = unittest.TextTestRunner(stream=test_output, verbosity=1)
suite = unittest.TestLoader().loadTestsFromTestCase(TestSentimentWithCoverage)
result = runner.run(suite)

print(f"Tests run: {result.testsRun}")
print(f"Test coverage: {coverage.get_coverage():.1f}%")
print(f"Executed lines: {len(coverage.executed_lines)}")
print(f"Total lines: {len(coverage.total_lines)}")
print(f"Uncovered lines: {coverage.get_uncovered_lines()}")

print("\n=== Coverage Report ===")
print("Lines not covered by tests:")
for line in coverage.get_uncovered_lines():
    print(f"  - {line}")

print("\nTo improve coverage, add tests for:")
print("  - TypeError handling (invalid input type)")
print("  - Additional edge cases")

## 12. Testing Best Practices Summary

In [None]:
# Summary of testing best practices for ML/NLP projects

print("=== Testing Best Practices for ML/NLP ===")

best_practices = {
    "Test Structure": [
        "Use descriptive test names",
        "Follow AAA pattern: Arrange, Act, Assert",
        "One assertion per test when possible",
        "Use setUp and tearDown for fixtures"
    ],
    
    "ML-Specific Testing": [
        "Test data preprocessing pipelines",
        "Validate model input/output shapes",
        "Test edge cases in data handling",
        "Mock external dependencies (APIs, databases)",
        "Test model serialization/deserialization"
    ],
    
    "Test Types": [
        "Unit tests: Individual functions",
        "Integration tests: Component interaction",
        "End-to-end tests: Full pipeline",
        "Performance tests: Speed and memory",
        "Property-based tests: Invariants"
    ],
    
    "Data Testing": [
        "Validate data schemas and types",
        "Test data cleaning functions",
        "Check for data leakage in splits",
        "Verify feature engineering correctness",
        "Test handling of missing values"
    ],
    
    "Debugging Tips": [
        "Use logging instead of print statements",
        "Set up proper error handling",
        "Use debugger (pdb) for complex issues",
        "Profile performance bottlenecks",
        "Write reproducible test cases"
    ]
}

for category, practices in best_practices.items():
    print(f"\n{category}:")
    for practice in practices:
        print(f"  • {practice}")

print("\n=== Common ML Testing Patterns ===")

patterns = [
    "✓ Test with small synthetic datasets first",
    "✓ Use fixtures for consistent test data", 
    "✓ Mock external services and APIs",
    "✓ Test boundary conditions and edge cases",
    "✓ Verify numerical stability and precision",
    "✓ Test model behavior with different data distributions",
    "✓ Validate that training improves performance",
    "✓ Test model persistence and loading",
    "✓ Use property-based testing for mathematical functions",
    "✓ Test preprocessing consistency across train/test"
]

for pattern in patterns:
    print(pattern)

print("\n" + "="*60)
print("Remember: Good tests make refactoring safe and debugging easier!")
print("Write tests first (TDD) when possible, and aim for high coverage.")
print("="*60)

## Practice Exercises

### Exercise 1: Write Unit Tests
Create unit tests for a function that calculates TF-IDF scores for documents.

### Exercise 2: Mock External API
Write tests for a function that calls a translation API, using mocks to simulate API responses.

### Exercise 3: Integration Testing
Create integration tests for a complete text classification pipeline.

### Exercise 4: Debug Complex Function
Use debugging techniques to fix a buggy implementation of a recommendation algorithm.

### Exercise 5: Performance Testing
Compare the performance of different text similarity algorithms and write performance tests.

## Key Takeaways

1. **Testing Philosophy**: Write tests first, test behavior not implementation
2. **Test Types**: Unit, integration, end-to-end, and performance tests all have their place
3. **ML Testing**: Special considerations for data pipelines, model validation, and reproducibility
4. **Debugging**: Use systematic approaches, logging, and proper tools
5. **Coverage**: Aim for high test coverage but focus on critical paths
6. **Automation**: Integrate testing into your development workflow

These testing and debugging skills are essential for building reliable ML and NLP systems!