# Example 25: Real-World Problem Solving

## Learning Objective
Learn to tackle real-world programming problems from start to finish.

## The Problem: Build a Log Analyzer

We need to:
1. Parse web server logs
2. Count requests by status code
3. Find high-error IPs
4. Generate a report

## Step 1: Define the Data Model

In [None]:
import re
from dataclasses import dataclass
from datetime import datetime
from collections import Counter, defaultdict


@dataclass
class LogEntry:
    """A single parsed log entry."""
    ip_address: str
    timestamp: str
    method: str
    path: str
    status_code: int
    response_size: int
    
    @property
    def is_error(self) -> bool:
        return self.status_code >= 400

## Step 2: Build the Parser

In [None]:
class LogParser:
    """Parser for Common Log Format."""
    
    # Pattern: IP - - [timestamp] "METHOD /path HTTP/1.1" status size
    PATTERN = re.compile(
        r'(?P<ip>[\d.]+)\s+'
        r'-\s+-\s+'
        r'\[(?P<timestamp>[^\]]+)\]\s+'
        r'"(?P<method>\w+)\s+(?P<path>\S+)\s+[^"]+"\s+'
        r'(?P<status>\d+)\s+'
        r'(?P<size>\d+)'
    )
    
    def parse_line(self, line: str) -> LogEntry | None:
        """Parse a single log line."""
        match = self.PATTERN.match(line)
        if not match:
            return None
        
        return LogEntry(
            ip_address=match.group('ip'),
            timestamp=match.group('timestamp'),
            method=match.group('method'),
            path=match.group('path'),
            status_code=int(match.group('status')),
            response_size=int(match.group('size'))
        )


# Test parser
parser = LogParser()
test_line = '192.168.1.1 - - [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234'
entry = parser.parse_line(test_line)
print(f"Parsed: {entry}")

## Step 3: Build the Analyzer

In [None]:
class LogAnalyzer:
    """Analyze parsed log entries."""
    
    def __init__(self, entries: list[LogEntry]):
        self.entries = entries
    
    def get_status_counts(self) -> dict[int, int]:
        """Count requests by status code."""
        return dict(Counter(e.status_code for e in self.entries))
    
    def get_top_endpoints(self, n: int = 10) -> list[tuple[str, int]]:
        """Get most requested endpoints."""
        return Counter(e.path for e in self.entries).most_common(n)
    
    def get_error_rate(self) -> float:
        """Calculate overall error rate."""
        if not self.entries:
            return 0.0
        errors = sum(1 for e in self.entries if e.is_error)
        return errors / len(self.entries)
    
    def find_suspicious_ips(self, threshold: float = 0.5, min_requests: int = 5) -> list[dict]:
        """Find IPs with high error rates."""
        ip_stats = defaultdict(lambda: {'total': 0, 'errors': 0})
        
        for entry in self.entries:
            ip_stats[entry.ip_address]['total'] += 1
            if entry.is_error:
                ip_stats[entry.ip_address]['errors'] += 1
        
        suspicious = []
        for ip, stats in ip_stats.items():
            if stats['total'] >= min_requests:
                error_rate = stats['errors'] / stats['total']
                if error_rate >= threshold:
                    suspicious.append({
                        'ip': ip,
                        'total': stats['total'],
                        'errors': stats['errors'],
                        'error_rate': round(error_rate, 2)
                    })
        
        return sorted(suspicious, key=lambda x: x['error_rate'], reverse=True)

## Step 4: Test with Sample Data

In [None]:
# Generate sample data
sample_logs = [
    '192.168.1.1 - - [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234',
    '192.168.1.1 - - [15/Jan/2024:10:30:01 +0000] "GET /api/users HTTP/1.1" 200 1234',
    '192.168.1.2 - - [15/Jan/2024:10:30:02 +0000] "POST /api/login HTTP/1.1" 401 89',
    '192.168.1.2 - - [15/Jan/2024:10:30:03 +0000] "POST /api/login HTTP/1.1" 401 89',
    '192.168.1.2 - - [15/Jan/2024:10:30:04 +0000] "POST /api/login HTTP/1.1" 401 89',
    '192.168.1.2 - - [15/Jan/2024:10:30:05 +0000] "POST /api/login HTTP/1.1" 401 89',
    '192.168.1.2 - - [15/Jan/2024:10:30:06 +0000] "POST /api/login HTTP/1.1" 401 89',
    '192.168.1.3 - - [15/Jan/2024:10:30:07 +0000] "GET /api/products HTTP/1.1" 200 5678',
    '192.168.1.3 - - [15/Jan/2024:10:30:08 +0000] "GET /api/products HTTP/1.1" 500 456',
    '192.168.1.1 - - [15/Jan/2024:10:30:09 +0000] "DELETE /api/users/1 HTTP/1.1" 404 78',
]

# Parse logs
parser = LogParser()
entries = [parser.parse_line(line) for line in sample_logs]
entries = [e for e in entries if e is not None]

# Analyze
analyzer = LogAnalyzer(entries)

print("=" * 50)
print("LOG ANALYSIS REPORT")
print("=" * 50)

print(f"\nTotal Requests: {len(entries)}")
print(f"Error Rate: {analyzer.get_error_rate():.1%}")

print("\nStatus Codes:")
for status, count in sorted(analyzer.get_status_counts().items()):
    print(f"  {status}: {count}")

print("\nTop Endpoints:")
for path, count in analyzer.get_top_endpoints(5):
    print(f"  {count:>4}  {path}")

print("\nSuspicious IPs (high error rate):")
for item in analyzer.find_suspicious_ips():
    print(f"  {item['ip']}: {item['errors']}/{item['total']} errors ({item['error_rate']:.0%})")

## Real-World Problem-Solving Process

1. **Understand Requirements** - What exactly needs solving?
2. **Design Structure** - Plan classes and modules
3. **Implement Incrementally** - One piece at a time
4. **Test As You Go** - Verify each step
5. **Refactor** - Improve code quality
6. **Document** - Help future maintainers

## Practice: Build Your Own Tool

Try building one of these:

1. **CSV Data Transformer** - Read CSV, transform, output JSON
2. **Duplicate File Finder** - Find files with same hash
3. **Config Validator** - Validate YAML/JSON against rules
4. **Markdown to HTML** - Simple static site generator

In [None]:
# Start your project here!
# Prompt: "Help me build a [tool] that [requirements]"
