# Logging with Real-World Examples

This notebook demonstrates practical logging implementations for common real-world scenarios. We'll explore best practices and patterns used in production applications.

## What We'll Cover

1. Rotating File Handlers (for long-running applications)
2. Logging in Web Applications
3. Logging in Data Processing Pipelines
4. Structured Logging (JSON format)
5. Logging Best Practices
6. Common Pitfalls to Avoid

---

## 1. Rotating File Handlers

For applications that run continuously, log files can grow very large. Rotating file handlers automatically create new log files based on size or time.

### RotatingFileHandler - Size-Based Rotation

Rotates log files when they reach a certain size.

In [None]:
import logging
from logging.handlers import RotatingFileHandler

# Create logger
logger = logging.getLogger('app')
logger.setLevel(logging.DEBUG)

# Create rotating file handler
# maxBytes: maximum file size before rotation (1 MB = 1024*1024 bytes)
# backupCount: number of backup files to keep
handler = RotatingFileHandler(
    'app.log',
    maxBytes=1024*1024,  # 1 MB
    backupCount=5  # Keep 5 backup files (app.log.1, app.log.2, etc.)
)

# Set format
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

# Generate logs
for i in range(100):
    logger.info(f"Log message number {i}")
    logger.debug(f"Debug info for iteration {i}")

print("Check app.log file. When it reaches 1MB, it will rotate to app.log.1")

### TimedRotatingFileHandler - Time-Based Rotation

Rotates log files at certain time intervals (hourly, daily, weekly, etc.).

In [None]:
import logging
from logging.handlers import TimedRotatingFileHandler
import time

# Create logger
logger = logging.getLogger('timed_app')
logger.setLevel(logging.INFO)

# Create timed rotating handler
# when: 'S' = seconds, 'M' = minutes, 'H' = hours, 'D' = days, 'midnight' = daily at midnight
# interval: how many units to wait before rotating
# backupCount: number of old log files to keep
handler = TimedRotatingFileHandler(
    'timed_app.log',
    when='midnight',  # Rotate daily at midnight
    interval=1,
    backupCount=7  # Keep 7 days of logs
)

formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

# Log some messages
logger.info("Application started")
logger.warning("This is a warning")
logger.error("This is an error")

print("Log file will rotate daily at midnight. Old logs: timed_app.log.YYYY-MM-DD")

---

## 2. Logging in Web Applications

Example of logging in a Flask-like web application with request tracking.

In [None]:
import logging
import uuid
from datetime import datetime

# Configure logging for web app
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - [%(request_id)s] - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

class WebApp:
    def __init__(self):
        self.logger = logging.getLogger('webapp')
    
    def handle_request(self, method, endpoint, user_id=None):
        # Generate unique request ID for tracking
        request_id = str(uuid.uuid4())[:8]
        
        # Create custom logger adapter to include request ID
        logger_adapter = logging.LoggerAdapter(
            self.logger, 
            {'request_id': request_id}
        )
        
        logger_adapter.info(f"{method} {endpoint} - User: {user_id}")
        
        # Simulate processing
        try:
            # Simulate some work
            if endpoint == "/api/users":
                logger_adapter.debug("Fetching users from database")
                logger_adapter.info("Users retrieved successfully")
            elif endpoint == "/api/error":
                raise ValueError("Simulated error")
        except Exception as e:
            logger_adapter.error(f"Error processing request: {e}")
        
        logger_adapter.info(f"Request completed - {endpoint}")

# Test web app logging
app = WebApp()
app.handle_request("GET", "/api/users", user_id=123)
app.handle_request("POST", "/api/orders", user_id=456)
app.handle_request("GET", "/api/error", user_id=789)

---

## 3. Logging in Data Processing Pipeline

Example of logging in a data processing/ETL pipeline with progress tracking.

In [None]:
import logging
import time

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('pipeline.log'),
        logging.StreamHandler()
    ]
)

class DataPipeline:
    def __init__(self, name):
        self.logger = logging.getLogger(f'pipeline.{name}')
        self.name = name
    
    def extract(self, source):
        """Extract data from source"""
        self.logger.info(f"Starting extraction from {source}")
        try:
            # Simulate extraction
            time.sleep(0.1)
            self.logger.info(f"Successfully extracted data from {source}")
            return ["data1", "data2", "data3"]
        except Exception as e:
            self.logger.error(f"Extraction failed: {e}")
            raise
    
    def transform(self, data):
        """Transform data"""
        self.logger.info(f"Starting transformation of {len(data)} records")
        try:
            # Simulate transformation
            transformed = [d.upper() for d in data]
            self.logger.debug(f"Transformation details: {data} -> {transformed}")
            self.logger.info("Transformation completed successfully")
            return transformed
        except Exception as e:
            self.logger.error(f"Transformation failed: {e}")
            raise
    
    def load(self, data, destination):
        """Load data to destination"""
        self.logger.info(f"Loading {len(data)} records to {destination}")
        try:
            # Simulate loading
            time.sleep(0.1)
            self.logger.info(f"Successfully loaded data to {destination}")
        except Exception as e:
            self.logger.error(f"Loading failed: {e}")
            raise
    
    def run(self, source, destination):
        """Run the complete ETL pipeline"""
        self.logger.info(f"===== Pipeline {self.name} Started =====")
        start_time = time.time()
        
        try:
            data = self.extract(source)
            transformed_data = self.transform(data)
            self.load(transformed_data, destination)
            
            elapsed = time.time() - start_time
            self.logger.info(f"===== Pipeline {self.name} Completed in {elapsed:.2f}s =====")
        except Exception as e:
            self.logger.critical(f"Pipeline failed: {e}")
            raise

# Run the pipeline
pipeline = DataPipeline("user_data_etl")
pipeline.run("database_source", "data_warehouse")

---

## 4. Logging Best Practices

**Best Practices:**

1. **Use Appropriate Log Levels**
   - DEBUG: Detailed diagnostic information (development only)
   - INFO: Confirmation things are working (milestones, checkpoints)
   - WARNING: Something unexpected but not critical
   - ERROR: Serious problem, but application can continue
   - CRITICAL: Fatal error, application may not continue

2. **Include Contextual Information**
   - User IDs, request IDs, session IDs
   - Timestamps (automatic with proper formatting)
   - Function/module names
   - Error details and stack traces

3. **Don't Log Sensitive Data**
   - Never log passwords, API keys, credit card numbers
   - Be careful with personally identifiable information (PII)
   - Mask sensitive data if necessary

4. **Use Structured Logging**
   - Makes logs easier to parse and analyze
   - Better for log aggregation tools
   - Enables programmatic log processing

5. **Configure Logging Early**
   - Set up logging at application startup
   - Use configuration files for complex setups
   - Don't call `basicConfig()` multiple times

6. **Use Loggers, Not Root**
   - Create named loggers: `logging.getLogger(__name__)`
   - Avoid using `logging.info()` directly (uses root logger)
   - Better control and organization

---

## Summary

**Key Takeaways:**

1. **Rotating Handlers**: Use for production applications to manage log file sizes
   - `RotatingFileHandler`: Size-based rotation
   - `TimedRotatingFileHandler`: Time-based rotation

2. **Web Application Logging**: Include request IDs for tracking user journeys

3. **Data Pipeline Logging**: Log each stage (extract, transform, load) with timing info

4. **Best Practices**:
   - Use appropriate log levels
   - Include context (IDs, timestamps)
   - Never log sensitive data
   - Configure logging early
   - Use named loggers

5. **Common Patterns**:
   - Multiple handlers (console + file)
   - Different log levels for development vs production
   - Structured logging for analysis
   - Request/transaction tracking with unique IDs

**Next Steps:**
- Implement logging in your projects
- Explore log aggregation tools (ELK stack, Splunk)
- Learn about distributed tracing for microservices