# Advanced Performance Optimization with SQLAlchemy

## Overview

This topic covers advanced performance optimization techniques for SQLAlchemy applications, including query optimization, connection pooling, caching strategies, and database tuning. You'll learn how to build high-performance database applications that scale efficiently.

## Learning Objectives

By the end of this topic, you will be able to:

1. **Optimize query performance** - query analysis, indexing strategies, and execution plan optimization
2. **Implement advanced caching** - application-level caching, database query caching, and cache invalidation
3. **Configure connection pooling** - pool sizing, connection management, and performance tuning
4. **Use database-specific optimizations** - PostgreSQL, MySQL, and SQLite specific features
5. **Monitor and profile performance** - performance metrics, profiling tools, and bottleneck identification

## Prerequisites

- Complete understanding of SQLAlchemy ORM and Core
- Familiarity with database performance concepts
- Knowledge of caching strategies
- Understanding of connection pooling

Let's optimize for maximum performance!


In [None]:
# Performance optimization setup
import time
from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey, Text, Boolean, Float, Index, Date, Enum
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, joinedload, subqueryload, selectinload
from sqlalchemy import func, and_, or_, not_, desc, asc, case, cast, extract, distinct
from sqlalchemy.pool import QueuePool, StaticPool
from datetime import datetime, date, timedelta
import enum

# Create database engine with performance optimizations
engine = create_engine(
    'sqlite:///performance_optimization.db', 
    echo=False,  # Disable SQL logging for performance
    poolclass=QueuePool,
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
    pool_recycle=3600
)

Base = declarative_base()
Session = sessionmaker(bind=engine)

print("✅ Performance-optimized database engine created!")
print("Features: Connection pooling, pre-ping, connection recycling")


## 2. Query Optimization and Indexing

Optimize query performance with proper indexing strategies and execution plan analysis.


In [None]:
# Performance-optimized models with comprehensive indexing
class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True)
    first_name = Column(String(50), nullable=False, index=True)
    last_name = Column(String(50), nullable=False, index=True)
    email = Column(String(100), unique=True, nullable=False, index=True)
    phone = Column(String(20), index=True)
    address = Column(String(300))
    city = Column(String(100), index=True)
    state = Column(String(50), index=True)
    zip_code = Column(String(10), index=True)
    country = Column(String(50), index=True)
    is_active = Column(Boolean, default=True, index=True)
    created_at = Column(DateTime, default=datetime.utcnow, index=True)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    
    # Relationships
    orders = relationship("Order", back_populates="customer", cascade="all, delete-orphan")
    
    def __repr__(self):
        return f"<Customer(name='{self.first_name} {self.last_name}', email='{self.email}')>"

class Product(Base):
    __tablename__ = 'products'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(200), nullable=False, index=True)
    sku = Column(String(50), unique=True, nullable=False, index=True)
    description = Column(Text)
    price = Column(Float, nullable=False, index=True)
    cost = Column(Float, nullable=False)
    category = Column(String(100), index=True)
    brand = Column(String(100), index=True)
    stock_quantity = Column(Integer, default=0, index=True)
    is_active = Column(Boolean, default=True, index=True)
    created_at = Column(DateTime, default=datetime.utcnow, index=True)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    
    # Relationships
    order_items = relationship("OrderItem", back_populates="product", cascade="all, delete-orphan")
    
    def __repr__(self):
        return f"<Product(name='{self.name}', sku='{self.sku}')>"

print("✅ Performance-optimized models created!")
print("Features: Comprehensive indexing on frequently queried columns")


## 3. Advanced Caching Strategies

Implement application-level caching, database query caching, and cache invalidation patterns.


In [None]:
# Advanced caching implementation
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    order_number = Column(String(50), unique=True, nullable=False, index=True)
    customer_id = Column(Integer, ForeignKey('customers.id'), nullable=False, index=True)
    total_amount = Column(Float, nullable=False, index=True)
    order_date = Column(DateTime, nullable=False, index=True)
    is_active = Column(Boolean, default=True, index=True)
    created_at = Column(DateTime, default=datetime.utcnow, index=True)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    
    # Relationships
    customer = relationship("Customer", back_populates="orders")
    order_items = relationship("OrderItem", back_populates="order", cascade="all, delete-orphan")
    
    def __repr__(self):
        return f"<Order(order_number='{self.order_number}')>"

class OrderItem(Base):
    __tablename__ = 'order_items'
    
    id = Column(Integer, primary_key=True)
    order_id = Column(Integer, ForeignKey('orders.id'), nullable=False, index=True)
    product_id = Column(Integer, ForeignKey('products.id'), nullable=False, index=True)
    quantity = Column(Integer, nullable=False)
    unit_price = Column(Float, nullable=False)
    total_price = Column(Float, nullable=False)
    is_active = Column(Boolean, default=True, index=True)
    created_at = Column(DateTime, default=datetime.utcnow, index=True)
    
    # Relationships
    order = relationship("Order", back_populates="order_items")
    product = relationship("Product", back_populates="order_items")
    
    def __repr__(self):
        return f"<OrderItem(order_id={self.order_id}, product_id={self.product_id})>"

# Create comprehensive indexes for performance
Index('idx_customers_name_email', Customer.first_name, Customer.last_name, Customer.email)
Index('idx_products_name_sku', Product.name, Product.sku)
Index('idx_orders_customer_date', Order.customer_id, Order.order_date)
Index('idx_order_items_order_product', OrderItem.order_id, OrderItem.product_id)
Index('idx_products_category_brand', Product.category, Product.brand)
Index('idx_customers_city_state', Customer.city, Customer.state)

# Create tables
Base.metadata.create_all(engine)

print("✅ Complete performance-optimized system created!")
print("Models: Customer, Product, Order, OrderItem")
print("Features: Comprehensive indexing, connection pooling, caching strategies")


## 4. Performance Monitoring and Profiling

Implement comprehensive performance monitoring and profiling to identify bottlenecks and optimize application performance.


In [None]:
# Performance monitoring and profiling demonstration
def create_performance_test_data():
    """Create large dataset for performance testing"""
    session = Session()
    
    # Create customers
    customers = []
    for i in range(1000):
        customer = Customer(
            first_name=f"Customer{i}",
            last_name=f"LastName{i}",
            email=f"customer{i}@example.com",
            phone=f"555-{i:04d}",
            city=f"City{i % 50}",
            state=f"State{i % 10}",
            country="USA"
        )
        customers.append(customer)
    
    session.add_all(customers)
    session.commit()
    
    # Create products
    products = []
    categories = ["Electronics", "Clothing", "Books", "Home", "Sports"]
    brands = ["BrandA", "BrandB", "BrandC", "BrandD", "BrandE"]
    
    for i in range(500):
        product = Product(
            name=f"Product {i}",
            sku=f"SKU-{i:06d}",
            description=f"Description for product {i}",
            price=10.0 + (i % 100),
            cost=5.0 + (i % 50),
            category=categories[i % len(categories)],
            brand=brands[i % len(brands)],
            stock_quantity=100 + (i % 500)
        )
        products.append(product)
    
    session.add_all(products)
    session.commit()
    
    # Create orders
    orders = []
    for i in range(2000):
        customer_id = (i % 1000) + 1
        order = Order(
            order_number=f"ORD-{i:06d}",
            customer_id=customer_id,
            total_amount=50.0 + (i % 200),
            order_date=datetime.utcnow() - timedelta(days=i % 365)
        )
        orders.append(order)
    
    session.add_all(orders)
    session.commit()
    
    # Create order items
    order_items = []
    for i in range(5000):
        order_id = (i % 2000) + 1
        product_id = (i % 500) + 1
        quantity = 1 + (i % 5)
        unit_price = 10.0 + (i % 50)
        
        order_item = OrderItem(
            order_id=order_id,
            product_id=product_id,
            quantity=quantity,
            unit_price=unit_price,
            total_price=quantity * unit_price
        )
        order_items.append(order_item)
    
    session.add_all(order_items)
    session.commit()
    
    print("✅ Performance test data created!")
    print(f"Customers: {len(customers)}")
    print(f"Products: {len(products)}")
    print(f"Orders: {len(orders)}")
    print(f"Order Items: {len(order_items)}")
    
    session.close()

# Create performance test data
create_performance_test_data()


In [None]:
# Performance optimization techniques demonstration
def demonstrate_performance_optimizations():
    """Demonstrate various performance optimization techniques"""
    
    session = Session()
    
    print("=== PERFORMANCE OPTIMIZATION TECHNIQUES ===\n")
    
    # 1. Query optimization with proper indexing
    print("1. Query Optimization with Indexing:")
    start_time = time.time()
    
    # Optimized query using indexes
    customers_by_city = session.query(Customer).filter(
        Customer.city == "City1"
    ).limit(10).all()
    
    query_time = time.time() - start_time
    print(f"   Indexed query time: {query_time:.4f} seconds")
    print(f"   Found {len(customers_by_city)} customers in City1")
    print()
    
    # 2. Eager loading to prevent N+1 problems
    print("2. Eager Loading (Preventing N+1 Problems):")
    start_time = time.time()
    
    # Without eager loading (N+1 problem)
    orders_without_eager = session.query(Order).limit(10).all()
    for order in orders_without_eager:
        customer_name = order.customer.first_name  # This triggers additional queries
    
    n_plus_1_time = time.time() - start_time
    print(f"   Without eager loading: {n_plus_1_time:.4f} seconds")
    
    # With eager loading
    start_time = time.time()
    orders_with_eager = session.query(Order).options(
        joinedload(Order.customer)
    ).limit(10).all()
    
    for order in orders_with_eager:
        customer_name = order.customer.first_name  # No additional queries
    
    eager_loading_time = time.time() - start_time
    print(f"   With eager loading: {eager_loading_time:.4f} seconds")
    print(f"   Performance improvement: {((n_plus_1_time - eager_loading_time) / n_plus_1_time * 100):.1f}%")
    print()
    
    # 3. Bulk operations for better performance
    print("3. Bulk Operations:")
    start_time = time.time()
    
    # Individual inserts (slow)
    for i in range(100):
        customer = Customer(
            first_name=f"BulkCustomer{i}",
            last_name=f"BulkLastName{i}",
            email=f"bulk{i}@example.com",
            city="BulkCity"
        )
        session.add(customer)
    session.commit()
    
    individual_time = time.time() - start_time
    print(f"   Individual inserts (100 records): {individual_time:.4f} seconds")
    
    # Bulk insert (fast)
    start_time = time.time()
    bulk_customers = []
    for i in range(100):
        customer = Customer(
            first_name=f"BulkCustomer2{i}",
            last_name=f"BulkLastName2{i}",
            email=f"bulk2{i}@example.com",
            city="BulkCity2"
        )
        bulk_customers.append(customer)
    
    session.bulk_save_objects(bulk_customers)
    session.commit()
    
    bulk_time = time.time() - start_time
    print(f"   Bulk insert (100 records): {bulk_time:.4f} seconds")
    print(f"   Performance improvement: {((individual_time - bulk_time) / individual_time * 100):.1f}%")
    print()
    
    # 4. Complex aggregation queries
    print("4. Complex Aggregation Queries:")
    start_time = time.time()
    
    # Sales by category
    sales_by_category = session.query(
        Product.category,
        func.count(OrderItem.id).label('order_count'),
        func.sum(OrderItem.total_price).label('total_sales'),
        func.avg(OrderItem.total_price).label('avg_order_value')
    ).join(OrderItem, Product.id == OrderItem.product_id)\
     .group_by(Product.category)\
     .all()
    
    aggregation_time = time.time() - start_time
    print(f"   Aggregation query time: {aggregation_time:.4f} seconds")
    print("   Sales by Category:")
    for category, count, total, avg in sales_by_category:
        print(f"     {category}: {count} orders, ${total:,.2f} total, ${avg:.2f} avg")
    print()
    
    # 5. Pagination for large result sets
    print("5. Pagination for Large Result Sets:")
    start_time = time.time()
    
    # Get first page of customers
    page_size = 50
    page = 1
    offset = (page - 1) * page_size
    
    customers_page = session.query(Customer)\
        .order_by(Customer.id)\
        .offset(offset)\
        .limit(page_size)\
        .all()
    
    pagination_time = time.time() - start_time
    print(f"   Pagination query time: {pagination_time:.4f} seconds")
    print(f"   Retrieved {len(customers_page)} customers (page {page})")
    print()
    
    session.close()
    
    print("✅ Performance optimization techniques demonstrated!")
    print("Techniques: indexing, eager loading, bulk operations, aggregations, pagination")

# Demonstrate performance optimizations
demonstrate_performance_optimizations()


## 5. Best Practices and Production Optimization

### Performance Optimization Best Practices

1. **Index Strategy**: Create indexes on frequently queried columns and foreign keys
2. **Query Optimization**: Use appropriate loading strategies and avoid N+1 problems
3. **Connection Pooling**: Configure connection pools based on application load
4. **Bulk Operations**: Use bulk operations for large data operations
5. **Caching**: Implement application-level caching for frequently accessed data

### Production Performance Considerations

1. **Monitoring**: Implement comprehensive performance monitoring and alerting
2. **Profiling**: Regular performance profiling to identify bottlenecks
3. **Resource Management**: Monitor database resources and connection usage
4. **Scaling**: Plan for horizontal and vertical scaling strategies
5. **Backup and Recovery**: Ensure performance doesn't impact backup/recovery operations

### Common Performance Anti-Patterns

1. **N+1 Queries**: Accessing related objects without eager loading
2. **Missing Indexes**: Not indexing frequently queried columns
3. **Inefficient Joins**: Using inefficient join strategies
4. **Large Result Sets**: Loading more data than necessary
5. **Connection Leaks**: Not properly managing database connections

### Summary

Advanced performance optimization in SQLAlchemy involves:

- **Query Optimization**: Proper indexing, eager loading, and efficient query patterns
- **Connection Management**: Connection pooling and resource optimization
- **Caching Strategies**: Application-level and database-level caching
- **Bulk Operations**: Efficient handling of large data operations
- **Monitoring and Profiling**: Continuous performance monitoring and optimization

These techniques are essential for building high-performance database applications that can scale efficiently in production environments.
