# Optimizing JOIN Operations

This notebook demonstrates various techniques for optimizing JOIN operations in PostgreSQL. We'll cover:
* Different types of joins and their performance implications
* Join order optimization
* Using proper indexes for joins
* Common join-related performance issues

## 1. Understanding Different Types of Joins

In [None]:
-- Example of INNER JOIN
EXPLAIN ANALYZE
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    o.order_id,
    o.order_date,
    o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE c.country = 'USA'
AND o.order_date >= '2022-01-01';

In [None]:
-- Example of LEFT JOIN
EXPLAIN ANALYZE
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    COUNT(o.order_id) as num_orders,
    COALESCE(SUM(o.total_amount), 0) as total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) = 0;

## 2. Join Order Optimization

In [None]:
-- Complex join with multiple tables - Original query
EXPLAIN ANALYZE
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    p.product_name,
    p.category,
    oi.quantity,
    oi.unit_price,
    s.supplier_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN suppliers s ON p.supplier_id = s.supplier_id
WHERE c.country = 'USA'
AND o.order_date >= '2022-01-01'
AND p.category = 'Electronics';

In [None]:
-- Optimized query with better join order and filtering
EXPLAIN ANALYZE
WITH filtered_products AS (
    SELECT p.product_id, p.product_name, p.category, p.supplier_id
    FROM products p
    WHERE p.category = 'Electronics'
),
filtered_customers AS (
    SELECT c.customer_id, c.first_name, c.last_name
    FROM customers c
    WHERE c.country = 'USA'
)
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    p.product_name,
    p.category,
    oi.quantity,
    oi.unit_price,
    s.supplier_name
FROM filtered_products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
JOIN filtered_customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2022-01-01';

## 3. Using Proper Indexes for Joins

In [None]:
-- Check existing indexes
SELECT 
    schemaname,
    tablename,
    indexname,
    indexdef
FROM pg_indexes
WHERE schemaname = 'public'
AND tablename IN ('customers', 'orders', 'order_items', 'products', 'suppliers');

In [None]:
-- Create composite index for common join + filter condition
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Create composite index for products
CREATE INDEX idx_products_category_supplier ON products(category, supplier_id);

## 4. Common Join Performance Issues

In [None]:
-- Example of cartesian product (bad performance)
EXPLAIN ANALYZE
SELECT c.customer_id, o.order_id
FROM customers c, orders o
WHERE c.country = 'USA';

In [None]:
-- Fixed query using proper join
EXPLAIN ANALYZE
SELECT c.customer_id, o.order_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.country = 'USA';

## Best Practices for Join Optimization

1. **Choose the Right Join Type**
   - Use INNER JOIN when you need matching records from both tables
   - Use LEFT JOIN when you need all records from the left table
   - Avoid RIGHT JOIN (convert to LEFT JOIN for better readability)
   - Use FULL OUTER JOIN only when absolutely necessary

2. **Optimize Join Order**
   - Start with the most filtered tables
   - Use CTEs to materialize filtered results
   - Consider table sizes when ordering joins
   - Let the optimizer know about table sizes with proper statistics

3. **Index Strategy**
   - Create indexes on join columns
   - Use composite indexes for join + filter conditions
   - Maintain indexes regularly
   - Monitor index usage

4. **Query Structure**
   - Always use explicit join syntax
   - Apply filters early
   - Use appropriate join conditions
   - Consider denormalization for heavy reporting queries