# Subquery Optimization

This notebook demonstrates techniques for optimizing subqueries in PostgreSQL:
* Converting subqueries to JOINs
* Optimizing correlated subqueries
* Using EXISTS vs IN
* Common Table Expressions (CTEs)

## 1. Inefficient Subquery Examples

In [None]:
-- Example of inefficient correlated subquery
EXPLAIN ANALYZE
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    (
        SELECT COUNT(*)
        FROM orders o
        WHERE o.customer_id = c.customer_id
        AND o.status = 'Completed'
    ) as completed_orders,
    (
        SELECT COALESCE(SUM(total_amount), 0)
        FROM orders o
        WHERE o.customer_id = c.customer_id
        AND o.status = 'Completed'
    ) as total_spent
FROM customers c
WHERE c.country = 'USA';

## 2. Optimized JOIN-based Solutions

In [None]:
-- Optimized version using JOIN
EXPLAIN ANALYZE
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    COUNT(o.order_id) as completed_orders,
    COALESCE(SUM(o.total_amount), 0) as total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
    AND o.status = 'Completed'
WHERE c.country = 'USA'
GROUP BY c.customer_id, c.first_name, c.last_name;

## 3. EXISTS vs IN Comparison

In [None]:
-- Using IN (generally less efficient)
EXPLAIN ANALYZE
SELECT p.product_id, p.product_name, p.category
FROM products p
WHERE p.product_id IN (
    SELECT oi.product_id
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    WHERE o.order_date >= '2022-01-01'
    AND oi.quantity >= 5
);

In [None]:
-- Using EXISTS (often more efficient)
EXPLAIN ANALYZE
SELECT p.product_id, p.product_name, p.category
FROM products p
WHERE EXISTS (
    SELECT 1
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    WHERE oi.product_id = p.product_id
    AND o.order_date >= '2022-01-01'
    AND oi.quantity >= 5
);

## 4. Using CTEs for Complex Queries

In [None]:
-- Complex query with multiple subqueries
EXPLAIN ANALYZE
WITH customer_orders AS (
    SELECT 
        customer_id,
        COUNT(*) as order_count,
        SUM(total_amount) as total_spent
    FROM orders
    WHERE status = 'Completed'
    GROUP BY customer_id
),
high_value_customers AS (
    SELECT 
        co.customer_id,
        c.first_name,
        c.last_name,
        c.country,
        co.order_count,
        co.total_spent
    FROM customer_orders co
    JOIN customers c ON co.customer_id = c.customer_id
    WHERE co.total_spent > 10000
)
SELECT 
    hvc.country,
    COUNT(*) as num_customers,
    AVG(hvc.order_count) as avg_orders,
    AVG(hvc.total_spent) as avg_spent
FROM high_value_customers hvc
GROUP BY hvc.country
ORDER BY avg_spent DESC;

## 5. Optimizing NOT EXISTS Queries

In [None]:
-- Find customers with no orders (using NOT EXISTS)
EXPLAIN ANALYZE
SELECT c.customer_id, c.first_name, c.last_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

In [None]:
-- Alternative using LEFT JOIN (often more readable)
EXPLAIN ANALYZE
SELECT c.customer_id, c.first_name, c.last_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

## Best Practices for Subquery Optimization

1. **Choose the Right Approach**
   - Convert correlated subqueries to JOINs when possible
   - Use EXISTS instead of IN for better performance
   - Consider CTEs for complex queries
   - Use LEFT JOIN NULL checks instead of NOT EXISTS

2. **Performance Considerations**
   - Avoid multiple executions of correlated subqueries
   - Use proper indexes on join and filter columns
   - Consider materialized CTEs for large datasets
   - Monitor execution plans for nested loops

3. **Query Structure**
   - Break down complex subqueries into CTEs
   - Apply filters early in the query
   - Use appropriate join types
   - Consider query readability

4. **Common Anti-patterns**
   - Unnecessary correlated subqueries
   - IN clauses with large result sets
   - Deeply nested subqueries
   - Redundant subqueries