# Query Plan Analysis

This notebook demonstrates how to analyze and understand query execution plans in PostgreSQL:
* Reading EXPLAIN output
* Understanding different scan types
* Analyzing join strategies
* Identifying performance bottlenecks

## 1. Basic EXPLAIN Usage

In [None]:
-- Simple query plan
EXPLAIN
SELECT *
FROM customers
WHERE country = 'USA';

-- Query plan with execution statistics
EXPLAIN ANALYZE
SELECT *
FROM customers
WHERE country = 'USA';

## 2. Understanding Different Scan Types

In [None]:
-- Sequential scan (full table scan)
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE total_amount > 1000;

-- Index scan
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE order_date >= '2022-01-01';

-- Bitmap scan
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE order_date >= '2022-01-01'
AND total_amount > 1000;

## 3. Analyzing Join Strategies

In [None]:
-- Nested loop join
EXPLAIN ANALYZE
SELECT c.first_name, c.last_name, o.order_id, o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.country = 'USA'
AND o.order_date >= '2022-01-01'
LIMIT 10;

-- Hash join
EXPLAIN ANALYZE
SELECT c.first_name, c.last_name, COUNT(*) as num_orders
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name;

-- Merge join
EXPLAIN ANALYZE
SELECT c.first_name, c.last_name, o.order_date, o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
ORDER BY c.customer_id, o.order_date;

## 4. Complex Query Analysis

In [None]:
-- Complex query with multiple joins and aggregations
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    c.country,
    p.category,
    DATE_TRUNC('month', o.order_date) as month,
    COUNT(DISTINCT c.customer_id) as num_customers,
    COUNT(DISTINCT o.order_id) as num_orders,
    SUM(oi.quantity * oi.unit_price) as total_revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2022-01-01'
AND o.status = 'Completed'
GROUP BY c.country, p.category, DATE_TRUNC('month', o.order_date)
HAVING COUNT(DISTINCT o.order_id) > 10
ORDER BY total_revenue DESC;

## 5. Identifying Bottlenecks

In [None]:
-- Query with potential bottlenecks
EXPLAIN (ANALYZE, BUFFERS)
WITH customer_stats AS (
    SELECT 
        customer_id,
        COUNT(*) as order_count,
        SUM(total_amount) as total_spent
    FROM orders
    WHERE status = 'Completed'
    GROUP BY customer_id
)
SELECT 
    c.country,
    c.segment,
    COUNT(*) as num_customers,
    AVG(cs.order_count) as avg_orders,
    AVG(cs.total_spent) as avg_spent,
    MAX(cs.total_spent) as max_spent
FROM customers c
JOIN customer_stats cs ON c.customer_id = cs.customer_id
GROUP BY c.country, c.segment
ORDER BY avg_spent DESC;

## Understanding Query Plan Components

1. **Scan Types**
   - Sequential Scan: Full table scan
   - Index Scan: Using an index to fetch rows
   - Bitmap Scan: Two-phase scan using bitmap
   - Index Only Scan: Using only index data

2. **Join Types**
   - Nested Loop: Good for small tables or indexed columns
   - Hash Join: Good for larger tables, equality conditions
   - Merge Join: Good for pre-sorted data

3. **Cost Components**
   - Startup Cost: Cost before first row
   - Total Cost: Estimated total cost
   - Actual Time: Real execution time
   - Rows: Number of rows processed

4. **Common Issues**
   - Missing indexes
   - Poor join order
   - Inefficient scan methods
   - Memory usage problems