# Statistics and Cardinality

This notebook demonstrates how to work with table statistics and cardinality in PostgreSQL:
* Understanding table statistics
* Analyzing data distribution
* Managing statistics
* Impact on query optimization

## 1. Viewing Table Statistics

In [None]:
-- View basic table statistics
SELECT 
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE schemaname = 'public';

-- View column statistics
SELECT 
    tablename,
    attname,
    null_frac,
    n_distinct,
    most_common_vals,
    most_common_freqs,
    correlation
FROM pg_stats
WHERE tablename IN ('customers', 'orders', 'products')
AND schemaname = 'public';

## 2. Analyzing Data Distribution

In [None]:
-- Analyze tables with increased statistics target
ALTER TABLE orders ALTER COLUMN order_date SET STATISTICS 1000;
ALTER TABLE orders ALTER COLUMN total_amount SET STATISTICS 1000;
ANALYZE orders;

-- View distribution of orders by date
SELECT 
    DATE_TRUNC('month', order_date) as month,
    COUNT(*) as num_orders,
    MIN(total_amount) as min_amount,
    MAX(total_amount) as max_amount,
    AVG(total_amount) as avg_amount,
    percentile_cont(0.5) WITHIN GROUP (ORDER BY total_amount) as median_amount
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

## 3. Creating Extended Statistics

In [None]:
-- Create extended statistics for correlated columns
CREATE STATISTICS order_stats (dependencies)
ON order_date, total_amount, status
FROM orders;

CREATE STATISTICS customer_stats (dependencies)
ON country, segment
FROM customers;

-- Analyze tables to gather extended statistics
ANALYZE orders;
ANALYZE customers;

-- View extended statistics
SELECT 
    stxname,
    stxkeys,
    stxkind
FROM pg_statistic_ext
JOIN pg_statistic_ext_data ON (oid = stxoid);

## 4. Impact on Query Planning

In [None]:
-- Compare query plans with and without statistics
-- First, save current statistics
CREATE TABLE orders_backup AS SELECT * FROM orders;

-- Delete statistics
ANALYZE orders WITH 0;

-- Query plan without good statistics
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    status,
    COUNT(*) as num_orders,
    AVG(total_amount) as avg_amount
FROM orders
WHERE order_date >= '2022-01-01'
AND total_amount > 1000
GROUP BY status;

-- Restore good statistics
ANALYZE orders;

-- Query plan with good statistics
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    status,
    COUNT(*) as num_orders,
    AVG(total_amount) as avg_amount
FROM orders
WHERE order_date >= '2022-01-01'
AND total_amount > 1000
GROUP BY status;

## 5. Statistics Maintenance

In [None]:
-- Update statistics for all tables
ANALYZE VERBOSE;

-- Update statistics for specific columns
ANALYZE orders (order_date, total_amount);

-- Monitor statistics age
SELECT 
    schemaname,
    relname,
    n_mod_since_analyze,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_mod_since_analyze DESC;

## Best Practices for Statistics Management

1. **Gathering Statistics**
   - Regular ANALYZE operations
   - Appropriate statistics targets
   - Extended statistics for correlated columns
   - Monitor statistics age

2. **Statistics Configuration**
   - Set appropriate default_statistics_target
   - Use column-specific statistics targets
   - Consider extended statistics
   - Monitor statistical correlation

3. **Maintenance Schedule**
   - Analyze after major data changes
   - Regular statistics updates
   - Monitor modification counts
   - Schedule during low-usage periods

4. **Common Issues**
   - Stale statistics
   - Insufficient statistics targets
   - Missing extended statistics
   - Poor correlation assumptions