# ANALYZE

## 🧠 What is ANALYZE?

In SQLite, the **query planner** chooses how to execute SQL statements (e.g., which index to use).
By default, SQLite uses **basic heuristics**, but it doesn’t know how **data is distributed**.

The `ANALYZE` command builds a special internal table (*sqlite_stat1*) containing **statistics about indexes and columns**. This helps SQLite make smarter choices when running queries — especially when multiple indexes exist.


## 🧱 1. Create a New Database and Table

We’ll start by creating a fresh in-memory SQLite database and a table with customer sales data.

In [1]:
import sqlite3
import random
from datetime import datetime, timedelta

# Create a new connection and cursor
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

# Create the sales table
cursor.execute('''
CREATE TABLE sales (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer TEXT,
    product TEXT,
    sale_date TEXT,
    amount REAL
)
''')

# Insert 1,000 rows of sample data
customers = ['Alice', 'Bob', 'Carol', 'David', 'Eve']
products = ['Dice Set', 'Board Game', 'Card Sleeves', 'Miniatures', 'Rulebook']

for _ in range(1000):
    customer = random.choice(customers)
    product = random.choice(products)
    days_ago = random.randint(0, 180)
    date = (datetime.now() - timedelta(days=days_ago)).strftime('%Y-%m-%d')
    amount = round(random.uniform(10, 100), 2)
    cursor.execute("INSERT INTO sales (customer, product, sale_date, amount) VALUES (?, ?, ?, ?)",
                   (customer, product, date, amount))

conn.commit()

## 📊 2. Create Indexes

We now create indexes that can help the query planner — but only after `ANALYZE` will SQLite use them more intelligently.

In [2]:
cursor.execute("CREATE INDEX idx_customer ON sales(customer)")
cursor.execute("CREATE INDEX idx_product ON sales(product)")
conn.commit()

## 🔍 3. Check Query Plan Before ANALYZE

We run a query and inspect the plan before running ANALYZE.

In [3]:
query = "SELECT * FROM sales WHERE customer = 'Alice' AND product = 'Board Game'"
cursor.execute(f"EXPLAIN QUERY PLAN {query}")
cursor.fetchall()

[(3, 0, 62, 'SEARCH sales USING INDEX idx_product (product=?)')]

👆 Here, SQLite may not know which index is better and might not optimize the query plan well.


In [4]:
import time

start = time.time()
cursor.execute(query)
results_before = cursor.fetchall()
end = time.time()

print(f"Before ANALYZE: {len(results_before)} rows in {end - start:.6f} seconds")

Before ANALYZE: 40 rows in 0.000533 seconds


### 🧠 4. Run ANALYZE

Now we run the ANALYZE command, which creates internal statistics about the indexed columns.

In [5]:
cursor.execute("ANALYZE")
conn.commit()

### 📁 5. View the Statistics Table

We inspect the contents of the `sqlite_stat1` table that `ANALYZE` created.

In [6]:
cursor.execute("SELECT * FROM sqlite_stat1")
rows = cursor.fetchall()

for row in rows:
    print(row)

('sales', 'idx_product', '1000 200')
('sales', 'idx_customer', '1000 200')


These numbers describe how values are distributed.

SQLite uses this to choose which index will filter more rows, which can drastically improve performance.

### 🔁 6. Recheck the Query Plan After ANALYZE

In [7]:
cursor.execute(f"EXPLAIN QUERY PLAN {query}")
cursor.fetchall()

[(3, 0, 100, 'SEARCH sales USING INDEX idx_product (product=?)')]

This plan might be more optimized — it may now **use one of the indexes** instead of a full scan or the wrong index.


### 🚀 7. Run the Query and Measure Time

In [8]:
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()

print(f"Query returned {len(results)} rows in {end - start:.6f} seconds")

Query returned 40 rows in 0.000385 seconds


You should observe that:

* The number of results should remain the same
* The time might improve slightly
* More importantly, the query planner may now use the better index

## 📌 Summary of ANALYZE in SQLite:

- ✅ ANALYZE collects statistics about table and index contents.
- 📁 It stores these in the sqlite_stat1 table.
- 🧠 The query planner uses these stats to choose better execution plans.
- ⚡ Especially helpful when multiple indexes are available.
- 🔁 Rerun ANALYZE after major data changes to keep optimization up to date.