# 📗 In-Class Activity


## 👩‍🏫 Learning Goals

- Understand how indexes improve query speed.
- Use `EXPLAIN QUERY PLAN` to check query strategies.
- Use `ANALYZE` to help SQLite make smarter choices.
- Time query performance before and after optimizations.


## 🔧 Setup: Create a Sales Table with Skewed Data

In [1]:

import sqlite3
import random
from datetime import datetime, timedelta
import time

# Create a new in-memory database
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

# Create the sales table
cursor.execute('''
CREATE TABLE sales (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer TEXT,
    product TEXT,
    sale_date TEXT,
    amount REAL
)
''')

# Insert 10,000 rows with skewed data (Bob is much more frequent)
customers = ['Bob'] * 7000 + ['Alice'] * 2000 + ['Carol', 'David', 'Eve'] * 200
products = ['Board Game', 'Dice Set', 'Miniatures']

for _ in range(10000):
    customer = random.choice(customers)
    product = random.choice(products)
    days_ago = random.randint(0, 180)
    date = (datetime.now() - timedelta(days=days_ago)).strftime('%Y-%m-%d')
    amount = round(random.uniform(20, 100), 2)
    cursor.execute("INSERT INTO sales (customer, product, sale_date, amount) VALUES (?, ?, ?, ?)",
                   (customer, product, date, amount))

conn.commit()


## 🔍 Activity Part 1: Run a Query Without Optimization


### Task:

1. Time how long the following query takes.
2. Use `EXPLAIN QUERY PLAN` to describe how it's executed.


In [2]:

query = "SELECT * FROM sales WHERE customer = 'Carol'"

# Time the query
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()

print(f"Time WITHOUT index or analyze: {end - start:.6f} seconds")
print(f"Rows returned: {len(results)}")


Time WITHOUT index or analyze: 0.000739 seconds
Rows returned: 211


In [3]:

# Inspect the query plan
cursor.execute(f"EXPLAIN QUERY PLAN {query}")
for row in cursor.fetchall():
    print(row)


(2, 0, 216, 'SCAN sales')



### ✍️ Questions:
- Was the query using an index?
- What kind of scan did the plan indicate?
- Why might this be slow for larger datasets?


## ⚙️ Activity Part 2: Add an Index and Retry

In [4]:

cursor.execute("CREATE INDEX idx_customer ON sales(customer)")
conn.commit()


In [5]:

# Time the query again
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()

print(f"Time WITH index: {end - start:.6f} seconds")
print(f"Rows returned: {len(results)}")


Time WITH index: 0.000499 seconds
Rows returned: 211


In [6]:

# Check if index is now used
cursor.execute(f"EXPLAIN QUERY PLAN {query}")
for row in cursor.fetchall():
    print(row)


(3, 0, 62, 'SEARCH sales USING INDEX idx_customer (customer=?)')



### ✍️ Questions:
- Did the planner use the index this time?
- Was the performance improved?
- Why might the index help even though there are fewer “Carol” rows?


## 📈 Activity Part 3: Use `ANALYZE` and Rerun

In [7]:

cursor.execute("ANALYZE")
conn.commit()


In [8]:

start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()

print(f"Time WITH index AND analyze: {end - start:.6f} seconds")
print(f"Rows returned: {len(results)}")


Time WITH index AND analyze: 0.000388 seconds
Rows returned: 211


In [9]:

cursor.execute(f"EXPLAIN QUERY PLAN {query}")
for row in cursor.fetchall():
    print(row)


(3, 0, 101, 'SEARCH sales USING INDEX idx_customer (customer=?)')



### ✍️ Questions:
- Did the query plan change again?
- What does the `sqlite_stat1` table show?
- Why does `ANALYZE` matter when there are multiple indexes?


## 🧪 Optional: View `sqlite_stat1`

In [10]:

cursor.execute("SELECT * FROM sqlite_stat1")
for row in cursor.fetchall():
    print(row)


('sales', 'idx_customer', '10000 2000')


## 🧠 Reflection: What Did You Learn?

💡 Summary

- Indexes help queries run faster, especially on selective conditions.
- EXPLAIN QUERY PLAN shows whether an index is being used.
- ANALYZE provides actual data stats, allowing the planner to make better choices.
- Query timing shows the practical effect of optimization strategies.



## ✍️ Final Prompt


In your own words, explain:

- What changed between the three runs of the query?
- How do `EXPLAIN QUERY PLAN`, `CREATE INDEX`, and `ANALYZE` work together to improve performance?
