# INDEX

The main functionality of Indexes in SQL is to speed up the retrieval of rows from a table. And that's how they work. They are like a book index, which allows you to find the page number of a specific topic quickly. 

In SQL, an index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional space and slower writes.

## How to CREATE INDEX in SQL

```sql
CREATE INDEX index_name
ON table_name (column1, column2, ...);
```

This creates an index on the specified columns of the table. The index name must be unique within the database.

## How to DROP INDEX in SQL

```sql
DROP INDEX index_name;
```

This removes the specified index from the database. The index name must be unique within the database.

## 📘 SQLite Indexes – A Practical Demo

From this point onwards we will be using SQLite to demonstrate the concepts of indexes.

In [1]:
# First, let's import the sqlite3 module and create a new in-memory database
import sqlite3
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

### 🧱 1. Creating a Sample Table Without Indexes

Let's create a simple table to simulate sales at a board game café. Each sale records the customer name, game title, date, and sale amount.

```sql
CREATE TABLE sales (
    sale_id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_name TEXT,
    game_title TEXT,
    sale_date TEXT,
    amount REAL
```

In [2]:
# Create a table for a board game café sales register
cursor.execute('''
CREATE TABLE sales (
    sale_id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_name TEXT,
    game_title TEXT,
    sale_date TEXT,
    amount REAL
)
''')

<sqlite3.Cursor at 0x107d19940>

We now generate and insert **1,000 rows of random data** to simulate realistic querying conditions.

```python
# Insert sample data
import random
from datetime import datetime, timedelta

names = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
games = ['Catan', 'Ticket to Ride', 'Carcassonne', 'Pandemic', 'Azul']

# Generate 1000 random rows
for _ in range(1000):
    name = random.choice(names)
    game = random.choice(games)
    days_ago = random.randint(0, 365)
    date = (datetime.now() - timedelta(days=days_ago)).strftime('%Y-%m-%d')
    amount = round(random.uniform(20, 60), 2)
    cursor.execute('INSERT INTO sales (customer_name, game_title, sale_date, amount) VALUES (?, ?, ?, ?)',
                   (name, game, date, amount))
    
conn.commit()
```

In [3]:
# Insert sample data
import random
from datetime import datetime, timedelta

names = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
games = ['Catan', 'Ticket to Ride', 'Carcassonne', 'Pandemic', 'Azul']

# Generate 1000 random rows
for _ in range(1000):
    name = random.choice(names)
    game = random.choice(games)
    days_ago = random.randint(0, 365)
    date = (datetime.now() - timedelta(days=days_ago)).strftime('%Y-%m-%d')
    amount = round(random.uniform(20, 60), 2)
    cursor.execute('INSERT INTO sales (customer_name, game_title, sale_date, amount) VALUES (?, ?, ?, ?)',
                   (name, game, date, amount))
    
conn.commit()

### 🔍 2. Querying Without an Index

Let’s run a query that filters by `customer_name`. Since there’s no index yet, SQLite will scan the entire table to find matches.

```sql
EXPLAIN QUERY PLAN SELECT * FROM sales WHERE customer_name = 'Alice';
```

In [4]:
query = "SELECT * FROM sales WHERE customer_name = 'Alice'"
cursor.execute(f"EXPLAIN QUERY PLAN {query}")
cursor.fetchall()

[(2, 0, 216, 'SCAN sales')]

This means SQLite is doing a **full table scan**, which is inefficient for large datasets.

We can also time the query:

```python
import time
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()
print(f"Query returned {len(results)} results in {end - start:.6f} seconds")
```

In [5]:
import time
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()
print(f"Query returned {len(results)} results in {end - start:.6f} seconds")

Query returned 193 results in 0.000565 seconds


### ⚙️ 3. Creating an Index on customer_name

Let’s now create an **index** on the `customer_name` column. This should improve performance for queries filtering on that column.

```sql
CREATE INDEX idx_customer_name ON sales(customer_name)
```

In [6]:
cursor.execute('CREATE INDEX idx_customer_name ON sales(customer_name)')
conn.commit()

### ⚡ 4. Querying After Creating the Index

We run the same query again and check if the index is being used:

```sql
EXPLAIN QUERY PLAN SELECT * FROM sales WHERE customer_name = 'Alice';
```

In [7]:
cursor.execute(f"EXPLAIN QUERY PLAN {query}")
cursor.fetchall()

[(3, 0, 62, 'SEARCH sales USING INDEX idx_customer_name (customer_name=?)')]

This confirms the index is being used! Let’s check the performance again:

```python
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()
print(f"Query returned {len(results)} results in {end - start:.6f} seconds")
```

In [8]:
start = time.time()
cursor.execute(query)
results = cursor.fetchall()
end = time.time()
print(f"Query returned {len(results)} results in {end - start:.6f} seconds")

Query returned 193 results in 0.000567 seconds


✅ On larger datasets, this improvement would be much more noticeable.

### 🧩 5. Composite Index Example

Let’s now create a **composite index** for two columns: `game_title` and `sale_date`. This is useful when you filter by both columns.

```sql
CREATE INDEX idx_game_date ON sales(game_title, sale_date);
```

In [9]:
cursor.execute('CREATE INDEX idx_game_date ON sales(game_title, sale_date)')
conn.commit()

Now we run a query that benefits from this composite index:

```sql
EXPLAIN QUERY PLAN SELECT * FROM sales WHERE game_title = 'Catan' AND sale_date >= '2024-09-01';
```

In [10]:
query2 = "SELECT * FROM sales WHERE game_title = 'Catan' AND sale_date >= '2024-09-01'"
cursor.execute(f"EXPLAIN QUERY PLAN {query2}")
cursor.fetchall()

[(3,
  0,
  51,
  'SEARCH sales USING INDEX idx_game_date (game_title=? AND sale_date>?)')]

You should see that the index is helping narrow down rows using both conditions.

Let's run it.

In [11]:
cursor.execute(query2)
results = cursor.fetchall()
print(f"{len(results)} results found for 'Catan' sales since September 2024")

140 results found for 'Catan' sales since September 2024


### 🧹 6. Dropping Indexes

Just like you can create indexes, you can also drop them when no longer needed:

In [12]:
cursor.execute('DROP INDEX idx_customer_name')
cursor.execute('DROP INDEX idx_game_date')
conn.commit()

## 📌 Summary:

- ✅ Indexes help speed up SELECT queries that use WHERE, JOIN, or ORDER BY on indexed columns.
- 🚫 Indexes slow down write operations like INSERT, UPDATE, and DELETE.
- 🧠 Use EXPLAIN QUERY PLAN to check if indexes are actually being used.
- 🧩 Composite indexes cover multiple columns for combined filtering.
- 🧼 Remove unused indexes to save space and speed up writes.