# SQL Indexes


# Table of Contents

1. [What is an Index?](#what-is-an-index)
2. [How We Use Indexes](#how-we-use-indexes)
   - [Example of Using Indexes](#example-of-using-indexes)
3. [When to Use Indexes or Not](#when-to-use-indexes-or-not)
   - [When to Use Indexes](#when-to-use-indexes)
   - [When Not to Use Indexes](#when-not-to-use-indexes)
4. [Drawbacks of Using Indexes](#drawbacks-of-using-indexes)
5. [B+ Trees](#b-trees)
   - [Why Does Using an Index Improve Lookup Speed?](#why-does-using-an-index-improve-lookup-speed)
   - [How Data Retrieval Works with B+ Trees](#how-data-retrieval-works-with-b-trees)
6. [Multi-Column Indexes](#multi-column-indexes)
   - [Key Considerations](#key-considerations)
7. [Covering Indexes](#covering-indexes)
   - [Example of a Covering Index](#example-of-a-covering-index)


## 1. What is an Index?

An **index** is a separate data structure that a database uses to look up data quickly. It is primarily used to avoid table scans, which involve scanning data row by row and can be very costly in terms of time and resources. The cost of a table scan is proportional to the number of rows in the table.

## 2. How We Use Indexes

Indexes are used to improve the speed of data retrieval operations by providing a quick way to look up data without scanning the entire table. They can be created on one or more columns of a table, and the database uses these indexes to efficiently find rows that match a specific condition.

### Example of Using Indexes

```sql
SELECT
  *
FROM
  movies
WHERE
  director = 'Guy Ritchie';

CREATE INDEX idx_director ON movies (director); -- Create an index on the 'director' column
DROP INDEX idx_director; -- Drop the index
```

To see how the database plans to execute a query and determine if an index will be used, you can use a query plan command:

```sql
EXPLAIN QUERY PLAN
SELECT
  title
FROM
  movies
WHERE
  release_date = 2022
  AND rating = 7
  AND revenue > 100; -- Only 'revenue' is using the index.
```

**Note:** The `EXPLAIN QUERY PLAN` command is database-specific and works for SQLite but may differ in other databases.

## 3. When to Use Indexes or Not

### When to Use Indexes

1. **Frequently Used Columns in Queries**: Columns often used in `WHERE`, `ORDER BY`, or `JOIN` operations.
2. **Columns with Unique Values**: Indexes are automatically created for columns with a unique constraint.
3. **Large Tables**: Indexes are more beneficial for larger tables where full table scans are expensive.
4. **Foreign Keys**: Indexes are useful for foreign key columns to speed up join operations.
5. **Multi-Column Indexes**: For queries that filter or sort by multiple columns together.
6. **Covering Indexes**: Used when the index can completely satisfy the needs of a query, avoiding the need to access the main table.
7. **Consider Update Frequency**: Avoid indexing columns that need frequent updates, as this can degrade performance.
8. **Full-Text Indexes for Large Text Columns**: For large text columns, such as a movie overview, use full-text indexes rather than B+ tree indexes.

### When Not to Use Indexes

1. **Overusing Indexes**: Too many indexes can slow down `INSERT`, `UPDATE`, and `DELETE` operations because each index must be updated whenever data changes.
2. **Small Tables**: Indexes may not provide significant benefits for small tables.
3. **Columns with Low Cardinality**: Columns with many repeated values (e.g., boolean flags) are less useful for indexing.

## 4. Drawbacks of Using Indexes

* **Storage Overhead**: Indexes require additional storage space, which can be substantial for large tables or multiple indexes.
* **Slower Write Operations**: `INSERT`, `UPDATE`, and `DELETE` operations can be slower because the database must update the index each time the data changes.
* **Maintenance Overhead**: Indexes need to be maintained and updated as data is added, modified, or deleted, increasing database maintenance costs.

## 5. B+ Trees

### Why Does Using an Index Improve Lookup Speed?

Indexes are typically implemented using a **B+ tree** structure. The B+ tree is similar to a binary search tree (BST) but has several advantages:

1. **Balanced Structure**: B+ trees are always balanced, ensuring consistent performance for all operations. This balance is maintained automatically during insertion and deletion operations, making B+ trees self-balancing.
2. **Efficient Disk I/O**: B+ trees are optimized for systems where disk I/O is the bottleneck. They minimize the number of disk accesses required to locate data, which is crucial for databases that store large amounts of data on disk.
3. **Minimized Tree Height**: By allowing multiple keys and children per node, B+ trees reduce the height of the tree, ensuring that searches, insertions, and deletions can be performed quickly.
4. **Range Queries**: B+ trees are particularly good for range queries (e.g., finding all values between two keys). Because all leaf nodes are linked, it's easy to traverse from one leaf to the next without needing to go back up the tree.

### How Data Retrieval Works with B+ Trees

* **Clustered Index**: The actual table data is stored in the leaf nodes of the B+ tree itself. Searching for data using a clustered index involves traversing the B+ tree and directly reaching the data stored in its leaf nodes. This is typically used for primary keys in InnoDB tables in MySQL.
* **Non-Clustered Index**: The B+ tree stores pointers (references) to the actual rows in the table. After finding the index entry using the B+ tree, a second step is required to access the actual table data using these pointers.

## 6. Multi-Column Indexes

### Key Considerations

1. **Order of Columns in Index Matters**:
   * The order in which columns are declared in a multi-column index significantly affects how the index is used by the SQL query optimizer.
   * The SQL engine can use the index for efficient lookups only if the query's `WHERE` clause conditions align with the leading (left-most) columns of the index.
2. **Index Utilization Stops After a Range Condition**:
   * When a range condition (like `<`, `>`, `BETWEEN`, or `!=`) is applied to one of the indexed columns, the use of the index can stop at that column.

### **Example of Multi-Column Index**
```sql
CREATE INDEX idx ON movies (revenue, release_date, rating);

EXPLAIN QUERY PLAN
SELECT
  title
FROM
  movies
WHERE
  release_date = 2022
  AND rating = 7
  AND revenue > 100; -- Only 'revenue' is using the index.
```
To optimize the above query, consider changing the index order if the primary use case involves equality checks on `release_date` and `rating`:
```sql
CREATE INDEX idx_movies_optimized ON movies (release_date, rating, revenue);
```
With this revised index:

  * **release_date = 2022 and rating = 7:** The index can efficiently use both conditions because they are equality conditions on the leading columns.
  * **revenue > 100:** The range condition now applies to the last column in the index. The index can still be used to filter by revenue after release_date and rating have been used for equality filtering.

## 7. Covering Indexes
A `covering index` is an index that completely satisfies the needs of a query, meaning all the columns required by the query are included in the index itself.

### **Example of a Covering Index**
```sql
SELECT
  title
FROM
  movies
WHERE
  rating > 7;

CREATE INDEX idx ON movies (rating, title);
```

In this example, even though `title` is not in the `WHERE` clause, indexing the `title` column along with `rating` allows the query to avoid additional lookups in the main table, since the index contains all the data needed to satisfy the query.

**Note:** Avoid overusing covering indexes, especially when it's expensive to maintain.



### Conclusion

Indexes are powerful tools for optimizing query performance in SQL databases, but they must be used strategically to balance the benefits of faster data retrieval against the drawbacks of additional storage and maintenance overhead. Understanding when and how to use different types of indexes, including single-column, multi-column, and covering indexes, is essential for efficient database design and query optimization.
