#  Optimising SQL Queries

In this notebook, we explore how to make SQL queries more efficient and readable.



###  Learning Objectives
By the end of this notebook, you should be able to write more efficient SQL queries by:
- Using column names instead of `*`
- Creating `JOIN`s with `INNER` instead of `WHERE`
- Avoiding using the wildcard `%` at the beginning of a predicate
- Avoiding using functions when searching for patterns
- Avoiding using calculated fields in the `JOIN` and `WHERE` clauses


##  Connecting to the Database

We'll be using the **US Flights database** to demonstrate optimization techniques.

Make sure your file `flights.sqlite.db` is in the same folder as this notebook.


In [28]:
%reload_ext sql

In [29]:
%sql sqlite:///flights.sqlite.db

##  View All Tables in the Database
Let's confirm which tables are available in the `flights.sqlite.db` database.


In [30]:
%%sql
SELECT name FROM sqlite_master WHERE type='table';


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.


name
airports
carriers
flights
planes
sysdiagrams


##  Understanding the Database Structure

The **Flights** database consists of the following tables:

- **flights:** Contains records of all domestic flights in the USA in 2008, including details such as flight number, tail number, departure delay, arrival delay, and other flight-related metrics.  
- **carriers:** A lookup table that provides information about each airline carrier, including carrier codes and their full descriptions.  
- **airports:** A lookup table that contains data about all airports, including airport codes, names, cities, and locations.  
- **planes:** A lookup table with detailed information about aircraft, such as tail numbers, manufacturers, and models.  
- **sysdiagrams:** A system table automatically generated by SQL Server or SQLite diagram tools. It stores metadata related to database diagrams and relationships between tables.  


##  Why Optimise SQL Queries?

SQL queries can become slow depending on data size and complexity.  
Efficient queries:
- Reduce runtime
- Prevent server overload
- Scale better as data grows

We'll use the `%%time` command to measure execution time for comparison.


##  1. Use Column Names Instead of `*`

Avoid `SELECT *` as it retrieves all columns unnecessarily.

###  Inefficient Query


In [31]:
%%time
%%sql
SELECT * FROM flights;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 2.66 s
Wall time: 9.02 s


###  Efficient Query
Select only the columns we need and limit the number of rows.


In [32]:
%%time
%%sql

SELECT TailNum, DepDelay
FROM flights
LIMIT 50;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 0 ns
Wall time: 4.99 ms


##  2. Create JOINS with `INNER` Instead of `WHERE`

When joining tables, use `INNER JOIN` rather than a `WHERE` clause join.

###  Inefficient Query


In [33]:
%%time
%%sql

SELECT FlightNum, TailNum, Description
FROM flights, carriers
WHERE carriers.Code = flights.UniqueCarrier
LIMIT 2500;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 0 ns
Wall time: 26.9 ms


###  Efficient Query


In [34]:
%%time
%%sql

SELECT FlightNum, TailNum, Description
FROM flights
INNER JOIN carriers
ON carriers.Code = flights.UniqueCarrier
LIMIT 2500;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 0 ns
Wall time: 26.9 ms


##  3. Avoid Using `%` at the Beginning of a LIKE Operator

Leading wildcards (`%word`) disable index usage and slow searches.

###  Inefficient Query


In [35]:
%%time
%%sql

SELECT TailNum, Distance
FROM flights
WHERE TailNum LIKE '%N7%'
LIMIT 2500;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 15.6 ms
Wall time: 53.9 ms


###  Efficient Query


In [36]:
%%time
%%sql

SELECT TailNum, Distance
FROM flights
WHERE TailNum LIKE 'N7%'
LIMIT 2500;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 15.6 ms
Wall time: 17 ms


##  4. Avoid Using Functions When Searching for Patterns

Applying functions (like `LOWER()`) in WHERE clauses disables index optimizations.

###  Inefficient Query


In [37]:
%%time
%%sql

SELECT Origin, DepDelay
FROM flights
WHERE LOWER(Origin) = 'las'
LIMIT 7000;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 62.5 ms
Wall time: 257 ms


###  Efficient Query (Using a CTE)


In [38]:
%%time
%%sql

WITH CTE AS (
    SELECT AirTime / 5 AS fifth_time, DepDelay, TailNum
    FROM flights
)
SELECT TailNum, DepDelay
FROM CTE
WHERE CTE.fifth_time < CTE.DepDelay
LIMIT 5000;


   sqlite:///flights.db
   sqlite:///flights.sqlite
 * sqlite:///flights.sqlite.db
Done.
CPU times: total: 62.5 ms
Wall time: 43.9 ms


##  Summary

Optimising SQL queries helps:
- Reduce processing time
- Improve scalability
- Keep code readable and maintainable

Always:
1. Select only what you need  
2. Use proper `JOIN`s  
3. Avoid leading `%` in `LIKE`  
4. Avoid wrapping columns in functions in WHERE  
5. Avoid inline calculations in WHERE or JOIN clauses  
