In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook.
%load_ext sql

In [2]:
# Load the Flights database stored in your local machine. 
# Make sure the file is saved in the same folder as this notebook.
%sql sqlite:///flights-a-3982.db

**Use column names instead of * symbol**

Instead of using * in our select statements, which selects all the columns in the table, we should rather use the specific column names we are interested in. For example, say we are only interested in the tail number and delay before departure for the first 50 flights.

**Inefficient query**

In [None]:
%%time
%%sql

SELECT *
FROM flights

 * sqlite:///flights-a-3982.db
Done.
CPU times: total: 17.8 s
Wall time: 27.4 s


**Efficient query**

In [None]:
%%time
%%sql

SELECT TailNum, DepDelay
FROM flights
LIMIT 5

**Create JOINS with INNER**

When extracting data from two databases and joining them on specific columns, we could use the WHERE clause to join the tables on those columns in most cases. Assume we want to add the carrier's description to the Flights database. That will require querying the carriers table and the flights table for codes that match the unique carrier name in the flights table.

**Inefficient query**

In [None]:
%%time
%%sql

SELECT FlightNum, TailNum, Description
FROM flights, carriers
WHERE carriers.Code = flights.UniqueCarrier
LIMIT 2500;

**Efficient query**

In [None]:
%%time
%%sql

SELECT FlightNum, TailNum, Description
FROM flights
INNER JOIN carriers
ON carriers.Code = flights.UniqueCarrier
LIMIT 2500;

**Avoid using the wildcard (%) at the beginning of a LIKE operator**
When every row in a table has an index, it is possible to search the table using the index, making it easier and faster to find the values we are looking for. Whether we search using the % wildcard at the beginning or the end of the LIKE operator also affects how efficient our query is.

Suppose we want the tail number and distance of all flights whose tail number starts with N7

### Inefficient query

In [None]:
%%time
%%sql

SELECT TailNum, Distance
FROM flights
WHERE TailNum LIKE '%N7%'
LIMIT 2500;

### Efficient query

In [7]:
%%time
%%sql

SELECT TailNum, Distance
FROM flights
WHERE TailNum LIKE 'N7%'
LIMIT 2500;

 * sqlite:///flights-a-3982.db
Done.
CPU times: total: 93.8 ms
Wall time: 2.21 s


**Avoid using functions when searching for patterns**
                                       
Let's say we also wanted to return the origin and departure delay of all flights from a specific origin.

### Inefficient query

In [None]:
%%time
%%sql

SELECT Origin, DepDelay
FROM flights
WHERE LOWER(Origin) = 'las'
LIMIT 7000;

### Efficient query

In [8]:
%%time
%%sql

WITH CTE AS
(SELECT LOWER(Origin) Origin_lower, DepDelay FROM flights)

SELECT * FROM CTE WHERE Origin_lower = 'las'
LIMIT 7000;

 * sqlite:///flights-a-3982.db
Done.
CPU times: total: 391 ms
Wall time: 1.01 s


**Avoid using calculated fields in the JOIN and WHERE clauses**

Suppose we want to return the flights where the departure delay was more than 20% of the flight time. We would need to calculate the value of 20% of the AirTime column and then compare that to the DepDelay column for each row.

### Inefficient query

In [None]:
%%time
%%sql

SELECT TailNum, DepDelay
FROM flights
WHERE AirTime/5 < DepDelay
LIMIT 5000;

### Efficient query

In [9]:
%%time
%%sql

WITH CTE AS
(SELECT AirTime/5 as fifth_time, DepDelay, TailNum FROM flights)

SELECT TailNum, DepDelay
FROM CTE
WHERE CTE.fifth_time < CTE.DepDelay
LIMIT 5000;

 * sqlite:///flights-a-3982.db
Done.
CPU times: total: 46.9 ms
Wall time: 234 ms
