# Query Explain Plans


<span style='font-size:1.2em'>This lab will pull some queries from previous activities and review the *Explain Plans*, or *Query Plans*.</span>

You are strongly encouraged to use `EXPLAIN` on all queries you write before you try to execute them.
We will look at a couple of bad queries to understand why.



In [None]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_ro

In [None]:
%%sql
EXPLAIN ANALYZE
SELECT COUNT(*) FROM cities;

In [None]:
%%sql 
EXPLAIN ANALYZE
SELECT COUNT(*) FROM cities WHERE country = 'India'

In the two queries above, we see that either way we get a sequential scan on the table.
This is driven by the size of the table - recall the size is 352 rows.


---

By contrast, let us look at a larger table with 3295 rows.
A regular `COUNT` gets a table scan, ` Seq Scan on us_second_order_divisions`


However, adding the WHERE clause allows an index to come into play.
The index element of the plan in this case: `Bitmap Index Scan on us_second_order_divisions_pkey`  
We will discuss Indexing within databases at the end of this module.


In [None]:
%%sql
EXPLAIN ANALYZE
SELECT COUNT(*) FROM us_second_order_divisions;

In [None]:
%%sql
EXPLAIN ANALYZE
SELECT COUNT(*) FROM us_second_order_divisions
WHERE state_number_code = 25;

## Explain versus Explain Analyze

You may notice above that we are using `EXPLAIN ANALYZE` versus just `EXPLAIN`. 
This is because I know these queries work and I know that running them will not drag down the database.

It is generally a good idea to `EXPLAIN` first, then once you trust your SQL, `EXPLAIN ANALYZE`.


**Take Note of the output differences of the same SQL without and with the `ANALYZE` option.**

In [None]:
%%sql 
EXPLAIN
SELECT country, MIN(population) 
FROM cities 
GROUP BY country;

In [None]:
%%sql 
EXPLAIN ANALYZE
SELECT country, MIN(population) 
FROM cities 
GROUP BY country;

## Aggregates 

We see the `HashAggregate` is used to perform the groupings and apply the aggregate function over the data groups.

In [None]:
%%sql 
EXPLAIN ANALYZE
SELECT country, count(*) 
FROM cities 
GROUP BY country 
HAVING count(*) > 10;

## Sorting is expensive!

We previously used the SQL below to build up our understanding of aggregations.

Examine each of the `EXPLAIN` plans and try to correlate those to parts of the SQL.
Tuning a database is as much an **art** as a science.
The first step however, is learning how to read explain plans and understand how query structure and data within the table will affect the cost-based optimizer of a DBMS.

In [None]:
%%sql 
EXPLAIN ANALYZE
SELECT S.state_name, count(*)
FROM us_second_order_divisions as C
JOIN util_us_states as S
  ON (C.state_number_code=S.state_number_code)
GROUP BY S.state_name;

In [None]:
%%sql 
EXPLAIN ANALYZE
SELECT S.state_name, count(*)
FROM us_second_order_divisions as C
JOIN util_us_states as S
  ON (C.state_number_code=S.state_number_code)
GROUP BY S.state_name
ORDER BY S.state_name;

In [None]:
%%sql 
EXPLAIN ANALYZE
SELECT S.state_name, count(*)
FROM us_second_order_divisions as C
JOIN util_us_states as S
  ON (C.state_number_code=S.state_number_code)
GROUP BY S.state_name
HAVING COUNT(*) BETWEEN 10 AND 30
ORDER BY COUNT(*) DESC;

## <span style="background:yellow">Your Turn!</span>

Examine the **cross-product** query using EXPLAIN first, and then answer the question below.



In [None]:
%%sql

EXPLAIN ANALYZE
SELECT S.state_name, count(*)
FROM us_second_order_divisions as C
, util_us_states as S
GROUP BY S.state_name
HAVING COUNT(*) BETWEEN 10 AND 30
ORDER BY COUNT(*) DESC;


# Save your Notebook, then `File > Close and Halt`

---