## SQL is a declarative language

In an _imperative_ language, you lay out specifically the steps for the program to take. SQL is a _declarative_ language where you specify intent, and it's up to the implementation of the language to determine the low-level steps to satisfy your intent.

Usually, that's great. You get to program at a high level and the database takes care of all the details. 

Until it's not great. You'll eventually write a query that runs simply too slowly and you'll need to think more deeply about how the database is executing it. The subsystem of a SQL database system that translates your code into execution details is called the _query planner_.

You can use the `EXPLAIN` command to peek at the plan your database has prepared to satisfy your query.

We're going to run through a case study where `EXPLAIN` helped us turn our "current state" query into something that performs well on an Amazon Redshift database with 1 TB+ of data.

<br><br><br><br>
# Set up the environment

We're connecting to Redshift now, but it says `postgres` below because we're using a PostgreSQL driver to talk to Redshift. Redshift was originally forked from PostgreSQL and it has a compatible interface that lets it pretend to be an old version of PostgreSQL.

In [None]:
%load_ext sql

In [4]:
%%sql postgresql://ops:MyPr3cious@somehost:5439/dev

select 'hi there' as "message";

1 rows affected.


message
hi there


<br><br><br><br>
# Create the history table

In [31]:
%%sql

DROP TABLE if exists public.transactions_history;

CREATE TABLE public.transactions_history (
    operation character(6) NOT NULL,
    when_modified timestamp without time zone NOT NULL,
    transaction_id integer,
    user_id text,
    merchant text,
    amount numeric,
    when_created timestamp without time zone
)
DISTKEY (transaction_id)
SORTKEY (transaction_id, when_modified)
;

 * postgresql://ops:***@klukas-demo.cfh4rlyx7ryj.us-east-2.redshift.amazonaws.com:5439/dev
Done.
Done.


[]

If we naively copy the table structure and query from the previous notebook, Redshift will give us the following explain plan that starts (note that we read the plan from bottom up, so I'm talking about the bottommost) with two troubling and expensive steps:

- DISTRIBUTE
- SORT

See all the details in the docs about [Redshift query plans](https://docs.aws.amazon.com/redshift/latest/dg/c-the-query-plan.html).

In [28]:
%%sql

explain select transaction_id, user_id, merchant, amount, when_created
from (
    select *, 
    row_number() over (
        partition by transaction_id
        order by transaction_id, when_modified desc
    ) as n
    from transactions_history
) as latest
where n = 1
and operation != 'delete'
;

 * postgresql://ops:***@klukas-demo.cfh4rlyx7ryj.us-east-2.redshift.amazonaws.com:5439/dev
10 rows affected.


QUERY PLAN
XN Subquery Scan latest (cost=1000000000003.33..1000000000006.53 rows=1 width=833)
Filter: ((n = 1) AND (operation <> 'delete'::bpchar))
-> XN Window (cost=1000000000003.33..1000000000005.33 rows=80 width=863)
Partition: transaction_id
"Order: transaction_id, when_modified"
-> XN Sort (cost=1000000000003.33..1000000000003.53 rows=80 width=863)
"Sort Key: transaction_id, when_modified"
-> XN Seq Scan on transactions_history (cost=0.00..0.80 rows=80 width=863)
----- Tables missing statistics: transactions_history -----
----- Update statistics by running the ANALYZE command on these tables -----


In [35]:
%%sql

explain select transaction_id, user_id, merchant, amount, when_created
from (
    select *, 
    row_number() over (
        partition by transaction_id
        order by transaction_id, when_modified
    ) as n,
    count(*) over (
        partition by transaction_id
        order by transaction_id, when_modified
        rows between unbounded preceding and unbounded following
    ) as c
    from transactions_history
) as latest
where n = c
and operation != 'delete'
;

 * postgresql://ops:***@klukas-demo.cfh4rlyx7ryj.us-east-2.redshift.amazonaws.com:5439/dev
8 rows affected.


QUERY PLAN
XN Subquery Scan latest (cost=0.00..4.20 rows=1 width=833)
Filter: ((c = n) AND (operation <> 'delete'::bpchar))
-> XN Window (cost=0.00..3.00 rows=80 width=863)
Partition: transaction_id
"Order: transaction_id, when_modified"
-> XN Seq Scan on transactions_history (cost=0.00..0.80 rows=80 width=863)
----- Tables missing statistics: transactions_history -----
----- Update statistics by running the ANALYZE command on these tables -----
