## Using Jupyter Environment

Let us understand how we can connect to Redshift using Jupyter Environment.
* Here are the pre-requisites to setup Jupyter Environment using SQL magic.
  * Setup Jupyter Lab on your PC or Mac or Ubuntu based machine.
  * Make sure to install **ipython-sql**. It will come with **SQL Alchemy**.
  * Even though we can directly connect to Redshift using **psycopg2** library, it is recommended to install Redshift dialect using `pip install sqlalchemy-redshift`.
  * Redshift dialect (sqlalchemy-redshift) works on top of **psycopg2** (Postgres) library
* Feel free to follow [Setup labs on Ubuntu 18.04 VM on GCP using Docker to learn Python and SQL](https://www.youtube.com/playlist?list=PLf0swTFhTI8qOGXb3e6BmqHGQ-tnsP51q)
  * Playlist also covers setting up Postgres. Once you learn PostgreSQL, you should be able to learn Redshift comfortably.
* Once pre-requisites are taken care, you can connect to Redshift using the same approach as Postgresql.
  * Load SQL magic
  * Create environment variable with connect string
  * Start running queries against Redshift tables

In [2]:
%load_ext sql

In [3]:
%env DATABASE_URL=redshift+psycopg2://retail_user:Retail_Passw0rd@redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com:5439/retail_db

env: DATABASE_URL=redshift+psycopg2://retail_user:Retail_Passw0rd@redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com:5439/retail_db


In [9]:
%%sql

SELECT * FROM information_schema.tables
WHERE table_name ~ 'sequence'
LIMIT 10

 * redshift+psycopg2://retail_user:***@redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com:5439/retail_db
3 rows affected.


table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_name
retail_db,pg_catalog,pg_statio_user_sequences,VIEW,,,,,
retail_db,pg_catalog,pg_statio_sys_sequences,VIEW,,,,,
retail_db,pg_catalog,pg_statio_all_sequences,VIEW,,,,,


In [10]:
%%sql

SELECT count(1) FROM orders

 * postgresql://retail_user:***@34.229.28.240:5439/retail_db
1 rows affected.


count
68883


In [11]:
%%sql

SELECT * FROM (
    SELECT nq.*,
        dense_rank() OVER (
            PARTITION BY order_date
            ORDER BY revenue DESC
        ) AS drnk
    FROM (
        SELECT o.order_date,
            oi.order_item_product_id,
            round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
        FROM orders o 
            JOIN order_items oi
                ON o.order_id = oi.order_item_order_id
        WHERE o.order_status IN ('COMPLETE', 'CLOSED')
        GROUP BY o.order_date, oi.order_item_product_id
    ) nq
) nq1
WHERE drnk <= 5
ORDER BY order_date, revenue DESC
LIMIT 20

 * postgresql://retail_user:***@34.229.28.240:5439/retail_db
20 rows affected.


order_date,order_item_product_id,revenue,drnk
2013-07-25 00:00:00,1004,5600,1
2013-07-25 00:00:00,191,5099,2
2013-07-25 00:00:00,957,4500,3
2013-07-25 00:00:00,365,3359,4
2013-07-25 00:00:00,1073,3000,5
2013-07-26 00:00:00,1004,10799,1
2013-07-26 00:00:00,365,7979,2
2013-07-26 00:00:00,957,6900,3
2013-07-26 00:00:00,191,6799,4
2013-07-26 00:00:00,1014,4798,5
