# L2 Exercise 3 : Creating Fact and Dimensions Tables with Star Schema

<img src="https://wiki.postgresql.org/images/a/a4/PostgreSQL_logo.3colors.svg" width="250" height="250">

<center><h1><span style='color:blue'>Environment preparation</span></h1></center>

Udacity environment has been prepared to ease student task, i.e. has a Postgres instance available for training exercises.

Let's create one based on Kubernetes.

* Add Pyscopg2 module to Python
* Load in K8s Postgresql

In [1]:
# Load package
#!pip install psycopg2-binary
#!pip install pandas --upgrade
#!pip install sqlalchemy --upgrade # ORM for databases
#!pip install ipython-sql --upgrade # SQL magic function

<h3><span style='color:blue'>Using K8S PostgreSQL</span></h3>

Obviously you need a k8s avaible like: Minikube, Minishift, Docker (with K8s)

Helm is need to, go to [helm.sh](http://helm.sh)

In [2]:
from time import sleep
import os

In [3]:
helm_version = !helm version --short
assert helm_version[0][:2] == 'v3', "Expected HELM version not available, visit https://helm.sh"

#!curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
#!chmod 700 /tmp/get_helm.sh
#!ls -al /tmp/
#!./get_helm.sh

In [4]:
!helm repo add bitnami https://charts.bitnami.com/bitnami

"bitnami" has been added to your repositories


In [5]:
CHART_INSTANCE_NAME = 'dend-l2e3'
os.environ['postgresql_port_instance_name'] = CHART_INSTANCE_NAME + "-postgresql"
os.getenv('postgresql_port_instance_name')

'dend-l2e3-postgresql'

In [6]:
helm_chart_out = !helm install {CHART_INSTANCE_NAME} stable/postgresql

In [7]:
postgresql_port_forward_command = helm_chart_out[-2].strip()
os.environ['postgresql_port_forward_command'] = postgresql_port_forward_command
os.getenv('postgresql_port_forward_command')

'kubectl port-forward --namespace default svc/dend-l2e3-postgresql 5432:5432 &'

In [8]:
# Waits until postgresl is running on 
max_checks_postgresql_run = 20

!kubectl get pods

while max_checks_postgresql_run > 0:

    postgres_is_running = !kubectl get pods|fgrep {CHART_INSTANCE_NAME}|fgrep "1/1"|fgrep "Running"
    
    if len(postgres_is_running) > 0 and not postgres_is_running[0] == 'No resources found.':
        break
    else:
        sleep(5)

        max_checks_postgresql_run -= 1

!kubectl get pods
assert max_checks_postgresql_run > 0, "Probably Postgresql is not running"

NAME                     READY   STATUS    RESTARTS   AGE
dend-l2e3-postgresql-0   0/1     Pending   0          0s
NAME                     READY   STATUS    RESTARTS   AGE
dend-l2e3-postgresql-0   1/1     Running   0          31s


<h3><span style='color:blue'>Open Proxy to PostgreSQL on K8s</span></h3>
Run next command in a separate terminal (if not run on Jupyter ;-))

In [9]:
%%script env --bg bash --out console_out
nohup kubectl port-forward --namespace default svc/dend-l2e3-postgresql 5432:5432 &

#%%script env postgres_port_forward_command="$postgres_port_forward_command" --bg bash --out console_out
#nohup kubectl port-forward --namespace default svc/dend-l1e1-postgresql 5432:5432 &

In [10]:
# Getting postgresql password from console out
postgresql_password = helm_chart_out[15].split('(')[1][:-1]
postgresql_password = !{postgresql_password}
postgresql_password = postgresql_password[0]

In [11]:
# Getting console command to connect with current instance of postgres
k8s_psql_command = helm_chart_out[19].strip().replace("$POSTGRES_PASSWORD", postgresql_password) + " -c "

In [12]:
!ps -ef|fgrep 'kubectl port-forward'

  501 12340     1   0  9:30PM ??         0:00.14 kubectl port-forward --namespace default svc/dend-l2e3-postgresql 5432:5432
  501 12344 12268   0  9:30PM ttys003    0:00.01 /bin/sh -c ps -ef|fgrep 'kubectl port-forward'
  501 12346 12344   0  9:30PM ttys003    0:00.00 fgrep kubectl port-forward


In [13]:
# Checks if proxy is enabled
pids_kubectl_proxy = !ps -ef|fgrep 'kubectl port-forward'|fgrep $CHART_INSTANCE_NAME|cut -d ' ' -f4
assert len(pids_kubectl_proxy) > 1, f"No kubectl proxy found, try in a console: '{postgresql_port_forward_command}'"

<h3><span style='color:blue'>Check Postgresql availibity</span></h3>
We had created an Postgresql on a K8s infraestructure, next we will test if it is avaiable

In [14]:
# Checks postgresql connection
select_1_postgresql_out = !{k8s_psql_command} 'SELECT 1;'
assert len(select_1_postgresql_out) > 0, 'Postgresql -select 1- failed, check it'
!{k8s_psql_command} 'SELECT version();'

                                                 version                        
                         
--------------------------------------------------------------------------------
-------------------------
 PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1
) 6.3.0 20170516, 64-bit
(1 row)

pod "dend-l2e3-postgresql-client" deleted


<h3><span style='color:blue'>Initialize Postgresql Student DB for excercise</span></h3>

In [15]:
!{k8s_psql_command} "CREATE ROLE student WITH LOGIN ENCRYPTED PASSWORD 'student'"

CREATE ROLE
pod "dend-l2e3-postgresql-client" deleted


In [16]:
!{k8s_psql_command} 'alter user student createdb;'

ALTER ROLE
pod "dend-l2e3-postgresql-client" deleted


In [17]:
!{k8s_psql_command} 'create database studentdb;'

If you don't see a command prompt, try pressing enter.
CREATE DATABASE
pod "dend-l2e3-postgresql-client" deleted


In [18]:
!{k8s_psql_command} 'grant all privileges on database studentdb to student;'

GRANT
pod "dend-l2e3-postgresql-client" deleted


In [19]:
!{k8s_psql_command} 'SELECT usename, usecreatedb FROM pg_user;'

 usename  | usecreatedb 
----------+-------------
 postgres | t
 student  | t
(2 rows)

pod "dend-l2e3-postgresql-client" deleted


# Lesson 2 Demo 3: Creating Fact and Dimension Tables with Star Schema

## Walk through the basics of modeling data using Fact and Dimension tables.  You will create both Fact and Dimension tables and show how this is a basic element of the Star Schema.

#### Import the library 
Note: An error might popup after this command has exectuted. If it does read it careful before ignoring. 

In [20]:
import psycopg2

JupyterLab allows "magics" easy commands to fasten user process like query :-)

References:
* https://github.com/catherinedevlin/ipython-sql
* https://towardsdatascience.com/jupyter-magics-with-sql-921370099589
* https://www.datacamp.com/community/tutorials/sql-interface-within-jupyterlab

In [21]:
%load_ext sql


### Create a connection to the database

In [22]:
try: 
    conn = psycopg2.connect("host=127.0.0.1 dbname=studentdb user=student password=student")
except psycopg2.Error as e: 
    print("Error: Could not make connection to the Postgres database")
    print(e)

In [23]:
%sql postgresql://student:student@localhost:5432/studentdb
%sql SELECT 1 as Test

 * postgresql://student:***@localhost:5432/studentdb
1 rows affected.


test
1


### Next use that connect to get a cursor that we will use to execute queries.

In [24]:
try: 
    cur = conn.cursor()
except psycopg2.Error as e: 
    print("Error: Could not get cursor to the Database")
    print(e)

#### For this demo we will use automactic commit so that each action is commited without having to call conn.commit() after each command. The ability to rollback and commit transactions are a feature of Relational Databases. 

In [25]:
conn.set_session(autocommit=True)

### Let's imagine we work at an online Music Store. There will be many tables in our database but let's just focus on 4 tables around customer purchases. 

`Table Name: customer_transactions
column: Customer Id
column: Store Id
column: Spent`

`Table Name: Customer
column: Customer Id
column: Name
column: Rewards`

`Table Name: store
column: Store Id
column: State`

`Table Name: items_purchased
column: customer id
column: Item Number
column: Item Name`

#### From this representation we can already start to see the makings of a "STAR". We have one fact table (the center of the star) and 3  dimension tables that are coming from it.

### Create the Fact Table and insert the data into the table

In [26]:
## Creating table "customer_transactions" on 3F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS customer_transactions
    (
        customer_id int,
        store_id int,
        spent real
    )
    """)
    print ("Table created")
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

Table created


### Create our Dimension Tables and insert data into those tables.

Data:

* customer_transactions
 * (1, "Amanda", True)
 * (2, "Toby", False)
 
* store
 * (1, "CA")
 * (2, "WA")
 
* items_purchased
 * (1, 1, "Rubber Soul")
 * (2, 3, "Let It Be")

In [27]:
%%sql 
CREATE TABLE IF NOT EXISTS customer
(
    customer_id int,
    name text,
    rewards boolean
)

 * postgresql://student:***@localhost:5432/studentdb
Done.


[]

In [28]:
%%sql 
CREATE TABLE IF NOT EXISTS store
(
    store_id int,
    state text
)

 * postgresql://student:***@localhost:5432/studentdb
Done.


[]

In [29]:
%%sql 
CREATE TABLE IF NOT EXISTS items_purchased
(
    customer_id int,
    item_number int,
    item_name text
)

 * postgresql://student:***@localhost:5432/studentdb
Done.


[]

In [30]:
%%sql 
INSERT INTO customer_transactions (customer_id, store_id, spent) VALUES (1, 1, 20.50);
INSERT INTO customer_transactions (customer_id, store_id, spent) VALUES (2, 1, 35.21);

INSERT INTO customer (customer_id, name, rewards) VALUES (1, 'Amanda', True);
INSERT INTO customer (customer_id, name, rewards) VALUES (2, 'Toby', False);

INSERT INTO store (store_id, state) VALUES (1, 'CA');
INSERT INTO store (store_id, state) VALUES (2, 'WA');

INSERT INTO items_purchased (customer_id, item_number, item_name) VALUES (1, 1, 'Rubber Soul');
INSERT INTO items_purchased (customer_id, item_number, item_name) VALUES (2, 3, 'Let It Be');

 * postgresql://student:***@localhost:5432/studentdb
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.


[]

### Now run the following queries on this data that utilize the Fact/Dimension and Star Schema

**Query 1**:  Find all the customers that spent more than 30 dollars, who are they, which store they bought it from, location of the store, what they bought and if they are a rewards member

In [31]:
%%sql

SELECT  name, store.store_id, store.state, item_name, customer.rewards
FROM 
    customer_transactions JOIN customer on customer_transactions.customer_id = customer.customer_id
    JOIN store on customer_transactions.store_id = store.store_id
    JOIN items_purchased on customer_transactions.customer_id = items_purchased.customer_id
    
WHERE spent > 30

 * postgresql://student:***@localhost:5432/studentdb
1 rows affected.


name,store_id,state,item_name,rewards
Toby,1,CA,Let It Be,False


**Query 2**: How much did Customer 2 spend?

In [32]:
%%sql

SELECT customer_transactions.customer_id, name, SUM(spent)
FROM 
    customer_transactions JOIN customer on customer_transactions.customer_id = customer.customer_id
WHERE customer_transactions.customer_id = 2
GROUP BY customer_transactions.customer_id, name


 * postgresql://student:***@localhost:5432/studentdb
1 rows affected.


customer_id,name,sum
2,Toby,35.21


### Summary: You can see here from this elegant schema that we were: 1) able to get "facts/metrics" from our fact table (how much each store sold), and 2) information about our customers that will allow us to do more indepth analytics to get answers to business questions by utilizing our fact and dimension tables. 

### Drop the tables

In [33]:
%%sql
DROP TABLE customer_transactions;
DROP TABLE customer;
DROP TABLE store;
DROP TABLE items_purchased;

 * postgresql://student:***@localhost:5432/studentdb
Done.
Done.
Done.
Done.


[]

### And finally close your cursor and connection. 

In [34]:
cur.close()
conn.close()

<h2><span style='color:blue'>Remove Environment</span></h2>

In [35]:
# Clears proxy
pids_kubectl_proxy = !ps -ef|fgrep 'kubectl port-forward'|fgrep $CHART_INSTANCE_NAME|cut -d ' ' -f4
!kill -9 {pids_kubectl_proxy[0]}

In [36]:
# Removes chart instances
!helm uninstall {CHART_INSTANCE_NAME}

release "dend-l2e3" uninstalled


In [37]:
# Removes persistent Volume
!kubectl get pvc|fgrep {CHART_INSTANCE_NAME}|cut -d ' '  -f1| xargs -t kubectl delete pvc

kubectl delete pvc data-dend-l2e3-postgresql-0
persistentvolumeclaim "data-dend-l2e3-postgresql-0" deleted


In [38]:
!kubectl get pvc

No resources found.
