# L2 Exercise 1: Creating Normalized Tables

<img src="https://wiki.postgresql.org/images/a/a4/PostgreSQL_logo.3colors.svg" width="250" height="250">

<center><h1><span style='color:blue'>Environment preparation</span></h1></center>

Udacity environment has been prepared to ease student task, i.e. has a Postgres instance available for training exercises.

Let's create one based on Kubernetes.

* Add Pyscopg2 module to Python
* Load in K8s Postgresql

In [1]:
# Load package
# !pip install psycopg2-binary

<h3><span style='color:blue'>Using K8S PostgreSQL</span></h3>

Obviously you need a k8s avaible like: Minikube, Minishift, Docker (with K8s)

Helm is need to, go to [helm.sh](http://helm.sh)

In [1]:
from time import sleep
import os

In [2]:
helm_version = !helm version --short
assert helm_version[0][:2] == 'v3', "Expected HELM version not available, visit https://helm.sh"

#!curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
#!chmod 700 /tmp/get_helm.sh
#!ls -al /tmp/
#!./get_helm.sh

In [3]:
!helm repo add bitnami https://charts.bitnami.com/bitnami

"bitnami" has been added to your repositories


In [4]:
CHART_INSTANCE_NAME = 'dend-l2e1'
os.environ['postgresql_port_instance_name'] = CHART_INSTANCE_NAME + "-postgresql"
os.getenv('postgresql_port_instance_name')

'dend-l2e2-postgresql'

In [5]:
helm_chart_out = !helm install {CHART_INSTANCE_NAME} stable/postgresql

In [6]:
postgresql_port_forward_command = helm_chart_out[-2].strip()
os.environ['postgresql_port_forward_command'] = postgresql_port_forward_command
os.getenv('postgresql_port_forward_command')

'kubectl port-forward --namespace default svc/dend-l2e2-postgresql 5432:5432 &'

In [7]:
# Waits until postgresl is running on 
max_checks_postgresql_run = 20

!kubectl get pods

while max_checks_postgresql_run > 0:

    postgres_is_running = !kubectl get pods|fgrep {CHART_INSTANCE_NAME}|fgrep "1/1"|fgrep "Running"
    
    if len(postgres_is_running) > 0 and not postgres_is_running[0] == 'No resources found.':
        break
    else:
        sleep(5)

        max_checks_postgresql_run -= 1

!kubectl get pods
assert max_checks_postgresql_run > 0, "Probably Postgresql is not running"

NAME                     READY   STATUS     RESTARTS   AGE
dend-l2e2-postgresql-0   0/1     Init:0/1   0          2s
NAME                     READY   STATUS    RESTARTS   AGE
dend-l2e2-postgresql-0   1/1     Running   0          33s


<h3><span style='color:blue'>Open Proxy to PostgreSQL on K8s</span></h3>
Run next command in a separate terminal (if not run on Jupyter ;-))

In [13]:
%%script env --bg bash --out console_out
nohup kubectl port-forward --namespace default svc/dend-l2e1-postgresql 5432:5432 &

#%%script env postgres_port_forward_command="$postgres_port_forward_command" --bg bash --out console_out
#nohup kubectl port-forward --namespace default svc/dend-l1e1-postgresql 5432:5432 &

In [14]:
# Getting postgresql password from console out
postgresql_password = helm_chart_out[15].split('(')[1][:-1]
postgresql_password = !{postgresql_password}
postgresql_password = postgresql_password[0]

In [15]:
# Getting console command to connect with current instance of postgres
k8s_psql_command = helm_chart_out[19].strip().replace("$POSTGRES_PASSWORD", postgresql_password) + " -c "

In [16]:
!ps -ef|fgrep 'kubectl port-forward'

  501 57444 57129   0  7:02PM ttys000    0:00.01 /bin/sh -c ps -ef|fgrep 'kubectl port-forward'
  501 57446 57444   0  7:02PM ttys000    0:00.00 fgrep kubectl port-forward


In [11]:
# Checks if proxy is enabled
pids_kubectl_proxy = !ps -ef|fgrep 'kubectl port-forward'|fgrep $CHART_INSTANCE_NAME|cut -d ' ' -f4
assert len(pids_kubectl_proxy) > 1, f"No kubectl proxy found, try in a console: '{postgresql_port_forward_command}'"

AssertionError: No kubectl proxy found, try in a console: 'kubectl port-forward --namespace default svc/dend-l2e2-postgresql 5432:5432 &'

<h3><span style='color:blue'>Check Postgresql availibity</span></h3>
We had created an Postgresql on a K8s infraestructure, next we will test if it is avaiable

In [13]:
# Checks postgresql connection
select_1_postgresql_out = !{k8s_psql_command} 'SELECT 1;'
assert len(select_1_postgresql_out) > 0, 'Postgresql -select 1- failed, check it'
!{k8s_psql_command} 'SELECT version();'

                                                 version                        
                         
--------------------------------------------------------------------------------
-------------------------
 PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1
) 6.3.0 20170516, 64-bit
(1 row)

pod "dend-l2e1-postgresql-client" deleted


<h3><span style='color:blue'>Initialize Postgresql Student DB for excercise</span></h3>

In [14]:
!{k8s_psql_command} "CREATE ROLE student WITH LOGIN ENCRYPTED PASSWORD 'student'"

CREATE ROLE
pod "dend-l2e1-postgresql-client" deleted


In [15]:
!{k8s_psql_command} 'alter user student createdb;'

ALTER ROLE
pod "dend-l2e1-postgresql-client" deleted


In [16]:
!{k8s_psql_command} 'create database studentdb;'

If you don't see a command prompt, try pressing enter.
CREATE DATABASE
pod "dend-l2e1-postgresql-client" deleted


In [17]:
!{k8s_psql_command} 'grant all privileges on database studentdb to student;'

GRANT
pod "dend-l2e1-postgresql-client" deleted


In [18]:
!{k8s_psql_command} 'SELECT usename, usecreatedb FROM pg_user;'

 usename  | usecreatedb 
----------+-------------
 postgres | t
 student  | t
(2 rows)

pod "dend-l2e1-postgresql-client" deleted


## In this exercise we are going to walk through the basics of modeling data in normalized form. We will create tables in PostgreSQL, insert rows of data, and do simple JOIN SQL queries to show how these mutliple tables can work together. 


#### Import the library 
Note: An error might popup after this command has exectuted. If it does, read it carefully before ignoring. 

In [19]:
import psycopg2

####  Create a connection to the database, get a cursor, and set autocommit to true

In [20]:
try: 
    conn = psycopg2.connect("host=127.0.0.1 dbname=studentdb user=student password=student")
except psycopg2.Error as e: 
    print("Error: Could not make connection to the Postgres database")
    print(e)
try: 
    cur = conn.cursor()
except psycopg2.Error as e: 
    print("Error: Could not get cursor to the Database")
    print(e)
conn.set_session(autocommit=True)

#### Let's imagine we have a table called Music Store. 

`Table Name: music_store
column 0: Transaction Id
column 1: Customer Name
column 2: Cashier Name
column 3: Year 
column 4: Albums Purchased`

## Now to translate this information into a Create Table Statement and insert the data

Data rows:

* (1, 'Amanda', 'Sam', 2000, ['Rubber Soul', 'Let it Be'])
* (2, 'Toby', 'Sam', 2000, ['My Generation'])
* (3, 'Max', 'Bob', 2018, ['Meet the Beatles', 'Help!'])


In [21]:
## Creating table "music_store"
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS music_store 
    (
        transaction_id int,
        customer_name text,
        chashier_name text,
        year int,
        albums_purchased text[]
    )
    """)
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

In [22]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (1, 'Amanda', 'Sam', 2000, ['Rubber Soul', 'Let it Be'])
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [23]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (2, 'Toby', 'Sam', 2000, ['My Generation'])
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [24]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (3, 'Max', 'Bob', 2018, ['Meet the Beatles', 'Help!'])
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [25]:
try: 
    cur.execute(
    """
        SELECT * FROM music_store
    """)
except psycopg2.Error as e:
    print(e)

for crow in cur.fetchall():
    print(crow)

(1, 'Amanda', 'Sam', 2000, ['Rubber Soul', 'Let it Be'])
(2, 'Toby', 'Sam', 2000, ['My Generation'])
(3, 'Max', 'Bob', 2018, ['Meet the Beatles', 'Help!'])


#### Moving to 1st Normal Form (1NF)
This data has not been normalized. To get this data into 1st normal form, we will need to remove any collections or list of data. We need to break up the list of songs into individual rows. 

`Table Name: music_store
column 0: Transaction Id
column 1: Customer Name
column 2: Cashier Name
column 3: Year 
column 4: Albums Purchased`

Data Rows:

* (1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
* (1, 'Amanda', 'Sam', 2000, 'Let it Be')
* (2, 'Toby', 'Sam', 2000, 'My Generation')
* (3, 'Max', 'Bob', 2018, 'Help!')
* (3, 'Max', 'Bob', 2018, 'Meet the Beatles')

In [26]:
## Creating table "music_store2" on 1F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS music_store2
    (
        transaction_id int,
        customer_name text,
        chashier_name text,
        year int,
        albums_purchased text
    )
    """)
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

In [27]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store2 (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [28]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store2 (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (1, 'Amanda', 'Sam', 2000,'Let it Be')
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [29]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store2 (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (2, 'Toby', 'Sam', 2000, 'My Generation')
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [30]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store2 (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (3, 'Max', 'Bob', 2018, 'Meet the Beatles')
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [31]:
# Insert row
try: 
    cur.execute(
    """
    INSERT INTO music_store2 (transaction_id, customer_name, chashier_name, year, albums_purchased) \
    VALUES (%s, %s, %s, %s, %s)
    """ , (3, 'Max', 'Bob', 2018,'Help!')
    )
except psycopg2.Error as e:
    print("Error: Could not create a row")
    print(e)

In [32]:
try: 
    cur.execute(
    """
        SELECT * FROM music_store2
    """)
except psycopg2.Error as e:
    print(e)

for crow in cur.fetchall():
    print(crow)

(1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
(1, 'Amanda', 'Sam', 2000, 'Let it Be')
(2, 'Toby', 'Sam', 2000, 'My Generation')
(3, 'Max', 'Bob', 2018, 'Meet the Beatles')
(3, 'Max', 'Bob', 2018, 'Help!')


#### Moving to 2nd Normal Form (2NF)
We have moved our data to be in 1NF which is the first step in moving to 2nd Normal Form. Our table is not yet in 2nd Normal Form. While each of our records in our table is unique, our Primary key (transaction id) is not unique. We need to break this up into two tables, transactions and albums sold. 

`Table Name: transactions 
column 0: Transaction ID
column 1: Customer Name
column 2: Cashier Name
column 3: Year `

`Table Name: albums_sold
column 0: Album Id
column 1: Transaction Id
column 3: Album Name` 

##### Data Rows

* Table: transactions
 * (1, 'Amanda', 'Sam', 2000)
 * (2, 'Toby', 'Sam', 2000)
 * (3, 'Max', 'Bob', 2018)

* Table: albums_sold
 * (1, 1, 'Rubber Soul')
 * (2, 1, 'Let it Be')
 * (3, 2, 'My Generation')
 * (4, 3, 'Meet the Beatles')
 * (5, 3, 'Help!')

In [33]:
## Creating table "transactions" on 2F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS transactions
    (
        transaction_id int,
        customer_name text,
        chashier_name text,
        year int
    )
    """)
    print ("Table created")
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

Table created


In [34]:
## Creating table "albums_sold" on 2F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS albums_sold
    (
        album_id int, 
        transaction_id int,
        album_name text
    )
    """)
    print ("Table created")    
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

Table created


In [35]:
# Insert row in transactions
def insert_into_transactions(params):
    try: 
        cur.execute(
        """
        INSERT INTO transactions (transaction_id, customer_name, chashier_name, year) \
        VALUES (%s, %s, %s, %s)
        """ , params
        )
        print(f"Row inserted {params}")
    except psycopg2.Error as e:
        print("Error: Could not create a row")
        print(e)
        
insert_into_transactions((1, 'Amanda', 'Sam', 2000))
insert_into_transactions((2, 'Toby', 'Sam', 2000))
insert_into_transactions((3, 'Max', 'Bob', 2018))

Row inserted (1, 'Amanda', 'Sam', 2000)
Row inserted (2, 'Toby', 'Sam', 2000)
Row inserted (3, 'Max', 'Bob', 2018)


In [36]:
# Insert row in albums_sold
def insert_into_albums_sold(params):
    try: 
        cur.execute(
        """
        INSERT INTO albums_sold (album_id, transaction_id, album_name) \
        VALUES (%s, %s, %s)
        """ , params
        )
        print(f"Row inserted {params}")
    except psycopg2.Error as e:
        print("Error: Could not create a row")
        print(e)
        
insert_into_albums_sold((1, 1, 'Rubber Soul'))
insert_into_albums_sold((2, 1, 'Let it Be'))
insert_into_albums_sold((3, 2, 'My Generation'))
insert_into_albums_sold((4, 3, 'Meet the Beatles'))
insert_into_albums_sold((5, 3, 'Help!'))

Row inserted (1, 1, 'Rubber Soul')
Row inserted (2, 1, 'Let it Be')
Row inserted (3, 2, 'My Generation')
Row inserted (4, 3, 'Meet the Beatles')
Row inserted (5, 3, 'Help!')


In [37]:
try: 
    cur.execute(
    """
        SELECT * FROM transactions
    """)
except psycopg2.Error as e:
    print(e)

for crow in cur.fetchall():
    print(crow)

(1, 'Amanda', 'Sam', 2000)
(2, 'Toby', 'Sam', 2000)
(3, 'Max', 'Bob', 2018)


In [38]:
try: 
    cur.execute(
    """
        SELECT * FROM albums_sold
    """)
except psycopg2.Error as e:
    print(e)

for crow in cur.fetchall():
    print(crow)

(1, 1, 'Rubber Soul')
(2, 1, 'Let it Be')
(3, 2, 'My Generation')
(4, 3, 'Meet the Beatles')
(5, 3, 'Help!')


#### Let's do a `JOIN` on this table so we can get all the information we had in our first Table. 

In [39]:
# We complete the join on the transactions and album_sold tables

try: 
    cur.execute("SELECT * FROM transactions JOIN albums_sold ON transactions.transaction_id = albums_sold.transaction_id ;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()


(1, 'Amanda', 'Sam', 2000, 1, 1, 'Rubber Soul')
(1, 'Amanda', 'Sam', 2000, 2, 1, 'Let it Be')
(2, 'Toby', 'Sam', 2000, 3, 2, 'My Generation')
(3, 'Max', 'Bob', 2018, 4, 3, 'Meet the Beatles')
(3, 'Max', 'Bob', 2018, 5, 3, 'Help!')


#### Moving to 3rd Normal Form (3NF)
Let's check our table for any transitive dependencies. Transactions can remove Cashier Name to its own table, called Employees, which will leave us with 3 tables. 
`Table Name: transactions2 
column 0: transaction Id
column 1: Customer Name
column 2: Cashier Id
column 3: Year `

`Table Name: albums_sold
column 0: Album Id
column 1: Transaction Id
column 3: Album Name` 

`Table Name: employees
column 0: Employee Id
column 1: Employee Name `

##### Data Rows

* Table: transactions2
 * (1, 'Amanda', 1, 2000)
 * (2, 'Toby', 1, 2000)
 * (3, 'Max', 2, 2018)
* Table: albums_sold
 * (1, 1, 'Rubber Soul')
 * (2, 1, 'Let it Be')
 * (3, 2, 'My Generation')
 * (4, 3, 'Meet the Beatles')
 * (5, 3, 'Help!')
* Table: employees
 * (1, 'Sam')
 * (2, 'Bob')



In [40]:
## Creating table "transactions2" on 3F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS transactions2
    (
        transaction_id int,
        customer_name text,
        cashier_id int,
        year int
    )
    """)
    print ("Table created")
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

Table created


In [41]:
## Creating table "albums_sold2" on 3F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS albums_sold
    (
        album_id int, 
        transaction_id int,
        album_name text
    )
    """)
    print ("Table created")    
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

Table created


In [42]:
## Creating table "employees" on 3F 
try:
    cur.execute("""
    CREATE TABLE IF NOT EXISTS employees
    (
        employee_id int, 
        employee_name text
    )
    """)
    print ("Table created")    
except psycopg2.Error as e:
    print("Error: Could not create table")
    print(e)

Table created


In [43]:
# Insert row in transactions
def insert_into_transactions2(params):
    try: 
        cur.execute(
        """
        INSERT INTO transactions2 (transaction_id, customer_name, cashier_id, year) \
        VALUES (%s, %s, %s, %s)
        """ , params
        )
        print(f"Row inserted {params}")
    except psycopg2.Error as e:
        print("Error: Could not create a row")
        print(e)
        
insert_into_transactions2((1, 'Amanda', 1, 2000))
insert_into_transactions2((2, 'Toby', 1, 2000))
insert_into_transactions2((3, 'Max', 2, 2018))

Row inserted (1, 'Amanda', 1, 2000)
Row inserted (2, 'Toby', 1, 2000)
Row inserted (3, 'Max', 2, 2018)


In [44]:
# Don't insert rows for albums_sold beacause has same information from previous exercises

In [45]:
# Insert row in transactions
def insert_into_employees(params):
    try: 
        cur.execute(
        """
        INSERT INTO employees (employee_id, employee_name) \
        VALUES (%s, %s)
        """ , params
        )
        print(f"Row inserted {params}")
    except psycopg2.Error as e:
        print("Error: Could not create a row")
        print(e)
        
insert_into_employees((1, 'Sam'))
insert_into_employees((2, 'Bob'))

Row inserted (1, 'Sam')
Row inserted (2, 'Bob')


#### Let's do two `JOIN` on these 3 tables so we can get all the information we had in our first Table. 

In [46]:
try: 
    cur.execute("SELECT * FROM (transactions2 JOIN albums_sold ON \
                               transactions2.transaction_id = albums_sold.transaction_id) JOIN \
                               employees ON transactions2.cashier_id=employees.employee_id;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

(1, 'Amanda', 1, 2000, 1, 1, 'Rubber Soul', 1, 'Sam')
(1, 'Amanda', 1, 2000, 2, 1, 'Let it Be', 1, 'Sam')
(2, 'Toby', 1, 2000, 3, 2, 'My Generation', 1, 'Sam')
(3, 'Max', 2, 2018, 4, 3, 'Meet the Beatles', 2, 'Bob')
(3, 'Max', 2, 2018, 5, 3, 'Help!', 2, 'Bob')


### DONE! We have Normalized our dataset! 

### For the sake of the demo, Iet's drop the tables. 

In [47]:
try: 
    cur.execute("DROP table music_store")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table music_store2")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table albums_sold")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table employees")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table transactions")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table transactions2")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)

### And finally close the cursor and connection. 

In [48]:
cur.close()
conn.close()

<h2><span style='color:blue'>Remove Environment</span></h2>

In [49]:
# Clears proxy
pids_kubectl_proxy = !ps -ef|fgrep 'kubectl port-forward'|fgrep $CHART_INSTANCE_NAME|cut -d ' ' -f4
!kill -9 {pids_kubectl_proxy[0]}

In [17]:
# Removes chart instances
!helm uninstall {CHART_INSTANCE_NAME}

release "dend-l2e2" uninstalled


In [18]:
# Removes persistent Volume
!kubectl get pvc|fgrep {CHART_INSTANCE_NAME}|cut -d ' '  -f1| xargs -t kubectl delete pvc

kubectl delete pvc data-dend-l2e2-postgresql-0
persistentvolumeclaim "data-dend-l2e2-postgresql-0" deleted
