# Lesson 2 Exercise 1: Creating Normalized Tables

<img src="images/postgresSQLlogo.png" width="250" height="250">

## In this exercise we are going to walk through the basics of modeling data in normalized form. We will create tables in PostgreSQL, insert rows of data, and do simple JOIN SQL queries to show how these mutliple tables can work together. 

#### Import the library 
Note: An error might popup after this command has exectuted. If it does, read it carefully before ignoring. 

In [1]:
import psycopg2 as postgres

__Create a connection to the database, get a cursor, and set autocommit to true)__

In [2]:
# Prepare connection string
host_ip = "127.0.0.1"
database_name = "postgres"
username = "postgres"
password = "123456"
port = "5432"
connection_string = "host={} dbname={} user={} password={} port={}".format(
    host_ip, database_name, username, password, port)

# Create a connection
connection = postgres.connect(connection_string)

In [3]:
# Get a cursor to execute our queries
cursor = connection.cursor()

In [4]:
# Set auto-commit to true; to avoid comitting every action we make
connection.set_session(autocommit = True)

#### Let's imagine we have a table called Music Store. 

`Table Name: music_store
column 0: Transaction Id
column 1: Customer Name
column 2: Cashier Name
column 3: Year 
column 4: Albums Purchased`


## Now to translate this information into a CREATE Table Statement and insert the data

<img src="images/table12.png" width="650" height="650">




In [5]:
# Prepare the query of the order
query = "CREATE TABLE IF NOT EXISTS music_store "

# add columns with their types
query = query + "(transaction_id INT, \
                  customer_name VARCHAR, \
                  cashier_name VARCHAR, \
                  year INT, \
                  albums_purchased TEXT[])"

# Execute the query
cursor.execute(query)

In [6]:
# Build the fundmental query
query = "INSERT INTO music_store "

# add column names to the query
query = query + "(transaction_id, customer_name, cashier_name, year, albums_purchased) " 

# add placeholders of the columns to the query
query = query + "VALUES (%s, %s, %s, %s, %s)"

In [7]:
# Insert first row
cursor.execute(query, (1, 'Amanda', 'Sam', 2000, ['Rubber Soul', 'Let it Be']))

In [8]:
# Insert second row
cursor.execute(query, (2, 'Toby', 'Sam', 2000, ['My Generation']))

In [9]:
# Insert third row
cursor.execute(query, (3, 'Max', 'Bob', 2018, ['Meet the Beatles', 'Help!']))

In [10]:
# Confirm that data were inserted successful
query = "SELECT * FROM music_store"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Amanda', 'Sam', 2000, ['Rubber Soul', 'Let it Be'])
(2, 'Toby', 'Sam', 2000, ['My Generation'])
(3, 'Max', 'Bob', 2018, ['Meet the Beatles', 'Help!'])


In [11]:
# Drop the table to clean up
cursor.execute("DROP TABLE music_store")

#### Moving to 1st Normal Form (1NF)

### TO-DO: This data has not been normalized. To get this data into 1st normal form, you need to remove any collections or list of data and break up the list of songs into individual rows. 


In [12]:
# Prepare the query of the order
query = "CREATE TABLE IF NOT EXISTS music_store "

# add columns with their types
query = query + "(transaction_id INT, \
                  customer_name VARCHAR, \
                  cashier_name VARCHAR, \
                  year INT, \
                  album_name TEXT)"

# Execute the query
cursor.execute(query)

In [13]:
# Build the fundmental query
query = "INSERT INTO music_store "

# add column names to the query
query = query + "(transaction_id, customer_name, cashier_name, year, album_name) " 

# add placeholders of the columns to the query
query = query + "VALUES (%s, %s, %s, %s, %s)"

In [14]:
# Insert first row
cursor.execute(query, (1, 'Amanda', 'Sam', 2000, 'Rubber Soul'))

In [15]:
# Insert second row
cursor.execute(query, (1, 'Amanda', 'Sam', 2000, 'Let it Be'))

In [16]:
# Insert third row
cursor.execute(query, (2, 'Toby', 'Sam', 2000, 'My Generation'))

In [17]:
# Insert fourth row
cursor.execute(query, (3, 'Max', 'Bob', 2018, 'Meet the Beatles'))

In [18]:
# Insert fifth row
cursor.execute(query, (3, 'Max', 'Bob', 2018, 'Help!'))

In [19]:
# Confirm that data were inserted successful
query = "SELECT * FROM music_store"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
(1, 'Amanda', 'Sam', 2000, 'Let it Be')
(2, 'Toby', 'Sam', 2000, 'My Generation')
(3, 'Max', 'Bob', 2018, 'Meet the Beatles')
(3, 'Max', 'Bob', 2018, 'Help!')


In [20]:
# Drop the table to clean up
cursor.execute("DROP TABLE music_store")

#### Moving to 2nd Normal Form (2NF)
You have now moved the data into 1NF, which is the first step in moving to 2nd Normal Form. The table is not yet in 2nd Normal Form. While each of the records in the table is unique, our Primary key (transaction id) is not unique. 

### TO-DO: Break up the table into two tables, transactions and albums sold. 


`Table Name: transactions
column 0: transaction_id
column 1: customer_name
column 2: cashier_name
column 3: year `

`Table Name: albums_sold
column 0: album_id
column 1: transaction_id
column 2: album_name`

### Since customer can buys multiple albums in the single transactions; we need to give each album a unique ID. That's why we will create the album_id column

In [21]:
# Build the query to create the transactions table
query = "CREATE TABLE IF NOT EXISTS transactions "
query += "(transaction_id INT, customer_name VARCHAR, cashier_name VARCHAR, year INT)"

# Execute the query and create the table
cursor.execute(query)

In [22]:
# Build the fundmental query
query = "INSERT INTO transactions "

# add column names to the query
query += "(transaction_id, customer_name, cashier_name, year) " 

# add placeholders of the columns to the query
query = query + "VALUES (%s, %s, %s, %s)"

In [23]:
# Insert first row
cursor.execute(query, (1, 'Amanda', 'Sam', 2000))

In [24]:
# Insert second row
cursor.execute(query, (2, 'Toby', 'Sam', 2000))

In [25]:
# Insert third row
cursor.execute(query, (3, 'Max', 'Bob', 2018))

In [26]:
# Confirm that data were inserted successful
query = "SELECT * FROM transactions"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Amanda', 'Sam', 2000)
(2, 'Toby', 'Sam', 2000)
(3, 'Max', 'Bob', 2018)


In [27]:
# Build the query to create the albums sold table
query = "CREATE TABLE IF NOT EXISTS albums_sold"
query += "(album_id INT, transaction_id INT, album_name VARCHAR)"

# Execute the query and create the table
cursor.execute(query)

In [28]:
# Build the fundmental query
query = "INSERT INTO albums_sold "

# add column names to the query
query += "(album_id, transaction_id, album_name) " 

# add placeholders of the columns to the query
query += "VALUES (%s, %s, %s)"

In [29]:
# Insert first row
cursor.execute(query, (1, 1, 'Rubber Soul'))

In [30]:
# Insert second row
cursor.execute(query, (2, 1, 'Let it Be'))

In [31]:
# Insert third row
cursor.execute(query, (3, 2, 'My Generation'))

In [32]:
# Insert fourth row
cursor.execute(query, (4, 3, 'Meet the Beatles'))

In [33]:
# Insert fifth row
cursor.execute(query, (5, 3, 'Help!'))

In [34]:
# Confirm that data were inserted successful
query = "SELECT * FROM albums_sold"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 1, 'Rubber Soul')
(2, 1, 'Let it Be')
(3, 2, 'My Generation')
(4, 3, 'Meet the Beatles')
(5, 3, 'Help!')


### TO-DO: Do a `JOIN` on these tables to get all the information in the original first Table. 

We want to get the following columns as the original table:  
transaction_id, customer_name, cashier_name, year, album_name

Original Table:  
(1, 'Amanda', 'Sam', 2000, 'Rubber Soul')  
(1, 'Amanda', 'Sam', 2000, 'Let it Be')  
(2, 'Toby', 'Sam', 2000, 'My Generation') 
(3, 'Max', 'Bob', 2018, 'Meet the Beatles')  
(3, 'Max', 'Bob', 2018, 'Help!')  

In [35]:
# Build the JOIN query
query = "SELECT transactions.transaction_id, transactions.customer_name, "
query += "transactions.cashier_name, transactions.year, albums_sold.album_name "
query += "FROM transactions JOIN albums_sold "
query += "ON transactions.transaction_id = albums_sold.transaction_id"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
(1, 'Amanda', 'Sam', 2000, 'Let it Be')
(2, 'Toby', 'Sam', 2000, 'My Generation')
(3, 'Max', 'Bob', 2018, 'Meet the Beatles')
(3, 'Max', 'Bob', 2018, 'Help!')


In [36]:
# Drop the table to clean up
cursor.execute("DROP TABLE transactions")

#### Moving to 3rd Normal Form (3NF)
Check our table for any transitive dependencies. 
_HINT:_ Check the table for any transitive dependencies. _Transactions_ can remove _Cashier Name_ to its own table, called _Employees_, which will leave us with 3 tables. 


### TO-DO: Create the third table named *employees* to move to 3rd NF. 


`Table Name: transactions 
column 0: transaction Id
column 1: Customer Name
column 2: Cashier Id
column 3: Year `

`Table Name: albums_sold
column 0: Album Id
column 1: Transaction Id
column 3: Album Name` 

`Table Name: employees
column 0: Employee Id
column 1: Employee Name `

In [37]:
# Build the query to create the transactions table
query = "CREATE TABLE IF NOT EXISTS employees "
query += "(employee_id INT, employee_name VARCHAR)"

# Execute the query and create the table
cursor.execute(query)

In [38]:
# set fundamental query
query = "INSERT INTO employees "
query += "(employee_id, employee_name) "
query += "VALUES (%s, %s)"

In [39]:
# add the two employees to the table
cursor.execute(query, (1, 'Sam'))
cursor.execute(query, (2, 'Bob'))

In [40]:
# Confirm that data were inserted successful
query = "SELECT * FROM employees"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Sam')
(2, 'Bob')


In [41]:
# Build the query to create the transactions table
query = "CREATE TABLE IF NOT EXISTS transactions "
query += "(transaction_id INT, customer_name VARCHAR, cashier_id INT, year INT)"

# Execute the query and create the table
cursor.execute(query)

In [42]:
# Set the fundamental query
query = "INSERT INTO transactions "
query += "(transaction_id, customer_name, cashier_id, year) "
query += "VALUES (%s, %s, %s, %s)"

In [43]:
# add first transaction
cursor.execute(query, (1, 'Amanda', 1, 2000))

# add second transaction
cursor.execute(query, (2, 'Toby', 1, 2000))

# add third transaction
cursor.execute(query, (3, 'Max', 2, 2018))

(1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
(1, 'Amanda', 'Sam', 2000, 'Let it Be')
(2, 'Toby', 'Sam', 2000, 'My Generation')
(3, 'Max', 'Bob', 2018, 'Meet the Beatles')
(3, 'Max', 'Bob', 2018, 'Help!')

In [44]:
# Confirm that data were inserted successful
query = "SELECT * FROM transactions"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Amanda', 1, 2000)
(2, 'Toby', 1, 2000)
(3, 'Max', 2, 2018)


### TO-DO: Complete the last two `JOIN` on these 3 tables so we can get all the information we had in our first Table. 

In [45]:
query = "SELECT transactions.transaction_id, "
query += "transactions.customer_name, "
query += "employees.employee_name, "
query += "transactions.year, "
query += "albums_sold.album_name "
query += "FROM (transactions JOIN employees "
query += "ON transactions.cashier_id = employees.employee_id) "
query += "JOIN albums_sold ON transactions.transaction_id = albums_sold.transaction_id"

# Execute the query
rows = cursor.execute(query)
    
# Fetch first row
row = cursor.fetchone()

# Print all rows till row is NaN
while row:
    
    # print current fetched row
    print(row)
    
    # Fetch next row
    row = cursor.fetchone()

(1, 'Amanda', 'Sam', 2000, 'Rubber Soul')
(1, 'Amanda', 'Sam', 2000, 'Let it Be')
(2, 'Toby', 'Sam', 2000, 'My Generation')
(3, 'Max', 'Bob', 2018, 'Meet the Beatles')
(3, 'Max', 'Bob', 2018, 'Help!')


<img src="images/table12.png" width="650" height="650">

### Awesome work!! You have Normalized the dataset! 

### Drop all tables to clean up

In [46]:
cursor.execute("DROP TABLE transactions, employees, albums_sold")

### And finally close your cursor and connection. 

In [47]:
cursor.close()
connection.close()