### Import the library 
Note: An error might popup after this command has exectuted. If it does read it careful before ignoring. 

In [1]:
import psycopg2

In [3]:
conn=psycopg2.connect('host=127.0.0.1 dbname=data_engineering user=postgres password=macbook')

In [4]:
cur=conn.cursor()

In [5]:
conn.set_session(autocommit=True)

#### Let's start with our normalized (3NF) database set of tables we had in the last exercise, but we have added a new table `sales`. 

`Table Name: transactions2 
column 0: transaction Id
column 1: Customer Name
column 2: Cashier Id
column 3: Year `

`Table Name: albums_sold
column 0: Album Id
column 1: Transaction Id
column 3: Album Name` 

`Table Name: employees
column 0: Employee Id
column 1: Employee Name `

`Table Name: sales
column 0: Transaction Id
column 1: Amount Spent
`
<img src="images/table16.png" width="450" height="450"> <img src="images/table15.png" width="450" height="450"> <img src="images/table17.png" width="350" height="350"> <img src="images/table18.png" width="350" height="350">


### Add all Create statements for all Tables and Insert data into the tables

In [8]:
cur.execute("create table  if not exists transactions2(transaction_id int,customer_name varchar,cashier_id int,year int)")
cur.execute("create table  if not exists albums_sold(album_id int,transaction_id int,album_name varchar)")
cur.execute("create table  if not exists employees(employee_id int,employee_name varchar)")
cur.execute("create table  if not exists sales(transaction_id int,amount_spent int)")

# TO-DO: Insert data into the tables    


In [9]:
query="insert into transactions2(transaction_id,customer_name,cashier_id,year)values(%s,%s,%s,%s)"
cur.execute(query,(1,"adam",1,1990))

In [10]:
cur.execute(query,(2,"adam",1,1990))

In [11]:
cur.execute(query,(3,"tommy",1,1990))

In [12]:
cur.execute(query,(4,"gunk",2,1990))

In [21]:
query="insert into albums_sold(album_id,transaction_id,album_name)values(%s,%s,%s)"
cur.execute(query,(5,1,"jackie"))

In [14]:
cur.execute(query,(2,2,"ponny"))
cur.execute(query,(3,3,"sunny"))
cur.execute(query,(4,4,"staircase to hevan"))

In [15]:
query="insert into employees(employee_id,employee_name)values(%s,%s)"
cur.execute(query,(1,'Bob'))
cur.execute(query,(2,'Sam'))

In [16]:
query="INSERT INTO sales (transaction_id, amount_spent)VALUES (%s, %s)"

In [17]:
cur.execute(query,(1,100))
cur.execute(query,(2,10))
cur.execute(query,(3,400))
cur.execute(query,(4,100000))


#### Confirm using the Select statement the data were added correctly

In [22]:
print("Table: transactions2\n")
try: 
    cur.execute("SELECT * FROM transactions2;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

print("\nTable: albums_sold\n")
try: 
    cur.execute("SELECT * FROM albums_sold;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

print("\nTable: employees\n")
try: 
    cur.execute("SELECT * FROM employees;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()
    
print("\nTable: sales\n")
try: 
    cur.execute("SELECT * FROM sales;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

Table: transactions2

(1, 'adam', 1, 1990)
(2, 'adam', 1, 1990)
(3, 'tommy', 1, 1990)
(4, 'gunk', 2, 1990)

Table: albums_sold

(1, 1, 'shiva')
(2, 2, 'ponny')
(3, 3, 'sunny')
(4, 4, 'staircase to hevan')
(5, 1, 'jackie')

Table: employees

(1, 'Bob')
(2, 'Sam')

Table: sales

(1, 100)
(2, 10)
(3, 400)
(4, 100000)


### Let's say you need to do a query that gives:

`transaction_id
 customer_name
 cashier name
 year 
 albums sold
 amount sold` 

### Complete the statement below to perform a 3 way `JOIN` on the 4 tables you have created. 

In [23]:
try: 
    cur.execute("SELECT \
        transactions2.transaction_id AS transaction_id, \
        transactions2.customer_name  AS customer_name, \
        employees.employee_name      AS cashier_name, \
        transactions2.year           AS year, \
        albums_sold.album_name       AS albums_sold, \
        sales.amount_spent           AS amount_spent \
    FROM \
        transactions2 \
        JOIN employees ON (employees.employee_id = transactions2.cashier_id) \
        JOIN albums_sold ON (albums_sold.transaction_id = transactions2.transaction_id) \
        JOIN sales ON (sales.transaction_id = transactions2.transaction_id)")
    
    
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

(1, 'adam', 'Bob', 1990, 'shiva', 100)
(1, 'adam', 'Bob', 1990, 'jackie', 100)
(2, 'adam', 'Bob', 1990, 'ponny', 10)
(3, 'tommy', 'Bob', 1990, 'sunny', 400)
(4, 'gunk', 'Sam', 1990, 'staircase to hevan', 100000)


#### Great we were able to get the data we wanted.

### But, we had to perform a 3 way `JOIN` to get there. While it's great we had that flexibility, we need to remember that `JOINS` are slow and if we have a read heavy workload that required low latency queries we want to reduce the number of `JOINS`.  Let's think about denormalizing our normalized tables.

### With denormalization you want to think about the queries you are running and how to reduce the number of JOINS even if that means duplicating data. The following are the queries you need to run.


#### Query 1 : `select transaction_id, customer_name, amount_spent FROM <min number of tables>` 
It should generate the amount spent on each transaction 
#### Query 2: `select cashier_name, SUM(amount_spent) FROM <min number of tables> GROUP BY cashier_name` 
It should generate the total sales by cashier 

###  Query 1: `select transaction_id, customer_name, amount_spent FROM <min number of tables>`

One way to do this would be to do a JOIN on the `sales` and `transactions2` table but we want to minimize the use of `JOINS`.  

To reduce the number of tables, first add `amount_spent` to the `transactions` table so that you will not need to do a JOIN at all. 

`Table Name: transactions 
column 0: transaction Id
column 1: Customer Name
column 2: Cashier Id
column 3: Year
column 4: amount_spent`

<img src="images/table19.png" width="450" height="450">


In [32]:
cur.execute("create table if not exists transactions(transaction_id int,customer_name varchar,cashier_name varchar,year int,amount_spent int)")

In [35]:
query="insert into transactions(transaction_id,customer_name,cashier_name,year,amount_spent)values(%s,%s,%s,%s,%s)"

In [36]:
cur.execute(query,                (1, 'Amanda', 'Sam', 2000, 40))

In [37]:
cur.execute(query,                 (2, 'Toby', 'Sam', 2000, 19))
cur.execute(query,                 (3, 'Max', 'Bob', 2018, 45))


In [38]:
try: 
    cur.execute("SELECT transaction_id, customer_name, amount_spent FROM transactions")
        
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

(1, 'Amanda', 40)
(2, 'Toby', 19)
(3, 'Max', 45)


### Query 2: `select cashier_name, SUM(amount_spent) FROM <min number of tables> GROUP BY cashier_name` 

To avoid using any `JOINS`, first create a new table with just the information we need. 

`Table Name: cashier_sales
col: Transaction Id
Col: Cashier Name
Col: Cashier Id
col: Amount_Spent
`


### TO-DO: Create a new table with just the information you need.

In [59]:
cur.execute("create table if not exists cashier_sales(transaction_name varchar,cashier_name varchar, cashier_id int,amount_spent int)")

In [60]:
query="insert into cashier_sales(transaction_name,cashier_name,cashier_id,amount_spent)values(%s,%s,%s,%s)"

In [61]:
cur.execute(query,                 (1, 'Sam', 1, 40))
cur.execute(query,                 (2, 'Sam', 1, 19))
cur.execute(query,                 (3, 'Bob', 2, 45))


In [64]:
cur.execute("SELECT cashier_name, sum(amount_spent) FROM cashier_sales group by cashier_name ")


In [65]:
row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

('Sam', 59)
('Bob', 45)


In [58]:
cur.execute("drop table cashier_sales")

### Drop the tables


In [66]:
try: 
    cur.execute("DROP table sales")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table employees")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table albums_sold")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table transactions")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table transactions2")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)
try: 
    cur.execute("DROP table cashier_sales")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)

### close all the connections 

In [67]:
cur.close()
conn.close()