# Normalisation - books purchased example (optional)

In this Notebook you can follow the normalisation of the books purchased data described in Activity 10.2 
using SQL to create a set of normalised tables from unnormalised data shown in Figure 10.21, 
which represents the same information as shown in Figure 10.2 but as a relation.

Enable access to the PostgreSQL database engine via [SQL Cell Magic](https://pypi.python.org/pypi/ipython-sql).

In [None]:
%load_ext sql
%sql postgresql://test:test@localhost:5432/tm351test

## Unnormalised Form (UNF)

Create the `books_purchased` table which will represent the `books_purchased` relation shown in Figure 10.21.

In [None]:
%%sql
DROP TABLE IF EXISTS books_purchased CASCADE;

CREATE TABLE books_purchased (
 invoice_no CHAR(8) NOT NULL,
 date DATE NOT NULL,
 customer_no CHAR(6) NOT NULL,
 customer_name VARCHAR(25) NOT NULL,
 isbn CHAR(14) NOT NULL,
 title VARCHAR(100) NOT NULL,
 quantity INTEGER NOT NULL,
 cost DECIMAL(5,2) NOT NULL,
 PRIMARY KEY (invoice_no, isbn)
);

Populate the `drugs_prescribed` table from a CSV file named `drugs_prescribed.csv` using [Psycopg](http://initd.org/psycopg/docs/index.html), 
a PostgreSQL database adapter for Python.

In [None]:
import psycopg2 as pg
import pandas as pd
import pandas.io.sql as psqlg
conn = pg.connect(dbname='tm351test', host='localhost', user='test', password='test', port=5432)

In [None]:
# open a connection to the PostgreSQL database tm351test
conn = pg.connect(dbname='tm351test', host='localhost', user='test', password='test', port=5432)
# create a cursor
c = conn.cursor()
# open books_purchased.dat
io = open('data/books_purchased.dat', 'r')
# execute the PostgreSQL copy command
c.copy_from(io, 'books_purchased')
# close books_purchased.dat
io.close()
# commit transaction
conn.commit()
# close cursor
c.close()
# close database connection
conn.close()

In [None]:
%%sql
SELECT *
FROM books_purchased
ORDER BY (invoice_no, isbn);

## Moving to First Normal Form (1NF)

In the unnormalised data above (the `books_purchased` tabel), there are several values for the `isbn`, `title`, 
`quantity` and `cost` attributes (columns) for each invoice. These items are a repeating group and are removed to a 
separate relation representing order items (the `order_item_book`), which has a primary key comprising the 
`invoice_no` and `isbn` attributes – a customer order may involve the purchase of several books.

In [None]:
%%sql
DROP TABLE IF EXISTS order_item_book CASCADE;

CREATE TABLE order_item_book AS
 SELECT DISTINCT invoice_no, isbn, title, quantity, cost
 FROM books_purchased;

SELECT *
FROM order_item_book
ORDER BY invoice_no, isbn;

Notes:
    
Remember that we need to include the `DISTINCT` keyword in the `SELECT` clause in order to achieve the same effect as 
a relational algebra project operation (see Exercise 9.6).

With the repeating group removed to a separate relation, we now consider the original relation after the attributes 
have been removed (the `order_customer` table). As each attribute has a single value for each order, this relation 
is in 1NF.

In [None]:
%%sql
DROP TABLE IF EXISTS order_customer CASCADE;

CREATE TABLE order_customer AS
 SELECT DISTINCT invoice_no, date, customer_no, customer_name
 FROM books_purchased;

SELECT *
FROM order_customer
ORDER BY invoice_no;

As both new relations have an attribute in common, `invoice_no`, the original relation (the `books_purchased` table) 
can be recreated from these relations by performing a join operation on `invoice_no`.

In [None]:
books_purchased = \
 %sql SELECT * \
      FROM books_purchased \
      ORDER BY (invoice_no, isbn)
    
recreated_books_purchased = \
 %sql SELECT invoice_no, date, customer_no, customer_name, isbn, title, quantity, cost \
      FROM order_item_book NATURAL JOIN order_customer \
      ORDER BY invoice_no, isbn
    
books_purchased == recreated_books_purchased

Notes:
    
In the `SELECT` statement that recreates the `books_purchased` table, the `NATURAL JOIN` clause specifies that the 
join is to be performed on the columns that are common to both tables, i.e. `invoice_no`.

## Moving to Second Normal Form (2NF)

In the first of the two 1NF relations shown above (the `order_item_book` table), the combination of 
`invoice_no` and `isbn` attributes together determine the `quantity` attribute, but only `isbn` determines `cost` 
and `title`. Thus, `cost` and `title` are removed from the relation (the `order_item` table), and `isbn`, `cost` and 
`title` form a new relation (the `book` table below), with `isbn` as the primary key.

In [None]:
%%sql
DROP TABLE IF EXISTS order_item CASCADE;

CREATE TABLE order_item AS
 SELECT DISTINCT invoice_no, isbn, quantity
 FROM order_item_book;

SELECT *
FROM order_item
ORDER BY invoice_no, isbn;

In [None]:
%%sql
DROP TABLE IF EXISTS book CASCADE;

CREATE TABLE book AS
 SELECT DISTINCT isbn, title, cost
 FROM order_item_book;

SELECT *
FROM book
ORDER BY isbn;

As both new relations have an attribute in common, `isbn`, the original relation can be recreated from these 
relations by performing a join operation on `isbn`.

In [None]:
order_item_book = \
 %sql SELECT * \
      FROM order_item_book \
      ORDER BY invoice_no, isbn

recreated_order_item_book = \
 %sql SELECT invoice_no, isbn, title, quantity, cost \
      FROM order_item NATURAL JOIN book \
      ORDER BY invoice_no, isbn

order_item_book == recreated_order_item_book

## Moving to Third Normal Form (3NF)

In the second of the two 1NF relations shown above (the `order_customer` table), the `date` and `customer_no` 
attributes are all directly dependent on `invoice_no`, but `customer_name` is directly dependent on `customer_no`, 
not `invoice_no`. Therefore we create a new relation (the `customer` table) from `customer_no` and `customer_name` 
where `customer_no` is the primary key. The `customer_no` remains in the original relation (the `order` table), 
as its value is determined by `invoice_no`, and where it acts as a foreign key referencing the new relation.

In [None]:
%%sql
DROP TABLE IF EXISTS orders CASCADE;

CREATE TABLE orders AS
 SELECT DISTINCT invoice_no, date, customer_no
 FROM order_customer;

SELECT *
FROM orders
ORDER BY invoice_no;

Notes:
    
As `ORDER` is a reserved word in SQL, the `order` table has been named `orders`.

In [None]:
%%sql
DROP TABLE IF EXISTS customer CASCADE;

CREATE TABLE customer AS
 SELECT DISTINCT customer_no, customer_name
 FROM order_customer;

SELECT *
FROM customer
ORDER BY customer_no;

As both new relations have an attribute in common, `customer_no`, the original relation can be recreated from these 
relations by performing a join operation on `customer_no`.

In [None]:
order_customer = \
 %sql SELECT * \
      FROM order_customer \
      ORDER BY invoice_no

recreated_order_customer = \
 %sql SELECT invoice_no, date, customer_no, customer_name \
      FROM orders NATURAL JOIN customer \
      ORDER BY invoice_no

order_customer == recreated_order_customer

## Normalised relations

The final set of relations (tables) is as follows:

In [None]:
%%sql
SELECT *
FROM orders
ORDER BY invoice_no;

The `customer_no` attribute is a foreign key referencing the `customer` table.

In [None]:
%%sql
SELECT *
FROM customer
ORDER BY customer_no;

In [None]:
%%sql
SELECT *
FROM order_item
ORDER BY invoice_no, isbn;

The `invoice_no` attribute is a foreign key referencing the `invoice` table, and the `isbn` attribute is a foreign key 
referencing the `customer` book.

In [None]:
%%sql
SELECT *
FROM book
ORDER BY isbn;

The original unnormalised relation (`books_purchased` table) can be recreated from the normalised relations 
(`orders`, `customer`, `order-item` and `book` tables) by performing the appropriate join operations via the foreign key columns described above.

In [None]:
books_purchased = \
 %sql SELECT * \
      FROM books_purchased \
      ORDER BY invoice_no, isbn
    
recreated_books_purchased = \
 %sql SELECT invoice_no, date, customer_no, customer_name, isbn, title, quantity, cost \
      FROM (((orders NATURAL JOIN order_item) NATURAL JOIN book) NATURAL JOIN customer) \
      ORDER BY invoice_no, isbn
    
books_purchased == recreated_books_purchased

## Summary
In this Notebook you have followed the normalisation of the books purchased data described in Activity 10.2
using SQL to create a set of normalised tables from unnormalised data shown in Figure 10.21, 
which represents the same information as shown in Figure 10.2 but as a relation.

## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to `10.3 Update (modification), deletion and addition (insertion) anomalies`.