# Normalisation - drugs prescribed example (optional)

In this Notebook you can follow the normalisation of the drugs prescribed data described in Part 10
using SQL to create a set of normalised tables from unnormalised data shown in Figure 10.4.

Enable access to the PostgreSQL database engine via [SQL Cell Magic](https://pypi.python.org/pypi/ipython-sql).

In [1]:
%load_ext sql
%sql postgresql://test:test@localhost:5432/tm351test

'Connected: test@tm351test'

## 2.1 Unnormalised Form (UNF)

Create the `drugs_prescribed` table which will represent the `drugs_prescribed` relation shown in Figure 10.4.

In [2]:
%%sql
DROP TABLE IF EXISTS drugs_prescribed CASCADE;

CREATE TABLE drugs_prescribed(
 patient_id CHAR(4) NOT NULL,
 patient_name VARCHAR(20) NOT NULL,
 doctor_id CHAR(3) NOT NULL,
 doctor_name VARCHAR(20) NOT NULL,
 date DATE NOT NULL,
 drug_code CHAR(6) NOT NULL, 
 drug_name VARCHAR(20) NOT NULL,
 dosage VARCHAR(20) NOT NULL, 
 duration VARCHAR(20) NOT NULL, 
 PRIMARY KEY(patient_id, date, drug_code)
);

Done.
Done.


[]

Populate the `drugs_prescribed` table from a file named `drugs_prescribed.dat` using 
[Psycopg](http://initd.org/psycopg/docs/index.html), a PostgreSQL database adapter for Python.

In [3]:
import psycopg2 as pg
import pandas as pd
import pandas.io.sql as psqlg

In [4]:
# open a connection to the PostgreSQL database tm351test
conn = pg.connect(dbname='tm351test', host='localhost', user='test', password='test', port=5432)
# create a cursor
c = conn.cursor()
# open drugs_prescribed.dat
io = open('data/drugs_prescribed.dat', 'r')
# execute the PostgreSQL copy command
c.copy_from(io, 'drugs_prescribed')
# close drugs_prescribed.dat
io.close()
# commit transaction
conn.commit()
# close cursor
c.close()
# close database connection
conn.close()

In [5]:
%%sql
SELECT patient_id, patient_name, doctor_id, doctor_name, date, drug_code, drug_name, dosage, duration
FROM drugs_prescribed
ORDER BY patient_id, date, drug_code;

7 rows affected.


patient_id,patient_name,doctor_id,doctor_name,date,drug_code,drug_name,dosage,duration
p001,Thornton,d06,Gibson,2014-05-15,O17663,Omeprazole,40 mg 1 x day,Daily
p001,Thornton,d06,Gibson,2014-05-15,T02378,Tramadol,50 mg 3 x day,As required
p001,Thornton,d06,Gibson,2014-05-23,S33558,Simvastatin,40 mg 1 x day,Daily
p001,Thornton,d06,Gibson,2014-06-15,A12458,Amitriptyline,10 mg 5 x day,As required
p007,Tennent,d07,Paxton,2014-06-01,C31319,Ciprofloxacin,500 mg 2 x day,20 days
p007,Tennent,d07,Paxton,2014-06-01,T05223,Tamsulosin,40 mg 1 x day,20 days
p007,Tennent,d07,Paxton,2014-07-01,S33558,Simvastatin,20 mg 1 x day,6 weeks


## 2.2 Moving to First Normal Form (1NF)

A relation is in **First Normal Form** (**1NF**) if each attribute contains only atomic values, 
that is, it has no repeating groups of values.

To represent the data in 1NF we remove any repeating groups of data to separate relations, and choose a primary key 
for each new relation. A repeating group of data is defined as any attribute or group of attributes that may occur 
with multiple values for a single value of the primary key. 
(See Ponniah (2003) ‘First Normal Form’, pp. 311–12.)

In the unnormalised data above (the `drugs_prescribed` table), there are several values for the 
`date`, `drug_code`, `drug_name`, `dosage` and `duration` attributes for each patient. 
For example, patient p001 has been prescribed Tramadol, Omeprazole, Simvastatin and Amitriptyline. 
These items are a repeating group and are removed to a separate relation (the `patient_prescription` table) using 
the relational algebra project operation. The new relation has a primary key comprising the `patient_id`, `date` and 
`drug_code` attributes – a patient may be prescribed several drugs on the same day or may be prescribed the same 
drug on separate occasions.

In [6]:
%%sql
DROP TABLE IF EXISTS patient_prescription CASCADE;

CREATE TABLE patient_prescription AS
  SELECT DISTINCT patient_id, date, drug_code, drug_name, dosage, duration
  FROM drugs_prescribed;

SELECT *
FROM patient_prescription
ORDER BY patient_id, date, drug_code;

Done.
7 rows affected.
7 rows affected.


patient_id,date,drug_code,drug_name,dosage,duration
p001,2014-05-15,O17663,Omeprazole,40 mg 1 x day,Daily
p001,2014-05-15,T02378,Tramadol,50 mg 3 x day,As required
p001,2014-05-23,S33558,Simvastatin,40 mg 1 x day,Daily
p001,2014-06-15,A12458,Amitriptyline,10 mg 5 x day,As required
p007,2014-06-01,C31319,Ciprofloxacin,500 mg 2 x day,20 days
p007,2014-06-01,T05223,Tamsulosin,40 mg 1 x day,20 days
p007,2014-07-01,S33558,Simvastatin,20 mg 1 x day,6 weeks


Notes:
    
Remember that we need to include the `DISTINCT` keyword in the `SELECT` clause in order to achieve the same effect as 
a relational algebra project operation (see Exercise 9.6).

With the repeating group removed to a separate relation, we now consider the original relation after the attributes 
have been removed (the `patient_doctor` table). 
As each attribute has a single value for each patient, this relation is in 1NF.

In [8]:
%%sql
DROP TABLE IF EXISTS patient_doctor CASCADE;

CREATE TABLE patient_doctor AS
  SELECT DISTINCT patient_id, patient_name, doctor_id, doctor_name
  FROM drugs_prescribed;

SELECT *
FROM patient_doctor
ORDER BY patient_id;

Done.
2 rows affected.
2 rows affected.


patient_id,patient_name,doctor_id,doctor_name
p001,Thornton,d06,Gibson
p007,Tennent,d07,Paxton


As both new relations have an attribute in common, `patient_id`, the original relation (Figure 10.2) 
can be recreated from these relations by performing a join operation on `patient_id`, which will result in the 
unnormalised relation shown in the discussion of Exercise 10.1 (the `drugs_prescribed` table).

In [9]:
drugs_prescribed = \
 %sql SELECT * \
      FROM drugs_prescribed \
      ORDER BY patient_id, date, drug_code
    
recreated_drugs_prescribed = \
 %sql SELECT patient_id, patient_name, doctor_id, doctor_name, date, drug_code, drug_name, dosage, duration \
      FROM patient_prescription NATURAL JOIN patient_doctor \
      ORDER BY patient_id, date, drug_code
    
drugs_prescribed == recreated_drugs_prescribed

7 rows affected.
7 rows affected.


True

Notes:
    
In the `SELECT` statement that recreates the `drugs_prescribed` table, the `NATURAL JOIN` clause specifies that the 
join is to be performed on the columns that are common to both tables, i.e. `patient_id`.

## 2.3 Moving to Second Normal Form (2NF)

A relation is in **Second Normal Form** (**2NF**) if it is in 1NF and every non-primary key attribute of the relation 
is dependent on the whole primary key, that is, without partial key dependencies.

To represent the data in 2NF we remove any attributes that only depend on part of the primary key to separate 
relations, and choose a primary key for each new relation. 
(See Ponniah (2003) ‘Second Normal Form’, pp. 312–14.)

This step only applies to relations that have a **composite primary key**. 
We have to decide whether any attributes in such relations are **functionally dependent** on only part of the 
composite primary key.

For any two attributes A and B, A is functionally dependent on B if and only if:

* For a given value of B there is precisely one associated value of A at any one time.

* For example, `patient_name` is totally dependent on `patient_id` because each patient is given a unique patient identifier.

Another way of describing this is to say that:

* Attribute B determines attribute A.

* For example, `patient_id` determines `patient_name`.

But, the opposite is not true:

* `patient_name` does not determine `patient_id`, as there may be several patients with the same name.

In the first of the two 1NF relations shown above (the `patient_prescription` table), 
the combination of `patient_id`, `date` and `drug_code` attributes together determine the dosage and duration attributes, 
but only `drug_code` determines `drug_name`. Thus, `drug_name` is removed from the relation (the `prescription` table), 
and `drug_code` and `drug_name` form a new relation (the `drug` table), with `drug_code` as the primary key.

In [10]:
%%sql
DROP TABLE IF EXISTS prescription CASCADE;

CREATE TABLE prescription AS
  SELECT DISTINCT patient_id, date, drug_code, dosage, duration
  FROM patient_prescription;

SELECT *
FROM prescription
ORDER BY patient_id, date, drug_code;

Done.
7 rows affected.
7 rows affected.


patient_id,date,drug_code,dosage,duration
p001,2014-05-15,O17663,40 mg 1 x day,Daily
p001,2014-05-15,T02378,50 mg 3 x day,As required
p001,2014-05-23,S33558,40 mg 1 x day,Daily
p001,2014-06-15,A12458,10 mg 5 x day,As required
p007,2014-06-01,C31319,500 mg 2 x day,20 days
p007,2014-06-01,T05223,40 mg 1 x day,20 days
p007,2014-07-01,S33558,20 mg 1 x day,6 weeks


In [11]:
%%sql
DROP TABLE IF EXISTS drug CASCADE;

CREATE TABLE drug AS
  SELECT DISTINCT drug_code, drug_name
  FROM patient_prescription;

SELECT *
FROM drug
ORDER BY drug_code;

Done.
6 rows affected.
6 rows affected.


drug_code,drug_name
A12458,Amitriptyline
C31319,Ciprofloxacin
O17663,Omeprazole
S33558,Simvastatin
T02378,Tramadol
T05223,Tamsulosin


As both new relations have an attribute in common, `drug_code`, the original relation can be recreated from these 
relations by performing a join operation on `drug_code`.

In [12]:
patient_prescription = \
 %sql SELECT * \
      FROM patient_prescription \
      ORDER BY patient_id, date, drug_code

recreated_patient_prescription = \
 %sql SELECT patient_id, date, drug_code, drug_name, dosage, duration \
      FROM prescription NATURAL JOIN drug \
      ORDER BY patient_id, date, drug_code

patient_prescription == recreated_patient_prescription

7 rows affected.
7 rows affected.


True

As the second of the two 1NF relations shown above (the `patient_doctor` table) has a non-composite primary key, 
`patient_id`, it is in 2NF.

## 2.4 Moving to Third Normal Form (3NF)

A relation is in **Third Normal Form** (**3NF**) if it is in 2NF and every non-primary key attribute of the relation 
is wholly dependent on the whole primary key, and not by any non-primary key attribute.

To represent the data in 3NF we remove any attributes that are not directly dependent on the primary key to separate 
relations, and choose a primary key for each new relation. 
(See Ponniah (2003) ‘Third Normal Form’, pp. 314–17.)

This step is similar to the previous one in that we are looking for a functional dependency between attributes within 
a relation. The difference is that here we are looking for attributes that might be dependent on other attributes 
instead of looking for non-primary key attributes that might be dependent on only part of the primary key.

In the second of the two 1NF relations shown above (the `patient_doctor` table), the `patient_name` and `doctor_id` 
attributes are all directly dependent on `patient_id`, but `doctor_name` is directly dependent on `doctor_id` not 
`patient_id`. Therefore, we create a new relation from `doctor_id` and `doctor_name` where `doctor_id` is the 
primary key (the `doctor` table).

The `doctor_id` remains in the original relation, as it records the patient’s doctor, and acts as a foreign key 
referencing the new relation (the `patient` table).

In [13]:
%%sql
DROP TABLE IF EXISTS doctor CASCADE;

CREATE TABLE doctor AS
  SELECT DISTINCT doctor_id, doctor_name
  FROM patient_doctor;

SELECT *
FROM doctor
ORDER BY doctor_id;

Done.
2 rows affected.
2 rows affected.


doctor_id,doctor_name
d06,Gibson
d07,Paxton


In [14]:
%%sql
DROP TABLE IF EXISTS patient CASCADE;

CREATE TABLE patient AS
  SELECT DISTINCT patient_id, patient_name, doctor_id
  FROM patient_doctor;

SELECT *
FROM patient
ORDER BY patient_id;

Done.
2 rows affected.
2 rows affected.


patient_id,patient_name,doctor_id
p001,Thornton,d06
p007,Tennent,d07


As both new relations have an attribute in common, `doctor_id`, the original relation 
(the `patient_doctor` table) can be recreated from these relations by performing a join operation on `doctor_id`.

In [15]:
patient_doctor = \
 %sql SELECT * \
      FROM patient_doctor \
      ORDER BY patient_id

recreated_patient_doctor = \
 %sql SELECT patient_id, patient_name, doctor_id, doctor_name \
      FROM doctor NATURAL JOIN patient \
      ORDER BY patient_id

patient_doctor == recreated_patient_doctor

2 rows affected.
2 rows affected.


True

## 2.5 Normalised relations

The final set of relations (tables) is as follows:

In [16]:
%%sql
SELECT *
FROM patient
ORDER BY patient_id;

2 rows affected.


patient_id,patient_name,doctor_id
p001,Thornton,d06
p007,Tennent,d07


Notes:
    
The `doctor_id` column is a foreign key referencing the `doctor` table (see Figure 10.11).

In [17]:
%%sql
SELECT *
FROM doctor
ORDER BY doctor_id;

2 rows affected.


doctor_id,doctor_name
d06,Gibson
d07,Paxton


In [18]:
%%sql
SELECT *
FROM prescription
ORDER BY patient_id, date, drug_code;

7 rows affected.


patient_id,date,drug_code,dosage,duration
p001,2014-05-15,O17663,40 mg 1 x day,Daily
p001,2014-05-15,T02378,50 mg 3 x day,As required
p001,2014-05-23,S33558,40 mg 1 x day,Daily
p001,2014-06-15,A12458,10 mg 5 x day,As required
p007,2014-06-01,C31319,500 mg 2 x day,20 days
p007,2014-06-01,T05223,40 mg 1 x day,20 days
p007,2014-07-01,S33558,20 mg 1 x day,6 weeks


Notes:
    
The `patient_id` column is a foreign key referencing the `patient` table, and the `drug_code` column is a foreign key 
referencing the `drug` table.

In [19]:
%%sql
SELECT *
FROM drug
ORDER BY drug_code;

6 rows affected.


drug_code,drug_name
A12458,Amitriptyline
C31319,Ciprofloxacin
O17663,Omeprazole
S33558,Simvastatin
T02378,Tramadol
T05223,Tamsulosin


The original unnormalised relation (`drugs_prescribed` table) can be recreated from the normalised realtions 
(`patient`, `doctor`, `prescription` and `drug` tables) by performing the appropriate join operations via the 
foreign key columns described above.

In [20]:
drugs_prescribed = \
 %sql SELECT * \
      FROM drugs_prescribed \
      ORDER BY patient_id, date, drug_code
    
recreated_drugs_prescribed = \
 %sql SELECT patient_id, patient_name, doctor_id, doctor_name, date, drug_code, drug_name, dosage, duration \
      FROM (((doctor NATURAL JOIN patient) NATURAL JOIN prescription) NATURAL JOIN drug) \
      ORDER BY patient_id, date, drug_code
    
drugs_prescribed == recreated_drugs_prescribed

7 rows affected.
7 rows affected.


True

## Summary
In this Notebook you have followed the normalisation of the drugs prescribed data described in
Part 10 using SQL to create a set of normalised tables from unnormalised data shown in Figure 10.4.

## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to `10.2 Normalisation - book purchases example`.