# <a name="top"></a>Normalisation - Hospital: 3 hours (1 hour per NF, includes referring back to 10.2)

In this notebook, you will practise normalisation by normalising the hospital data from the TM351 running example.

You should make frequent reference to Notebook 10.2 for examples of how to perform each step. When you've finished this notebook, you should have something like Notebook 10.2 replicated here.

This notebook will take quite a while to work through. There's no need to do it all in one go. Convenient places to stop are after completing each normal form, but you could also pause after creating each new table at each normalised form.

* [Moving to first normal form (1NF)](#1nf)
* [Moving to second normal form (2NF)](#2nf)
* [Moving to third normal form (3NF)](#3nf)

## The data


This is the data you will be normalising. Your task is to move this data from the unnormalised form given below into a collection of relations in 3NF, implemented as a collection of PostgreSQL tables.

You should refer to notebook *10.2 Normalisation - Antique Opticals* for examples of how to carry out each of the steps.

An example form which is the source of the data is shown below.

<img src="images/tm351-patient_record.png" alt="Drawing" style="width: 75%;"/>

The functional dependencies in this example are:

| This attribute | functionally defines this attribute |
| ------------- |:------------- |
| `patient_id`  | `patient_name` |
| `patient_id`  | `doctor_id` |
| `doctor_id`   | `doctor_name`  |
| `drug_code`   |  `drug_name`   |
| (`patient_id`, `prescribing_doctor_id`, `drug_code`, `date`) | `dosage`   |
| (`patient_id`, `prescribing_doctor_id`, `drug_code`, `date`) | `duration` |

You should use the same data as in Notebook 10.1, in which we imported the unnormalised data from the csv file `unnormalised_prescription.csv`, as:

In [None]:
!head unnormalised_prescription.csv

We have not included solutions in this notebook: for our solution, you should look in notebook *10.4 Our solution to Normalisation - the Hospital scenario*.

### When things go wrong

You will almost certainly make mistakes during the process of working through this notebook. When you do, just clear out the database and repeat the steps you know work.

To clear out the database, re-run the database cleanup cell (making sure you have an active connection):


## Setting up

The next group of cells set up your database connection, and reset the database to a clean state. Check notebook *8.1 Data Definition Language in SQL* if you are unsure what the next cells do.

You may need to change the given values of the variables `DB_USER` and `DB_PWD`, depending on which environment you are using

In [None]:
# Make the connection

%run sql_init.ipynb
print("Connecting with connection string : {}".format(DB_CONNECTION))
%sql $DB_CONNECTION

In [None]:
%run reset_databases.ipynb

# Load the data

In [None]:
prescriptions_detail = pd.read_csv('unnormalised_prescription.csv', parse_dates=['date'])
prescriptions_detail

# <a name="1nf"></a> Moving from unnormalised data to first formal form (1NF)
* [Top](#top)

Convert the data above into one or more relations, each in 1NF. Verify that the normalised tables accurately represent the original data. 

One relation should use `patient_id` as its primary key.

----

Remember the mantra: to be in third normal form, 
> attributes must be dependent on the key, the whole key, and nothing but the key.

Where there are multiple values of an attribute for a key, a _repeating group_, we need to extract the repeating values into a new relation.

More formally, **a relation in 1NF has no repeating groups**.

In [None]:
# Write your code to convert the file into 1NF here.
# You will need more than one cell to do it. 

# <a name="2nf"></a>Moving to second normal form (2NF)
* [Top](#top)

Convert the 1NF tables you created above into a collection of relations, implemented as Postresql tables, each in 2NF. 

The functional dependencies in this example are:

| This attribute | functionally defines this attribute |
| ------------- |:------------- |
| `patient_id`  | `patient_name` |
| `patient_id`  | `doctor_id` |
| `doctor_id`   | `doctor_name`  |
| `drug_code`   |  `drug_name`   |
| (`patient_id`, `prescribing_doctor_id`, `drug_code`, `date`) | `dosage`   |
| (`patient_id`, `prescribing_doctor_id`, `drug_code`, `date`) | `duration` |

----

To reiterate, to be in third normal form, 
> attributes must be dependent on the key, the whole key, and nothing but the key.

Formally, **a relation in 2NF has all attributes functionally dependent on the whole of the primary key**.


In [None]:
# Write your code to convert the database into 2NF here.
# You will need more than one cell to do it. 

# <a name="3nf"></a>Moving to Third Normal Form (3NF)
* [Top](#top)


Convert the 2NF tables you created above into a collection of relations, implemented as Postresql tables, each in 3NF. 

When you've finished, include all the foreign key constraints to connect the various tables.

As a reminder, the functional dependencies in this example are:

| This attribute | functionally defines this attribute |
| ------------- |:------------- |
| `patient_id`  | `patient_name` |
| `patient_id`  | `doctor_id` |
| `doctor_id`   | `doctor_name`  |
| `drug_code`   |  `drug_name`   |
| (`patient_id`, `prescribing_doctor_id`, `drug_code`, `date`) | `dosage`   |
| (`patient_id`,`prescribing_doctor_id`,  `drug_code`, `date`) | `duration` |

----

To be in third normal form, 
> attributes must be dependent on the key, the whole key, and nothing but the key.

To move to third normal form (3NF), we have to ensure the first clause of that mantra: each attribute is directly functionally dependent on the key, and not functionally dependent on any other attribute. As before, we ensure this is true by splitting relations as necessary, while ensuring that all relations remain in 2NF (and hence also in 1NF). 

Formally, **a relation in 3NF has all attributes _directly_ functionally dependent on the whole of the primary key**.

In [None]:
# Write your code to convert the database into 3NF here.
# You will need more than one cell to do it. 

## Review

Compare the relations you have created here to the entities for the prescription example from Part 8. Have you generated the same relations? Have you generated the same relationships between the relations, as shown by the foreign keys?

What does this tell you about the role normalisation should play in structuring data systems?

## What next?

If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to notebook *10.5 Improvements with normalised data*.