# <a name="top"></a>Changes with normalised data: 30 minutes

In notebook *10.1 Problems with unnormalised data*, you saw some problems with unnormalised data. In this notebook you will see how these problems don't occur if we have normalised the data. 

This notebook will look at exactly the same problems as in notebook 10.1.

As with notebook 10.1, this notebook uses the `public` schema, so after running the preamble, none of the four tables are in the initial database. We will construct the four normalised tables from the single unnormalised table.

## Setting up

The next group of cells set up your database connection, and reset the database to a clean state. Check notebook *08.1 Data Definition Language in SQL* if you are unsure what the next cells do.

You may need to change the given values of the variables `DB_USER` and `DB_PWD`, depending on which environment you are using

In [None]:
# Make the connection

%run sql_init.ipynb
print("Connecting with connection string : {}".format(DB_CONNECTION))
%sql $DB_CONNECTION

In [None]:
%run reset_databases.ipynb

## Loading the unnormalised data
This is the same process as in notebook 10.1, where we create the database tables from a _**pandas**_ DataFrame.

In [None]:
prescriptions = pd.read_csv('unnormalised_prescription.csv', parse_dates=['date'])
prescriptions

In [None]:
# Define the unnormalised_prescription table

prescriptions.to_sql('unnormalised_prescription',
                     DB_CONNECTION,
                     if_exists='replace',
                     index=False)

In [None]:
%%sql

ALTER TABLE unnormalised_prescription
ADD CONSTRAINT unnormalised_prescription_pk
    PRIMARY KEY (patient_id, prescribing_doctor_id, drug_code, date);

SELECT *
FROM unnormalised_prescription;

Now we can use this table to construct the normalised versions.

First create the `doctor` table:

In [None]:
%%sql

DROP TABLE IF EXISTS doctor;

CREATE TABLE doctor AS
    SELECT DISTINCT doctor_id, doctor_name
    FROM unnormalised_prescription;
    
ALTER TABLE doctor
ADD CONSTRAINT doctor_pk
    PRIMARY KEY (doctor_id);

SELECT *
FROM doctor;

Next, create the `drug` table:

In [None]:
%%sql

DROP TABLE IF EXISTS drug;

CREATE TABLE drug AS
    SELECT DISTINCT drug_code, drug_name
    FROM unnormalised_prescription;
    
ALTER TABLE drug
ADD CONSTRAINT drug_pk
    PRIMARY KEY (drug_code);

SELECT *
FROM drug;

Next, create the `patient` table:

In [None]:
%%sql

DROP TABLE IF EXISTS patient;

CREATE TABLE patient AS
    SELECT DISTINCT patient_id, patient_name, doctor_id
    FROM unnormalised_prescription;
    
ALTER TABLE patient
ADD CONSTRAINT patient_pk
    PRIMARY KEY (patient_id);

ALTER TABLE patient
ADD CONSTRAINT patient_doctor_fk
    FOREIGN KEY (doctor_id) REFERENCES doctor;

SELECT *
FROM patient;

And finally, create the prescription table:

In [None]:
%%sql

DROP TABLE IF EXISTS prescription;

CREATE TABLE prescription AS
    SELECT DISTINCT patient_id, prescribing_doctor_id, drug_code, date, dosage, duration
    FROM unnormalised_prescription;
    
ALTER TABLE prescription
ADD CONSTRAINT prescription_pk
    PRIMARY KEY (patient_id, prescribing_doctor_id, drug_code, date);

ALTER TABLE prescription
ADD CONSTRAINT prescription_patient_fk
    FOREIGN KEY (patient_id) REFERENCES patient;

ALTER TABLE prescription
ADD CONSTRAINT prescription_doctor_fk
    FOREIGN KEY (prescribing_doctor_id) REFERENCES doctor;

ALTER TABLE prescription
ADD CONSTRAINT prescription_drug_fk
    FOREIGN KEY (drug_code) REFERENCES drug;

SELECT *
FROM prescription;

We can see the tables we've defined with the display schema:

In [None]:
%schema --connection_string $DB_CONNECTION

# Insertion anomalies
The composite primary key of `unnormalised_prescription` limits what information can be added to this table. These limits are not present with normalised data.

### Activity 1

Pravastatin is an alternative to simvastatin, treating much the same conditions with much the same doses. The hospital wants to make this available for prescription. 

Add pravastatin, with `drug_code` P1234, to the normalised database.

Inserting into the unnormalised data fails with an integrity error, caused by the `patient_id` and `date` fields being missing from the primary key:

In [None]:
%%sql

INSERT INTO unnormalised_prescription (drug_code, drug_name) 
VALUES ('P1234', 'Pravastatin');

Now try to add pravastatin to the `drug` table in the normalised database.

In [None]:
# Write your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

With normalised data, adding a new drug is as simple as inserting it into the `drug` table.

In [None]:
%%sql

INSERT INTO drug (drug_code, drug_name) 
VALUES ('P1234', 'Pravastatin');

In [None]:
%%sql

SELECT *
FROM drug;

That all worked smoothly. The normalised `drug` table contains information only about drugs. If there is a new drug available, it is simply added to the `drug` table. When a drug is prescribed, the drug information is connected via the `prescription` table's foreign key to `drug`.

#### End of Activity 1

-------------------------------

### Activity 2

A patient by name of Kay, `patient_id` = `p009`, has just arrived in hosptial. Dr James, `doctor_id` = `d07`, is leading Kay's care. Kay has just been admitted, so has received no drugs yet.

Add Kay's details to the database.

Inserting into the unnormalised data fails due to an integrity error:

In [None]:
%%sql

INSERT INTO unnormalised_prescription (patient_id, patient_name, doctor_id, doctor_name) 
VALUES ('p009', 'Kay', 'd07', 'James');

Add Kay's details to the normalised data.

In [None]:
# Write your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

With normalised data, adding a new patient just requires inserting their details into the `patient` table. Their lack of prescriptions is no impediment to recording their existence.

In [None]:
%%sql

INSERT INTO patient (patient_id, patient_name, doctor_id) 
VALUES ('p009', 'Kay', 'd07');

In [None]:
%%sql

SELECT *
FROM patient;

#### End of Activity 2

-------------------------------

## Discussion
These examples show that, now the data is normalised, we can add patients, drugs (and doctors) to the database independently of each other and independently of prescription information. This is because the patients, drugs, doctors, and prescriptions are all stored as separate entities in the database.

When we add a new prescription, the primary key constraint still ensures that the prescription is for a patient and a drug, but prescriptions are not required for drugs or patients to exist.

# Deletion anomalies
When all the data was held in a single table, deletions had unintended consequences. When the data is normalised, deletions of some data have the expected effects.

### Activity 3

1. Find the drug code for omeprazole. 
2. Thornton's record of omeprazole was made in error. Correct the error by removing the record showing Thornton (`patient_id` = `p001`) being prescribed omeprazole on 15 May 2017.
3. Again, find the drug code for Omeprazole. 

Deletion of the unnormalised data has unexpected consequences:

In [None]:
%%sql 

SELECT DISTINCT drug_code 
FROM unnormalised_prescription 
WHERE drug_name = 'Omeprazole';

In [None]:
%%sql

DELETE FROM unnormalised_prescription 
WHERE patient_id = 'p001' 
    AND prescribing_doctor_id='d06'
    AND drug_code = 'O17663' 
    AND date = '2017-05-15';

In [None]:
%%sql 

SELECT DISTINCT drug_code 
FROM unnormalised_prescription 
WHERE drug_name = 'Omeprazole';

This shows that the information about Omeprazole has disappeared from the database, simply because we removed a prescription record. 

Repeat the steps above on the normalised database: look up omeprazole, remove Thornton's prescription, then look up omeprazole again.

In [None]:
# Write your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

In [None]:
%%sql 

SELECT drug_code 
FROM drug 
WHERE drug_name = 'Omeprazole';

In [None]:
%%sql 

DELETE FROM prescription 
WHERE patient_id = 'p001'
    AND prescribing_doctor_id='d06'
    AND drug_code = 'O17663' 
    AND date = '2017-05-15';

In [None]:
%%sql

SELECT drug_code 
FROM drug 
WHERE drug_name = 'Omeprazole';

To check, Thornton's prescription has been deleted from the database but the information about omeprazole is still there.

In [None]:
%%sql 

SELECT * 
FROM prescription 
ORDER BY patient_id, date, drug_code;

#### End of Activity 3

-----------------------------------------------

## Discussion
With unnormalised data, information on drugs (and other entities) could only exist in the context of prescriptions. That meant we could easily lose information about drugs, patients, and doctors if we remove the last reference to them from the prescription table. 

With normalised data,we can delete an incorrect prescription from the `prescription` table without affecting any ohter information in the database, such as information about patients, drugs, or doctors. Again, this is because these now exist in the database as separate entities.

# Update anomalies

### Activity 4

1. Find the drug code for tamsulosin. 
2. Tennent's record of tamsulosin on 19 June was recorded incorrectly. Correct the error by replacing the `drug_code` with the correct one, `P1234`. 
3. Again, find the drug code for tamsulosin. 

Updating the unnormalised data has unexpected consequences. Initially, all is well with a unique drug description.

In [None]:
%%sql 

SELECT DISTINCT drug_code, drug_name 
FROM unnormalised_prescription 
WHERE drug_name = 'Tamsulosin';

Tennent has two prescriptions for Tamsulosin. We change the one on 19 June 2017.

In [None]:
%%sql 

SELECT * 
FROM unnormalised_prescription 
WHERE patient_id = 'p007';

In [None]:
%%sql

UPDATE unnormalised_prescription
SET drug_code = 'P1234'
    WHERE patient_id = 'p007' 
        AND drug_name = 'Tamsulosin' 
        AND date = '2017-06-19';

There are now two codes for the same drug.

In [None]:
%%sql 

SELECT DISTINCT drug_code, drug_name 
FROM unnormalised_prescription 
WHERE drug_name = 'Tamsulosin';

We now have two drug codes for tamsulosin.

In the normalised database, update Tennent's prescription and then find the drug code for tamsulosin.

In [None]:
# Write your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

In [None]:
%%sql 

SELECT drug_code, drug_name 
FROM drug 
WHERE drug_name = 'Tamsulosin';

In [None]:
%%sql

UPDATE prescription
SET drug_code = 'P1234'
    WHERE patient_id = 'p007' 
        AND drug_code = 'T05223' 
        AND date = '2017-06-19';

In [None]:
%%sql 

SELECT drug_code, drug_name 
FROM drug 
WHERE drug_name = 'Tamsulosin';

Good. Following the update, we now have only a single code for Tamsulosin.

As a check, what is Tennent's prescription recorded as?

In [None]:
%%sql 

SELECT * 
FROM prescription, drug
WHERE prescription.drug_code = drug.drug_code
    AND patient_id = 'p007' 
ORDER BY patient_id, date, prescription.drug_code;

The prescription of 19 June is now recorded as being for pravastatin.

#### End of Activity 4

-------------------------------------------------

## Discussion
The problems of inconsistent data have gone away. The drug name is recorded in only one place, so when a reference to a drug changes, the database picks up the information about the new drug. We can make sensible and appropriate changes to the data without causing additional errors. 

## What next?
This Notebook has shown you the benefits of normalising data. As you saw in Notebook 10.1 and again here, the unnormalised database is difficult to use. It cannot store some of the information we need, such as the existence of drugs or patients when they have no prescriptions. 

In contrast, the normalised database allows a much easier manipulation of data. The insertion, deletion, and update anomalies no longer affect this database. We can use this database to effectively record all the different states of information about the hospital prescription system.

If you are working through this Notebook as part of an inline exercise, return to the module materials now.