# Update (modification), deletion and addition (insertion) anomalies
## Solution Notebook

This is a set of worked solutions to the `10.3 Update (modification), deletion and addition (insertion) anomalies` Notebook.
The solutions given here should be viewed as guide only: other equally acceptable solutions may be possible.

Enable access to the PostgreSQL database engine via [SQL Cell Magic](https://pypi.python.org/pypi/ipython-sql).

In [None]:
%load_ext sql
%sql postgresql://test:test@localhost:5432/tm351test

Define and populate tables.

In [None]:
%%sql
-- unnormalised data
DROP TABLE IF EXISTS books_purchased CASCADE;

CREATE TABLE books_purchased (
 invoice_no CHAR(8) NOT NULL,
 date DATE NOT NULL,
 customer_no CHAR(6) NOT NULL,
 customer_name VARCHAR(25) NOT NULL,
 isbn CHAR(14) NOT NULL,
 title VARCHAR(100) NOT NULL,
 quantity INTEGER NOT NULL,
 cost DECIMAL(5,2) NOT NULL,
 PRIMARY KEY (invoice_no, isbn)
);

-- normalised data
DROP TABLE IF EXISTS order_item;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS book;
DROP TABLE IF EXISTS customer;

CREATE TABLE book (
 isbn CHAR(14) NOT NULL,
 title VARCHAR(100) NOT NULL,
 cost DECIMAL(5,2) NOT NULL,
 PRIMARY KEY (isbn)
);

CREATE TABLE customer (
 customer_no CHAR(6) NOT NULL,
 customer_name VARCHAR(25) NOT NULL,
 PRIMARY KEY (customer_no)
);

-- Note: as ORDER is a reserved word in SQL, calling the table 'orders' instead.
CREATE TABLE orders (
 invoice_no CHAR(8) NOT NULL,
 date DATE NOT NULL,
 customer_no CHAR(6) NOT NULL,
 PRIMARY KEY (invoice_no),
 FOREIGN KEY (customer_no) REFERENCES customer(customer_no)
);

CREATE TABLE order_item (
 invoice_no CHAR(8) NOT NULL,
 isbn CHAR(14) NOT NULL,
 quantity INTEGER NOT NULL,
 PRIMARY KEY (invoice_no, isbn),
 FOREIGN KEY (invoice_no) REFERENCES orders(invoice_no),
 FOREIGN KEY (isbn) REFERENCES book(isbn)
);

Populate the tables from CSV files using [Psycopg](http://initd.org/psycopg/docs/index.html), 
a PostgreSQL database adapter for Python.

In [3]:
import psycopg2 as pg
import pandas as pd
import pandas.io.sql as psqlg

In [4]:
# open a connection to the PostgreSQL database tm351test
conn = pg.connect(dbname='tm351test', host='localhost', user='test', password='test', port=5432)
# create a cursor
c = conn.cursor()

# populate 'books_purchased' table
io = open('data/books_purchased.dat', 'r')
c.copy_from(io, 'books_purchased')
io.close()
c.execute("COMMIT")

# populate 'customer' table
io = open('data/customer.dat', 'r')
c.copy_from(io, 'customer')
io.close()
conn.commit()

# populate 'book' table
io = open('data/book.dat', 'r')
c.copy_from(io, 'book')
io.close()
conn.commit()

# populate 'orders' table
io = open('data/orders.dat', 'r')
c.copy_from(io, 'orders')
io.close()
conn.commit()

# populate 'order_item' table
io = open('data/order_item.dat', 'r')
c.copy_from(io, 'order_item')
io.close()
conn.commit()

# close cursor
c.close()
# close database connection
conn.close()

In [None]:
%%sql
SELECT *
FROM books_purchased
ORDER BY invoice_no, isbn;

In [None]:
%%sql
SELECT *
FROM customer
ORDER BY customer_no;

In [None]:
%%sql
SELECT *
FROM book
ORDER BY isbn;

In [None]:
%%sql
SELECT *
FROM orders
ORDER BY invoice_no;

In [None]:
%%sql
SELECT *
FROM order_item
ORDER BY invoice_no, isbn;

## Activity

Give example SQL `UPDATE`, `DELETE` and `INSERT` statements that will result in an update (modification), deletion and addition (insertion) anomaly respectively with the `books_purchased` table because the data are unnormalised.

### Update (modification) anomaly 

An update (modification) anomaly results in data inconsistency because of possible partial update instead of the 
proper complete update (See Ponniah (2003) Chapter 10, Update Anomaly, p. 307.)
 
For example, the SQL `UPDATE` statement below is an attempt to associate a particular order with another customer, but it results in an update (modification) anomaly because it only changes the customer number (`customer_no`) and not 
the customer’s name (`customer_name`). This results in the customer identified by the customer number of '234678' 
being called both ‘Dimity Stone’ and ‘Roger Monk’ in the relation.

In [1]:
%%sql
UPDATE books_purchased
 SET customer_no = '234678'
 WHERE invoice_no = '00966047';

ERROR: Cell magic `%%sql` not found.


In [None]:
%%sql
SELECT *
FROM books_purchased
ORDER BY invoice_no, isbn;

With the normalised data, updating the `orders` table achieves the required result, as we illlustrate below by 
recreating the `books_purchased` table from the normalised tables.

In [None]:
%%sql
UPDATE orders 
 SET customer_no = '234678' 
 WHERE invoice_no = '00966047';

In [None]:
%%sql
SELECT invoice_no, date, customer_no, customer_name, isbn, title, quantity, cost
FROM (((orders NATURAL JOIN order_item)
               NATURAL JOIN book)
               NATURAL JOIN customer)
ORDER BY invoice_no, isbn;

Note that the unnormalised form of the data above does not include data about customer '123789' who now has not 
purchased any books.

We can include customers who have not purchased any books, and books that have not been purchased by any customers, by using `FULL OUTER JOIN` clauses on the `customer` and `book` tables respectively. 
(You were introduced to *outer joins* in the `03.3 Combining data from multiple datasets` Notebook)

In [None]:
%%sql
SELECT invoice_no, date, customer.customer_no, customer_name, book.isbn, title, quantity, cost
FROM (((orders NATURAL JOIN order_item)
               FULL OUTER JOIN book ON order_item.isbn = book.isbn)
               FULL OUTER JOIN customer ON orders.customer_no = customer.customer_no)
ORDER BY invoice_no, isbn;

### Deletion anomaly

A deletion anomaly results in unintended loss of data because of possible deletion of data other than what must be 
deleted (See Ponniah (2003) Chapter 10, Deletion Anomaly, p. 307.)

For example, the SQL `DELETE` statement below is intended to delete a particular book purchase from an order,
but it results in a deletion anomaly because as this book has only been ordered by this order, all the details of 
this book – `ISBN`, `title` and `cost` – are deleted from this relation.

In [None]:
%%sql
DELETE FROM books_purchased 
 WHERE invoice_no = '00966047' AND isbn = '978-1558604568';

In [None]:
%%sql
SELECT *
FROM books_purchased
ORDER BY invoice_no, isbn;

With the normalised data, updating the `order_item` table achieves the required result.

In [None]:
%%sql
DELETE FROM order_item 
 WHERE invoice_no = '00966047' AND isbn = '978-1558604568';

In [None]:
%%sql
SELECT invoice_no, date, customer.customer_no, customer_name, book.isbn, title, quantity, cost
FROM (((orders NATURAL JOIN order_item)
               FULL OUTER JOIN book ON order_item.isbn = book.isbn)
               FULL OUTER JOIN customer ON orders.customer_no = customer.customer_no)
ORDER BY invoice_no, isbn;

### Addition (insertion) anomaly 

An addition (insertion) anomaly results in an inability to add data to the database because of the absence of some 
data not presently available (See Ponniah (2003) Chapter 10, Addition Anomaly, p. 308.)
 
For example, the SQL `INSERT` statement below is an attempt to add a new customer to the relation who has as yet not 
ordered any books, but it results in an addition (insertion) anomaly because a tuple (row) cannot be added to the 
relation without values for the primary key attributes (`invoice_no`, `isbn`).

In [None]:
%%sql
INSERT INTO books_purchased(customer_no, customer_name)
 VALUES('346781', 'John Urquhart');

With the normalised data, updating the `customer` table achieves the required result.

In [None]:
%%sql
INSERT INTO customer(customer_no, customer_name) 
 VALUES('346781', 'John Urquhart');

In [None]:
%%sql
SELECT invoice_no, date, customer.customer_no, customer_name, book.isbn, title, quantity, cost
FROM (((orders NATURAL JOIN order_item)
               FULL OUTER JOIN book ON order_item.isbn = book.isbn)
               FULL OUTER JOIN customer ON orders.customer_no = customer.customer_no)
ORDER BY invoice_no, isbn;