# Update (modification), deletion and addition (insertion) anomalies

In this Notebook you will explore the problems associated with unnormalised data and how these problems are resolved by 
normalising the data.

We will compare the unnormalised and normalised forms of the book purchases data (see Activity 10.2):
* unnormalised data - `books_purchased` table
* normalised data - `orders`, `order_item`, `book` and `customer` tables.

Enable access to the PostgreSQL database engine via [SQL Cell Magic](https://pypi.python.org/pypi/ipython-sql).

In [1]:
%load_ext sql
%sql postgresql://test:test@localhost:5432/tm351test

'Connected: test@tm351test'

Define and populate tables.

In [2]:
%%sql
-- unnormalised data
DROP TABLE IF EXISTS books_purchased CASCADE;

CREATE TABLE books_purchased (
 invoice_no CHAR(8) NOT NULL,
 date DATE NOT NULL,
 customer_no CHAR(6) NOT NULL,
 customer_name VARCHAR(25) NOT NULL,
 isbn CHAR(14) NOT NULL,
 title VARCHAR(100) NOT NULL,
 quantity INTEGER NOT NULL,
 cost DECIMAL(5,2) NOT NULL,
 PRIMARY KEY (invoice_no, isbn)
);

-- normalised data
DROP TABLE IF EXISTS order_item;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS book;
DROP TABLE IF EXISTS customer;

CREATE TABLE book (
 isbn CHAR(14) NOT NULL,
 title VARCHAR(100) NOT NULL,
 cost DECIMAL(5,2) NOT NULL,
 PRIMARY KEY (isbn)
);

CREATE TABLE customer (
 customer_no CHAR(6) NOT NULL,
 customer_name VARCHAR(25) NOT NULL,
 PRIMARY KEY (customer_no)
);

-- Note: as ORDER is a reserved word in SQL, calling the table 'orders' instead.
CREATE TABLE orders (
 invoice_no CHAR(8) NOT NULL,
 date DATE NOT NULL,
 customer_no CHAR(6) NOT NULL,
 PRIMARY KEY (invoice_no),
 FOREIGN KEY (customer_no) REFERENCES customer(customer_no)
);

CREATE TABLE order_item (
 invoice_no CHAR(8) NOT NULL,
 isbn CHAR(14) NOT NULL,
 quantity INTEGER NOT NULL,
 PRIMARY KEY (invoice_no, isbn),
 FOREIGN KEY (invoice_no) REFERENCES orders(invoice_no),
 FOREIGN KEY (isbn) REFERENCES book(isbn)
);

Done.
Done.
Done.
Done.
Done.
Done.
Done.
Done.
Done.
Done.


[]

Populate the tables from CSV files using [Psycopg](http://initd.org/psycopg/docs/index.html), 
a PostgreSQL database adapter for Python.

In [3]:
import psycopg2 as pg
import pandas as pd
import pandas.io.sql as psqlg

In [4]:
# open a connection to the PostgreSQL database tm351test
conn = pg.connect(dbname='tm351test', host='localhost', user='test', password='test', port=5432)
# create a cursor
c = conn.cursor()

# populate 'books_purchased' table
io = open('data/books_purchased.dat', 'r')
c.copy_from(io, 'books_purchased')
io.close()
c.execute("COMMIT")

# populate 'customer' table
io = open('data/customer.dat', 'r')
c.copy_from(io, 'customer')
io.close()
conn.commit()

# populate 'book' table
io = open('data/book.dat', 'r')
c.copy_from(io, 'book')
io.close()
conn.commit()

# populate 'orders' table
io = open('data/orders.dat', 'r')
c.copy_from(io, 'orders')
io.close()
conn.commit()

# populate 'order_item' table
io = open('data/order_item.dat', 'r')
c.copy_from(io, 'order_item')
io.close()
conn.commit()

# close cursor
c.close()
# close database connection
conn.close()

In [5]:
%%sql
SELECT *
FROM books_purchased
ORDER BY invoice_no, isbn;

8 rows affected.


invoice_no,date,customer_no,customer_name,isbn,title,quantity,cost
966047,2014-07-01,123789,Dimity Stone,978-0071005296,Database System Concepts,10,9.55
966047,2014-07-01,123789,Dimity Stone,978-0130402646,Database System Implementation,10,48.78
966047,2014-07-01,123789,Dimity Stone,978-1292025827,A First Course in Database Systems,10,10.0
966047,2014-07-01,123789,Dimity Stone,978-1558604568,SQL:1999,10,54.99
966047,2014-07-01,123789,Dimity Stone,978-1852330088,A Guided Tour of Relational Databases,10,41.69
966048,2014-07-01,234678,Roger Monk,978-0071005296,Database System Concepts,1,9.55
966048,2014-07-01,234678,Roger Monk,978-0471141617,Building the Data Warehouse,1,9.55
966048,2014-07-01,234678,Roger Monk,978-1558604896,Data Mining: Concepts and Techniques,1,18.55


In [6]:
%%sql
SELECT *
FROM customer
ORDER BY customer_no;

2 rows affected.


customer_no,customer_name
123789,Dimity Stone
234678,Roger Monk


In [7]:
%%sql
SELECT *
FROM book
ORDER BY isbn;

7 rows affected.


isbn,title,cost
978-0071005296,Database System Concepts,9.55
978-0130402646,Database System Implementation,48.78
978-0471141617,Building the Data Warehouse,9.55
978-1292025827,A First Course in Database Systems,10.0
978-1558604568,SQL:1999,54.99
978-1558604896,Data Mining: Concepts and Techniques,18.55
978-1852330088,A Guided Tour of Relational Databases,41.69


In [8]:
%%sql
SELECT *
FROM orders
ORDER BY invoice_no;

2 rows affected.


invoice_no,date,customer_no
966047,2014-07-01,123789
966048,2014-07-01,234678


In [9]:
%%sql
SELECT *
FROM order_item
ORDER BY invoice_no, isbn;

8 rows affected.


invoice_no,isbn,quantity
966047,978-0071005296,10
966047,978-0130402646,10
966047,978-1292025827,10
966047,978-1558604568,10
966047,978-1852330088,10
966048,978-0071005296,1
966048,978-0471141617,1
966048,978-1558604896,1


## Activity

Give example SQL `UPDATE`, `DELETE` and `INSERT` statements that will result in an update (modification), deletion and addition (insertion) anomaly respectively with the `books_purchased` table because the data are unnormalised.

In [11]:
%%sql
UPDATE books_purchased
SET invoice_no = '00966048'
WHERE customer_no = '123789' AND isbn = '978-1292025827'
#moving order item to invoice 00966048 creates an update problem because now we have 2 conflicting sets of customer details!

1 rows affected.


[]

In [13]:
%%sql
DELETE FROM books_purchased
WHERE isbn = '978-1292025827'
#deleting this book from an order creates a problem because now we have deleted the book data as well as the order item!

1 rows affected.


[]

In [15]:
%%sql
INSERT INTO books_purchased
VALUES(null, null, '999999', 'Pablo Toledo', null, null, null, null)
#this fails because you can't add an customer independently of an order

IntegrityError: (psycopg2.IntegrityError) null value in column "invoice_no" violates not-null constraint
DETAIL:  Failing row contains (null, null, 999999, Pablo Toledo, null, null, null, null).
 [SQL: "INSERT INTO books_purchased\nVALUES(null, null, '999999', 'Pablo Toledo', null, null, null, null)"]

Solutions can be found in the `10.3.soln Update (modification), deletion and addition (insertion) anomalies` Notebook, 
but please DO attempt the activity yourself before looking at these solutions.

## Summary
In this Notebook you have explored the problems associated with unnormalised data and how these problems are resolved 
by normalising the data.

## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to `10.4 Normalised v. unnormalised data`.