# Reasons for normalizing databases

Which of the following are reasons to normalize a database?

- To reduce data duplication
- To increase data consistency
- To improve data organization

# Reducing data redundancy

A previous employee of the Small Business Administration developed an initial version of the database. Location information is utilized throughout the database for borrowers, banks, and projects. Each of the corresponding tables for these entities utilizes city, state, and zip_code columns creating redundant data. It is your responsibility to normalize this location data. You will have the opportunity to put your data normalization knowledge to work for you by creating a place table to consolidate location data.

```
-- Create the place table
CREATE TABLE place (
  -- Define zip_code column
  zip_code CHAR(5) PRIMARY KEY,
  -- Define city column
  city VARCHAR(50) NOT NULL,
  -- Define state column
  state CHAR(2) NOT NULL
);

CREATE TABLE borrower (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  approved BOOLEAN DEFAULT NULL,
  
  -- Remove zip_code column (defined below)
  
  -- Remove city column (defined below)
  
  -- Remove state column (defined below)
  
  -- Add column referencing place table
  place_id CHAR(5) REFERENCES place(zip_code)
);
```

# Improving object-to-data mapping

The Small Business Development Center client table was previously defined without the inclusion of a point of contact for the client. The initial instinct of the database team was to simply add contact_name and contact_email columns to the client table. However, you object to this plan due to your instincts regarding proper data organization. In the future, a contact might be referenced in multiple tables. In this exercise, you will define table structures for the client and contact information that better separates the client and contact objects.

```
-- Create the contact table
CREATE TABLE contact (
  	-- Define the id primary key column
	id SERIAL PRIMARY KEY,
  	-- Define the name column
  	name VARCHAR(50) NOT NULL,
    -- Define the email column
  	email VARCHAR(50) NOT NULL
);

-- Add contact_id to the client table
ALTER TABLE client ADD contact_id INTEGER NOT NULL;

-- Add a FOREIGN KEY constraint to the client table
ALTER TABLE client ADD CONSTRAINT fk_c_id FOREIGN KEY (contact_id) REFERENCES contact(id);
```

# Simplifying database records

One teacher from the high school heard rumblings about efforts to better organize student records. He would like to organize student grades in his courses. The teacher proposes the following table structure for the test_grades table:
```
CREATE TABLE test_grades (
    student_id INTEGER NOT NULL,
    course_name VARCHAR(50) NOT NULL,
    grades TEXT NOT NULL
);
```
Each record represents a student from one of the teacher's classes identified by the student's id, the course name, and the student's test grades. The teacher finds that managing the database with this structure is difficult. Inserting new grades requires a complex query. In addition, doing calculations on the grades is not very easy. In this exercise, you will help to put this table in 1st Normal Form (1NF).

```
-- Create the test_grade table
CREATE TABLE test_grade (
    -- Include a column for the student id
	student_id INTEGER NOT NULL,
  
  	-- Include a column for the course name
    course_name  VARCHAR(50) NOT NULL,
  
  	-- Add a column to capture a single test grade
    grade  NUMERIC NOT NULL
);
```

# Too much normalization

Recall the definition of the loan table.
```
CREATE TABLE loan (
    borrower_id INTEGER REFERENCES borrower(id),
    bank_id INTEGER REFERENCES bank(id),
    approval_date DATE NOT NULL DEFAULT CURRENT_DATE,
    gross_approval DECIMAL(9, 2) NOT NULL,
    term_in_months SMALLINT NOT NULL,
    revolver_status BOOLEAN NOT NULL DEFAULT FALSE,
    initial_interest_rate DECIMAL(4, 2) NOT NULL
);
```
A new design for this table has been suggested to satisfy 1NF. The revised table definition replaces approval_date with approval_month, approval_day, and approval_year:
```
CREATE TABLE loan (
    ...
    approval_month SMALLINT,
    approval_day SMALLINT,
    approval_year SMALLINT,
    ...
);
```
This exercise demonstrates how too much normalization can allow for the insertion of invalid data.

```
INSERT INTO loan (
  	borrower_id, bank_id, approval_month, approval_day,
  	approval_year, gross_approval, term_in_months,
  	revolver_status, initial_interest_rate
) VALUES (12, 14, 12, 1, 2013, 421115, 120, false, 4.42);



INSERT INTO loan (
  	borrower_id, bank_id, approval_month, approval_day,
  	approval_year, gross_approval, term_in_months,
  	revolver_status, initial_interest_rate
) VALUES (19, 5, 8, 19, 2018, 200000, 120, false, 6.3);
```

# Designing a course table

The school's administration decides to use its database to store course details. Given that this is the first attempt at building the database, they are unsure of which columns to include in the course table. Below is a list of possible columns and a description of the data type for each. In this exercise, you will choose the appropriate columns for this table from the list of possible column choices:

- id - a PRIMARY KEY for the course
- name - a variable length (max 50, not NULL) string for the course name
- meeting_time - a time representing the meeting time of the course
- student_name - a variable length (max 50, not NULL) string representing an enrolled student
- max_students - an integer for maximum student enrollment (classrooms can only fit 30 desks safely)

```
-- Create the course table
CREATE TABLE course (
    -- Add a column for the course table
	id SERIAL PRIMARY KEY,
  
  	-- Add a column for the course table
  	name VARCHAR(50) NOT NULL,
  
  	-- Add a column for the course table
  	max_students SMALLINT
);
```

# Streamlining meal options

The cafeteria staff hears about all of the great work happening at the high school to organize data for important aspects of school operations. This group now wants to join these efforts. In particular, the staff wants to keep track of the different meal options that are available throughout the school year. With the help of the IT staff, the following table is defined for this purpose:
```
CREATE TABLE meal (
    id INTEGER,
    name VARCHAR(50) NOT NULL
    ingredients VARCHAR(150), -- comma seperated list
    avg_student_rating NUMERIC,
    date_served DATE,
    total_calories SMALLINT NOT NULL
);
```
Using your knowledge of database normalization, you will provide a better design for the meal table.

```
CREATE TABLE ingredient (
  -- Add PRIMARY KEY for table
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL
);

CREATE TABLE meal (
    -- Make id a PRIMARY KEY
	id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL,

	-- Remove the 2 columns (below) that do not satisfy 2NF
    avg_student_rating NUMERIC,
    total_calories SMALLINT NOT NULL
);

CREATE TABLE meal_date (
    -- Define a column referencing the meal table
  	meal_id INTEGER REFERENCES meal(id),
    date_served DATE NOT NULL
);

CREATE TABLE meal_ingredient (
  	meal_id INTEGER REFERENCES meal(id),
  
    -- Define a column referencing the ingredient table
    ingredient_id INTEGER REFERENCES ingredient(id)
);


```

# Identifying transitive dependencies

Imagine that a nation-wide database of schools exists. Someone who is unfamiliar with database normalization proposes the following structure for the school table:
```
CREATE TABLE school (
    id serial PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    street_address VARCHAR(100) NOT NULL,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL,
    zip_code INTEGER NOT NULL
)
```
Identify the transitive dependency introduced by this table definition.

- `zip_code` determines `city` and `state`.

# Table definitions for 3rd Normal Form

Recall the definition of the school table from the previous exercise:
```
CREATE TABLE school (
    id serial PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    street_address VARCHAR(100) NOT NULL,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL,
    zip_code INTEGER NOT NULL
)
```
We can define a new table called zip to help satisfy 3rd Normal Form.

# Table definitions for 3rd Normal Form

Recall the definition of the school table from the previous exercise:
```
CREATE TABLE school (
    id serial PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    street_address VARCHAR(100) NOT NULL,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL,
    zip_code INTEGER NOT NULL
)
```
We can define a new table called zip to help satisfy 3rd Normal Form.

```
-- Complete the definition of the table for zip codes
CREATE TABLE zip (
	code INTEGER PRIMARY KEY,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL
);

-- Complete the definition of the "zip_code" column
CREATE TABLE school (
	id serial PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    street_address VARCHAR(100) NOT NULL,
    zip_code INTEGER REFERENCES zip(code)
);
```

# Working through the normalization process

Table normalization is an important action to undertake prior to creation of a new database to ensure that data redundancy is reduced and the integrity of your data is properly managed.

You will be normalizing database tables related to the Small Business Association loan program:

- a borrower table will be altered to satisfy the requirements for 1st Normal Form (1NF)
- a bank and a loan table will be altered to satisfy the requirements for 2nd Normal Form (2NF)
- the loan table will be altered again to satisfy the requirements for 3rd Normal Form (3NF)

The borrower table is not in 1NF.
```
CREATE TABLE borrower (
    id serial PRIMARY KEY,
    full_name VARCHAR (100) NOT NULL
);
```

Resolving 1NF:

```
-- Add new columns to the borrower table
ALTER TABLE borrower
ADD COLUMN first_name VARCHAR (50) NOT NULL,
ADD COLUMN last_name VARCHAR (50) NOT NULL;

-- Remove column from borrower table to satisfy 1NF
ALTER TABLE borrower
DROP COLUMN full_name ;
```

The loan table contains a `bank_zip` column. The bank table is defined below:

```
CREATE TABLE bank (
    id serial PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);
```

Resolving 2NF:

```
-- Add a new column named 'zip' to the 'bank' table 
ALTER TABLE bank
ADD COLUMN zip VARCHAR(10) NOT NULL;

-- Remove corresponding column from 'loan' to satisfy 2NF
ALTER TABLE loan
DROP COLUMN bank_zip;
```

Let's also track the type of program for the loan. Create a new table named program that will store program records consisting of a id, description, and max_amount columns.

### Resolving 3NF: Creating New Column

```
-- Define 'program' table with max amount for each program
CREATE TABLE program (
  	id serial PRIMARY KEY,
  	description text NOT NULL,
  	max_amount DECIMAL(9,2) NOT NULL
);
```

### Resolving 3NF: Removing Transitive Dependency from a Table

The max_amount of a loan depends only on the loan's program. The max_amount of a loan can be determined using a foreign key reference to the program table, program_id, removing the need for the program column. Alter loan to satisfy 3NF.

```
-- Alter the 'loan' table to satisfy 3NF
ALTER TABLE loan
ADD COLUMN program_id INTEGER REFERENCES program (id), 
DROP COLUMN program,
DROP COLUMN max_amount;
```