# Database Normalization

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing a database into smaller tables and defining relationships between them.

In [1]:
# Import SQLite library
import sqlite3

# Create an in-memory SQLite database
connection = sqlite3.connect(':memory:')
cursor = connection.cursor()

## Why is Normalization Important?

- **Reduces Redundancy**: Eliminates duplicate data, saving storage space.
- **Improves Query Performance**: Simplifies queries and enhances performance.
- **Minimizes Update Anomalies**: Ensures data consistency during updates.
- **Enhances Data Integrity**: Maintains accurate and consistent data.

## Levels of Normalization

### First Normal Form (1NF)
- Ensures each column contains atomic values.
- Each column must have a unique name.
- The order of data does not matter.

### Second Normal Form (2NF)
- Builds on 1NF.
- Eliminates partial dependencies.
- Non-key attributes depend only on the primary key.

### Third Normal Form (3NF)
- Builds on 2NF.
- Removes transitive dependencies.
- Non-key attributes depend only on the primary key.

## First Normal Form (1NF)

A table is in 1NF if it contains only atomic (indivisible) values and each column contains values of a single type.

### Table Violating 1NF
| StudentID | Name  | Subjects         |
|-----------|-------|------------------|
| 1         | Alice | Math, Science    |
| 2         | Bob   | History, Literature |

**Why it violates 1NF:** The "Subjects" column contains non-atomic values (multiple subjects in a single cell).

In [2]:
# Create a table that violates 1NF
cursor.execute('''
CREATE TABLE Students (
    StudentID INTEGER PRIMARY KEY,
    Name TEXT,
    Subjects TEXT
)''')

# Insert data with non-atomic values
cursor.executemany('INSERT INTO Students (StudentID, Name, Subjects) VALUES (?, ?, ?)', [
    (1, 'Alice', 'Math, Science'),
    (2, 'Bob', 'History, Literature')
])

# Query the table
cursor.execute('SELECT * FROM Students')
for row in cursor.fetchall():
    print(row)

(1, 'Alice', 'Math, Science')
(2, 'Bob', 'History, Literature')


### Converting to 1NF

To achieve 1NF, we split the Subjects column into separate rows.

In [3]:
# Create a table in 1NF
cursor.execute('''
CREATE TABLE StudentSubjects (
    StudentID INTEGER,
    Name TEXT,
    Subject TEXT,
    PRIMARY KEY (StudentID, Subject)
)''')

# Insert data in 1NF
cursor.executemany('INSERT INTO StudentSubjects (StudentID, Name, Subject) VALUES (?, ?, ?)', [
    (1, 'Alice', 'Math'),
    (1, 'Alice', 'Science'),
    (2, 'Bob', 'History'),
    (2, 'Bob', 'Literature')
])

# Query the 1NF table
cursor.execute('SELECT * FROM StudentSubjects')
for row in cursor.fetchall():
    print(row)

(1, 'Alice', 'Math')
(1, 'Alice', 'Science')
(2, 'Bob', 'History')
(2, 'Bob', 'Literature')


## Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key.

### Table Violating 2NF
| OrderID | ProductID | ProductName |
|---------|-----------|-------------|
| 1       | 101       | Laptop      |
| 1       | 102       | Mouse       |
| 2       | 101       | Laptop      |

**Why it violates 2NF:** The "ProductName" column depends only on "ProductID" and not on the composite primary key (OrderID, ProductID).

In [4]:
# Create a table that violates 2NF
cursor.execute('''
CREATE TABLE Orders (
    OrderID INTEGER,
    ProductID INTEGER,
    ProductName TEXT,
    PRIMARY KEY (OrderID, ProductID)
)''')

# Insert data
cursor.executemany('INSERT INTO Orders (OrderID, ProductID, ProductName) VALUES (?, ?, ?)', [
    (1, 101, 'Laptop'),
    (1, 102, 'Mouse'),
    (2, 101, 'Laptop')
])

# Query the table
cursor.execute('SELECT * FROM Orders')
for row in cursor.fetchall():
    print(row)

(1, 101, 'Laptop')
(1, 102, 'Mouse')
(2, 101, 'Laptop')


### Converting to 2NF

To achieve 2NF, we separate the ProductName into a new table.

In [5]:
# Create tables in 2NF
cursor.execute('''
CREATE TABLE Products (
    ProductID INTEGER PRIMARY KEY,
    ProductName TEXT
)''')

cursor.execute('''
CREATE TABLE OrderDetails (
    OrderID INTEGER,
    ProductID INTEGER,
    PRIMARY KEY (OrderID, ProductID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
)''')

# Insert data into 2NF tables
cursor.executemany('INSERT INTO Products (ProductID, ProductName) VALUES (?, ?)', [
    (101, 'Laptop'),
    (102, 'Mouse')
])

cursor.executemany('INSERT INTO OrderDetails (OrderID, ProductID) VALUES (?, ?)', [
    (1, 101),
    (1, 102),
    (2, 101)
])

# Query the 2NF tables
print('Products Table:')
cursor.execute('SELECT * FROM Products')
for row in cursor.fetchall():
    print(row)

print('\nOrderDetails Table:')
cursor.execute('SELECT * FROM OrderDetails')
for row in cursor.fetchall():
    print(row)

Products Table:
(101, 'Laptop')
(102, 'Mouse')

OrderDetails Table:
(1, 101)
(1, 102)
(2, 101)


## Third Normal Form (3NF)

A table is in 3NF if it is in 2NF and all non-key attributes are not only dependent on the primary key but are also non-transitively dependent.

That is, no non-key attribute depends on another non-key attribute.

### Table Violating 3NF
| EmployeeID | DepartmentID | DepartmentName |
|------------|--------------|----------------|
| 1          | 10           | HR             |
| 2          | 20           | Finance        |
| 3          | 10           | HR             |

**Why it violates 3NF:** The "DepartmentName" column is transitively dependent on "DepartmentID" through "EmployeeID".

In [6]:
# Create a table that violates 3NF
cursor.execute('''
CREATE TABLE Employees (
    EmployeeID INTEGER PRIMARY KEY,
    DepartmentID INTEGER,
    DepartmentName TEXT
)''')

# Insert data
cursor.executemany('INSERT INTO Employees (EmployeeID, DepartmentID, DepartmentName) VALUES (?, ?, ?)', [
    (1, 10, 'HR'),
    (2, 20, 'Finance'),
    (3, 10, 'HR')
])

# Query the table
cursor.execute('SELECT * FROM Employees')
for row in cursor.fetchall():
    print(row)

(1, 10, 'HR')
(2, 20, 'Finance')
(3, 10, 'HR')


### Converting to 3NF

To achieve 3NF, we separate the DepartmentName into a new table.

The new tables are in 3NF as they do not contain any transitive dependencies.

In [7]:
# Create tables in 3NF
cursor.execute('''
CREATE TABLE Departments (
    DepartmentID INTEGER PRIMARY KEY,
    DepartmentName TEXT
)''')

cursor.execute('''
CREATE TABLE EmployeeDetails (
    EmployeeID INTEGER PRIMARY KEY,
    DepartmentID INTEGER,
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
)''')

# Insert data into 3NF tables
cursor.executemany('INSERT INTO Departments (DepartmentID, DepartmentName) VALUES (?, ?)', [
    (10, 'HR'),
    (20, 'Finance')
])

cursor.executemany('INSERT INTO EmployeeDetails (EmployeeID, DepartmentID) VALUES (?, ?)', [
    (1, 10),
    (2, 20),
    (3, 10)
])

# Query the 3NF tables
print('Departments Table:')
cursor.execute('SELECT * FROM Departments')
for row in cursor.fetchall():
    print(row)

print('\nEmployeeDetails Table:')
cursor.execute('SELECT * FROM EmployeeDetails')
for row in cursor.fetchall():
    print(row)

Departments Table:
(10, 'HR')
(20, 'Finance')

EmployeeDetails Table:
(1, 10)
(2, 20)
(3, 10)


## Benefits of Normalization

- **Data Integrity**: Ensures data remains accurate and consistent.
- **Storage Optimization**: Reduces unnecessary storage usage.
- **Simplified Maintenance**: Makes it easier to update and manage data.

## Boyce-Codd Normal Form (BCNF)

BCNF is an advanced version of the Third Normal Form (3NF). A table is in BCNF if it is in 3NF and every determinant is a candidate key.

### Table Violating BCNF
| CourseID | Instructor  | Classroom |
|----------|-------------|-----------|
| 1        | Dr. Smith  | Room 101  |
| 2        | Dr. Smith  | Room 102  |
| 3        | Dr. Johnson| Room 101  |

**Why it violates BCNF:** The "Instructor" column determines "Classroom", but "Instructor" is not a candidate key.

### Example: A Table Not in BCNF

Consider a table that stores information about courses, instructors, and classrooms.

In [8]:
# Create a table that is not in BCNF
cursor.execute('''
CREATE TABLE Courses (
    CourseID INTEGER PRIMARY KEY,
    Instructor TEXT,
    Classroom TEXT
)''')

# Insert data
cursor.executemany('INSERT INTO Courses (CourseID, Instructor, Classroom) VALUES (?, ?, ?)', [
    (1, 'Dr. Smith', 'Room 101'),
    (2, 'Dr. Smith', 'Room 102'),
    (3, 'Dr. Johnson', 'Room 101')
])

# Query the table
cursor.execute('SELECT * FROM Courses')
for row in cursor.fetchall():
    print(row)

(1, 'Dr. Smith', 'Room 101')
(2, 'Dr. Smith', 'Room 102')
(3, 'Dr. Johnson', 'Room 101')


### Decomposing into BCNF

To convert the table into BCNF, we decompose it into two tables: one for instructors and classrooms, and another for courses and instructors.

In [9]:
# Create tables in BCNF
cursor.execute('''
CREATE TABLE Instructors (
    Instructor TEXT,
    Classroom TEXT,
    PRIMARY KEY (Instructor, Classroom)
)''')

cursor.execute('''
CREATE TABLE CourseAssignments (
    CourseID INTEGER,
    Instructor TEXT,
    PRIMARY KEY (CourseID, Instructor),
    FOREIGN KEY (Instructor) REFERENCES Instructors(Instructor)
)''')

# Insert data into BCNF tables
cursor.executemany('INSERT INTO Instructors (Instructor, Classroom) VALUES (?, ?)', [
    ('Dr. Smith', 'Room 101'),
    ('Dr. Smith', 'Room 102'),
    ('Dr. Johnson', 'Room 101')
])

cursor.executemany('INSERT INTO CourseAssignments (CourseID, Instructor) VALUES (?, ?)', [
    (1, 'Dr. Smith'),
    (2, 'Dr. Smith'),
    (3, 'Dr. Johnson')
])

# Query the BCNF tables
print('Instructors Table:')
cursor.execute('SELECT * FROM Instructors')
for row in cursor.fetchall():
    print(row)

print('\nCourseAssignments Table:')
cursor.execute('SELECT * FROM CourseAssignments')
for row in cursor.fetchall():
    print(row)

Instructors Table:
('Dr. Smith', 'Room 101')
('Dr. Smith', 'Room 102')
('Dr. Johnson', 'Room 101')

CourseAssignments Table:
(1, 'Dr. Smith')
(2, 'Dr. Smith')
(3, 'Dr. Johnson')
