# **`Data Science Learners Hub`**

**Module : SQL**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

### **`# Normalisation in SQL`**


**1. Introduction:**

Normalization in MS SQL Server is a database design process that organizes and structures data to reduce redundancy and improve data integrity. The goal is to eliminate data anomalies and maintain a consistent, efficient, and reliable database.

**2. Why Learn This Topic:**

Learning normalization is crucial for designing efficient databases. Normalized databases reduce redundancy, improve data integrity, and facilitate easier maintenance. It ensures that data is stored in a way that minimizes the risk of inconsistencies and allows for more flexible querying.

**3. Real-world Applications:**

Consider a scenario of managing a library database. Without normalization, each book entry might include information about the author, the publisher, and the genre. Normalizing the database would involve creating separate tables for authors, publishers, and genres, linked by keys. This ensures that if an author's details change, you only need to update one record in the Authors table, avoiding redundant updates across multiple book entries.



**4. Considerations:**

- Normalization is an iterative process. Not all databases need to be normalized to the same level.
- Over-normalization can lead to increased complexity and potentially slower query performance.

**5. Common Mistakes:**

- Failing to identify and eliminate data redundancy.
- Neglecting to address functional dependencies between attributes.
- Normalizing unnecessarily, leading to overcomplicated database structures.


**6. Key Terms:**

a. `Transitive Dependencies`:
- A transitive dependency occurs when a non-key attribute is functionally dependent on another non-key attribute rather than on the primary key.

b. `Functional Dependencies`:
- Functional dependency describes the relationship between attributes in a table, where the value of one attribute uniquely determines the value of another.

c. `Candidate Keys`:
- Candidate keys are unique identifiers for each row in a table. They are potential choices for the primary key.

d. `Multi-valued Dependencies`:
- A multi-valued dependency occurs when one attribute uniquely determines another attribute, independent of the other attributes in the table.




### **7. Hands On Experience:**


#### `Original Table`:

| StudentID | StudentName | Course       | Grade |
|-----------|-------------|--------------|-------|
| 1         | Alice       | Math         | A     |
| 1         | Alice       | History      | B     |
| 2         | Bob         | Math         | C     |
| 2         | Bob         | Physics      | A     |
| 3         | Charlie     | Chemistry    | B     |



#### `1. First Normal Form (1NF)`:

**Definition:**
- 1NF ensures that a table's structure is simple and all data is organized to the finest level of detail.

**Conditions:**
1. Each column must contain simple, indivisible values.
2. Entries in a column must be of the same data type.

**Approach:**
- Eliminate any repeating groups or arrays by creating separate rows for each set of related values.
- Ensure each cell in the table contains a single, indivisible value.

| StudentID | StudentName | Course1 | Grade1 | Course2 | Grade2 |
|-----------|-------------|---------|--------|---------|--------|
| 1         | Alice       | Math    | A      | History | B      |
| 2         | Bob         | Math    | C      | Physics | A      |
| 3         | Charlie     | Chemistry| B     |         |        |

#### `2. Second Normal Form (2NF):`

**Definition:**
- 2NF builds on 1NF by addressing partial dependencies on the primary key.

**Conditions:**
1. Must be in 1NF.
2. No partial dependencies on the primary key.

**Approach:**
- Identify the primary key.
- Move attributes dependent on only part of the primary key to a separate table.
- This ensures that each non-key attribute is fully dependent on the entire primary key.

| StudentID | StudentName | Course    | Grade |
|-----------|-------------|-----------|-------|
| 1         | Alice       | Math      | A     |
| 1         | Alice       | History   | B     |
| 2         | Bob         | Math      | C     |
| 2         | Bob         | Physics   | A     |
| 3         | Charlie     | Chemistry | B     |

#### `3. Third Normal Form (3NF):`

**Definition:**
- 3NF further refines the table by eliminating transitive dependencies between non-key attributes.

**Conditions:**
1. Must be in 2NF.
2. No transitive dependencies.

**Approach:**
- Identify and remove transitive dependencies by creating separate tables for related attributes.
- This ensures that non-key attributes are not dependent on other non-key attributes.

| StudentID | StudentName | Course    | Grade |
|-----------|-------------|-----------|-------|
| 1         | Alice       | Math      | A     |
| 1         | Alice       | History   | B     |
| 2         | Bob         | Math      | C     |
| 2         | Bob         | Physics   | A     |
| 3         | Charlie     | Chemistry | B     |


#### `4. Boyce-Codd Normal Form (BCNF):`

**Definition:**
- BCNF builds on 3NF and ensures that all determinants are candidate keys.

**Conditions:**
1. Must be in 3NF.
2. All determinants must be candidate keys.

**Approach:**
- Identify candidate keys (unique identifiers for each row).
- Ensure that all functional dependencies are based on candidate keys.
- This prevents any non-trivial dependencies between non-key attributes.

| StudentID | Course    | Grade |
|-----------|-----------|-------|
| 1         | Math      | A     |
| 1         | History   | B     |
| 2         | Math      | C     |
| 2         | Physics   | A     |
| 3         | Chemistry | B     |

#### `5. Fourth Normal Form (4NF):`

**Definition:**
- 4NF addresses multi-valued dependencies within a table.

**Conditions:**
1. Must be in BCNF.
2. No multi-valued dependencies.

**Approach:**
- Identify and remove multi-valued dependencies by creating separate tables for multi-valued attributes.
- This ensures that each cell in the table contains a single, atomic value.

| StudentID | Course    | Grade |
|-----------|-----------|-------|
| 1         | Math      | A     |
| 1         | History   | B     |
| 2         | Math      | C     |
| 2         | Physics   | A     |
| 3         | Chemistry | B     |


Now, the table is in 4NF, and each cell contains a single, atomic value. This normalization process helps improve data integrity, eliminate redundancies, and maintain a more efficient database structure.





### **8. Another Hand On experience to further increase our understanding of normalisation**

#### `Original Table:`

Consider a table tracking information about students and the projects they are working on:

| StudentID | StudentName | ProjectID | ProjectName  | AdvisorID | AdvisorName  |
|-----------|-------------|-----------|--------------|-----------|--------------|
| 1         | Alice       | 101       | Database     | 201       | Dr. Smith    |
| 1         | Alice       | 102       | Web Development | 202    | Prof. Johnson |
| 2         | Bob         | 101       | Database     | 201       | Dr. Smith    |
| 2         | Bob         | 103       | Networking   | 203       | Dr. Brown    |
| 3         | Charlie     | 102       | Web Development | 202    | Prof. Johnson |

#### `1. First Normal Form (1NF):`

**Approach:**
- Eliminate any repeating groups or arrays by creating separate rows for each set of related values.
- Ensure each cell in the table contains a single, indivisible value.

| StudentID | StudentName | ProjectID | ProjectName      | AdvisorID | AdvisorName    |
|-----------|-------------|-----------|------------------|-----------|----------------|
| 1         | Alice       | 101       | Database         | 201       | Dr. Smith      |
| 1         | Alice       | 102       | Web Development  | 202       | Prof. Johnson  |
| 2         | Bob         | 101       | Database         | 201       | Dr. Smith      |
| 2         | Bob         | 103       | Networking       | 203       | Dr. Brown      |
| 3         | Charlie     | 102       | Web Development  | 202       | Prof. Johnson  |

**Changes:**
- No changes are needed to meet 1NF as the original table is already in this form.

#### `2. Second Normal Form (2NF):`

**Approach:**
- Identify the primary key.
- Move attributes dependent on only part of the primary key to a separate table.

| StudentID | StudentName |      |               |            |                |
|-----------|-------------|------|---------------|------------|----------------|
| 1         | Alice       | 101  | Database      | 201        | Dr. Smith      |
| 1         | Alice       | 102  | Web Development | 202      | Prof. Johnson  |
| 2         | Bob         | 101  | Database      | 201        | Dr. Smith      |
| 2         | Bob         | 103  | Networking    | 203        | Dr. Brown      |
| 3         | Charlie     | 102  | Web Development | 202      | Prof. Johnson  |

| ProjectID | ProjectName      |
|-----------|------------------|
| 101       | Database         |
| 102       | Web Development  |
| 103       | Networking       |

**Changes:**
- Separated the projects into a new table with ProjectID as the primary key.

#### `3. Third Normal Form (3NF):`

**Approach:**
- Identify and remove transitive dependencies by creating separate tables for related attributes.

| StudentID | StudentName |      |            |                |
|-----------|-------------|------|------------|----------------|
| 1         | Alice       | 101  | 201        | Dr. Smith      |
| 1         | Alice       | 102  | 202        | Prof. Johnson  |
| 2         | Bob         | 101  | 201        | Dr. Smith      |
| 2         | Bob         | 103  | 203        | Dr. Brown      |
| 3         | Charlie     | 102  | 202        | Prof. Johnson  |

| ProjectID | ProjectName      |
|-----------|------------------|
| 101       | Database         |
| 102       | Web Development  |
| 103       | Networking       |

| AdvisorID | AdvisorName    |
|-----------|----------------|
| 201       | Dr. Smith      |
| 202       | Prof. Johnson  |
| 203       | Dr. Brown      |

**Changes:**
- Separated the advisor information into a new table with AdvisorID as the primary key.

#### `4. Boyce-Codd Normal Form (BCNF):`

**Approach:**
- Identify candidate keys (unique identifiers for each row).
- Ensure that all functional dependencies are based on candidate keys.

| StudentID | ProjectID | AdvisorID |
|-----------|-----------|-----------|
| 1         | 101       | 201       |
| 1         | 102       | 202       |
| 2         | 101       | 201       |
| 2         | 103       | 203       |
| 3         | 102       | 202       |

**Changes:**
- No significant changes at this stage as the table structure is already designed to meet BCNF criteria.

#### `5. Fourth Normal Form (4NF):`

**Approach:**
- Identify and remove multi-valued dependencies by creating separate tables for multi-valued attributes.

| StudentID | ProjectID | AdvisorID |
|-----------|-----------|-----------|
| 1         | 101       | 201       |
| 1         | 102       | 202       |
| 2         | 101       | 201       |
| 2         | 103       | 203       |
| 3         | 102       | 202       |

**Changes:**
- No multi-valued dependencies present, so no new changes are introduced.

Now, the table is in 4NF, and each cell contains a single, atomic value. This normalization process helps improve data integrity, eliminate redundancies, and maintain a more efficient database structure.


**9. How to transform ?**

Let's use a hypothetical table to illustrate these concepts:

| StudentID | StudentName | Course | Grade |
|-----------|-------------|--------|-------|
| 1         | Alice       | Math   | A     |
| 1         | Alice       | History| B     |
| 2         | Bob         | Math   | C     |
| 2         | Bob         | Physics| A     |
| 3         | Charlie     | Chem   | B     |

1. **1NF:**
   - Ensure each cell has a single, indivisible value.

2. **2NF:**
   - Identify primary key {StudentID, Course}.
   - Move non-key attributes dependent on only part of the primary key to a new table.

3. **3NF:**
   - Identify and remove transitive dependencies by creating separate tables for related attributes.

4. **BCNF:**
   - Identify candidate keys.
   - Ensure all functional dependencies are based on candidate keys.

5. **4NF:**
   - Identify and remove multi-valued dependencies by creating separate tables for multi-valued attributes.

By applying these normalization steps, we ensure a more efficient, organized, and structurally sound database design. Each step aims to eliminate redundancies, improve data integrity, and simplify the overall database structure.




**10. Unknown Facts:**

- Normalization is part of the broader field of database design, which includes denormalization strategies for specific use cases.
- There is a trade-off between normalization and query performance, and the level of normalization depends on the specific requirements of the application.
- The concept of normalization was introduced by Edgar F. Codd, a computer scientist who defined the relational database model.


