<a href="https://colab.research.google.com/github/p-tech/wbs-dm/blob/main/Normalisation/1NF_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Implementing 1st Normal From**

---

We're going to look at how to apply First Normal Form (1NF) by designing and restructuring a Python-based database representation (using lists or pandas DataFrames).

We'll normalise a given dataset by ensuring:
*   No repeating groups or arrays.
*   Each column contains atomic (indivisible) values.
*   Each row has a unique identifier.



# **Task Description:**

---

You are given a dataset representing student course registrations, where multiple courses are stored in a single column as a comma-separated list.

Your task is to normalise this data to First Normal Form (1NF) using Python.



In [1]:
import pandas as pd

# Given dataset (Unnormalised)
data = {
    "student_id": [101, 102, 103],
    "student_name": ["Alice", "Bob", "Charlie"],
    "courses": ["Math, Science", "English, History, Math", "Biology"]
}

df = pd.DataFrame(data)
print(df)

   student_id student_name                 courses
0         101        Alice           Math, Science
1         102          Bob  English, History, Math
2         103      Charlie                 Biology


# **Transform the Data to 1NF**

---

You need to eliminate multi-valued attributes (the courses column) and create a separate row for each course while preserving student details.

**Initialise an Empty List**
Crates and empty list to store the transformed rows in a normalised format.

normalised_data = []

**Loop Through the Dataframe**
df.iterrows() - steps through each row of the DataFrame
The'_' is used because we dont' need the row index, jsut the row values

for _, row in df.iterrows():

**Extract Data from Each Row**
Split based on the ',' : "Math, Science" -> ["Math,"Science"]

    student_id = row["student_id"]
    student_name = row["student_name"]
    courses = row["courses"].split(", ")  # Splitting courses

**Creates Normalised Rows**
For each course, a new dictionary is created and added to normalised_data

    for course in courses:
        normalised_data.append({"student_id": student_id, "student_name": student_name, "course": course})

**Create a New DataFrame**
Normailsed data, each course has it's own row.

df_1NF = pd.DataFrame(normalised_data)



In [7]:
#normalised_data = []

for _, row in df.iterrows():
    student_id = row["student_id"]
    student_name = row["student_name"]
    courses = row["courses"].split(", ")  # Splitting courses

    for course in courses:
        normalised_data.append({"student_id": student_id, "student_name": student_name, "course": course})

# Step 2: Create new DataFrame
df_1NF = pd.DataFrame(normalised_data)

# Display the normalised DataFrame
# directly display the DataFrame using pandas' display function:
display(df_1NF)

# Alternatively, simply print the dataframe:
#print(df_1NF)

Unnamed: 0,student_id,student_name,course
0,101,Alice,Math
1,101,Alice,Science
2,102,Bob,English
3,102,Bob,History
4,102,Bob,Math
5,103,Charlie,Biology
6,101,Alice,Math
7,101,Alice,Science
8,102,Bob,English
9,102,Bob,History
