# SQL Task — Normalising a Flat-File Database (Veterinary Clinic)

**Submit:** ONE Jupyter Notebook (`.ipynb`) with all required sections completed.  
**Tooling:** Use **JupySQL** in this notebook (you already know how to integrate it).

---

## Context
A veterinary clinic currently stores records in a single “flat file” table (one big table), where **each row represents a visit/appointment** and mixes information about the **owner**, **patient (animal)**, **vet**, and **treatment**.

Your task is to **propose and implement** a well-designed relational database by applying **normalisation** and **justifying** your design decisions.

---

## Learning goals
By the end of this task you should be able to:
- Identify **redundancy**, **update anomalies** (insert/update/delete), and **functional dependencies**
- Apply normalisation to at least **3NF**
- Create an **ERD** with correct relationships and cardinalities
- Implement the model in SQL using **PK, FK, and constraints**
- Explain how the design improves **data integrity and data quality**


## 0) Setup (JupySQL)

> **Do not change the deliverable format:** everything must remain in this single notebook.

1. Start your Docker services (database + Jupyter).
2. Connect to your database from here using JupySQL.

**Tip:** You may use either **MySQL** or **PostgreSQL**, depending on the stack your teacher provided.


In [None]:
# TODO: Load JupySQL and connect to your database.
# Example patterns (choose the correct one for your setup):
#
# %load_ext sql
# %sql postgresql://USER:PASSWORD@HOST:PORT/DBNAME
# %sql mysql+pymysql://USER:PASSWORD@HOST:PORT/DBNAME
#
# After connecting, test:
# %%sql
# SELECT 1;


## 1) Suggested flat-file dataset (starting point)

Create a flat table (e.g., `vet_flat`) and populate it with the following sample data.

| Owner ID | Owner Name | Owner Address | Patient ID | Patient Name | Patient Type | Vet ID | Vet Name | Vet Address | Date | Treatment | Treatment Type | Cost |
|---:|---|---|---:|---|---|---:|---|---|---|---|---|---:|
| 1029 | Alison Bachman | 12 Green Lane, 20192 | 1011 | Oskar | Dog | 4400 | Rachel | 345 Ridley St, 99554 | 23/05/25 | Worming | Tablet | 50 |
| 1922 | Aria Mathers | 458 Rigistr, 4993 | 1012 | Seb | Cat | 4100 | Lucy | 29 Entle Street, 3049 | 23/05/25 | Broken Tail | Surgery | 400 |
| 1029 | Alison Bachman | 12 Green Lane, 20192 | 3999 | Jaques | Hamster | 4400 | Rachel | 345 Ridley St, 99554 | 23/05/25 | Cut of Paw | Medication | 50 |
| 2032 | Theo Naidoo | 45 Rue Martignac | 2393 | Kai | Dog | 4400 | Rachel | 345 Ridley St, 99554 | 23/05/25 | Broken Leg | Surgery | 450 |
| 2032 | Theo Naidoo | 45 Rue Martignac | 2393 | Kai | Dog | 4400 | Rachel | 345 Ridley St, 99554 | 17/08/25 | Cast Removal | Surgery | 200 |

**Note on dates:** store dates in ISO format `YYYY-MM-DD` (e.g., `2025-05-23`) to avoid parsing issues across SQL engines.

**Address fields:** You may keep addresses as a single text field, or split them into parts — but you must justify your choice.


### 1.1 Create and populate the flat table

Run (or adapt) the SQL below to create `vet_flat` and insert the sample data.


In [None]:
%%sql
-- TODO: If your DB already has a table with this name, DROP it first (carefully).
-- DROP TABLE IF EXISTS vet_flat;

CREATE TABLE vet_flat (
  owner_id      INT,
  owner_name    VARCHAR(100),
  owner_address VARCHAR(200),
  patient_id    INT,
  patient_name  VARCHAR(100),
  patient_type  VARCHAR(50),
  vet_id        INT,
  vet_name      VARCHAR(100),
  vet_address   VARCHAR(200),
  visit_date    DATE,
  treatment     VARCHAR(100),
  treatment_type VARCHAR(50),
  cost          DECIMAL(10,2)
);

INSERT INTO vet_flat (
  owner_id, owner_name, owner_address,
  patient_id, patient_name, patient_type,
  vet_id, vet_name, vet_address,
  visit_date, treatment, treatment_type, cost
) VALUES
(1029, 'Alison Bachman', '12 Green Lane, 20192', 1011, 'Oskar', 'Dog',     4400, 'Rachel', '345 Ridley St, 99554', '2025-05-23', 'Worming',      'Tablet',     50),
(1922, 'Aria Mathers',   '458 Rigistr, 4993',    1012, 'Seb',   'Cat',     4100, 'Lucy',   '29 Entle Street, 3049', '2025-05-23', 'Broken Tail',  'Surgery',    400),
(1029, 'Alison Bachman', '12 Green Lane, 20192', 3999, 'Jaques','Hamster', 4400, 'Rachel', '345 Ridley St, 99554', '2025-05-23', 'Cut of Paw',   'Medication', 50),
(2032, 'Theo Naidoo',    '45 Rue Martignac',     2393, 'Kai',   'Dog',     4400, 'Rachel', '345 Ridley St, 99554', '2025-05-23', 'Broken Leg',   'Surgery',    450),
(2032, 'Theo Naidoo',    '45 Rue Martignac',     2393, 'Kai',   'Dog',     4400, 'Rachel', '345 Ridley St, 99554', '2025-08-17', 'Cast Removal', 'Surgery',    200);


### 1.2 Explore the data (3–5 queries)

Write **3–5 simple queries** to explore the data and spot repetition patterns (e.g., the same owner appears across multiple rows).


In [None]:
%%sql
-- TODO: Add your exploration query #1
DESCRIBE vet_flat;

In [None]:
%%sql
-- TODO: Add your exploration query #2
SELECT * FROM vet_flat;

In [None]:
%%sql
-- TODO: Add your exploration query #3


In [None]:
%%sql
-- TODO: Add your exploration query #4


In [None]:
%%sql
-- TODO: Add your exploration query #4


## 2) Diagnose the problems (write in Markdown)

Answer briefly:

1. Where do you see **redundancy**?
2. Describe **at least TWO anomalies** that could occur (insertion / update / deletion).
3. Suggest likely **functional dependencies** (e.g., `OwnerID → OwnerName, OwnerAddress`).

### Your answers


**Redundancy:**

- TODO

**Anomalies:**

- TODO

**Functional dependencies:**

- TODO


## 3) Normalisation (1NF → 2NF → 3NF)

Normalise the database to at least **3NF**, showing and justifying your steps.

### Required functional expectations
- One **owner** can have **many patients**
- One **patient** can have **many visits**
- Each **visit** is handled by **one vet**
- A **visit may include one or more treatments**  
  *(Even if the flat file shows only one treatment per row, your new model must support multiple treatments per visit.)*

---

### 3.1 First Normal Form (1NF)
Explain how 1NF applies here (atomic values, no repeating groups).

### 3.2 Second Normal Form (2NF)
Explain how your design avoids partial dependencies.

### 3.3 Third Normal Form (3NF)
Explain how your design removes transitive dependencies.

### Your normalisation notes


- **1NF:** TODO
- **2NF:** TODO
- **3NF:** TODO

**Final tables (3NF) with PK/FK (list):**

- TODO


## 4) ERD (Entity–Relationship Diagram)

Create an ERD showing:
- entities (tables) and key attributes  
- **PK/FK clearly labelled**  
- cardinalities (1–N, N–N)  
- associative table(s) where needed (e.g., visit–treatment)

### Add your ERD here
1. Export your ERD as an image (PNG recommended).
2. Place it next to this notebook (same folder) as `erd.png`.
3. Embed it below by keeping the Markdown image link.

![ERD](erd.png)


## 5) Full SQL implementation (DDL + sample data)

Write SQL that:
- creates all tables in your final design
- includes PK, FK, NOT NULL, UNIQUE, and CHECK constraints where appropriate
- inserts enough data to reproduce the flat-file information
- demonstrates:
  - **one owner with two patients**
  - **one visit with two treatments**


In [None]:
%%sql
-- TODO: Write your full schema here (CREATE TABLE ... with PK/FK/constraints)


In [None]:
%%sql
-- TODO: Insert sample data for your normalised schema here


## 6) Validation queries (minimum 6)

Include at least these queries:
1. All visits for a specific owner
2. Full history for a patient
3. Total spend per owner
4. Number of visits per vet in a date range
5. Top 3 most frequent treatments
6. List of patients by species and their owners


In [None]:
%%sql
-- (1) TODO: All visits for a specific owner


In [None]:
%%sql
-- (2) TODO: Full history for a patient


In [None]:
%%sql
-- (3) TODO: Total spend per owner


In [None]:
%%sql
-- (4) TODO: Number of visits per vet in a date range


In [None]:
%%sql
-- (5) TODO: Top 3 most frequent treatments


In [None]:
%%sql
-- (6) TODO: List of patients by species and their owners


## 7) Required justifications (write in Markdown)

Explain clearly:
- which flat-file problems you fixed (redundancy/anomalies)
- why you created the entities/tables you created
- how integrity is enforced (PK/FK/constraints)
- trade-offs you made (e.g., address as one field vs split fields; cost as part of visit vs treatment, etc.)


**Justifications:**

- TODO


## Suggested assessment criteria (for your reference)

- **Normalisation quality (1NF–3NF): 35%**
- **ERD (clarity, keys, cardinalities): 20%**
- **SQL implementation (correct DDL, constraints, data): 30%**
- **Justifications and communication quality: 15%**


---
**end of doc**