# Assignment HPDM172

## Intro

This notebook documents the design and build of a relational database modeling the activties of a group of hospitals. The aim is to create a structured dataset that captures the key entities and interactions found in a healthcare environment which includes hospitals, doctors, patients, appointments, prescriptions, diseases, and laboratory investigations.

The database is implemented using SQLite, a relational database engine that integrates cleanly with Python. All tables, relationships, and data-generation steps are documented.


## Aims

1. Design a relational schema for modeling the activties of a group of hospitals
2. Implement the tables using SQL CREATE TABLE statements in Python
3. Generate synthetic data to populate the database
4. Demonstrate key database operations such as inserts and queries 

## Scope

The database includes tables representing:

- Hospitals 
- Doctors – clinical staff working at a hospital
- Patients – individuals receiving care, each assigned to one doctor
- Appointments – events where a doctor sees a patient at a hospital
- Medications – a catalogue of drug names
- Prescriptions – many-to-many link between patients and medications, with doctor + date information
- Diseases – conditions treated by doctors and medications
- DiseaseSpecialist – which doctors specialise in which diseases
- DiseaseTreatment – which medications treat which diseases
- LabTest – a catalogue of test types 
- LabResult – individual patient test results ordered by a doctor

## Structure of notebook(s)

1. Introduction
2. Database Initialisation
3. Schema Definition: For each table, a markdown explanation and then SQL CREATE TABLE code executed using a helper function
4. Sytntheic Data Generation
5. Data Validation and testing
6. Queries
7. Entity-Relationship Diagram

## Database Initialisation

This section sets up the SQLite database that will be used throughout the remainder of the notebook. SQLite is chosen for its simplicity and native integration with Python, which makes it suitable for reproducible analysis within a Jupyter environment.

The following initialisation steps are performed:
1. Import required Python libraries
2. Create the database file
3. Enable foreign key enforcement, to esnsure the database maintains referential integrity throughout data insertion and modification.
4. Create a reusable cursor object.

Once this initialisation step is complete, the database connection  and cursor will be available for the interacting with tables in later sections of the notebook

In [19]:
"""
Initialises the SQLite database connection for the Hospital Information System.

This cell:
- Imports required Python modules
- Creates (or connects to) the SQLite database file
- Enables foreign key enforcement (disabled by default in SQLite)
- Creates a reusable cursor object
- Prints confirmation of successful setup and SQLite version

All subsequent CREATE TABLE and data insertion cells will rely on the
`conn` (connection) and `cur` (cursor) objects defined here.
"""

import sqlite3
from sqlite3 import Connection     
import pandas as pd

# Path to the SQLite database file.
# If 'hospital.db' does not yet exist, SQLite will create it automatically.
db_path = "hospital.db"

# Create a connection object to the database file.
conn: Connection = sqlite3.connect(db_path)

# Enable foreign key constraint enforcement in SQLite.
conn.execute("PRAGMA foreign_keys = ON;")

# Create a cursor object, for executing SQL statements.
cur = conn.cursor()

# Display confirmation messages to the user.
print("Database created at:", db_path)
print("SQLite version:", sqlite3.sqlite_version)


Database created at: hospital.db
SQLite version: 3.50.4


### SQL Execution Helper Function

This helper function simplifies database operations throughout the notebook. It executes SQL statements and commits the changes, without writing the code in each cell.

In [20]:
def run_sql(sql: str) -> str:
    """
    Executes a SQL statement using the global cursor and commits the change.

    Parameters
    ----------
    sql : str
        The SQL command to run. This should be a valid SQL statement
        written as a single string. It can include CREATE, INSERT,
        UPDATE, or DELETE statements.

    Returns
    -------
    str
        A confirmation message indicating that the SQL executed successfully.

    Notes
    -----
    - This function uses the global `cur` (cursor) and `conn` (connection)
      objects created during database initialisation.
    - Errors will propagate to the caller unless caught explicitly.
    """
    
    cur.execute(sql)   # Execute the SQL statement
    conn.commit()      # Save changes to the database
    return "SQL executed successfully."


## HOSPITAL table

This is the core table. Each hospital should have a unique name and a unique id.
There must be attributes like address, size etc

| Field Name             | Data Type       | Constraints & Notes                            |
| ---------------------- | --------------- | ---------------------------------------------- |
| `hospital_id`          | INT             | **Primary key**, auto-increment                |
| `name`                 | VARCHAR(150)    | **Unique**, not null                           |
| `address`              | VARCHAR(255)    | Not null                                       |
| `size_beds`            | INT             | Not null, must be >0                           |
| `accreditation_status` | VARCHAR(50)     | e.g. "Accredited", "Pending", "Not accredited" |


In [21]:
"""
Creates the `Hospital` table in the SQLite database.

This table stores core information about each hospital, including:
- A unique primary key (`hospital_id`)
- A unique hospital name
- Address information
- Bed capacity (`size_beds`)
- Type of hospital (e.g., General, Teaching, Specialist)
- Accreditation status

The table is only created if it does not already exist, making the cell
safe to re-run. All SQL execution is handled by the `run_sql()` helper
function defined earlier.
"""

# Note: No TEXT field length is required in SQLite.
sql_create_hospital = """
CREATE TABLE IF NOT EXISTS Hospital (
    hospital_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    address TEXT NOT NULL,
    size_beds INTEGER CHECK(size_beds > 0),
    hospital_type TEXT,
    accreditation_status TEXT
);
"""

# Execute the SQL statement and report result
run_sql(sql_create_hospital)
print("Hospital table created.")


Hospital table created.


## DOCTOR table

This table contains a list of doctors. Related information should be stored about each doctor including at least:
name, date of birth, Address.

1 hospital can have a relationship with many doctors and the hospital_id field allows this (foreign key) 

| Field Name      | Data Type       | Constraints / Notes                                    |
| --------------- | --------------- | ------------------------------------------------------ |
| `doctor_id`     | INT             | **Primary key**, auto-increment                        |
| `hospital_id`   | INT             | **Foreign key** = Hospital.hospital_id                 |
| `first_name`    | VARCHAR(100)    | Not null                                               |
| `last_name`     | VARCHAR(100)    | Not null                                               |
| `date_of_birth` | DATE            | Not null                                               |
| `specialty`     | VARCHAR(100)    | e.g. Cardiology, Oncology                              |
| `address`       | VARCHAR(255)    |  Not Null                                              |


In [22]:
"""
Creates the `Doctor` table in the SQLite database.

This table stores demographic and professional information for each doctor.
Each doctor is assigned to exactly one hospital, enforced by the
foreign key constraint on `hospital_id`.

Fields:
- doctor_id:      Primary key (auto-incrementing)
- hospital_id:    Foreign key linking to Hospital.hospital_id
- first_name:     Doctor's given name (required)
- last_name:      Doctor's surname (required)
- date_of_birth:  Stored as ISO string 'YYYY-MM-DD' (required)
- specialty:      Medical specialty (optional)
- address:        Work or home address (required)

The table is created only if it does not already exist.
"""

sql_create_doctor = """
CREATE TABLE IF NOT EXISTS Doctor (
    doctor_id INTEGER PRIMARY KEY AUTOINCREMENT,
    hospital_id INTEGER NOT NULL,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    date_of_birth TEXT NOT NULL,
    specialty TEXT,
    address TEXT NOT NULL,
    FOREIGN KEY (hospital_id) REFERENCES Hospital(hospital_id)
);
"""

run_sql(sql_create_doctor)
print("Doctor table created.")


Doctor table created.


## PATIENT TABLE

Each patient is assigned to one doctor (foreign key)

| Field Name      | Data Type (SQL) | Constraints / Notes                |
| --------------- | --------------- | ---------------------------------- |
| `patient_id`    | INT             | **Primary key**, auto-increment    |
| `doctor_id`     | INT             | **Foreign key** =  Doctor.doctor_id |
| `first_name`    | VARCHAR(100)    | Not null                           |
| `last_name`     | VARCHAR(100)    | Not null                           |
| `date_of_birth` | DATE            | Not null                           |
| `address`       | VARCHAR(255)    | Not null                           |       
| `gender`        | VARCHAR(10)     | Not null                           |


## MEDICATION table

30+ meds. A medication can be prescribed to multiple patients and each patient could be prescribed multiple medications, but that many-to-many relationship will be described in the PRESCRIPTIONS table.


| Field Name.      | Data Type       | Constraints/ Notes               |
| ---------------- | --------------- | -------------------------------- |
| `medication_id`  | INT             | **primary key** autoincrement.   |
| `name`           | VARCHAR(150).   | Unique, not null.                |

## PRESCRIPTTION Table

This is a Junction table for PATIENT and MEDICATION

This table represents the many-to-many relationship between patients and medications. Each record indicates that a specific patient has been prescribed a specific medication.
It also records who prescribed the medication, when it was prescrinbed, and will presumably contain extra information like dose, duration and route. This is called a descriptive junction table, or a bridge entity with attributes.

| Field Name          | Data Type    | Notes                               |
| ------------------- | ------------ | ----------------------------------- |
| `prescription_id`   | INT          | **Primary key**                     |
| `patient_id`        | INT          | **FK = Patient**                    |
| `doctor_id`         | INT          | **FK = Doctor**                     |
| `medication_id`     | INT          | **FK = Medication**                 |
| `prescribed_date`   | DATE         | Must be within past 2 years         |
| `dose_instructions` | VARCHAR(255) | e.g., "Take one tablet twice a day" |
| `duration_days`     | INT          | Optional                            |
| `route`             | VARCHAR(150) | not Null                            |



## DISEASES Table

One disease can be treated by many meds, and by many doctors. 
One medication can treat many diseases.
One doctor can specialise in many diseases.

There are TWO many-to-many relationships here, which can be represented in two junction tables: Disease-Medication, and Disease-Doctor


| Field         | Type         | Notes                  |
| ------------- | ------------ | ---------------------- |
| `disease_id`  | INT          | **Primary key**        |
| `name`        | VARCHAR(150) | Unique, not null       |
| `description` | TEXT         | Optional               |
| `icd10_code`  | VARCHAR(10)  | Optional               |




## DiseaseMEDICATION

Many to many, represents which meds are used to treat which diseases

| Field                  | Type | Notes           |
| ---------------------- | ---- | --------------- |
| `disease_treatment_id` | INT  | Primary Key             |
| `disease_id`           | INT  | Foreign Key - Disease    |
| `medication_id`        | INT  | Foreign Key - Medication |


## DiseaseDOCTOR

Many to Many, represents which doctors treat which diseases 

| Field                   | Type | Notes        |
| ----------------------- | ---- | ------------ |
| `disease_specialist_id` | INT  | PK           |
| `disease_id`            | INT  | FK → Disease |
| `doctor_id`             | INT  | FK → Doctor  |



## APPOINTMENTS table

Each Appointment is a one to one relationship between a PATIENT and a DOCTOR occuring at a HOSPTIAL and at a certain time and for a certain duration.

| Field               | Data Type    | Notes                                      |
| ------------------- | ------------ | ------------------------------------------ |
| `appointment_id`    | INT          | **Primary key**                            |
| `patient_id`        | INT          | **FK → Patient**                           |
| `doctor_id`         | INT          | **FK → Doctor**                            |
| `hospital_id`       | INT          | **FK → Hospital**                          |
| `appointment_start` | DATETIME     | exact scheduled time                       |
| `duration_minutes`  | INT          | typical values 10–60                       |
| `reason`            | VARCHAR(255) | optional free-text                         |
| `status`            | VARCHAR(50)  | e.g. "Scheduled", "Completed", "Cancelled" |


## LABTESTS table

A table should be created for storing the results from lab tests,
which are for individual patients.
A patient may have many lab tests, but each lab result is for one patient.
and each lab test should be requested by a specific doctor (one to one)
each lab result corresponds to one test type (many to one)

To achieve this we can create a LAB TESTStable, and then a LABRESULTS table which is a junctional table with attributes, joining a test result to a patient (many to one) and describing the doctor who requested it, the date of the request, the result of the test, the date of the result, and optionally whether it is normal or abnormal and a reference range.

| Field             | Type         | Notes                                              |
| ----------------- | ------------ | -------------------------------------------------- |
| `lab_test_id`     | INT          | **Primary key**                                    |
| `name`            | VARCHAR(150) | e.g. “HbA1c”, “CRP”                                |
| `description`     | TEXT         | optional                                           |
| `units`           | VARCHAR(20)  | e.g. “mmol/mol”, “mg/L”                            |
| `reference_range` | VARCHAR(50)  |                                                    |
| `sample_type`     | VARCHAR(50)  | e.g. “Blood”, “Urine”                              |


## LABRESULTS Table

| Field            | Type        | Notes                            |
| ---------------- | ----------- | -------------------------------- |
| `lab_result_id`  | INT         | **Primary key**                  |
| `lab_test_id`    | INT         | FK - LabTest                     |
| `patient_id`     | INT         | FK - Patient                     |
| `doctor_id`      | INT         | FK - Doctor                      |
| `requested_date` | DATE        | when the doctor ordered the test |
| `result_date`    | DATE        | when the lab produced the result |
| `result_value`   | VARCHAR(50) | numeric or text result           |
| `is_normal`      | BOOLEAN     | optional                         |
| `notes`          | TEXT        | optional commentary              |
