-
Notifications
You must be signed in to change notification settings - Fork 135
SE 13 Database Design Principles
Part of the Software Engineering Principles series
The database is the persistent foundation of most applications. Unlike code, which can be refactored and redeployed, the database schema and the data it holds are far harder to change once in production. A poorly designed schema propagates bad data, degrades performance, and constrains every future development decision.
Good database design produces schemas that:
- Accurately represent the business domain
- Enforce data integrity constraints
- Perform well at expected data volumes
- Can be extended without destructive migrations
A table models an entity type (patients, bills, medicines). Each row is one instance of that entity. Each column is an attribute of that entity.
Every table must have a primary key — a column or set of columns whose values uniquely identify each row.
CREATE TABLE patient (
id BIGINT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
dob DATE,
PRIMARY KEY (id)
);Use surrogate keys (auto-generated integers or UUIDs) for primary keys in most cases. Avoid using natural keys (like national ID numbers) as primary keys because they can change and they couple your schema to external systems.
A foreign key is a column in one table that references the primary key of another, enforcing referential integrity.
CREATE TABLE bill (
id BIGINT NOT NULL AUTO_INCREMENT,
patient_id BIGINT NOT NULL,
net_value DECIMAL(15,2) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (patient_id) REFERENCES patient(id)
);A foreign key constraint prevents inserting a bill for a non-existent patient and prevents deleting a patient who has bills.
Normalisation is the process of structuring a database to reduce data redundancy and improve integrity. It is formalised as a series of "Normal Forms" (1NF, 2NF, 3NF, etc.).
- Each column contains atomic (indivisible) values
- No repeating groups or arrays in a single column
-- Bad: storing multiple phone numbers in one column
patient: id, name, phones = "077-123, 011-456"
-- Good: separate table for phone numbers
patient_phone: patient_id, phone_number, phone_type- Satisfies 1NF
- Every non-key column depends on the entire primary key (relevant for composite keys)
-- Bad: a composite key where description depends on only one part
order_item: (order_id, medicine_id, quantity, medicine_description)
-- medicine_description depends only on medicine_id, not the combination
-- Good: medicine description lives in the medicine table
medicine: (id, name, description)
order_item: (order_id, medicine_id, quantity)- Satisfies 2NF
- No non-key column depends on another non-key column (no transitive dependencies)
-- Bad: department_location depends on department_id, not bill_id
bill: (id, patient_id, department_id, department_location)
-- Good: location is in the department table
department: (id, name, location)
bill: (id, patient_id, department_id)Aim for 3NF in most cases. Denormalise (deliberately introduce redundancy) only when a specific performance requirement demands it, and document the reason.
The most common relationship. One patient has many bills.
-- patient (1) ──< bill (many)
bill.patient_id references patient.idImplemented with a junction table. A prescription can contain many medicines, and a medicine can appear in many prescriptions.
CREATE TABLE prescription_item (
prescription_id BIGINT NOT NULL,
medicine_id BIGINT NOT NULL,
quantity INT NOT NULL,
dosage VARCHAR(100),
PRIMARY KEY (prescription_id, medicine_id),
FOREIGN KEY (prescription_id) REFERENCES prescription(id),
FOREIGN KEY (medicine_id) REFERENCES medicine(id)
);One row in table A corresponds to exactly one row in table B. Useful when a large or rarely-needed subset of attributes should be split into a separate table.
Constraints enforce business rules at the database level — they cannot be bypassed by application bugs.
CREATE TABLE stock_item (
id BIGINT NOT NULL AUTO_INCREMENT,
medicine_id BIGINT NOT NULL,
quantity INT NOT NULL DEFAULT 0,
unit_price DECIMAL(10,2) NOT NULL,
CONSTRAINT chk_quantity_non_negative CHECK (quantity >= 0),
CONSTRAINT chk_price_positive CHECK (unit_price > 0),
PRIMARY KEY (id)
);| Constraint | Purpose |
|---|---|
NOT NULL |
Column must have a value |
UNIQUE |
No two rows can have the same value in this column |
PRIMARY KEY |
Unique, not null — identifies the row |
FOREIGN KEY |
References a row in another table |
CHECK |
Value must satisfy a condition |
DEFAULT |
Value to use when none is provided |
A transaction is a unit of work that either completes entirely or not at all (atomicity). Transactions ensure that related changes succeed or fail together.
@Transactional
public void transferStock(StockItem from, StockItem to, int quantity) {
from.deduct(quantity); // If this succeeds...
to.add(quantity); // ...this must also succeed, or both roll back
stockRepo.save(from);
stockRepo.save(to);
}
// If any step throws an exception, the entire transaction rolls backThe ACID properties of transactions:
- Atomicity — all or nothing
- Consistency — the database moves from one valid state to another
- Isolation — concurrent transactions do not interfere
- Durability — committed data survives system failure
An index is a data structure that allows the database to find rows quickly without scanning the entire table.
- Columns used frequently in WHERE clauses
- Columns used in JOIN conditions (foreign keys)
- Columns used in ORDER BY when large result sets are sorted
-- Without index: full table scan for every lookup
SELECT * FROM bill WHERE patient_id = 42;
-- With index: direct lookup
CREATE INDEX idx_bill_patient ON bill (patient_id);| Benefit | Cost |
|---|---|
| Faster reads | Slower writes (index must be updated) |
| Faster joins | More disk space |
| Faster sorts | More maintenance |
Do not index every column. Index columns that are actually used in queries on large tables. Over-indexing degrades write performance.
An index on multiple columns is most useful when queries filter on those columns together, in the same order as the index.
-- Useful for: WHERE department_id = 5 AND created_date > '2025-01-01'
CREATE INDEX idx_bill_dept_date ON bill (department_id, created_date);Consistent naming makes schemas readable.
| Object | Convention | Example |
|---|---|---|
| Table | lowercase, underscore |
patient, bill_item, stock_transfer
|
| Column | lowercase, underscore |
patient_id, created_date, net_value
|
| Primary key | id |
id |
| Foreign key | {referenced_table}_id |
patient_id, department_id
|
| Index | idx_{table}_{column(s)} |
idx_bill_patient, idx_stock_dept_date
|
| Constraint | chk_{table}_{description} |
chk_stock_quantity_non_negative |
A migration script modifies an existing database schema in a controlled, repeatable way. Every schema change — adding a table, adding a column, adding an index — should be implemented as a versioned migration script.
- Never modify a deployed migration — once it has run in production, it is permanent history
- Make migrations reversible — include rollback steps where possible
- Test migrations against a copy of production data before deploying
- Keep migrations small — one logical change per migration file
-- v2.1.5/migration.sql
-- Add expiry tracking to stock items
ALTER TABLE stock_item
ADD COLUMN expiry_date DATE NULL,
ADD COLUMN batch_number VARCHAR(50) NULL;
CREATE INDEX idx_stock_expiry ON stock_item (expiry_date);Previous: SE-12: Security Principles
Next: SE-14: Agile and Scrum