<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/march-2022/notebook-march-2022-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 March 2022 - Solutions Notebook

This notebook contains **complete solutions** for the March 2022 exam.

**Exam Structure:**
- Section A: 10 MCQs (on VLE separately)
- Section B: Answer 2 of 3 questions
  - Q2: XML Family Tree (English Monarchy)
  - Q3: Wikidata SPARQL
  - Q4: Hospital Database

**Note:** Section A MCQs are completed separately on the VLE.

---

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, and SPARQL.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === xmllint Setup (for XML/XPath exercises) ===
!apt -y -qq install libxml2-utils > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml sparqlwrapper

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")
print("SPARQLWrapper ready!")

In [None]:
# === SPARQL Setup (for Wikidata queries) ===
from SPARQLWrapper import SPARQLWrapper, JSON
import re

def run_sparql(query, limit=50):
    """Run a SPARQL query against Wikidata and print results."""
    sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

    # Only add LIMIT if not already in query
    if not re.search(r'\bLIMIT\b', query, re.IGNORECASE):
        query = query + f"\nLIMIT {limit}"

    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    # Print results dynamically based on SELECT variables
    vars = results["head"]["vars"]
    for result in results["results"]["bindings"]:
        row = [f"{var}: {result[var]['value']}" for var in vars if var in result]
        print("  ".join(row))

    return results

print("SPARQL ready!")

---

# Question 2: XML Family Tree (English Monarchy) [30 marks]

## Sample XML Data

In [None]:
%%writefile royals.xml
<royal name="Henry" xml:id="HenryVII">
  <title rank="king" territory="England" regnal="VII"
         from="1485-08-22" to="1509-04-21" />
  <relationship type="marriage" spouse="#ElizabethOfYork">
    <children>
      <royal name="Arthur" xml:id="ArthurTudor"/>
      <royal name="Henry" xml:id="HenryVIII">
        <title rank="king" territory="England" regnal="VIII"
               from="1509-04-22" to="1547-01-28" />
        <relationship type="marriage" spouse="#CatherineOfAragon"
                      from="1509-06-11" to="1533-05-23">
          <children>
            <royal name="Mary">
              <title rank="queen" territory="England" regnal="I"
                     from="1553-07-19" to="1558-11-17" />
              <relationship type="marriage" spouse="#PhilipOfSpain"
                            from="1554-07-25"/>
            </royal>
          </children>
        </relationship>
        <relationship type="marriage" spouse="#AnneBoleyn"
                      from="1533-01-25" to="1536-05-17">
          <children>
            <royal name="Elizabeth">
              <title rank="queen" territory="England" regnal="I"
                     from="1558-11-17" to="1603-03-24" />
            </royal>
          </children>
        </relationship>
        <relationship type="marriage" spouse="#JaneSeymour"
                      from="1536-05-30" to="1537-10-24">
          <children>
            <royal name="Edward">
              <title rank="king" territory="England" regnal="VI"
                     from="1547-01-28" to="1553-07-06" />
            </royal>
          </children>
        </relationship>
      </royal>
    </children>
  </relationship>
</royal>

## Q2(a): Identify Elements and Attributes [2 marks]

**Question:** Give two examples of element names and two examples of attribute names.

### Solution

In [None]:
# Q2(a) SOLUTION
print("""Element Names (examples):
1. royal
2. title
(Also valid: relationship, children)

Attribute Names (examples):
1. rank
2. territory
(Also valid: name, xml:id, regnal, from, to, type, spouse)

---
Key Distinction:
- Elements: Container tags like <royal>...</royal>, <title/>
- Attributes: Name-value pairs inside tags like rank="king"
""")

## Q2(b): XPath Query Analysis [3 marks]

**Question:** What will be the result of:
```xpath
//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]
```

### Solution

In [None]:
# Q2(b) SOLUTION
print("""Answer: This query returns NOTHING (empty result).

Query Breakdown:
1. //title[@rank="king" and @regnal="VIII"]
   -> Finds Henry VIII's title element

2. /..
   -> Goes UP to parent: <royal name="Henry" xml:id="HenryVIII">

3. /royal[@name="Henry"]
   -> Looks for a DIRECT CHILD <royal> with name="Henry"

Problem: Henry VIII's <royal> element has NO direct child named <royal name="Henry">.
His children (Mary, Elizabeth, Edward) are nested inside <relationship><children>.

The query would need //royal[@name="Henry"] (descendant) not /royal (child).
""")

In [None]:
# Demonstrate with xmllint
print("\n=== Testing with xmllint ===")
print("Step 1: Find the title")
!xmllint --xpath '//title[@rank="king" and @regnal="VIII"]' royals.xml 2>&1

In [None]:
print("\n\nStep 2: Full query (returns nothing)")
!xmllint --xpath '//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]' royals.xml 2>&1 || echo "No match found (as expected)"

## Q2(c): Deep XPath Navigation [3 marks]

**Question:** What will be returned by:
```xpath
//title[@rank="king" or @rank="queen"]/../relationship/children/royal/relationship/children/royal/
```

### Solution

In [None]:
# Q2(c) SOLUTION
print("""Answer: This query returns all <royal> elements that are GRANDCHILDREN
(through relationships) of any king or queen.

Query Breakdown:
1. //title[@rank="king" or @rank="queen"]
   -> All titles of monarchs

2. /..
   -> Parent <royal> (the monarch themselves)

3. /relationship/children/royal
   -> Monarch's children (first generation)

4. /relationship/children/royal
   -> Children's children = GRANDCHILDREN (second generation)

In this data, it navigates two relationship levels deep from monarchs.
""")

In [None]:
# Test - note: our sample data may not have grandchildren through two relationship levels
print("\n=== Testing with xmllint ===")
print("Monarchs found:")
!xmllint --xpath '//title[@rank="king" or @rank="queen"]/@regnal' royals.xml 2>&1

## Q2(d): Add XML Fragment [4 marks]

**Question:** Mary I was also queen consort of Spain from 16 January 1556 until her death. Give an XML fragment.

### Solution

In [None]:
# Q2(d) SOLUTION
print("""XML Fragment to add inside <royal name="Mary">:

<title rank="queen" territory="Spain" regnal="consort"
       from="1556-01-16" to="1558-11-17"/>

Location: Inside Mary's <royal> element, alongside her existing English title.

Complete context:
<royal name="Mary">
  <title rank="queen" territory="England" regnal="I"
         from="1553-07-19" to="1558-11-17" />
  <title rank="queen" territory="Spain" regnal="consort"
         from="1556-01-16" to="1558-11-17"/>  <!-- NEW -->
  <relationship type="marriage" spouse="#PhilipOfSpain"
                from="1554-07-25"/>
</royal>

Key decisions:
- Follows existing attribute pattern (rank, territory, regnal, from, to)
- Uses ISO 8601 date format (YYYY-MM-DD)
- Death date (1558-11-17) used as 'to' date
""")

In [None]:
# Verify well-formedness
!echo '<title rank="queen" territory="Spain" regnal="consort" from="1556-01-16" to="1558-11-17"/>' | xmllint --noout - && echo "Fragment is well-formed!"

## Q2(e): Strengths and Weaknesses [7 marks]

### Solution

In [None]:
# Q2(e) SOLUTION
print("""STRENGTHS of XML for Family Tree Data:

| Strength           | Explanation                                           |
|--------------------|-------------------------------------------------------|
| Natural hierarchy  | XML's tree structure mirrors family tree relationships|
| Self-describing    | Tags like <royal>, <relationship>, <children> are intuitive |
| Flexibility        | Easy to add new attributes or elements                |
| Human readable     | Can be read and edited without special tools          |
| Standard format    | Wide tool support for parsing, validation             |

WEAKNESSES of XML for Family Tree Data:

| Weakness            | Explanation                                          |
|---------------------|------------------------------------------------------|
| Verbosity           | Repeated tags create large files                     |
| Complex queries     | Deep nesting makes XPath cumbersome                  |
| Redundancy          | Same person may appear multiple times                |
| Limited cross-refs  | Hard to link across branches (graph relationships)   |
| Scalability         | Large genealogies become unwieldy                    |

Fundamental Issue: Genealogical data is GRAPH-like, not tree-like.
- A person has TWO parents (from different branches)
- Marriages connect different family trees
- XML's tree model struggles with this.
""")

## Q2(f): RDF vs Relational - Who is correct? [1 mark]

### Solution

In [None]:
# Q2(f) SOLUTION
print("""Answer: BOTH are correct - the choice depends on requirements.

| Approach    | Best For                                              |
|-------------|-------------------------------------------------------|
| RDF         | Flexible cross-references, linking to external data,  |
|             | semantic reasoning                                    |
| Relational  | Structured queries, ACID transactions, well-defined   |
|             | schema, reporting                                     |

Genealogical data IS graph-like, so RDF captures this naturally.
But relational databases can also model graphs using:
- Junction tables for many-to-many
- Self-referential foreign keys
- Recursive queries (CTEs)
""")

## Q2(g): Implementing in Relational or RDF [10 marks]

### Solution (Relational Approach)

In [None]:
# Q2(g) SOLUTION - Relational Database Approach
print("""How Relational Addresses XML Weaknesses:

| XML Weakness     | Relational Solution                              |
|------------------|--------------------------------------------------|
| Verbosity        | Normalized tables eliminate redundancy           |
| Complex queries  | SQL JOINs are often simpler than deep XPath      |
| Redundancy       | Each person stored once, referenced by ID        |
| Cross-references | Foreign keys naturally link entities             |
| Scalability      | Databases optimized for large datasets           |
""")

In [None]:
%%sql
-- Q2(g) SOLUTION: Relational Schema for Royal Family Tree

DROP TABLE IF EXISTS ParentChild;
DROP TABLE IF EXISTS RoyalRelationships;
DROP TABLE IF EXISTS Titles;
DROP TABLE IF EXISTS Royals;

CREATE TABLE Royals (
    Id VARCHAR(50) PRIMARY KEY,
    Name VARCHAR(100) NOT NULL
);

CREATE TABLE Titles (
    Id INT AUTO_INCREMENT PRIMARY KEY,
    RoyalId VARCHAR(50) NOT NULL,
    `Rank` VARCHAR(20),
    Territory VARCHAR(50),
    Regnal VARCHAR(10),
    FromDate DATE,
    ToDate DATE,
    FOREIGN KEY (RoyalId) REFERENCES Royals(Id)
);

CREATE TABLE RoyalRelationships (
    Id INT AUTO_INCREMENT PRIMARY KEY,
    RoyalId VARCHAR(50) NOT NULL,
    Type VARCHAR(20),
    SpouseId VARCHAR(50),
    FromDate DATE,
    ToDate DATE,
    FOREIGN KEY (RoyalId) REFERENCES Royals(Id),
    FOREIGN KEY (SpouseId) REFERENCES Royals(Id)
);

CREATE TABLE ParentChild (
    ParentId VARCHAR(50),
    ChildId VARCHAR(50),
    RelationshipId INT,
    PRIMARY KEY (ParentId, ChildId),
    FOREIGN KEY (ParentId) REFERENCES Royals(Id),
    FOREIGN KEY (ChildId) REFERENCES Royals(Id)
);

SELECT 'Relational schema for royals created!' AS Status;

In [None]:
%%sql
-- Insert sample data
INSERT INTO Royals VALUES ('HenryVII', 'Henry'), ('HenryVIII', 'Henry'), ('Mary', 'Mary');

INSERT INTO Titles (RoyalId, `Rank`, Territory, Regnal, FromDate, ToDate) VALUES
('HenryVII', 'king', 'England', 'VII', '1485-08-22', '1509-04-21'),
('HenryVIII', 'king', 'England', 'VIII', '1509-04-22', '1547-01-28'),
('Mary', 'queen', 'England', 'I', '1553-07-19', '1558-11-17');

SELECT 'Sample data inserted!' AS Status;

In [None]:
%%sql
-- Example query: Find all monarchs
SELECT R.Name, T.Rank, T.Territory, T.Regnal
FROM Royals R
INNER JOIN Titles T ON R.Id = T.RoyalId
WHERE T.Rank IN ('king', 'queen');

---

# Question 3: Wikidata SPARQL [30 marks]

## Reference: Wikidata URIs

| URI | Meaning |
|-----|--------|
| `wdt:P19` | place of birth |
| `wdt:P31` | instance of (like rdf:type) |
| `wdt:P131` | located in administrative territorial entity |
| `wd:Q5` | human |
| `wd:Q60` | New York City |

## Q3(a): Basic SPARQL Query [2 marks]

**Query:**
```sparql
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19 wd:Q60.
}
```

### Solution

In [None]:
# Q3(a) SOLUTION
print("""This query returns all distinct entities (?person) that are:
1. Instance of human (wdt:P31 wd:Q5)
2. Born in New York City (wdt:P19 wd:Q60)

In other words: ALL HUMANS BORN IN NEW YORK CITY.

Query Breakdown:
| Pattern                  | Meaning                              |
|--------------------------|--------------------------------------|
| ?person wdt:P31 wd:Q5    | ?person is instance of human         |
| ?person wdt:P19 wd:Q60   | ?person's place of birth is NYC      |
| ;                        | Same subject, different predicate    |
| DISTINCT                 | No duplicate results                 |
""")

# Run the query
print("\n=== Running query (first 10 results) ===")
run_sparql("""
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19 wd:Q60.
}
LIMIT 10
""")

## Q3(b): Query Assumptions [2 marks]

### Solution

In [None]:
# Q3(b) SOLUTION
print("""Assumptions the query makes:

1. Each person has wdt:P31 wd:Q5 explicitly stating they are human
2. Each person has wdt:P19 (place of birth) defined
3. The place of birth is EXACTLY wd:Q60 (New York City), not a sub-location

Data Requirements:
- The P31 (instance of) property must be set to Q5 (human)
- The P19 (place of birth) must be set to exactly Q60
- People born in boroughs like "Queens" or "Manhattan" won't match
  if those aren't identified as Q60

| Scenario                        | Will It Match? |
|---------------------------------|----------------|
| Person with P19 = Q60           | Yes            |
| Person with P19 = "Queens"      | No             |
| Person with no P19 at all       | No             |
| Person without P31 Q5           | No             |
""")

## Q3(c): Property Path Query [4 marks]

**Query with property path:**
```sparql
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19/wdt:P131* wd:Q60.
}
```

### Solution

In [None]:
# Q3(c) SOLUTION
print("""Difference: Uses a PROPERTY PATH wdt:P19/wdt:P131* instead of just wdt:P19.

What it means:
- wdt:P19 = place of birth
- /wdt:P131* = zero or more steps up the administrative hierarchy

Resolution of Assumptions:
- RESOLVES: birthplace must be exactly NYC
  -> Now includes Queens, Manhattan, etc. (sub-locations of NYC)
- DOES NOT resolve: P31 or P19 being present

Example: A person born in "Queens" (Q18424) would now match because:
  Queens (Q18424) --P131--> New York City (Q60)

Property Path Syntax:
| Syntax  | Meaning                    |
|---------|----------------------------|
| p1/p2   | p1 followed by p2          |
| p*      | Zero or more of p          |
| p+      | One or more of p           |
| p?      | Zero or one of p           |
""")

# Run the improved query
print("\n=== Running property path query (first 10 results) ===")
run_sparql("""
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19/wdt:P131* wd:Q60.
}
LIMIT 10
""")

## Q3(d): Why results aren't human-readable [1 mark]

### Solution

In [None]:
# Q3(d) SOLUTION
print("""Answer: The results return ENTITY URIs (like http://www.wikidata.org/entity/Q12345)
rather than human-readable LABELS (like "John Smith").

Without explicitly requesting labels, SPARQL returns only the identifiers.

| Query Returns | You Want       |
|---------------|----------------|
| wd:Q76        | Barack Obama   |
| wd:Q36970     | Taylor Swift   |
| wd:Q5582      | Vincent van Gogh |

Why URIs?
- URIs are unique identifiers
- Labels can be ambiguous ("John Smith" could be many people)
- Labels exist in multiple languages
""")

## Q3(e): Adding Human-Readable Labels [5 marks]

### Solution

In [None]:
# Q3(e) SOLUTION - Method 1: Using rdfs:label with FILTER
print("""Method 1: Using rdfs:label with FILTER

SELECT DISTINCT ?person ?personLabel
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19/wdt:P131* wd:Q60.
  ?person rdfs:label ?personLabel .
  FILTER (lang(?personLabel) = "en")
}
""")

run_sparql("""
SELECT DISTINCT ?person ?personLabel
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19/wdt:P131* wd:Q60.
  ?person rdfs:label ?personLabel .
  FILTER (lang(?personLabel) = "en")
}
LIMIT 10
""")

In [None]:
# Q3(e) SOLUTION - Method 2: Using Wikidata's Label Service
print("""Method 2: Using Wikidata's Label Service (Wikidata-specific)

SELECT DISTINCT ?person ?personLabel
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19/wdt:P131* wd:Q60.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

How it works:
- Any variable ?foo automatically gets ?fooLabel
- Falls back through language preferences
- Handles entities without labels gracefully
""")

## Q3(f): IMDB vs Wikidata Comparison [6 marks]

### Solution

In [None]:
# Q3(f) SOLUTION
print("""IMDB vs Wikidata Comparison for Actor Birthplace Search:

| Aspect             | IMDB                      | Wikidata                  |
|--------------------|---------------------------|---------------------------|
| Access             | Web interface only        | Open SPARQL endpoint      |
| Flexibility        | Fixed search parameters   | Arbitrary complex queries |
| Data scope         | Movies/TV only            | All knowledge domains     |
| Data depth         | Detailed film credits     | Basic biographical facts  |
| Programmatic access| Limited/restricted        | Fully open                |
| Data quality       | Curated, professional     | Community-contributed     |

Key Differences:
1. Openness: Wikidata provides free, open SPARQL endpoint.
   IMDB's search is not exposed in their API.

2. Query Power: Wikidata allows complex queries (birth place hierarchies,
   combinations of criteria). IMDB offers fixed search forms.

3. Integration: Wikidata links to other datasets. IMDB is a closed silo.

4. Specialization: IMDB has richer movie-specific data. Wikidata is
   broader but shallower.
""")

## Q3(g): Combining Wikidata and IMDB [4 marks]

### Solution

In [None]:
# Q3(g) SOLUTION
print("""Integration Strategies:

1. Use IMDB IDs stored in Wikidata:
   - Wikidata stores IMDB IDs as property P345
   - Query Wikidata for people, get their IMDB ID, then fetch from IMDB

2. Federated approach:
   - Use Wikidata for biographical queries (birthplace, family)
   - Use IMDB for filmography and ratings
   - Link results by shared identifiers

3. Data enrichment:
   - Export Wikidata results
   - Programmatically look up IMDB details
   - Combine into unified dataset

Example SPARQL to get IMDB IDs:
""")

# Example: Get IMDB IDs for actors born in NYC
print("\n=== Getting IMDB IDs for actors born in NYC ===")
run_sparql("""
SELECT ?person ?personLabel ?imdbId
WHERE {
  ?person wdt:P31 wd:Q5;           # human
          wdt:P106 wd:Q33999;      # occupation: actor
          wdt:P19/wdt:P131* wd:Q60; # born in NYC area
          wdt:P345 ?imdbId.        # has IMDB ID
  ?person rdfs:label ?personLabel .
  FILTER (lang(?personLabel) = "en")
}
LIMIT 10
""")

## Q3(h): Triple Table in SQL [2 marks]

### Solution

In [None]:
%%sql
-- Q3(h) SOLUTION: Triple Table Schema and Query

DROP TABLE IF EXISTS Triples;

CREATE TABLE Triples (
    Subject VARCHAR(100),
    Predicate VARCHAR(50),
    Object VARCHAR(100),
    PRIMARY KEY (Subject, Predicate, Object)
);

-- Sample data
INSERT INTO Triples (Subject, Predicate, Object) VALUES
('Person_SongCi', 'InstanceOf', 'Human'),
('Person_SongCi', 'BirthPlace', 'New_York_City'),
('Person_NehaKapoor', 'InstanceOf', 'Human'),
('Person_NehaKapoor', 'BirthPlace', 'Queens'),
('Person_JohnSmith', 'InstanceOf', 'Human'),
('Person_JohnSmith', 'BirthPlace', 'Boston'),
('Queens', 'LocatedIn', 'New_York_City'),
('Manhattan', 'LocatedIn', 'New_York_City'),
('New_York_City', 'LocatedIn', 'New_York_State');

SELECT 'Triples table ready!' AS Status;

In [None]:
%%sql
-- Q3(h) SOLUTION: Find humans born in NYC (direct match)
-- Equivalent to the basic SPARQL query from 3(a)

SELECT DISTINCT T1.Subject AS Person
FROM Triples T1
INNER JOIN Triples T2 ON T1.Subject = T2.Subject
WHERE T1.Predicate = 'InstanceOf'
  AND T1.Object = 'Human'
  AND T2.Predicate = 'BirthPlace'
  AND T2.Object = 'New_York_City';

## Q3(i): SQL with Location Hierarchy [4 marks]

### Solution

In [None]:
%%sql
-- Q3(i) SOLUTION: Find humans born in NYC or sub-locations
-- Pragmatic approach: Multiple self-joins for known hierarchy depth

SELECT DISTINCT T1.Subject AS Person
FROM Triples T1
INNER JOIN Triples T2 ON T1.Subject = T2.Subject
LEFT JOIN Triples T3 ON T2.Object = T3.Subject AND T3.Predicate = 'LocatedIn'
LEFT JOIN Triples T4 ON T3.Object = T4.Subject AND T4.Predicate = 'LocatedIn'
WHERE T1.Predicate = 'InstanceOf'
  AND T1.Object = 'Human'
  AND T2.Predicate = 'BirthPlace'
  AND (T2.Object = 'New_York_City'
       OR T3.Object = 'New_York_City'
       OR T4.Object = 'New_York_City');

In [None]:
# Q3(i) EXPLANATION
print("""Two approaches for hierarchical queries:

Option 1: Multiple Self-Joins (Pragmatic - use in exams)
- Simple to understand and write
- Works when hierarchy depth is known
- Each additional level = one more LEFT JOIN

Option 2: Recursive CTE (Advanced)
- Handles arbitrary depth automatically
- More elegant for deep/variable hierarchies
- More complex syntax

WITH RECURSIVE LocationChain AS (
    -- Base case: direct birth place
    SELECT Subject, Object AS Location
    FROM Triples WHERE Predicate = 'BirthPlace'

    UNION

    -- Recursive case: follow LocatedIn chain
    SELECT LC.Subject, T.Object
    FROM LocationChain LC
    INNER JOIN Triples T ON LC.Location = T.Subject
    WHERE T.Predicate = 'LocatedIn'
)
SELECT DISTINCT T.Subject AS Person
FROM Triples T
INNER JOIN LocationChain LC ON T.Subject = LC.Subject
WHERE T.Predicate = 'InstanceOf' AND T.Object = 'Human'
  AND LC.Location = 'New_York_City';

Why SPARQL is better here: P131* handles arbitrary depth automatically.
""")

---

# Question 4: Hospital Database [30 marks]

## Database Setup

In [None]:
%%sql
-- Drop tables in reverse order of dependencies
DROP TABLE IF EXISTS WorksAt;
DROP TABLE IF EXISTS StayIn;
DROP TABLE IF EXISTS Patients;
DROP TABLE IF EXISTS Wards;
DROP TABLE IF EXISTS Departments;
DROP TABLE IF EXISTS Doctors;
DROP TABLE IF EXISTS Buildings;
DROP TABLE IF EXISTS Hospitals;

-- 1) Hospitals
CREATE TABLE Hospitals (
    Name VARCHAR(100) PRIMARY KEY
);

-- 2) Buildings (run by Hospitals)
CREATE TABLE Buildings (
    Name VARCHAR(100) PRIMARY KEY,
    Address VARCHAR(255),
    RunBy VARCHAR(100) NOT NULL,
    FOREIGN KEY (RunBy) REFERENCES Hospitals(Name)
);

-- 3) Departments (part of Hospitals)
CREATE TABLE Departments (
    Name VARCHAR(100) PRIMARY KEY,
    PartOf VARCHAR(100) NOT NULL,
    Specialisation VARCHAR(100),
    FOREIGN KEY (PartOf) REFERENCES Hospitals(Name)
);

-- 4) Wards (located in Buildings, operated by Departments)
CREATE TABLE Wards (
    Name VARCHAR(100) PRIMARY KEY,
    LocatedIn VARCHAR(100) NOT NULL,
    OperatedBy VARCHAR(100) NOT NULL,
    FOREIGN KEY (LocatedIn) REFERENCES Buildings(Name),
    FOREIGN KEY (OperatedBy) REFERENCES Departments(Name)
);

-- 5) Doctors
CREATE TABLE Doctors (
    Name VARCHAR(100) PRIMARY KEY
);

-- 6) Patients (treated by Doctors)
CREATE TABLE Patients (
    Id INT PRIMARY KEY,
    Name VARCHAR(100),
    DoB DATE,
    TreatedBy VARCHAR(100) NOT NULL,
    FOREIGN KEY (TreatedBy) REFERENCES Doctors(Name)
);

-- 7) StayIn (junction: Patients <-> Wards with dates)
CREATE TABLE StayIn (
    Patient INT,
    Ward VARCHAR(100),
    Arrived DATE NOT NULL,
    Departed DATE,
    PRIMARY KEY (Patient, Ward, Arrived),
    FOREIGN KEY (Patient) REFERENCES Patients(Id),
    FOREIGN KEY (Ward) REFERENCES Wards(Name)
);

-- 8) WorksAt (junction: Doctors <-> Departments, M:N)
CREATE TABLE WorksAt (
    Doctor VARCHAR(100),
    Department VARCHAR(100),
    PRIMARY KEY (Doctor, Department),
    FOREIGN KEY (Doctor) REFERENCES Doctors(Name),
    FOREIGN KEY (Department) REFERENCES Departments(Name)
);

SELECT 'Hospital database schema created!' AS Status;

In [None]:
%%sql
-- Insert sample data
INSERT INTO Hospitals (Name) VALUES ('City Hospital'), ('General Hospital');

INSERT INTO Buildings (Name, Address, RunBy) VALUES
('Main Building', 'Main Street', 'City Hospital'),
('Annex', 'Annex Lane', 'City Hospital'),
('North Wing', 'North Avenue', 'General Hospital'),
('The Alexander Fleming Building', 'Imperial College Rd', 'General Hospital');

INSERT INTO Departments (Name, PartOf, Specialisation) VALUES
('Orthopedics', 'City Hospital', 'Musculoskeletal'),
('Accident & Emergency', 'City Hospital', 'Acute Care'),
('ENT', 'General Hospital', 'Ear/Nose/Throat');

INSERT INTO Wards (Name, LocatedIn, OperatedBy) VALUES
('Ward A', 'Main Building', 'Accident & Emergency'),
('Orthopedics Ward', 'Main Building', 'Orthopedics'),
('Ward B', 'North Wing', 'ENT'),
('Fleming Ward', 'The Alexander Fleming Building', 'ENT');

INSERT INTO Doctors (Name) VALUES ('Song Ci'), ('Neha Kapoor');

INSERT INTO Patients (Id, Name, DoB, TreatedBy) VALUES
(100, 'Neha Ahuja', '1990-05-12', 'Song Ci'),
(101, 'John Smith', '1985-03-22', 'Neha Kapoor');

INSERT INTO StayIn (Patient, Ward, Arrived, Departed) VALUES
(100, 'Ward A', '2023-08-01', '2023-08-15'),
(100, 'Fleming Ward', '2023-09-01', '2023-09-10'),
(101, 'Orthopedics Ward', '2023-08-05', '2023-08-10');

INSERT INTO WorksAt (Doctor, Department) VALUES
('Song Ci', 'Orthopedics'),
('Song Ci', 'ENT'),
('Neha Kapoor', 'Accident & Emergency');

SELECT 'Sample data inserted!' AS Status;

## Q4(a): Which questions can be answered? [3 marks]

### Solution

In [None]:
# Q4(a) SOLUTION
print("""Analysis of which questions can be answered:

| Question | Answerable? | Reasoning                                    |
|----------|-------------|----------------------------------------------|
| (i)      | Yes         | Patient -> Ward -> Building path exists      |
| (ii)     | Yes         | Patient -> Ward -> Building -> Hospital      |
| (iii)    | Partial     | Need Ward -> Department link (OperatedBy)    |
| (iv)     | Yes         | Doctor -> Department -> Hospital             |
| (v)      | Yes         | Building -> Hospital -> Departments          |
| (vi)     | No*         | No direct link Patient -> Doctor             |

*Note: In the adapted model with TreatedBy FK, question (vi) IS answerable.

Summary: Questions i, ii, iv, v are directly answerable.
Questions iii and vi require model modifications.
""")

## Q4(b): Implementation Issue [3 marks]

### Solution

In [None]:
# Q4(b) SOLUTION
print("""Parts that cannot be directly implemented in relational model:

1. Attributes on Relationship (StayIn):
   - 'arrived' and 'departed' are attributes of the relationship, not entities
   - Solution: Create a junction table StayIn with these attributes

2. Many-to-Many Relationship (WorksAt):
   - Doctor to Department is M:N
   - Solution: Create a junction table WorksAt

Junction Table Pattern:

CREATE TABLE StayIn (
    Patient INT,
    Ward VARCHAR(100),
    Arrived DATE,
    Departed DATE,
    PRIMARY KEY (Patient, Ward, Arrived),
    FOREIGN KEY (Patient) REFERENCES Patients(Id),
    FOREIGN KEY (Ward) REFERENCES Wards(Name)
);
""")

## Q4(d): Tables and Keys [5 marks]

### Solution

In [None]:
# Q4(d) SOLUTION
print("""Tables and Keys for Hospital Database:

| Table       | Primary Key              | Foreign Keys                              |
|-------------|--------------------------|-------------------------------------------|
| Hospitals   | Name                     | -                                         |
| Buildings   | Name                     | RunBy -> Hospitals(Name)                  |
| Departments | Name                     | PartOf -> Hospitals(Name)                 |
| Wards       | Name                     | LocatedIn -> Buildings(Name),             |
|             |                          | OperatedBy -> Departments(Name)           |
| Doctors     | Name                     | -                                         |
| Patients    | Id                       | TreatedBy -> Doctors(Name)                |
| StayIn      | (Patient, Ward, Arrived) | Patient -> Patients(Id),                  |
|             |                          | Ward -> Wards(Name)                       |
| WorksAt     | (Doctor, Department)     | Doctor -> Doctors(Name),                  |
|             |                          | Department -> Departments(Name)           |
""")

## Q4(e): SQL Queries [6 marks]

### Solution

In [None]:
%%sql
-- Q4(e)(i) SOLUTION: Which building did patient Neha Ahuja stay in?

SELECT DISTINCT W.LocatedIn AS BuildingName
FROM Patients P
INNER JOIN StayIn S ON P.Id = S.Patient
INNER JOIN Wards W ON S.Ward = W.Name
WHERE P.Name = 'Neha Ahuja';

In [None]:
%%sql
-- Q4(e)(ii) SOLUTION: Which hospital was responsible for Neha Ahuja's stay?

SELECT DISTINCT B.RunBy AS HospitalName
FROM Patients P
INNER JOIN StayIn S ON P.Id = S.Patient
INNER JOIN Wards W ON S.Ward = W.Name
INNER JOIN Buildings B ON W.LocatedIn = B.Name
WHERE P.Name = 'Neha Ahuja';

In [None]:
%%sql
-- Q4(e)(iii) SOLUTION: In which wards are Orthopedics patients housed?

SELECT DISTINCT W.Name AS WardName
FROM Wards W
INNER JOIN Departments D ON W.OperatedBy = D.Name
WHERE D.Name = 'Orthopedics';

In [None]:
%%sql
-- Q4(e)(iv) SOLUTION: Which hospitals does doctor Song Ci work in?

SELECT DISTINCT D.PartOf AS HospitalName
FROM Doctors Doc
INNER JOIN WorksAt WA ON Doc.Name = WA.Doctor
INNER JOIN Departments D ON WA.Department = D.Name
WHERE Doc.Name = 'Song Ci';

In [None]:
%%sql
-- Q4(e)(v) SOLUTION: What departments does the hospital have that contains
-- 'The Alexander Fleming Building'?

SELECT D.Name AS DepartmentName
FROM Departments D
INNER JOIN Hospitals H ON D.PartOf = H.Name
INNER JOIN Buildings B ON B.RunBy = H.Name
WHERE B.Name = 'The Alexander Fleming Building';

In [None]:
%%sql
-- Q4(e)(vi) SOLUTION: Which doctor treated Neha Ahuja?

SELECT P.TreatedBy AS DoctorName
FROM Patients P
WHERE P.Name = 'Neha Ahuja';

## Q4(f): Would XML work better? [3 marks]

### Solution

In [None]:
# Q4(f) SOLUTION
print("""Answer: NO, a tree-based model like XML would NOT work better.

Reasons why Relational is Better:

| Aspect                 | Why Relational is Better                    |
|------------------------|---------------------------------------------|
| Many-to-many relations | Doctor-Department is M:N; XML trees can't   |
|                        | naturally represent this                    |
| Cross-references       | Patients link to doctors, wards link to     |
|                        | departments - multiple connection points    |
| Query efficiency       | SQL JOINs are more efficient than XPath     |
| Data integrity         | Foreign keys enforce referential integrity  |
| Updates                | Updating one doctor's name is one statement |
|                        | in SQL; in XML you'd update multiple places |

When XML Might Be Better:
- Document-centric data (medical records as documents)
- Hierarchical data with single parent-child relationships
- Data interchange formats

This Hospital Data:
- Has multiple interlinked entities
- Requires referential integrity
- Needs efficient joins across relationships
- Relational model is clearly the better fit
""")

---

# End of Solutions Notebook

All solutions have been provided. Compare with your attempts in the practice notebook!