<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/march-2022/notebook-march-2022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 March 2022 - Practice Notebook

This notebook provides hands-on practice for the March 2022 exam.

**Exam Structure:**
- Section A: 10 MCQs (on VLE separately)
- Section B: Answer 2 of 3 questions
  - Q2: XML Family Tree (English Monarchy)
  - Q3: Wikidata SPARQL
  - Q4: Hospital Database

**Instructions:**
1. Run the Setup cells first
2. Write your answers in the empty code cells
3. Check your answers against the solution sheet

---

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, and SPARQL.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === xmllint Setup (for XML/XPath exercises) ===
!apt -y -qq install libxml2-utils > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml sparqlwrapper

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")
print("SPARQLWrapper ready!")

In [None]:

# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

# Test MongoDB is running
!mongo --quiet --eval 'print("MongoDB ready!")'

In [None]:
# === SPARQL Setup (for Wikidata queries) ===
from SPARQLWrapper import SPARQLWrapper, JSON
import re

def run_sparql(query, limit=50):
    """Run a SPARQL query against Wikidata and print results."""
    sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

    # Only add LIMIT if not already in query
    if not re.search(r'\bLIMIT\b', query, re.IGNORECASE):
        query = query + f"\nLIMIT {limit}"

    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    # Print results dynamically based on SELECT variables
    vars = results["head"]["vars"]
    for result in results["results"]["bindings"]:
        row = [f"{var}: {result[var]['value']}" for var in vars if var in result]
        print("  ".join(row))

    return results

print("SPARQL ready!")

---

# Question 2: XML Family Tree (English Monarchy)

## Sample XML Data

A historian has developed an XML file to track the English monarchy of the 16th Century.

In [None]:
%%writefile royals.xml
<royal name="Henry" xml:id="HenryVII">
  <title rank="king" territory="England" regnal="VII"
         from="1485-08-22" to="1509-04-21" />
  <relationship type="marriage" spouse="#ElizabethOfYork">
    <children>
      <royal name="Arthur" xml:id="ArthurTudor"/>
      <royal name="Henry" xml:id="HenryVIII">
        <title rank="king" territory="England" regnal="VIII"
               from="1509-04-22" to="1547-01-28" />
        <relationship type="marriage" spouse="#CatherineOfAragon"
                      from="1509-06-11" to="1533-05-23">
          <children>
            <royal name="Mary">
              <title rank="queen" territory="England" regnal="I"
                     from="1553-07-19" to="1558-11-17" />
              <relationship type="marriage" spouse="#PhilipOfSpain"
                            from="1554-07-25"/>
            </royal>
          </children>
        </relationship>
        <relationship type="marriage" spouse="#AnneBoleyn"
                      from="1533-01-25" to="1536-05-17">
          <children>
            <royal name="Elizabeth">
              <title rank="queen" territory="England" regnal="I"
                     from="1558-11-17" to="1603-03-24" />
            </royal>
          </children>
        </relationship>
        <relationship type="marriage" spouse="#JaneSeymour"
                      from="1536-05-30" to="1537-10-24">
          <children>
            <royal name="Edward">
              <title rank="king" territory="England" regnal="VI"
                     from="1547-01-28" to="1553-07-06" />
            </royal>
          </children>
        </relationship>
      </royal>
    </children>
  </relationship>
</royal>

## Q2(a): Identify Elements and Attributes [2 marks]

**Question:** Give two examples of element names and two examples of attribute names from this code.

In [None]:
# View the XML structure
!cat royals.xml

# Your answer:
# Element names:
# Attribute names:

## Q2(b): XPath Query Analysis [3 marks]

**Question:** What will be the result of the following XPath query?
```xpath
//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]
```

In [None]:
# Test with xmllint
# Note: This query looks for a child <royal name="Henry"> of the parent of Henry VIII's title

In [None]:
# First, let's find the title:
!xmllint --xpath '//title[@rank="king" and @regnal="VIII"]' royals.xml 2>&1

In [None]:
print("\n--- Now the full query ---")
# The parent of that title is <royal name="Henry" xml:id="HenryVIII">
# Looking for a child <royal> with name="Henry" - there isn't one directly
!xmllint --xpath '//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]' royals.xml 2>&1 || echo "No direct child match"

## Q2(c): Deep XPath Navigation [3 marks]

**Question:** What (in general terms) will be returned by:
```xpath
//title[@rank="king" or @rank="queen"]/../relationship/children/royal/relationship/children/royal/
```

In [None]:
# Find grandchildren of monarchs through relationships
print("=== All titles of kings or queens ===")
!xmllint --xpath '//title[@rank="king" or @rank="queen"]/@regnal' royals.xml 2>&1

In [None]:
print("\n=== Grandchildren of monarchs (2 relationship levels deep) ===")
!xmllint --xpath '//title[@rank="king" or @rank="queen"]/../relationship/children/royal/relationship/children/royal/@name' royals.xml 2>&1 || echo "No matches at this depth"

## Q2(d): Add XML Fragment [4 marks]

**Question:** Mary I was also queen consort of Spain from 16 January 1556 until her death. Give an XML fragment that would record this.

In [None]:
# Your XML fragment:
xml_fragment = '''
<title rank="queen" territory="Spain" regnal="consort"
       from="1556-01-16" to="1558-11-17"/>
'''

print("XML Fragment to add inside <royal name='Mary'>:")
print(xml_fragment)

In [None]:
# Verify it's well-formed
!echo '<title rank="queen" territory="Spain" regnal="consort" from="1556-01-16" to="1558-11-17"/>' | xmllint --noout - && echo "Well-formed!"

---

# Question 3: Wikidata SPARQL

## Reference: Wikidata URIs

| URI | Meaning |
|-----|--------|
| `wdt:P19` | place of birth |
| `wdt:P31` | instance of (like rdf:type) |
| `wdt:P131` | located in administrative territorial entity |
| `wd:Q5` | human |
| `wd:Q60` | New York City |

## Q3(a): SPARQL Query - Humans born in NYC

**Query:**
```sparql
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19 wd:Q60.
}
LIMIT 50
```

In [None]:
run_sparql("""
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19 wd:Q60.
}
LIMIT 50
""")

## Q3(c): Property Path - Born in NYC or sub-locations

**Question:** Extend the query to include people born in sub-locations of NYC (Queens, Manhattan, etc.)

**Hint:** Use property path `wdt:P19/wdt:P131*` where `*` means zero or more hops.

```sparql
LIMIT 50
```

In [None]:
# Write your query here:
run_sparql("""
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P31 wd:Q5;
          # TODO: modify wdt:P19 to use property path
          wdt:P19 wd:Q60.
}
LIMIT 50
""")

## Q3(e): Adding Human-Readable Labels

**Question:** The results show URIs like `wd:Q12345`. How do you get readable names?

**Hint:** Use `rdfs:label` with a language filter to get English labels:
```sparql
?person rdfs:label ?personLabel.
FILTER(LANG(?personLabel) = "en")
```

```sparql
LIMIT 50
```

In [None]:
# Write your query here:
run_sparql("""
SELECT DISTINCT ?person ?personLabel
WHERE {
  ?person wdt:P31 wd:Q5;
          wdt:P19 wd:Q60.
  # TODO: Add rdfs:label with FILTER for English labels
}
LIMIT 50
""")

## Q3(h)-(i): Triple Table in SQL

We'll simulate RDF triples in a relational database.

In [None]:
%%sql
DROP TABLE IF EXISTS Triples;

CREATE TABLE Triples (
    Subject VARCHAR(100),
    Predicate VARCHAR(50),
    Object VARCHAR(100)
);

-- Sample data
INSERT INTO Triples (Subject, Predicate, Object) VALUES
('Person_SongCi', 'InstanceOf', 'Human'),
('Person_SongCi', 'BirthPlace', 'New_York_City'),
('Person_SongCi', 'Occupation', 'Doctor'),

('Person_NehaKapoor', 'InstanceOf', 'Human'),
('Person_NehaKapoor', 'BirthPlace', 'Queens'),
('Person_NehaKapoor', 'Occupation', 'Actor'),

('Person_JohnSmith', 'InstanceOf', 'Human'),
('Person_JohnSmith', 'BirthPlace', 'Boston'),

-- Location hierarchy
('Queens', 'LocatedIn', 'New_York_City'),
('Manhattan', 'LocatedIn', 'New_York_City'),
('New_York_City', 'LocatedIn', 'New_York_State');

SELECT 'Triples table ready!' AS Status;

## Sample Data: Triples Table

| Subject | Predicate | Object |
|---------|-----------|--------|
| Person_SongCi | InstanceOf | Human |
| Person_SongCi | BirthPlace | New_York_City |
| Person_SongCi | Occupation | Doctor |
| Person_NehaKapoor | InstanceOf | Human |
| Person_NehaKapoor | BirthPlace | Queens |
| Person_NehaKapoor | Occupation | Actor |
| Person_JohnSmith | InstanceOf | Human |
| Person_JohnSmith | BirthPlace | Boston |
| Queens | LocatedIn | New_York_City |
| Manhattan | LocatedIn | New_York_City |
| New_York_City | LocatedIn | New_York_State |

**Location Hierarchy:**
```
New_York_State
└── New_York_City
    ├── Queens
    └── Manhattan
```

### Q3(h): Find humans born in NYC (direct match)

In [None]:
%%sql
-- Write your query here:
-- Hint: Join Triples table to itself to match multiple conditions on same Subject


### Q3(i): Find humans born in NYC or sub-locations (with hierarchy)

**Pragmatic approach:** Use multiple self-joins for known hierarchy depth.

In [None]:
%%sql
-- Write your query here:
-- Hint: Use LEFT JOINs to follow the LocatedIn hierarchy (one join per level)
-- Check if T2.Object OR T3.Object OR T4.Object = 'New_York_City'


### Q3(i) Advanced: Recursive CTE (Optional)

For arbitrary depth hierarchies, a recursive CTE is more elegant but more complex:

In [None]:
%%sql
-- Optional: Try writing a recursive CTE
-- WITH RECURSIVE LocationChain AS (
--     -- Base case: ...
--     -- UNION
--     -- Recursive case: ...
-- )
-- SELECT ...


---

# Question 4: Hospital Database

## E/R Model Overview

A health organization is designing a database for doctors, hospitals, and patients:
- Patients stay in wards (with arrival/departure dates)
- Wards are in buildings, buildings are run by hospitals
- Departments belong to hospitals
- Doctors can work in multiple departments

## Database Setup

In [None]:
%%sql
-- Drop tables in reverse order of dependencies
DROP TABLE IF EXISTS WorksAt;
DROP TABLE IF EXISTS StayIn;
DROP TABLE IF EXISTS Patients;
DROP TABLE IF EXISTS Wards;
DROP TABLE IF EXISTS Departments;
DROP TABLE IF EXISTS Doctors;
DROP TABLE IF EXISTS Buildings;
DROP TABLE IF EXISTS Hospitals;

-- 1) Hospitals
CREATE TABLE Hospitals (
    Name VARCHAR(100) PRIMARY KEY
);

-- 2) Buildings (run by Hospitals)
CREATE TABLE Buildings (
    Name VARCHAR(100) PRIMARY KEY,
    Address VARCHAR(255),
    RunBy VARCHAR(100) NOT NULL,
    FOREIGN KEY (RunBy) REFERENCES Hospitals(Name)
);

-- 3) Departments (part of Hospitals)
CREATE TABLE Departments (
    Name VARCHAR(100) PRIMARY KEY,
    PartOf VARCHAR(100) NOT NULL,
    Specialisation VARCHAR(100),
    FOREIGN KEY (PartOf) REFERENCES Hospitals(Name)
);

-- 4) Wards (located in Buildings, operated by Departments)
CREATE TABLE Wards (
    Name VARCHAR(100) PRIMARY KEY,
    LocatedIn VARCHAR(100) NOT NULL,
    OperatedBy VARCHAR(100) NOT NULL,
    FOREIGN KEY (LocatedIn) REFERENCES Buildings(Name),
    FOREIGN KEY (OperatedBy) REFERENCES Departments(Name)
);

-- 5) Doctors
CREATE TABLE Doctors (
    Name VARCHAR(100) PRIMARY KEY
);

-- 6) Patients (treated by Doctors)
CREATE TABLE Patients (
    Id INT PRIMARY KEY,
    Name VARCHAR(100),
    DoB DATE,
    TreatedBy VARCHAR(100) NOT NULL,
    FOREIGN KEY (TreatedBy) REFERENCES Doctors(Name)
);

-- 7) StayIn (junction: Patients <-> Wards with dates)
CREATE TABLE StayIn (
    Patient INT,
    Ward VARCHAR(100),
    Arrived DATE NOT NULL,
    Departed DATE,
    PRIMARY KEY (Patient, Ward, Arrived),
    FOREIGN KEY (Patient) REFERENCES Patients(Id),
    FOREIGN KEY (Ward) REFERENCES Wards(Name)
);

-- 8) WorksAt (junction: Doctors <-> Departments, M:N)
CREATE TABLE WorksAt (
    Doctor VARCHAR(100),
    Department VARCHAR(100),
    PRIMARY KEY (Doctor, Department),
    FOREIGN KEY (Doctor) REFERENCES Doctors(Name),
    FOREIGN KEY (Department) REFERENCES Departments(Name)
);

SELECT 'Hospital database schema created!' AS Status;

## Insert Sample Data

## Sample Data: Hospital Tables

### Hospitals
| Name |
|------|
| City Hospital |
| General Hospital |

### Buildings
| Name | Address | RunBy |
|------|---------|-------|
| Main Building | Main Street | City Hospital |
| Annex | Annex Lane | City Hospital |
| North Wing | North Avenue | General Hospital |
| The Alexander Fleming Building | Imperial College Rd | General Hospital |

### Departments
| Name | PartOf | Specialisation |
|------|--------|----------------|
| Orthopedics | City Hospital | Musculoskeletal |
| Accident & Emergency | City Hospital | Acute Care |
| ENT | General Hospital | Ear/Nose/Throat |

### Wards
| Name | LocatedIn | OperatedBy |
|------|-----------|------------|
| Ward A | Main Building | Accident & Emergency |
| Orthopedics Ward | Main Building | Orthopedics |
| Ward B | North Wing | ENT |
| Fleming Ward | The Alexander Fleming Building | ENT |

### Doctors
| Name |
|------|
| Song Ci |
| Neha Kapoor |

### Patients
| Id | Name | DoB | TreatedBy |
|----|------|-----|-----------|
| 100 | Neha Ahuja | 1990-05-12 | Song Ci |
| 101 | John Smith | 1985-03-22 | Neha Kapoor |

### StayIn (Patient Ward Stays)
| Patient | Ward | Arrived | Departed |
|---------|------|---------|----------|
| 100 | Ward A | 2023-08-01 | 2023-08-15 |
| 100 | Fleming Ward | 2023-09-01 | 2023-09-10 |
| 101 | Orthopedics Ward | 2023-08-05 | 2023-08-10 |

### WorksAt (Doctor-Department Assignments)
| Doctor | Department |
|--------|------------|
| Song Ci | Orthopedics |
| Song Ci | ENT |
| Neha Kapoor | Accident & Emergency |

In [None]:
%%sql
-- Hospitals
INSERT INTO Hospitals (Name) VALUES
    ('City Hospital'),
    ('General Hospital');

-- Buildings
INSERT INTO Buildings (Name, Address, RunBy) VALUES
    ('Main Building', 'Main Street', 'City Hospital'),
    ('Annex', 'Annex Lane', 'City Hospital'),
    ('North Wing', 'North Avenue', 'General Hospital'),
    ('The Alexander Fleming Building', 'Imperial College Rd', 'General Hospital');

-- Departments
INSERT INTO Departments (Name, PartOf, Specialisation) VALUES
    ('Orthopedics', 'City Hospital', 'Musculoskeletal'),
    ('Accident & Emergency', 'City Hospital', 'Acute Care'),
    ('ENT', 'General Hospital', 'Ear/Nose/Throat');

-- Wards
INSERT INTO Wards (Name, LocatedIn, OperatedBy) VALUES
    ('Ward A', 'Main Building', 'Accident & Emergency'),
    ('Orthopedics Ward', 'Main Building', 'Orthopedics'),
    ('Ward B', 'North Wing', 'ENT'),
    ('Fleming Ward', 'The Alexander Fleming Building', 'ENT');

-- Doctors
INSERT INTO Doctors (Name) VALUES
    ('Song Ci'),
    ('Neha Kapoor');

-- Patients
INSERT INTO Patients (Id, Name, DoB, TreatedBy) VALUES
    (100, 'Neha Ahuja', '1990-05-12', 'Song Ci'),
    (101, 'John Smith', '1985-03-22', 'Neha Kapoor');

-- Patient ward stays
INSERT INTO StayIn (Patient, Ward, Arrived, Departed) VALUES
    (100, 'Ward A', '2023-08-01', '2023-08-15'),
    (100, 'Fleming Ward', '2023-09-01', '2023-09-10'),
    (101, 'Orthopedics Ward', '2023-08-05', '2023-08-10');

-- Doctor department assignments
INSERT INTO WorksAt (Doctor, Department) VALUES
    ('Song Ci', 'Orthopedics'),
    ('Song Ci', 'ENT'),
    ('Neha Kapoor', 'Accident & Emergency');

SELECT 'Sample data inserted!' AS Status;

## Q4(e): Practice Queries

### (i) Which building did patient Neha Ahuja stay in?

In [None]:
%%sql
-- Write your query here:


### (ii) Which hospital was responsible for Neha Ahuja's stay?

In [None]:
%%sql
-- Write your query here:


### (iii) In which wards are Orthopedics patients housed?

In [None]:
%%sql
-- Write your query here:


### (iv) Which hospitals does doctor Song Ci work in?

In [None]:
%%sql
-- Write your query here:


### (v) What departments does the hospital have that contains 'The Alexander Fleming Building'?

In [None]:
%%sql
-- Write your query here:


### (vi) Which doctor treated Neha Ahuja?

In [None]:
%%sql
-- Write your query here:


---

# Done!

Check your answers against the **solution sheet**.