<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/march-2023/notebook-march-2023-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Environment Setup

Run these cells first to set up MySQL, xmllint, rapper, and rdflib.

In [None]:
# === MySQL Setup ===
!apt-get update -qq > /dev/null
!apt-get install -y -qq mysql-server > /dev/null
!service mysql start
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === SQL Magic ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0
%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

# === XPath Magic (cellspell) ===
!apt-get install -y libxml2-utils -qq > /dev/null
!pip install git+https://github.com/sreent/jupyter-query-magics.git -q
%load_ext cellspell.xpath

# === SPARQL Magic (cellspell) ===
!pip install "cellspell[sparql] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.sparql

---

# Question 2: Analyzing OpenDocument Format (ODF) and RelaxNG Schema [30 marks]

## Context

An extract from an ODF word processing document is shown below:

In [None]:
%%writefile odf_extract.xml
<?xml version="1.0" encoding="UTF-8"?>
<office:text xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
             xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
  <text:p>Introduction to Data Structures</text:p>
  <text:list>
    <text:list-item>
      <text:p>Trees</text:p>
    </text:list-item>
    <text:list-item>
      <text:p>Graphs</text:p>
    </text:list-item>
    <text:list-item>
      <text:p>Relations</text:p>
    </text:list-item>
  </text:list>
</office:text>

In [None]:
%%writefile odf_text.rng
<?xml version="1.0" encoding="UTF-8"?>
<!-- Simplified RelaxNG schema for ODF text elements -->
<!-- Based on the schema snippet shown in the exam -->
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

  <start>
    <ref name="office-text"/>
  </start>

  <define name="office-text">
    <element name="office:text" ns="urn:oasis:names:tc:opendocument:xmlns:office:1.0">
      <zeroOrMore>
        <choice>
          <ref name="text-p"/>
          <ref name="text-list"/>
        </choice>
      </zeroOrMore>
    </element>
  </define>

  <define name="text-p">
    <element name="text:p" ns="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
      <optional>
        <attribute name="text:style-name"/>
      </optional>
      <text/>
    </element>
  </define>

  <!-- This is the schema snippet from the exam question -->
  <define name="text-list">
    <element name="text:list" ns="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
      <ref name="text-list-attr"/>
      <optional>
        <ref name="text-list-header"/>
      </optional>
      <zeroOrMore>
        <ref name="text-list-item"/>
      </zeroOrMore>
    </element>
  </define>

  <define name="text-list-attr">
    <optional>
      <attribute name="text:style-name"/>
    </optional>
    <optional>
      <attribute name="text:continue-numbering"/>
    </optional>
  </define>

  <define name="text-list-header">
    <element name="text:list-header" ns="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
      <zeroOrMore>
        <ref name="text-p"/>
      </zeroOrMore>
    </element>
  </define>

  <define name="text-list-item">
    <element name="text:list-item" ns="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
      <zeroOrMore>
        <choice>
          <ref name="text-p"/>
          <ref name="text-list"/>  <!-- Allows nested lists -->
        </choice>
      </zeroOrMore>
    </element>
  </define>

</grammar>

In [None]:
# Validate the ODF extract against the RelaxNG schema
print("=== Validating odf_extract.xml against odf_text.rng ===")
%xpath --rng odf_text.rng odf_extract.xml
print("VALID: odf_extract.xml passes schema validation!")

### RelaxNG Schema Snippet

```xml
<define name="text-list">
  <element name="text:list">
    <ref name="text-list-attr"/>
    <optional>
      <ref name="text-list-header"/>
    </optional>
    <zeroOrMore>
      <ref name="text-list-item"/>
    </zeroOrMore>
  </element>
</define>
```

## Q2(d): XPath expressions [4 marks]

**`//text:list-item/text:p`:**
- Selects `<text:p>` elements that are **direct children** of `<text:list-item>`

**`//text:list//text:p`:**
- Selects **all** `<text:p>` elements that are **descendants** of `<text:list>` (at any depth)

**In this example:** Both expressions return the same three items (`Trees`, `Graphs`, `Relations`) because each `<text:p>` is already a direct child of `<text:list-item>`. In a more complex or nested structure, these expressions could yield different results.

In [None]:
%%xpath --ns office=urn:oasis:names:tc:opendocument:xmlns:office:1.0 --ns text=urn:oasis:names:tc:opendocument:xmlns:text:1.0 odf_extract.xml


<details>
<summary>Click to reveal solution</summary>

```xml
%%xpath --ns office=urn:oasis:names:tc:opendocument:xmlns:office:1.0 --ns text=urn:oasis:names:tc:opendocument:xmlns:text:1.0 odf_extract.xml
//text:list-item/text:p/text()
```

```xml
%%xpath --ns office=urn:oasis:names:tc:opendocument:xmlns:office:1.0 --ns text=urn:oasis:names:tc:opendocument:xmlns:text:1.0 odf_extract.xml
//text:list//text:p/text()
```

</details>


## Q2(h): Invalid element example [3 marks]

```xml
<text:list>
  <text:list-item>Item Content</text:list-item>
  <text:invalid-element>Invalid Content</text:invalid-element>
</text:list>
```

`<text:invalid-element>` is not defined in the schema, so the document fails validation.

In [None]:
# Q2(h) Demonstration: Create and validate an INVALID ODF document
%%writefile odf_invalid.xml
<?xml version="1.0" encoding="UTF-8"?>
<office:text xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
             xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
  <text:list>
    <text:invalid-element>This element is not allowed!</text:invalid-element>
    <text:list-item>
      <text:p>Valid item</text:p>
    </text:list-item>
  </text:list>
</office:text>

In [None]:
# TODO: Write your solution here


<details>
<summary>Click to reveal solution</summary>

```python
# Q2(h) Validate the INVALID ODF - should show an error
print("=== Validating odf_invalid.xml - should FAIL ===")
%xpath --rng odf_text.rng odf_invalid.xml
print("INVALID: text:invalid-element is not defined in the schema!")
```

</details>


---

# Question 3: MusicBrainz / Linked Data [30 marks]

## Context

RDF/Turtle data describing a music group (BTS):

In [None]:
%%writefile musicbrainz.ttl
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix mba: <http://musicbrainz.org/artist/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mba:9fe8e-ba27-4859-bb8c-2f255f346853
    a schema:MusicGroup ;
    schema:name "BTS"@en ;
    schema:foundingDate "2013-06-12"^^xsd:date ;
    schema:member [
        a schema:OrganizationRole ;
        schema:startDate "2013-06-12"^^xsd:date ;
        schema:member mba:person-jin
    ] ;
    schema:member [
        a schema:OrganizationRole ;
        schema:startDate "2013-06-12"^^xsd:date ;
        schema:member mba:person-suga
    ] .

mba:person-jin
    a schema:Person, schema:MusicGroup ;
    schema:name "JIN"@en .

mba:person-suga
    a schema:Person ;
    schema:name "SUGA"@en .

In [None]:
%%sparql --file musicbrainz.ttl
SELECT (COUNT(*) AS ?triples) WHERE { ?s ?p ?o }

## Q3(e): Types for "JIN" [1 mark]

JIN is typed as both:
- `schema:Person`
- `schema:MusicGroup`

(This is due to how MusicBrainz RDF is auto-generated)

In [None]:
%%sparql --file musicbrainz.ttl
# TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sparql
%%sparql --file musicbrainz.ttl
PREFIX schema: <http://schema.org/>
SELECT ?type WHERE {
  ?person schema:name "JIN"@en .
  ?person a ?type .
}
```

</details>


## Q3(g): Query results [2 marks]

The query returns pairs of `(?a, ?b)` where:
- `?a` = **member name** (e.g., "JIN", "SUGA")
- `?b` = **startDate** from the membership role (e.g., "2013-06-12")

Essentially: each band member's name plus when they joined.

In [None]:
%%sparql --file musicbrainz.ttl
# TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sparql
%%sparql --file musicbrainz.ttl
PREFIX mba: <http://musicbrainz.org/artist/>
PREFIX schema: <http://schema.org/>

SELECT ?a ?b WHERE {
  mba:9fe8e-ba27-4859-bb8c-2f255f346853 schema:member ?c .
  ?c schema:startDate ?b ;
     schema:member ?d .
  ?d schema:name ?a .
}
```

</details>


## Q3(i): CREATE TABLE commands [4 marks]


In [None]:
%%sql
DROP TABLE IF EXISTS Membership;
DROP TABLE IF EXISTS Artist;

CREATE TABLE Artist (
  ArtistId     INT PRIMARY KEY,
  Name         VARCHAR(100) NOT NULL,
  Type         VARCHAR(20)  NOT NULL,  -- 'Person' or 'MusicGroup'
  FoundingDate DATE
);

CREATE TABLE Membership (
  BandId    INT NOT NULL,
  MemberId  INT NOT NULL,
  StartDate DATE,
  RoleName  VARCHAR(100),
  PRIMARY KEY (BandId, MemberId),
  FOREIGN KEY (BandId)   REFERENCES Artist(ArtistId),
  FOREIGN KEY (MemberId) REFERENCES Artist(ArtistId)
);

SELECT 'Tables created!' AS Status;

In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
-- Insert sample data
INSERT INTO Artist VALUES (1, 'BTS', 'MusicGroup', '2013-06-12');
INSERT INTO Artist VALUES (2, 'JIN', 'Person', NULL);
INSERT INTO Artist VALUES (3, 'SUGA', 'Person', NULL);

INSERT INTO Membership VALUES (1, 2, '2013-06-12', 'Member');
INSERT INTO Membership VALUES (1, 3, '2013-06-12', 'Member');

SELECT * FROM Artist;
SELECT * FROM Membership;
```

</details>


## Q3(j): Data integrity query [5 marks]

Query to find members who joined before the band was founded:

In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
SELECT aMember.Name AS MemberName,
       aBand.Name   AS BandName,
       m.StartDate,
       aBand.FoundingDate
FROM Membership m
INNER JOIN Artist aBand   ON m.BandId   = aBand.ArtistId
INNER JOIN Artist aMember ON m.MemberId = aMember.ArtistId
WHERE m.StartDate < aBand.FoundingDate;
```

</details>


---

# Question 4: Enhancing an ER Model for 16th-Century Music Records [30 marks]

## Context

An existing ER model for a database of 16th-century European music books needs enhancement.

## Q4(c): Tables, PKs, and FKs [7 marks]

| Table | Primary Key | Foreign Keys |
|-------|-------------|---------------|
| **Piece** | PieceId | - |
| **Page** | PageId | BookId → Book(BookId) |
| **Region** | RegionId | PageId → Page(PageId) |
| **InstrumentOrVoicePart** | PartId | - |
| **Line** | LineId | PieceId, PageId, RegionId, PartId |

In [None]:
%%sql
DROP TABLE IF EXISTS Line;
DROP TABLE IF EXISTS Region;
DROP TABLE IF EXISTS Page;
DROP TABLE IF EXISTS Piece;
DROP TABLE IF EXISTS InstrumentOrVoicePart;
DROP TABLE IF EXISTS Book;

CREATE TABLE Book (
    BookId INT PRIMARY KEY AUTO_INCREMENT,
    Title VARCHAR(200) NOT NULL
);

CREATE TABLE Piece (
    PieceId INT PRIMARY KEY AUTO_INCREMENT,
    Title VARCHAR(200) NOT NULL,
    Composer VARCHAR(100)
);

CREATE TABLE Page (
    PageId INT PRIMARY KEY AUTO_INCREMENT,
    BookId INT,
    PageNumber INT,
    FOREIGN KEY (BookId) REFERENCES Book(BookId)
);

CREATE TABLE Region (
    RegionId INT PRIMARY KEY AUTO_INCREMENT,
    PageId INT,
    Description VARCHAR(100),
    Orientation VARCHAR(20),
    FOREIGN KEY (PageId) REFERENCES Page(PageId)
);

CREATE TABLE InstrumentOrVoicePart (
    PartId INT PRIMARY KEY AUTO_INCREMENT,
    PartName VARCHAR(50) NOT NULL
);

CREATE TABLE Line (
    LineId INT PRIMARY KEY AUTO_INCREMENT,
    PieceId INT,
    PageId INT,
    RegionId INT,
    PartId INT,
    LineOrder INT,
    XCoordinate FLOAT,
    YCoordinate FLOAT,
    FOREIGN KEY (PieceId) REFERENCES Piece(PieceId),
    FOREIGN KEY (PageId) REFERENCES Page(PageId),
    FOREIGN KEY (RegionId) REFERENCES Region(RegionId),
    FOREIGN KEY (PartId) REFERENCES InstrumentOrVoicePart(PartId)
);

SELECT 'Tables created!' AS Status;

In [None]:
%%sql
-- Insert sample data
INSERT INTO Book VALUES (1, 'Cantiones Sacrae 1575');
INSERT INTO Piece VALUES (1, 'Ave Maria', 'Palestrina'), (2, 'Kyrie', 'Byrd');
INSERT INTO Page VALUES (1, 1, 1), (2, 1, 2);
INSERT INTO Region VALUES (1, 1, 'Top region', 'horizontal'), (2, 1, 'Bottom region', 'vertical');
INSERT INTO InstrumentOrVoicePart VALUES (1, 'Soprano'), (2, 'Alto'), (3, 'Tenor');

INSERT INTO Line VALUES (1, 1, 1, 1, 1, 1, 10.0, 50.0);
INSERT INTO Line VALUES (2, 1, 1, 1, 2, 2, 10.0, 100.0);
INSERT INTO Line VALUES (3, 1, 1, 2, 3, 1, 200.0, 50.0);
INSERT INTO Line VALUES (4, 2, 2, NULL, 1, 1, 10.0, 50.0);
INSERT INTO Line VALUES (5, 2, 2, NULL, 2, 2, 10.0, 100.0);

SELECT 'Sample data inserted!' AS Status;

## Q4(d): Line count query [5 marks]

Query to list pieces with total number of lines:

In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
SELECT p.Title, COUNT(*) AS TotalLines
FROM Piece p
INNER JOIN Line l ON p.PieceId = l.PieceId
GROUP BY p.PieceId, p.Title;
```

</details>
