<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/march-2023/notebook-march-2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 March 2023 - Practice Notebook

This notebook allows you to practice the March 2023 exam questions.

**Exam Structure:**
- Section A: MCQs (taken separately on VLE)
- Section B: Answer 2 of 3 questions - 60 marks
  - Q2: Analyzing OpenDocument Format (ODF) and RelaxNG Schema
  - Q3: MusicBrainz / Linked Data
  - Q4: Enhancing an ER Model for 16th-Century Music Records

**Instructions:**
1. Run the Setup cells first
2. Fill in your answers in the empty cells
3. Compare with the solutions notebook when done

---

# 1. Environment Setup

Run these cells first to set up MySQL, XML tools, and RDF libraries.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === xmllint Setup (for XML/XPath exercises) ===
!apt -y -qq install libxml2-utils > /dev/null

# === rapper Setup (for RDF/Turtle validation) ===
!apt -y -qq install raptor2-utils > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml rdflib

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")
print("rapper ready!")

---

# Question 2: Analyzing OpenDocument Format (ODF) and RelaxNG Schema [30 marks]

## Context

An extract from an ODF word processing document is shown below:

In [None]:
%%writefile odf_extract.xml
<?xml version="1.0" encoding="UTF-8"?>
<office:text xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
             xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
  <text:p>Introduction to Data Structures</text:p>
  <text:list>
    <text:list-item>
      <text:p>Trees</text:p>
    </text:list-item>
    <text:list-item>
      <text:p>Graphs</text:p>
    </text:list-item>
    <text:list-item>
      <text:p>Relations</text:p>
    </text:list-item>
  </text:list>
</office:text>

### RelaxNG Schema Snippet

```xml
<define name="text-list">
  <element name="text:list">
    <ref name="text-list-attr"/>
    <optional>
      <ref name="text-list-header"/>
    </optional>
    <zeroOrMore>
      <ref name="text-list-item"/>
    </zeroOrMore>
  </element>
</define>
```

## Q2(a): What language is this encoded in? [1 mark]

In [None]:
# Q2(a) YOUR ANSWER:


## Q2(b): What data structure does it use? [1 mark]

In [None]:
# Q2(b) YOUR ANSWER:


## Q2(c): List the two namespaces [2 marks]

**Question:** List the two namespaces that this document uses.

In [None]:
# Q2(c) YOUR ANSWER:
# 1.
# 2.

## Q2(d): XPath expressions [4 marks]

**Question:** What would the XPath expression `//text:list-item/text:p` return? Would it be different from `//text:list//text:p`?

In [None]:
# Q2(d) YOUR ANSWER:


In [None]:
# Test XPath expressions with lxml
from lxml import etree

doc = etree.parse('odf_extract.xml')
namespaces = {
    'office': 'urn:oasis:names:tc:opendocument:xmlns:office:1.0',
    'text': 'urn:oasis:names:tc:opendocument:xmlns:text:1.0'
}

# Test //text:list-item/text:p
result1 = doc.xpath('//text:list-item/text:p/text()', namespaces=namespaces)
print("//text:list-item/text:p:", result1)

# Test //text:list//text:p
result2 = doc.xpath('//text:list//text:p/text()', namespaces=namespaces)
print("//text:list//text:p:", result2)

## Q2(e): Well-formedness [2 marks]

**Question:** How does the RelaxNG schema code help us assess if the document above is **well-formed**?

In [None]:
# Q2(e) YOUR ANSWER:


## Q2(f): Validity [2 marks]

**Question:** How does the RelaxNG schema code help us assess if the document above is **valid**?

In [None]:
# Q2(f) YOUR ANSWER:


## Q2(g): Schema relevance [2 marks]

**Question:** Which part or parts of the document is the RelaxNG schema snippet relevant to?

In [None]:
# Q2(g) YOUR ANSWER:


## Q2(h): Invalid element example [3 marks]

**Question:** Give an example of an element that would not be valid given this schema code (assume `text-list-attr` only defines attributes).

In [None]:
# Q2(h) YOUR ANSWER - Write example XML that would be invalid:


## Q2(i): Compare XML vs Relational [13 marks]

**Question:** Assess the suitability of this data structure for encoding word processing documents. What advantages or disadvantages would a relational model bring?

In [None]:
# Q2(i) YOUR ANSWER:
# XML advantages:
#
# XML disadvantages:
#
# Relational model advantages:
#
# Relational model disadvantages:
#
# Conclusion:


---

# Question 3: MusicBrainz / Linked Data [30 marks]

## Context

RDF/Turtle data describing a music group (BTS) with properties like `foundingDate`, `schema:member`, etc.

In [None]:
%%writefile musicbrainz.ttl
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix mba: <http://musicbrainz.org/artist/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mba:9fe8e-ba27-4859-bb8c-2f255f346853
    a schema:MusicGroup ;
    schema:name "BTS"@en ;
    schema:foundingDate "2013-06-12"^^xsd:date ;
    schema:member [
        a schema:OrganizationRole ;
        schema:startDate "2013-06-12"^^xsd:date ;
        schema:member mba:person-jin
    ] ;
    schema:member [
        a schema:OrganizationRole ;
        schema:startDate "2013-06-12"^^xsd:date ;
        schema:member mba:person-suga
    ] .

mba:person-jin
    a schema:Person, schema:MusicGroup ;
    schema:name "JIN"@en .

mba:person-suga
    a schema:Person ;
    schema:name "SUGA"@en .

In [None]:
# Validate the Turtle
!rapper -i turtle -c musicbrainz.ttl

## Q3(a): Accept header type [1 mark]

**Question:** What (approximately) was the type that we put into the `Accept` header to retrieve this data?

In [None]:
# Q3(a) YOUR ANSWER:


## Q3(b): Full URL of predicate [1 mark]

**Question:** What is the full URL of the predicate `schema:member` in this context?

In [None]:
# Q3(b) YOUR ANSWER:


## Q3(c): Band member count [1 mark]

**Question:** How many band members of BTS are listed in this snippet?

In [None]:
# Q3(c) YOUR ANSWER:


## Q3(d): Comment on schema:member usage [3 marks]

**Question:** Comment on the way the `schema:member` predicate is used in this snippet.

In [None]:
# Q3(d) YOUR ANSWER:


## Q3(e): Types for "JIN" [1 mark]

**Question:** What type(s) are associated with the entity having `schema:name` of "JIN"?

In [None]:
# Q3(e) YOUR ANSWER:


In [None]:
# Verify with rdflib
import rdflib

g = rdflib.Graph()
g.parse('musicbrainz.ttl', format='turtle')

query = """
PREFIX schema: <http://schema.org/>
SELECT ?type WHERE {
  ?person schema:name "JIN"@en .
  ?person a ?type .
}
"""

for row in g.query(query):
    print(row)

## Q3(f): SPARQL prefixes [1 mark]

**Question:** What prefixes need to be defined for this SPARQL query to work?

```sparql
SELECT ?a ?b WHERE {
  mba:9fe8e-ba27-4859-bb8c-2f255f346853 schema:member ?c .
  ?c schema:startDate ?b ;
     schema:member ?d .
  ?d schema:name ?a .
}
```

In [None]:
# Q3(f) YOUR ANSWER - Write the PREFIX declarations:


## Q3(g): Query results [2 marks]

**Question:** What would the query above return?

In [None]:
# Q3(g) YOUR ANSWER:


In [None]:
# Verify with rdflib
query = """
PREFIX mba: <http://musicbrainz.org/artist/>
PREFIX schema: <http://schema.org/>

SELECT ?a ?b WHERE {
  mba:9fe8e-ba27-4859-bb8c-2f255f346853 schema:member ?c .
  ?c schema:startDate ?b ;
     schema:member ?d .
  ?d schema:name ?a .
}
"""

for row in g.query(query):
    print(f"Name: {row.a}, StartDate: {row.b}")

## Q3(h): ER diagram [6 marks]

**Question:** This data represents an export from a relational database. Construct an ER diagram that could accommodate the instance data above.

In [None]:
# Q3(h) YOUR ANSWER - Describe or draw the E/R diagram:
# Entities:
#
# Relationships:
#

## Q3(i): CREATE TABLE commands [4 marks]

**Question:** Give the CREATE TABLE commands for two tables based on your ER diagram.

In [None]:
%%sql
-- Q3(i) YOUR SQL:


## Q3(j): Data integrity query [5 marks]

**Question:** Suggest a MySQL query to check whether any band member in the database is recorded as joining before the founding date of their band.

In [None]:
%%sql
-- Q3(j) YOUR SQL:


## Q3(k): Database dump vs Linked Data [5 marks]

**Question:** MusicBrainz makes their data available as both a downloadable database dump and as Linked Data. What are the benefits and disadvantages of each approach?

In [None]:
# Q3(k) YOUR ANSWER:
# Database dump:
#   Pros:
#   Cons:
#
# Linked Data:
#   Pros:
#   Cons:

---

# Question 4: Enhancing an ER Model for 16th-Century Music Records [30 marks]

## Context

An existing ER model for a database of 16th-century European music books needs enhancement. The database tracks:
- Books containing music
- Pages within books
- Pieces of music
- Lines of music on pages

## Q4(a): Order and coordinates [3 marks]

**Question:** This model doesn't allow storing the order or coordinates for lines of music on a page. How could this be fixed?

In [None]:
# Q4(a) YOUR ANSWER:


## Q4(b): Tablebook format [8 marks]

**Question:** Some books are published in tablebook format, with multiple parts/voices to a piece and page regions with lines in different directions. How would you add these aspects to the model?

In [None]:
# Q4(b) YOUR ANSWER - New entities and relationships:


## Q4(c): Tables, PKs, and FKs [7 marks]

**Question:** List the tables, primary keys, and foreign keys for a relational implementation of your modified model.

In [None]:
# Q4(c) YOUR ANSWER:
# Table 1:
#   PK:
#   FKs:
#

In [None]:
%%sql
-- Q4(c) Create your tables:
DROP TABLE IF EXISTS Line;
DROP TABLE IF EXISTS Region;
DROP TABLE IF EXISTS Page;
DROP TABLE IF EXISTS Piece;
DROP TABLE IF EXISTS InstrumentOrVoicePart;
DROP TABLE IF EXISTS Book;

-- Add your CREATE TABLE statements here:


## Q4(d): Line count query [5 marks]

**Question:** Give a query to list pieces with the total number of lines of music that they occupy.

In [None]:
%%sql
-- Q4(d) YOUR SQL:


## Q4(e): Compare with another model [7 marks]

**Question:** Assess the suitability of this data structure for a relational model, and compare it with ONE other database model (XML-based, document-based, or Linked Data graph).

In [None]:
# Q4(e) YOUR ANSWER:
# Relational model assessment:
#   Pros:
#   Cons:
#
# Comparison with [XML/Document/Graph] model:
#   Pros:
#   Cons:
#
# Conclusion:

---

# Done!

Check your answers against the **solution sheet**.