<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/september-2022/notebook-september-2022-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, rapper, and rdflib.

In [None]:
# === MySQL Setup ===
!apt-get update -qq > /dev/null
!apt-get install -y -qq mysql-server > /dev/null
!service mysql start
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === SQL Magic ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0
%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

# === XPath Magic (cellspell) ===
!apt-get install -y libxml2-utils -qq > /dev/null
!pip install git+https://github.com/sreent/jupyter-query-magics.git -q
%load_ext cellspell.xpath

# === SPARQL Magic (cellspell) ===
!pip install "cellspell[sparql] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.sparql

In [None]:
# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

!mongo --quiet --eval 'print("MongoDB ready!")'

# === MongoDB Magic (cellspell) ===
!pip install "cellspell[mongodb] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.mongodb
%mongodb mongodb://localhost:27017/exam_db

In [None]:
%%mongodb
db.actors.drop()

In [None]:
%%mongodb
db.actors.insertMany([{"name": "Marlon Brando", "dateOfBirth": ISODate("1924-04-03")}, {"name": "James Dean", "dateOfBirth": ISODate("1931-02-08")}, {"name": "Marilyn Monroe", "dateOfBirth": ISODate("1926-06-01")}, {"name": "Audrey Hepburn", "dateOfBirth": ISODate("1929-05-04")}, {"name": "Jack Nicholson", "dateOfBirth": ISODate("1937-04-22")}, {"name": "Robert De Niro", "dateOfBirth": ISODate("1943-08-17")}, {"name": "Meryl Streep", "dateOfBirth": ISODate("1949-06-22")}, {"name": "Tom Hanks", "dateOfBirth": ISODate("1956-07-09")}, {"name": "Leonardo DiCaprio", "dateOfBirth": ISODate("1974-11-11")}, {"name": "Scarlett Johansson", "dateOfBirth": ISODate("1984-11-22")}])

In [None]:
%%mongodb
db.actors.find({"dateOfBirth": {"$lt": ISODate("1957-01-01")}})

In [None]:
%%mongodb
// Try your own query here


## Q1(j) DTD Validation Demonstration

Let's validate the RecipeML DTD answers by actually running validation tests:

In [None]:
%%writefile recipeml.dtd
<!-- RecipeML DTD (simplified for exam practice) -->
<!-- Based on the DTD shown in Q1(j) -->

<!ELEMENT recipe (head, description*, equipment?, ingredients, directions, nutrition?, diet-exchanges?)>

<!-- Required elements (no modifier = exactly one) -->
<!ELEMENT head (#PCDATA)>
<!ELEMENT ingredients (#PCDATA)>
<!ELEMENT directions (#PCDATA)>

<!-- Optional elements (? = zero or one) -->
<!ELEMENT equipment (#PCDATA)>
<!ELEMENT nutrition (#PCDATA)>
<!ELEMENT diet-exchanges (#PCDATA)>

<!-- Zero or more (* = zero or more) -->
<!ELEMENT description (#PCDATA)>

In [None]:
# Valid recipe - demonstrates Q1(j)(i): "must have exactly one ingredients"
%%writefile recipe_valid.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE recipe SYSTEM "recipeml.dtd">
<recipe>
  <head>Chocolate Chip Cookies</head>
  <description>A classic family recipe.</description>
  <ingredients>2 cups flour, 1 cup sugar, 1 cup chocolate chips</ingredients>
  <directions>Mix ingredients. Bake at 350F for 12 minutes.</directions>
</recipe>

In [None]:
# Validate the valid recipe
print("=== Testing Q1(j)(i): Recipe MUST have exactly one ingredients ===")
%xpath --dtd recipeml.dtd recipe_valid.xml

In [None]:
# INVALID recipe - wrong element order (demonstrates Q1(j)(ii): order matters)
%%writefile recipe_wrong_order.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE recipe SYSTEM "recipeml.dtd">
<recipe>
  <head>Bad Recipe</head>
  <directions>Cook somehow!</directions>
  <ingredients>Some stuff</ingredients>
</recipe>

In [None]:
# Validate - should FAIL (Q1(j)(ii): ingredients must come BEFORE directions)
print("=== Testing Q1(j)(ii): Order matters in DTD - ingredients before directions ===")
%xpath --dtd recipeml.dtd recipe_wrong_order.xml

In [None]:
# INVALID recipe - multiple ingredients (demonstrates Q1(j)(v) is FALSE)
%%writefile recipe_multi_ingredients.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE recipe SYSTEM "recipeml.dtd">
<recipe>
  <head>Over-specified Recipe</head>
  <ingredients>First batch of ingredients</ingredients>
  <ingredients>Second batch of ingredients</ingredients>
  <directions>Mix everything!</directions>
</recipe>

In [None]:
# Validate - should FAIL (Q1(j)(v) is FALSE: can only have ONE ingredients)
print("=== Testing Q1(j)(v) is FALSE: Cannot have multiple ingredients ===")
%xpath --dtd recipeml.dtd recipe_multi_ingredients.xml

---

# Question 2: Database Design and Querying [30 marks]

## Database Setup

In [None]:
%%sql
DROP TABLE IF EXISTS Tests;
DROP TABLE IF EXISTS Students;

CREATE TABLE Students (
    Id INT PRIMARY KEY,
    GivenName VARCHAR(50) NOT NULL,
    FamilyName VARCHAR(50) NOT NULL,
    Gender VARCHAR(10) NOT NULL,
    BirthDate DATE NOT NULL,
    School VARCHAR(130),
    City VARCHAR(130)
);

CREATE TABLE Tests (
    TestId INT PRIMARY KEY,
    StudentId INT,
    TestDate DATE,
    Score DOUBLE,
    FOREIGN KEY (StudentId) REFERENCES Students(Id)
);

INSERT INTO Students VALUES
(1, 'Alice', 'Smith', 'F', '2005-05-10', 'Birmingham High', 'Birmingham'),
(2, 'Bob', 'Jones', 'M', '2005-06-12', 'Berlin Academy', 'Berlin'),
(3, 'Charlie', 'Brown', 'M', '2004-03-20', 'Seoul International', 'Seoul'),
(4, 'Diana', 'Miles', 'F', '2005-01-01', 'Birmingham High', 'Birmingham');

INSERT INTO Tests VALUES
(101, 1, '2019-01-10', 50.5),
(102, 1, '2019-09-10', 55.0),
(103, 2, '2019-01-10', 80.9),
(104, 2, '2019-09-15', 77.2),
(105, 3, '2019-05-01', 91.0),
(106, 4, '2019-01-10', 63.0);

SELECT 'Database ready!' AS Status;

## Q2(a): Which aggregate function is used? [1 mark]

### Solution

**Q2(a) SOLUTION**

Answer: AVG()

The AVG() function calculates the arithmetic mean of Score values.

In the query:
  SELECT AVG(Score) AS Average, ...

Other common aggregate functions:
- COUNT() - count rows
- SUM() - sum values
- MIN() - minimum value
- MAX() - maximum value

In [None]:
%%sql
-- Demonstrate AVG function
SELECT AVG(Score) AS AverageScore,
       YEAR(TestDate) AS TestYear,
       S.Gender,
       S.City
FROM Tests T
INNER JOIN Students S ON T.StudentId = S.Id
GROUP BY YEAR(TestDate), S.Gender, S.City;

In [None]:
%%sql
-- Try your own query here


## Q2(d): Aggregated data access [4 marks]

### Solution

**Q2(d) SOLUTION**

Approach: Create a VIEW that exposes only aggregated data.

This protects individual student records while allowing research.

In [None]:
%%sql
-- Q2(d) SOLUTION: Create aggregated view
DROP VIEW IF EXISTS AggregatedTestData;

CREATE VIEW AggregatedTestData AS
SELECT S.Gender,
       S.City,
       AVG(T.Score) AS AvgScore,
       COUNT(*) AS SampleSize,
       YEAR(T.TestDate) AS TestYear
FROM Tests T
INNER JOIN Students S ON T.StudentId = S.Id
GROUP BY S.Gender, S.City, YEAR(T.TestDate)
HAVING COUNT(*) >= 2;  -- Minimum group size for privacy

SELECT 'View created!' AS Status;

In [None]:
%%sql
-- View the aggregated data
SELECT * FROM AggregatedTestData;

In [None]:
%%sql
-- Try your own query here


**Grant access only to the view**

Then grant access only to the view:

GRANT SELECT ON exam_db.AggregatedTestData TO 'researcher'@'localhost';

Benefits:
- Researcher cannot see individual student records
- Only aggregated statistics visible
- HAVING clause ensures minimum group sizes

## Q2(f): Student table problems [8 marks]

### Solution

**Q2(f) SOLUTION**

Problems with the Student table and solutions:

| Problem                  | Issue                           | Resolution                        |
|--------------------------|---------------------------------|-----------------------------------|
| VARCHAR(25) as PK        | Inefficient indexing            | Use INT AUTO_INCREMENT            |
| Gender ENUM('M','F')     | Binary-only, not inclusive      | Use VARCHAR or add more options   |
| School as free text      | Inconsistent entries            | Create Schools table, use FK      |
| City as free text        | Duplicates, typos possible      | Create Cities table, use FK       |
| No referential integrity | Can't ensure valid school/city  | Add foreign key constraints       |
| School/City can be NULL  | Data quality issues             | Consider NOT NULL or defaults     |

In [None]:
%%sql
-- Q2(f) SOLUTION: Improved normalized schema
DROP TABLE IF EXISTS TestsNorm;
DROP TABLE IF EXISTS StudentsNorm;
DROP TABLE IF EXISTS Schools;
DROP TABLE IF EXISTS Cities;

CREATE TABLE Cities (
    CityId INT PRIMARY KEY AUTO_INCREMENT,
    CityName VARCHAR(100) NOT NULL,
    Country VARCHAR(50)
);

CREATE TABLE Schools (
    SchoolId INT PRIMARY KEY AUTO_INCREMENT,
    SchoolName VARCHAR(130) NOT NULL,
    CityId INT,
    FOREIGN KEY (CityId) REFERENCES Cities(CityId)
);

CREATE TABLE StudentsNorm (
    Id INT PRIMARY KEY AUTO_INCREMENT,
    ExternalId VARCHAR(25) UNIQUE,
    GivenName VARCHAR(80) NOT NULL,
    FamilyName VARCHAR(80) NOT NULL,
    Gender VARCHAR(20),
    BirthDate DATE NOT NULL,
    SchoolId INT,
    CityId INT,
    FOREIGN KEY (SchoolId) REFERENCES Schools(SchoolId),
    FOREIGN KEY (CityId) REFERENCES Cities(CityId)
);

SELECT 'Normalized schema created!' AS Status;

---

# Question 3: XML, XPath, and Relational Models [30 marks]

## XML Setup

In [None]:
%%writefile manuscript.xml
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:id="manuscript_3945" xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader xmlns:tei="http://www.tei-c.org/ns/1.0">
    <fileDesc>
      <titleStmt>
        <title>Christ Church MS. 341</title>
        <title type="collection">Christ Church MSS.</title>
        <respStmt>
          <resp>Cataloguer</resp>
          <persName>Ralph Hanna</persName>
          <persName>David Rundle</persName>
        </respStmt>
      </titleStmt>
    </fileDesc>
  </teiHeader>
</TEI>

In [None]:
%%writefile msdesc.rng
<?xml version="1.0" encoding="UTF-8"?>
<!-- Simplified RelaxNG schema for TEI manuscript descriptions -->
<!-- Based on TEI P5 msdesc module - simplified for exam practice -->
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:tei="http://www.tei-c.org/ns/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

  <start>
    <ref name="TEI"/>
  </start>

  <define name="TEI">
    <element name="TEI" ns="http://www.tei-c.org/ns/1.0">
      <attribute name="xml:id"/>
      <ref name="teiHeader"/>
    </element>
  </define>

  <define name="teiHeader">
    <element name="teiHeader" ns="http://www.tei-c.org/ns/1.0">
      <optional>
        <attribute name="xmlns:tei"/>
      </optional>
      <ref name="fileDesc"/>
    </element>
  </define>

  <define name="fileDesc">
    <element name="fileDesc" ns="http://www.tei-c.org/ns/1.0">
      <ref name="titleStmt"/>
    </element>
  </define>

  <define name="titleStmt">
    <element name="titleStmt" ns="http://www.tei-c.org/ns/1.0">
      <!-- At least one title is REQUIRED -->
      <oneOrMore>
        <ref name="title"/>
      </oneOrMore>
      <!-- respStmt is OPTIONAL (zero or more) -->
      <zeroOrMore>
        <ref name="respStmt"/>
      </zeroOrMore>
    </element>
  </define>

  <define name="title">
    <element name="title" ns="http://www.tei-c.org/ns/1.0">
      <optional>
        <attribute name="type"/>
      </optional>
      <text/>
    </element>
  </define>

  <define name="respStmt">
    <element name="respStmt" ns="http://www.tei-c.org/ns/1.0">
      <ref name="resp"/>
      <oneOrMore>
        <ref name="persName"/>
      </oneOrMore>
    </element>
  </define>

  <define name="resp">
    <element name="resp" ns="http://www.tei-c.org/ns/1.0">
      <text/>
    </element>
  </define>

  <define name="persName">
    <element name="persName" ns="http://www.tei-c.org/ns/1.0">
      <text/>
    </element>
  </define>

</grammar>

In [None]:
# Validate manuscript.xml against the RelaxNG schema
print("=== Validating manuscript.xml against msdesc.rng ===")
%xpath --rng msdesc.rng manuscript.xml

## Q3(b): Well-formedness [3 marks]

### Solution

**Q3(b) SOLUTION**

Answer: YES, this fragment is well-formed.

Well-formedness requirements met:
1. Exactly one root element (<TEI>) ✓
2. All tags properly opened and closed ✓
3. Proper nesting (no overlapping tags) ✓
4. Attribute values in quotes ✓
5. Valid characters in element/attribute names ✓

Note: The exam question may show a truncated fragment.
If closing tags are missing, it would NOT be well-formed.

In [None]:
# Verify well-formedness
%xpath manuscript.xml

## Q3(c): XPath expression //fileDesc//title/@type [2 marks]

### Solution

**Q3(c) SOLUTION**

Answer: The query selects the 'type' attribute value from
<title> elements under <fileDesc>.

Result: "collection"

This is from: <title type="collection">Christ Church MSS.</title>

In [None]:
%%xpath --ns tei=http://www.tei-c.org/ns/1.0 manuscript.xml
//tei:fileDesc//tei:title/@type

In [None]:
%%xpath --ns tei=http://www.tei-c.org/ns/1.0 manuscript.xml
# Try your own XPath here

## Q3(d): XPath //resp[text()='Cataloguer']/../persName [2 marks]

### Solution

**Q3(d) SOLUTION**

Answer: All <persName> elements that are siblings of a <resp>
element containing "Cataloguer".

Result:
- <persName>Ralph Hanna</persName>
- <persName>David Rundle</persName>

How it works:
1. //resp[text()='Cataloguer'] - Find <resp> with text "Cataloguer"
2. /.. - Navigate UP to parent (<respStmt>)
3. /persName - Select <persName> children

In [None]:
%%xpath --ns tei=http://www.tei-c.org/ns/1.0 manuscript.xml
//tei:resp[text()='Cataloguer']/../tei:persName/text()

In [None]:
%%xpath --ns tei=http://www.tei-c.org/ns/1.0 manuscript.xml
# Try your own XPath here

## Q3(f): Relational model for manuscript contents [8 marks]

### Solution

**Q3(f) SOLUTION**

Problems with n="2" attribute in relational model:

1. ORDERING NOT IMPLICIT
   - Relational tables have no inherent row order
   - Must store sequence explicitly

2. ATTRIBUTE VS COLUMN
   - The n attribute needs explicit storage as a column

3. NESTED STRUCTURE
   - Manuscripts contain items which may have sub-items
   - Requires careful modeling

SOLUTION: Store sequence number explicitly

In [None]:
%%sql
-- Q3(f) SOLUTION: Relational schema for manuscript contents
DROP TABLE IF EXISTS ManuscriptItems;
DROP TABLE IF EXISTS Manuscripts;

CREATE TABLE Manuscripts (
    ManuscriptId VARCHAR(50) PRIMARY KEY,
    Title VARCHAR(200)
);

CREATE TABLE ManuscriptItems (
    ItemId INT PRIMARY KEY AUTO_INCREMENT,
    ManuscriptId VARCHAR(50),
    ItemNumber INT NOT NULL,  -- Explicit ordering
    Incipit TEXT,
    Explicit TEXT,
    Notes TEXT,
    FOREIGN KEY (ManuscriptId) REFERENCES Manuscripts(ManuscriptId),
    UNIQUE (ManuscriptId, ItemNumber)  -- No duplicate item numbers
);

-- Insert sample data
INSERT INTO Manuscripts VALUES ('manuscript_3945', 'Christ Church MS. 341');
INSERT INTO ManuscriptItems (ManuscriptId, ItemNumber, Incipit) VALUES
('manuscript_3945', 1, 'First textual item...'),
('manuscript_3945', 2, 'Seynt austyn sei in e secounde boke...');

SELECT 'Schema created!' AS Status;

In [None]:
%%sql
-- Query items in correct order
SELECT ManuscriptId, ItemNumber, Incipit
FROM ManuscriptItems
WHERE ManuscriptId = 'manuscript_3945'
ORDER BY ItemNumber;

In [None]:
%%sql
-- Try your own query here


## Q3(i) & Q3(j): Omitting elements [2 marks]

### Solution

**Q3(i) and Q3(j) SOLUTION**

Q3(i): If respStmt was omitted, would the XML be legal?

Answer:
- Well-formed: YES (if tags still properly closed)
- Valid: DEPENDS on schema
  - Schema shows <zeroOrMore><ref name="model.respLike"/></zeroOrMore>
  - This means respStmt is OPTIONAL, so likely still valid

---

Q3(j): If title elements were omitted, would the XML be legal?

Answer:
- Well-formed: YES (syntactically correct)
- Valid: NO
  - Schema shows <oneOrMore><ref name="title"/></oneOrMore>
  - At least one <title> is REQUIRED in <titleStmt>
  - Without it, validation fails

In [None]:
# Q3(i) VALIDATION TEST: XML without respStmt (should PASS - it's optional)
%%writefile manuscript_no_respstmt.xml
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:id="manuscript_3945" xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader xmlns:tei="http://www.tei-c.org/ns/1.0">
    <fileDesc>
      <titleStmt>
        <title>Christ Church MS. 341</title>
        <title type="collection">Christ Church MSS.</title>
        <!-- respStmt is OMITTED - but this is OK because it's optional -->
      </titleStmt>
    </fileDesc>
  </teiHeader>
</TEI>

In [None]:
print("=== Q3(i) Test: Validating XML without respStmt ===")
%xpath --rng msdesc.rng manuscript_no_respstmt.xml

In [None]:
# Q3(j) VALIDATION TEST: XML without title (should FAIL - title is required)
%%writefile manuscript_no_title.xml
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:id="manuscript_3945" xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader xmlns:tei="http://www.tei-c.org/ns/1.0">
    <fileDesc>
      <titleStmt>
        <!-- NO title elements - this violates the schema! -->
        <respStmt>
          <resp>Cataloguer</resp>
          <persName>Ralph Hanna</persName>
        </respStmt>
      </titleStmt>
    </fileDesc>
  </teiHeader>
</TEI>

In [None]:
print("=== Q3(j) Test: Validating XML without title elements ===")
%xpath --rng msdesc.rng manuscript_no_title.xml

---

# Question 4: RDF, Ontologies, and Linked Data [30 marks]

## RDF Setup

In [None]:
%%writefile annotation.ttl
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix myrdf: <http://example.org/> .
@prefix armadale: <https://literary-greats.com/WCollins/Armadale/> .

myrdf:anno-001 a oa:Annotation ;
    dcterms:created "2015-10-13T13:00:00+00:00"^^xsd:dateTime ;
    dcterms:creator myrdf:DL192 ;
    oa:hasBody [
        a oa:TextualBody ;
        rdf:value "Note the use of visual language here."
    ] ;
    oa:hasTarget [
        a oa:SpecificResource ;
        oa:hasSelector [
            a oa:TextPositionSelector ;
            oa:start 235 ;
            oa:end 300
        ] ;
        oa:hasSource armadale:Chapter3
    ] ;
    oa:motivatedBy oa:commenting .

myrdf:DL192 a foaf:Person ;
    foaf:name "David Lewis" .

In [None]:
%%sparql --file annotation.ttl
SELECT (COUNT(*) AS ?triples) WHERE { ?s ?p ?o }

## Q4(d): Fix the SPARQL query [7 marks]

### Solution

**Q4(d) SOLUTION - Show the broken and fixed queries**

BROKEN QUERY:

SELECT ?body ?creator
WHERE {
  ?annotation a oa:Annotation .
  ?creator ;
  oa:hasBody body .
  hasSource armadale:Chapter3 }

PROBLEMS:
1. Missing PREFIX declarations
2. ?creator has no predicate connecting it
3. 'body' should be '?body' (variable)
4. 'hasSource' should be 'oa:hasSource'
5. Need to navigate through oa:hasTarget to get to oa:hasSource
6. Need to get actual text value and creator name

---

CORRECTED QUERY:

PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX armadale: <https://literary-greats.com/WCollins/Armadale/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?bodyText ?creatorName
WHERE {
  ?annotation a oa:Annotation ;
              dcterms:creator ?creator ;
              oa:hasBody ?body ;
              oa:hasTarget ?target .
  ?body rdf:value ?bodyText .
  ?target oa:hasSource armadale:Chapter3 .
  ?creator foaf:name ?creatorName .
}

In [None]:
%%sparql --file annotation.ttl
PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX armadale: <https://literary-greats.com/WCollins/Armadale/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?bodyText ?creatorName
WHERE {
  ?annotation a oa:Annotation ;
              dcterms:creator ?creator ;
              oa:hasBody ?body ;
              oa:hasTarget ?target .
  ?body rdf:value ?bodyText .
  ?target oa:hasSource armadale:Chapter3 .
  ?creator foaf:name ?creatorName .
}

In [None]:
%%sparql --file annotation.ttl
# Try your own query here


## Q4(f): Tables and keys [5 marks]

### Solution

**Q4(f) SOLUTION**

Tables for relational implementation:

OPTION 1: SINGLE TRIPLE TABLE

| Table   | Primary Key                     | Foreign Keys |
|---------|---------------------------------|--------------|
| Triples | (Subject, Predicate, Object)    | None         |

OPTION 2: TRADITIONAL RELATIONAL

| Table       | Primary Key   | Foreign Keys                    |
|-------------|---------------|---------------------------------|
| Persons     | PersonId      | -                               |
| Annotations | AnnotationId  | CreatorId -> Persons(PersonId)  |
| Bodies      | BodyId        | AnnotationId -> Annotations     |
| Targets     | TargetId      | AnnotationId -> Annotations     |
| Sources     | SourceId      | -                               |
| Selectors   | SelectorId    | TargetId -> Targets(TargetId)   |

In [None]:
%%sql
-- Q4(f) SOLUTION: Create Triple Store table
DROP TABLE IF EXISTS Triples;

CREATE TABLE Triples (
    Subject VARCHAR(256),
    Predicate VARCHAR(256),
    Object VARCHAR(512),
    PRIMARY KEY (Subject, Predicate, Object)
);

-- Insert sample annotation data
INSERT INTO Triples VALUES
('myrdf:anno-001', 'rdf:type', 'oa:Annotation'),
('myrdf:anno-001', 'dcterms:creator', 'myrdf:DL192'),
('myrdf:anno-001', 'oa:hasBody', '_:body1'),
('myrdf:anno-001', 'oa:hasTarget', '_:target1'),
('_:body1', 'rdf:value', 'Note the use of visual language here.'),
('_:target1', 'oa:hasSource', 'armadale:Chapter3'),
('myrdf:DL192', 'rdf:type', 'foaf:Person'),
('myrdf:DL192', 'foaf:name', 'David Lewis');

SELECT 'Triple store created!' AS Status;

## Q4(g): MySQL equivalent query [3 marks]

### Solution

**Q4(g) SOLUTION explanation**

MySQL query equivalent for the SPARQL query:

Using the triple store design, we need multiple self-joins:

In [None]:
%%sql
-- Q4(g) SOLUTION: MySQL equivalent query
SELECT tBodyVal.Object AS BodyText,
       tCreatorName.Object AS CreatorName
FROM Triples tAnno
INNER JOIN Triples tType
    ON tAnno.Subject = tType.Subject
INNER JOIN Triples tBody
    ON tAnno.Subject = tBody.Subject
INNER JOIN Triples tBodyVal
    ON tBody.Object = tBodyVal.Subject
INNER JOIN Triples tCreator
    ON tAnno.Subject = tCreator.Subject
INNER JOIN Triples tCreatorName
    ON tCreator.Object = tCreatorName.Subject
INNER JOIN Triples tTarget
    ON tAnno.Subject = tTarget.Subject
INNER JOIN Triples tSource
    ON tTarget.Object = tSource.Subject
WHERE tType.Predicate = 'rdf:type'
  AND tType.Object = 'oa:Annotation'
  AND tBody.Predicate = 'oa:hasBody'
  AND tBodyVal.Predicate = 'rdf:value'
  AND tCreator.Predicate = 'dcterms:creator'
  AND tCreatorName.Predicate = 'foaf:name'
  AND tTarget.Predicate = 'oa:hasTarget'
  AND tSource.Predicate = 'oa:hasSource'
  AND tSource.Object = 'armadale:Chapter3';

In [None]:
%%sql
-- Try your own query here


**Q4(g) Explanation**

Note: This demonstrates why SPARQL is more natural for RDF data.

The SQL query requires EIGHT self-joins to traverse the graph structure,
whereas SPARQL handles this pattern matching naturally.

Each alias represents finding a specific triple pattern:
- tAnno: Base annotation
- tType: Verify it's an oa:Annotation
- tBody: Find oa:hasBody
- tBodyVal: Get rdf:value from body
- tCreator: Find dcterms:creator
- tCreatorName: Get foaf:name from creator
- tTarget: Find oa:hasTarget
- tSource: Verify oa:hasSource is armadale:Chapter3