<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/september-2024/notebook-september-2024-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 September 2024 - Solutions Notebook

This notebook contains **complete solutions** for the September 2024 exam.

**Exam Structure:**
- Section A: 10 MCQs - 40 marks
- Section B: Answer 2 of 3 questions - 60 marks
- Both parts completed together on Inspera (4 hours total)
  - Q2: Historical Lute Music Database
  - Q3: Poetry Contest XML/TEI
  - Q4: Wikidata SPARQL / Belgian Artists / MongoDB

**Instructions:**
1. Run the Setup cells first
2. All solution cells are pre-filled with correct answers
3. Compare with your own attempts from the practice notebook

---

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, and SPARQL.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === xmllint Setup (for XML/XPath exercises and schema validation) ===
!apt -y -qq install libxml2-utils > /dev/null

# === jing Setup (for RelaxNG validation) ===
!apt -y -qq install jing > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml sparqlwrapper

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")
print("jing ready (for RelaxNG validation)!")

In [None]:
# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

!mongo --quiet --eval 'print("MongoDB ready!")'

In [None]:
# === SPARQL Setup (for Wikidata queries) ===
from SPARQLWrapper import SPARQLWrapper, JSON

def run_sparql(query, endpoint="https://query.wikidata.org/sparql"):
    """Run a SPARQL query against Wikidata and print results."""
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    try:
        results = sparql.query().convert()
        for result in results["results"]["bindings"]:
            print(result)
        return results
    except Exception as e:
        print(f"Error: {e}")
        return None

print("SPARQL ready!")

---

# Question 2: Historical Lute Music Database [30 marks]

## Context

An enthusiast website stores historical lute music data in CSV files.

## Question 2(a) [6 marks]

### Solution

In [None]:
# Q2(a) SOLUTION
print("""ADVANTAGES OF DATABASE APPROACH:

1. Data Integrity:
   - Enforce that Conc_no in pieces matches valid concordance
   - Foreign keys prevent orphaned records

2. Reduced Redundancy:
   - Composer names stored once (e.g., "V. Gaultier" not repeated)
   - Update in one place affects all references

3. Query Capabilities:
   - SQL allows complex queries (find all pieces by V. Gaultier)
   - JOIN across Sources and Pieces easily

4. Concurrent Access:
   - Multiple users can edit safely with transactions
   - No file locking issues

DISADVANTAGES OF DATABASE APPROACH:

1. Setup Complexity:
   - Need database server infrastructure
   - Schema design effort required

2. Learning Curve:
   - Users must know SQL instead of editing CSV
   - Current workflow with GitHub would change

3. Version Control:
   - Harder to track changes with Git
   - CSV diffs are human-readable

4. Migration Effort:
   - Converting existing CSV files requires work
   - Need to parse Concordances field (contains list)
""")

## Question 2(b) [2 marks]

### Solution

In [None]:
# Q2(b) SOLUTION
print("""RECOMMENDED: Document database (e.g., MongoDB)

REASONS:
1. Semi-structured data: CSV files have varying fields
2. Flexible schema: Can import CSV directly without strict schema
3. Nested data: The "Concordances" field contains multiple locations
   (e.g., "NL-At/2v – F-PnVmb7/188") that map naturally to arrays

ALTERNATIVE ACCEPTABLE ANSWER:
Relational database - because the data already has clear entities
(Sources, Concordances, Pieces) with relationships between them.
""")

## Question 2(c) [12 marks]

### Solution

In [None]:
# Q2(c) SOLUTION - Model explanation
print("""PROPOSED RELATIONAL MODEL:

Tables:
1. Source - stores manuscript/print information
2. Composer - normalizes composer names
3. Concordance - musical works that appear in multiple sources
4. Piece - individual pieces in each source
5. ConcordanceLocation - junction table for concordance locations

CONCERNS:
1. Composer ambiguity: "V. Gaultier or D. Gaultier" - need Uncertain flag
2. Date format: "1600-1680" is a range - store as DateStart/DateEnd
3. Page numbering: "2v" uses folio notation - keep as VARCHAR
4. Concordance parsing: "NL-At/2v – F-PnVmb7/188" needs splitting
""")

In [None]:
%%sql
-- Q2(c) SOLUTION - CREATE TABLE statements
DROP TABLE IF EXISTS ConcordanceLocation;
DROP TABLE IF EXISTS Piece;
DROP TABLE IF EXISTS Concordance;
DROP TABLE IF EXISTS Source;
DROP TABLE IF EXISTS Composer;

CREATE TABLE Composer (
    ComposerId INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(100) NOT NULL UNIQUE
);

CREATE TABLE Source (
    RefShort VARCHAR(50) PRIMARY KEY,
    RefLong VARCHAR(200),
    Library VARCHAR(200),
    NameGerman VARCHAR(200),
    NameEnglish VARCHAR(200),
    DateRange VARCHAR(50),
    Instrument VARCHAR(100)
);

CREATE TABLE Concordance (
    ConcNo VARCHAR(20) PRIMARY KEY,
    ComposerId INT,
    FOREIGN KEY (ComposerId) REFERENCES Composer(ComposerId)
);

CREATE TABLE Piece (
    PieceId INT PRIMARY KEY AUTO_INCREMENT,
    SourceRef VARCHAR(50) NOT NULL,
    PieceNo INT NOT NULL,
    MusicalKey VARCHAR(20),
    PageNo VARCHAR(20),
    Title VARCHAR(200),
    ComposerId INT,
    ConcNo VARCHAR(20),
    FOREIGN KEY (SourceRef) REFERENCES Source(RefShort),
    FOREIGN KEY (ComposerId) REFERENCES Composer(ComposerId),
    FOREIGN KEY (ConcNo) REFERENCES Concordance(ConcNo),
    UNIQUE (SourceRef, PieceNo)
);

CREATE TABLE ConcordanceLocation (
    ConcNo VARCHAR(20),
    SourceRef VARCHAR(50),
    PageNo VARCHAR(20),
    PRIMARY KEY (ConcNo, SourceRef, PageNo),
    FOREIGN KEY (ConcNo) REFERENCES Concordance(ConcNo),
    FOREIGN KEY (SourceRef) REFERENCES Source(RefShort)
);

SELECT 'Tables created!' AS Status;

In [None]:
%%sql
-- Insert sample data for testing
INSERT INTO Composer (ComposerId, Name) VALUES
(1, 'V. Gaultier'),
(2, 'D. Gaultier'),
(3, 'John Dowland');

INSERT INTO Source VALUES
('NL-At', 'Ms. 205.B.32', 'Amsterdam, Toonkunst-Bibliotheek', NULL, NULL, '1600-1680', 'Baroque Lute'),
('D-DI_M297', 'Mscr Dresd. M. 297', 'Staats- und Universitätsbibliothek Dresden', 'Liederbuch', 'Songbook', '1603', 'Renaissance Lute');

INSERT INTO Concordance VALUES
('Conc_51', 1),
('Conc_15', 2),
('Conc_99', 3);  -- Dowland concordance

INSERT INTO Piece (SourceRef, PieceNo, MusicalKey, PageNo, Title, ComposerId, ConcNo) VALUES
('NL-At', 4, 'c minor', '2v', 'sans titre', 1, 'Conc_51'),
('NL-At', 17, 'd minor', '24v', 'Caprice', 2, 'Conc_15'),
('NL-At', 25, 'a minor', '30r', 'Lachrimae', 3, 'Conc_99'),
('D-DI_M297', 1, 'g major', '1r', 'Flow my tears', NULL, NULL),  -- Not in any concordance
('D-DI_M297', 5, 'e minor', '5v', 'Lachrimae Pavan', NULL, NULL);  -- Not linked to Dowland

SELECT 'Sample data inserted!' AS Status;

## Question 2(d) [6 marks]

### Solution

In [None]:
%%sql
-- Q2(d) SOLUTION: Find pieces with 'lachrimae' or 'flow' NOT in Dowland concordance
SELECT p.PieceId, p.Title, p.SourceRef, p.PageNo
FROM Piece p
WHERE (LOWER(p.Title) LIKE '%lachrimae%'
       OR LOWER(p.Title) LIKE '%flow%')
  AND (p.ConcNo IS NULL
       OR p.ConcNo NOT IN (
           SELECT c.ConcNo
           FROM Concordance c
           INNER JOIN Composer comp ON c.ComposerId = comp.ComposerId
           WHERE LOWER(comp.Name) LIKE '%john dowland%'
       ));

In [None]:
# Q2(d) SOLUTION explanation
print("""QUERY EXPLANATION:

1. WHERE clause filters pieces with 'lachrimae' or 'flow' in title
2. AND clause ensures piece is either:
   - Not in any concordance (ConcNo IS NULL), OR
   - In a concordance NOT associated with John Dowland

3. The subquery finds all concordances where the composer
   name contains 'john dowland' (case-insensitive)

4. Result: Pieces that might need to be added to Dowland's
   Lachrimae concordance group
""")

## Question 2(e) [4 marks]

### Solution

In [None]:
# Q2(e) SOLUTION
print("""GRANT COMMAND FOR WEB APPLICATION:

CREATE USER 'webapp'@'localhost' IDENTIFIED BY 'secure_password';

GRANT SELECT, INSERT ON lutemusic.Source TO 'webapp'@'localhost';
GRANT SELECT, INSERT ON lutemusic.Piece TO 'webapp'@'localhost';
GRANT SELECT, INSERT ON lutemusic.Concordance TO 'webapp'@'localhost';
GRANT SELECT, INSERT ON lutemusic.ConcordanceLocation TO 'webapp'@'localhost';
GRANT SELECT ON lutemusic.Composer TO 'webapp'@'localhost';

KEY POINTS:
- SELECT: Read existing data
- INSERT: Add new sources and pieces (as specified)
- No UPDATE/DELETE: Prevent accidental data loss
- No GRANT on Composer INSERT: Prevent arbitrary composer creation
- Principle of least privilege applied
""")

---

# Question 3: Poetry Contest XML/TEI [30 marks]

## XML Setup

In [None]:
%%writefile poetry_contest.xml
<?xml version="1.0" encoding="UTF-8"?>
<contests xmlns:tei="http://www.tei-c.org/ns/1.0">
  <competition theme="limericks" date="2024-01-03">
    <entry>
      <authors>
        <author viaf="23156">Edward Lear</author>
      </authors>
      <poem>
        <tei:lg type="stanza">
          <tei:l>There was an old man of Dumbree</tei:l>
          <tei:l>Who taught little owls to drink tea</tei:l>
          <tei:l>For he said, "To eat mice is not proper or nice"</tei:l>
          <tei:l>That amiable man of Dumbree</tei:l>
        </tei:lg>
      </poem>
    </entry>
    <entry>
      <authors>
        <author viaf="12345">Anonymous</author>
      </authors>
      <poem>
        <tei:lg type="stanza">
          <tei:l>A wonderful bird is the pelican</tei:l>
          <tei:l>His bill can hold more than his belican</tei:l>
          <tei:l>He can take in his beak enough food for a week</tei:l>
          <tei:l>But I'm darned if I see how the helican</tei:l>
        </tei:lg>
      </poem>
    </entry>
  </competition>
  <competition theme="haiku" date="2024-02-15">
    <entry>
      <authors>
        <author viaf="67890">Matsuo Basho</author>
      </authors>
      <poem>
        <tei:lg type="stanza">
          <tei:l>An old silent pond</tei:l>
          <tei:l>A frog jumps into the pond</tei:l>
          <tei:l>Splash! Silence again</tei:l>
        </tei:lg>
      </poem>
    </entry>
  </competition>
</contests>

In [None]:
%%writefile poetry_contest.rng
<?xml version="1.0" encoding="UTF-8"?>
<!-- RelaxNG schema for Poetry Contest XML with TEI elements -->
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:tei="http://www.tei-c.org/ns/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

  <start>
    <ref name="contests"/>
  </start>

  <define name="contests">
    <element name="contests">
      <attribute name="xmlns:tei"/>
      <oneOrMore>
        <ref name="competition"/>
      </oneOrMore>
    </element>
  </define>

  <define name="competition">
    <element name="competition">
      <attribute name="theme"/>
      <attribute name="date">
        <data type="date"/>
      </attribute>
      <oneOrMore>
        <ref name="entry"/>
      </oneOrMore>
    </element>
  </define>

  <define name="entry">
    <element name="entry">
      <ref name="authors"/>
      <ref name="poem"/>
    </element>
  </define>

  <define name="authors">
    <element name="authors">
      <oneOrMore>
        <ref name="author"/>
      </oneOrMore>
    </element>
  </define>

  <define name="author">
    <element name="author">
      <optional>
        <attribute name="viaf"/>
      </optional>
      <text/>
    </element>
  </define>

  <define name="poem">
    <element name="poem">
      <oneOrMore>
        <ref name="tei-lg"/>
      </oneOrMore>
    </element>
  </define>

  <!-- TEI line group element -->
  <define name="tei-lg">
    <element name="tei:lg" ns="http://www.tei-c.org/ns/1.0">
      <optional>
        <attribute name="type"/>
      </optional>
      <oneOrMore>
        <ref name="tei-l"/>
      </oneOrMore>
    </element>
  </define>

  <!-- TEI line element -->
  <define name="tei-l">
    <element name="tei:l" ns="http://www.tei-c.org/ns/1.0">
      <text/>
    </element>
  </define>

</grammar>

In [None]:
# Validate the poetry contest XML against the schema
print("=== Validating poetry_contest.xml against poetry_contest.rng ===")
!jing poetry_contest.rng poetry_contest.xml && echo "VALID: poetry_contest.xml passes schema validation!"

## Question 3(a) [1 mark]

### Solution

In [None]:
# Q3(a) SOLUTION
print("Answer: XML (Extensible Markup Language)")

## Question 3(b) [3 marks]

### Solution

In [None]:
# Q3(b) SOLUTION
print("""They are PARTIALLY CORRECT but IMPRECISE.

More accurate statement:
"The file is XML that INCORPORATES TEI elements (specifically
tei:lg and tei:l for line groups and lines) within a custom schema.
It is NOT a pure TEI document because:

1. The root element is <contests>, not <TEI>
2. Structure elements (<competition>, <entry>, <authors>) are not TEI
3. TEI elements are used only for poem content via namespace prefix
4. A true TEI document would have:
   - <TEI> as root
   - <teiHeader> for metadata
   - <text> for content"
""")

## Question 3(c) [3 marks]

### Solution

In [None]:
# Q3(c) SOLUTION
print("""XPath Expression:

//competition[@theme='limericks']//entry//tei:l[1]

OR more explicitly:

//competition[@theme='limericks']/entry/poem/tei:lg/tei:l[1]

Explanation:
- //competition[@theme='limericks'] - Find competition with theme='limericks'
- //entry - All entry descendants
- //tei:l[1] - First line element in each entry
- Requires namespace binding for 'tei:' prefix
""")

In [None]:
# Test the XPath
from lxml import etree

doc = etree.parse('poetry_contest.xml')
namespaces = {'tei': 'http://www.tei-c.org/ns/1.0'}

# Get first lines from limerick entries
result = doc.xpath("//competition[@theme='limericks']/entry/poem/tei:lg/tei:l[1]/text()", 
                   namespaces=namespaces)
print("First lines from limerick entries:")
for line in result:
    print(f"  - {line}")

## Question 3(d) [12 marks]

### Solution

In [None]:
# Q3(d) SOLUTION - Design explanation
print("""RELATIONAL MODEL DESIGN:

Tables:
1. Competition - theme, date
2. Author - name, VIAF ID
3. Entry - poem text, competition reference
4. EntryAuthor - junction table (entries can have multiple authors)
5. Judge - judge information
6. Assessment - scores from judges

Design Choices:
- Separate Author table: Authors may enter multiple competitions
- EntryAuthor junction: Entries can have multiple authors
- VIAF ID in Author: Links to authority file
- PoemText as TEXT: Preserves full poem
- Assessment separate: Allows multiple judges per entry
- UNIQUE on Assessment: Prevents duplicate judging

Normal Forms:
- 1NF: All atomic values, no repeating groups
- 2NF: No partial dependencies (Author separate from Entry)
- 3NF: No transitive dependencies (Judge info not dependent on Entry)
- BCNF: All determinants are candidate keys
""")

In [None]:
%%sql
-- Q3(d) SOLUTION - CREATE TABLE statements
DROP TABLE IF EXISTS Assessment;
DROP TABLE IF EXISTS EntryAuthor;
DROP TABLE IF EXISTS Entry;
DROP TABLE IF EXISTS Author;
DROP TABLE IF EXISTS Judge;
DROP TABLE IF EXISTS Competition;

CREATE TABLE Competition (
    CompetitionId INT PRIMARY KEY AUTO_INCREMENT,
    Theme VARCHAR(100) NOT NULL,
    CompDate DATE NOT NULL,
    UNIQUE (Theme, CompDate)
);

CREATE TABLE Author (
    AuthorId INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(200) NOT NULL,
    ViafId VARCHAR(50)
);

CREATE TABLE Entry (
    EntryId INT PRIMARY KEY AUTO_INCREMENT,
    CompetitionId INT NOT NULL,
    PoemText TEXT NOT NULL,
    SubmittedAt DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (CompetitionId) REFERENCES Competition(CompetitionId)
);

CREATE TABLE EntryAuthor (
    EntryId INT,
    AuthorId INT,
    PRIMARY KEY (EntryId, AuthorId),
    FOREIGN KEY (EntryId) REFERENCES Entry(EntryId),
    FOREIGN KEY (AuthorId) REFERENCES Author(AuthorId)
);

CREATE TABLE Judge (
    JudgeId INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(200) NOT NULL
);

CREATE TABLE Assessment (
    AssessmentId INT PRIMARY KEY AUTO_INCREMENT,
    EntryId INT NOT NULL,
    JudgeId INT NOT NULL,
    Score DECIMAL(4,2) NOT NULL,
    AssessedAt DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (EntryId) REFERENCES Entry(EntryId),
    FOREIGN KEY (JudgeId) REFERENCES Judge(JudgeId),
    UNIQUE (EntryId, JudgeId)
);

SELECT 'Tables created!' AS Status;

In [None]:
%%sql
-- Insert sample data
INSERT INTO Competition VALUES (1, 'limericks', '2024-01-03');
INSERT INTO Competition VALUES (2, 'haiku', '2024-02-15');

INSERT INTO Author VALUES (1, 'Edward Lear', '23156');
INSERT INTO Author VALUES (2, 'Anonymous', '12345');

INSERT INTO Entry (EntryId, CompetitionId, PoemText) VALUES
(1, 1, 'There was an old man of Dumbree...'),
(2, 1, 'A wonderful bird is the pelican...');

INSERT INTO EntryAuthor VALUES (1, 1), (2, 2);

INSERT INTO Judge VALUES (1, 'Judge A'), (2, 'Judge B'), (3, 'Judge C');

INSERT INTO Assessment (EntryId, JudgeId, Score) VALUES
(1, 1, 8.5), (1, 2, 9.0), (1, 3, 8.0),
(2, 1, 7.5), (2, 2, 7.0), (2, 3, 8.0);

SELECT 'Sample data inserted!' AS Status;

## Question 3(e) [5 marks]

### Solution

In [None]:
%%sql
-- Q3(e) SOLUTION: Winning entry for Limerick 3 Jan 2024
SELECT e.EntryId, e.PoemText, AVG(a.Score) AS AvgScore
FROM Entry e
INNER JOIN Competition c ON e.CompetitionId = c.CompetitionId
INNER JOIN Assessment a ON e.EntryId = a.EntryId
WHERE c.Theme = 'limericks'
  AND c.CompDate = '2024-01-03'
GROUP BY e.EntryId, e.PoemText
ORDER BY AvgScore DESC
LIMIT 1;

## Question 3(f) [6 marks]

### Solution

In [None]:
# Q3(f) SOLUTION
print("""XML vs RELATIONAL COMPARISON:

| Aspect           | XML Approach           | Relational Approach    |
|------------------|------------------------|------------------------|
| Poem Storage     | Excellent - preserves  | Poor - TEXT blob loses |
|                  | structure, formatting  | structure              |
| Judging/Scores   | Poor - custom queries  | Excellent - easy AVG   |
| Author Queries   | Moderate - XPath works | Excellent - simple JOIN|
| Schema Flex      | High - add elements    | Low - ALTER TABLE      |
| Data Integrity   | Low - no FK            | High - constraints     |

WHAT WORKS BEST:
- Store poem with formatting: XML
- Calculate average scores: Relational
- Find entries by author: Relational
- Display poem as submitted: XML
- Generate reports: Relational

HYBRID APPROACH BENEFITS:
1. Store poems as XML column in relational database
2. Relational tables for Competition, Author, Judge, Assessment
3. Use EXTRACTVALUE() or XPath functions for poem content
4. Best of both: SQL for queries, XPath for poem structure

Example: SELECT EXTRACTVALUE(PoemXml, '//tei:l[1]') FROM Entry
""")

---

# Question 4: Wikidata SPARQL / Belgian Artists / MongoDB [30 marks]

## Question 4(a) [1 mark]

### Solution

In [None]:
# Q4(a) SOLUTION
print("Answer: SPARQL (SPARQL Protocol and RDF Query Language)")

## Question 4(b) [2 marks]

### Solution

In [None]:
# Q4(b) SOLUTION
print("""ERRORS AND CORRECTIONS:

| Error                    | Line   | Correction                              |
|--------------------------|--------|-----------------------------------------|
| SHOW should be SELECT    | 1      | SELECT ?person ?personLabel...          |
| {{ should be {           | 2      | Single braces for WHERE clause          |
| }} should be }           | End    | Single closing brace                    |
| Missing ? on person      | 4      | ?person wdt:P19 ?place                  |
| Missing ? on place       | 7      | ?place wdt:P17 ?country                 |
| , should be ;            | 4-5    | Same subject, different predicates use ;|
| ; should be .            | 6      | End ?person triples before new subject  |
""")

corrected_query = """
SELECT ?person ?personLabel ?placeLabel ?dob
WHERE {
  BIND (wd:Q31 AS ?country)
  BIND (wd:Q483501 AS ?job)

  ?person wdt:P19 ?place ;
          wdt:P569 ?dob ;
          wdt:P106 ?job .

  ?place wdt:P17 ?country .

  FILTER(YEAR(?dob) < 1600)

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
"""
print("CORRECTED QUERY:")
print(corrected_query)

## Question 4(c) [4 marks]

### Solution

In [None]:
# Q4(c) SOLUTION
print("""REASONS FOR INCOMPLETE RETRIEVAL:

| Issue              | Example                                         |
|--------------------|-------------------------------------------------|
| Missing data       | Artist has no wdt:P569 (date of birth) recorded |
| Missing place link | Birth place exists but no wdt:P17 country link  |
| Different occupation| Artist listed as "painter" not "artist" (Q483501)|
| Place not in Belgium| Birth place's country is "Spanish Netherlands"  |
| Date precision     | Birth date recorded as century only, not year   |
| No English label   | Artist has name only in Dutch/French            |
""")

## Question 4(d) [3 marks]

### Solution

In [None]:
# Q4(d) SOLUTION
print("""UNWANTED RESULTS THAT MIGHT BE RETURNED:

| Issue                    | Example                                    |
|--------------------------|--------------------------------------------|
| Non-visual artists       | Musicians, writers ("artist" is broad)     |
| Born but not Belgian     | Foreign visitor's child born while traveling|
| Modern Belgium boundaries| Places now in Belgium but historically in  |
|                          | France/Netherlands                         |
| Amateur artists          | People with "artist" as secondary occupation|
""")

## Question 4(e) [4 marks]

### Solution

In [None]:
# Q4(e) SOLUTION
print("""HOW THE QUERY AVOIDS THE BELGIUM PROBLEM:

The query uses PLACE OF BIRTH (P19) instead of COUNTRY OF CITIZENSHIP (P27).

How this works:
1. Query finds the PLACE where artist was born (e.g., Antwerp)
2. Then checks if that place's CURRENT country (P17) is Belgium
3. Wikidata records Antwerp's country as Belgium (Q31) today
4. This works even though in 1525, Antwerp was in Habsburg Netherlands

Key insight:
Geographic locations PERSIST through political changes.
The city of Antwerp still exists and is NOW in Belgium,
even though Belgium didn't exist when the artist was born there.

Limitation:
Might include artists born in places that are now Belgian
but weren't culturally connected to the region historically.
""")

## Question 4(f) [6 marks]

### Solution

In [None]:
# Q4(f) SOLUTION
print("""MONGODB QUERY FOR ARTWORKS (1520-1530, artists born in Antwerp):

Assuming embedded artist data:

db.artworks.find({
  "dateOfCreation": {
    $gte: ISODate("1520-01-01"),
    $lte: ISODate("1530-12-31")
  },
  "artist.placeOfBirth": "Antwerp"
})

If using year integers:

db.artworks.find({
  "yearCreated": { $gte: 1520, $lte: 1530 },
  "artist.birthPlace": /Antwerp/i
})

If artist is a reference (requires aggregation):

db.artworks.aggregate([
  {
    $lookup: {
      from: "artists",
      localField: "artistId",
      foreignField: "_id",
      as: "artistInfo"
    }
  },
  { $unwind: "$artistInfo" },
  {
    $match: {
      "yearCreated": { $gte: 1520, $lte: 1530 },
      "artistInfo.placeOfBirth": "Antwerp"
    }
  }
])
""")

## Question 4(g) [10 marks]

### Solution

In [None]:
# Q4(g) SOLUTION
print("""DATABASE MODEL EVALUATION:

| Model               | Pros                          | Cons                           |
|---------------------|-------------------------------|--------------------------------|
| Graph (RDF/SPARQL)  | Natural fit for Wikidata;     | Complex aggregation;           |
|                     | follows links easily;         | requires SPARQL knowledge;     |
|                     | handles varied relationships  | performance issues             |
| Document (MongoDB)  | Flexible schema for varied    | Harder cross-artist queries;   |
|                     | artwork metadata; good for    | no referential integrity;      |
|                     | read-heavy access             | duplication of artist data     |
| Relational (MySQL)  | Strong for queries and        | Rigid schema; harder to        |
|                     | aggregation; data integrity;  | store varying metadata;        |
|                     | familiar SQL                  | many tables for attributes     |

SPECIAL CONSIDERATIONS FOR THIS CASE:

| Factor              | Impact                                          |
|---------------------|-------------------------------------------------|
| Wikidata integration| Graph model aligns with source data format      |
| Varied metadata     | Document model handles diverse artwork attributes|
| Analysis queries    | Relational model best for aggregation           |
| Image links         | All models handle equally well                  |

RECOMMENDATION:

Hybrid approach would be best:
1. Keep Wikidata as source for artist biographical data (graph)
2. Use MongoDB for artwork details (flexible schema)
3. Cache artist IDs to link artworks to Wikidata entities

Why MongoDB alone is suboptimal:
- Duplicates artist data across artworks
- Harder to update when Wikidata changes
- Loses rich relationship data from Wikidata

Assessment: The researcher's choice is REASONABLE but NOT OPTIMAL.
MongoDB works for the artwork catalog but loses the benefits of
Wikidata's graph structure for artist relationships.
""")

---

# End of Solutions Notebook

All solutions have been provided. Compare with your attempts in the practice notebook!