<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/september-2021/notebook-september-2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, rapper, and rdflib.

In [None]:
# === MySQL Setup ===
!apt-get update -qq > /dev/null
!apt-get install -y -qq mysql-server > /dev/null
!service mysql start
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === SQL Magic ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0
%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

# === XPath Magic (cellspell) ===
!apt-get install -y libxml2-utils -qq > /dev/null
!pip install "cellspell[xpath] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.xpath

# === SPARQL Magic (cellspell) ===
!pip install "cellspell[sparql] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.sparql

In [None]:
# Install and start MongoDB
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

# Test MongoDB is running
!mongo --quiet --eval 'print("MongoDB ready!")'

# === MongoDB Magic (cellspell) ===
!pip install "cellspell[mongodb] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.mongodb
%mongodb mongodb://localhost:27017/music_db

---

# Question 2: Bird Spotter's Database [30 marks]

## Data Setup

In [None]:
%%sql
DROP TABLE IF EXISTS Sightings;

CREATE TABLE Sightings (
    Species VARCHAR(100),
    Date DATE,
    NumberSighted INT,
    ConservationStatus VARCHAR(50),
    NatureReserve VARCHAR(100),
    Location VARCHAR(50)
);

INSERT INTO Sightings VALUES
('Bar-tailed godwit', '2021-04-21', 31, 'Least concern', 'Rainham Marshes', '51.5N 0.2E'),
('Wood pigeon', '2021-04-21', 31, 'Least concern', 'Rainham Marshes', '51.5N 0.2E'),
('Greater spotted woodpecker', '2021-06-13', 1, 'Least concern', 'Epping Forest', '51.6N 0.0E'),
('European turtle dove', '2021-06-13', 2, 'Vulnerable', 'Epping Forest', '51.6N 0.0E'),
('Wood pigeon', '2021-06-13', 2, 'Least concern', 'Epping Forest', '51.6N 0.0E'),
('Great bustard', '2020-04-15', 3, 'Vulnerable', 'Salisbury Plain', '51.1N -1.8W'),
('Bar-tailed godwit', '2020-04-20', 53, 'Least concern', 'Rainham Marshes', '51.5N 0.2E');

SELECT 'Sightings table ready!' AS Status;

## Q2(a): Retrieve bird types seen since January 1, 2021 [4 marks]


In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
-- Q2(a) SOLUTION: Bird types seen since January 1, 2021

SELECT DISTINCT Species
FROM Sightings
WHERE Date >= '2021-01-01';
```

</details>


## Q2(c): Normalise this data [7 marks]


In [None]:
%%sql
-- Q2(c) SOLUTION: Create normalized tables

DROP TABLE IF EXISTS SightingsNorm;
DROP TABLE IF EXISTS Species;
DROP TABLE IF EXISTS NatureReserves;

-- Species table (removes conservation status redundancy)
CREATE TABLE Species (
    SpeciesName VARCHAR(100) PRIMARY KEY,
    ConservationStatus VARCHAR(50)
);

-- NatureReserves table (removes location redundancy)
CREATE TABLE NatureReserves (
    ReserveName VARCHAR(100) PRIMARY KEY,
    Location VARCHAR(50)
);

-- Sightings table (normalized - references other tables)
CREATE TABLE SightingsNorm (
    SpeciesName VARCHAR(100),
    ReserveName VARCHAR(100),
    Date DATE,
    NumberSighted INT,
    PRIMARY KEY (SpeciesName, ReserveName, Date),
    FOREIGN KEY (SpeciesName) REFERENCES Species(SpeciesName),
    FOREIGN KEY (ReserveName) REFERENCES NatureReserves(ReserveName)
);

-- Populate Species
INSERT INTO Species VALUES
('Bar-tailed godwit', 'Least concern'),
('Wood pigeon', 'Least concern'),
('Greater spotted woodpecker', 'Least concern'),
('European turtle dove', 'Vulnerable'),
('Great bustard', 'Vulnerable');

-- Populate NatureReserves
INSERT INTO NatureReserves VALUES
('Rainham Marshes', '51.5N 0.2E'),
('Epping Forest', '51.6N 0.0E'),
('Salisbury Plain', '51.1N -1.8W');

-- Populate SightingsNorm
INSERT INTO SightingsNorm VALUES
('Bar-tailed godwit', 'Rainham Marshes', '2021-04-21', 31),
('Wood pigeon', 'Rainham Marshes', '2021-04-21', 31),
('Greater spotted woodpecker', 'Epping Forest', '2021-06-13', 1),
('European turtle dove', 'Epping Forest', '2021-06-13', 2),
('Wood pigeon', 'Epping Forest', '2021-06-13', 2),
('Great bustard', 'Salisbury Plain', '2020-04-15', 3),
('Bar-tailed godwit', 'Rainham Marshes', '2020-04-20', 53);

SELECT 'Normalized tables created!' AS Status;

**Q2(c) EXPLANATION**

Normalized Tables:

1. Species
   | Column             | Key |
   |--------------------|-----|
   | SpeciesName        | PK  |
   | ConservationStatus |     |

2. NatureReserves
   | Column      | Key |
   |-------------|-----|
   | ReserveName | PK  |
   | Location    |     |

3. SightingsNorm
   | Column        | Key                  |
   |---------------|----------------------|
   | SpeciesName   | PK, FK -> Species    |
   | ReserveName   | PK, FK -> NatureReserves |
   | Date          | PK                   |
   | NumberSighted |                      |

Why This Design:
- No update anomaly: Change conservation status once in Species table
- No insert anomaly: Can add a new species without needing a sighting
- No delete anomaly: Deleting last sighting doesn't lose species info

## Q2(e): Query with JOIN [5 marks]

Retrieve bird types and conservation status for birds seen since January 1, 2021.


In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
-- Q2(e) SOLUTION: Bird types and conservation status since Jan 1, 2021

SELECT DISTINCT S.SpeciesName, SP.ConservationStatus
FROM SightingsNorm S
INNER JOIN Species SP ON S.SpeciesName = SP.SpeciesName
WHERE S.Date >= '2021-01-01';
```

</details>


---

# Question 3: MEI Music Encoding [30 marks]

## Data Setup

In [None]:
%%writefile mei_sample.xml
<measure>
  <staff n="2">
    <layer n="1">
      <chord xml:id="d13e1" dur="8" dur.ppq="12" stem.dir="up">
        <note xml:id="d1e101" pname="c" oct="5"/>
        <note xml:id="d1e118" pname="a" oct="4"/>
        <note xml:id="d1e136" pname="c" oct="4"/>
      </chord>
    </layer>
  </staff>
  <staff n="3">
    <layer n="1">
      <chord xml:id="d17e1" dur="8" dur.ppq="12" stem.dir="up">
        <note xml:id="d1e157" pname="f" oct="3"/>
        <note xml:id="d1e174" pname="f" oct="2"/>
      </chord>
    </layer>
  </staff>
</measure>

## Q3(b): Fix the XPath [3 marks]

**Original (incorrect):** `/staff[n="2"]/layer/chord[note/@pname="c"]`


In [None]:
%%xpath mei_sample.xml


<details>
<summary>Click to reveal solution</summary>

**Q3(b) SOLUTION: Corrected XPath**

Corrected XPath:

//staff[@n="2"]/layer/chord[note/@pname="c"]

What Was Wrong in the Original:
| Original        | Problem                                    | Correct        |
|-----------------|--------------------------------------------|----------------|
| /staff[n="2"]   | n="2" looks for child element <n>          | [@n="2"]       |
| /staff          | Starts from root, but staff isn't root     | //staff        |

Note: The question text mentions finding notes with pname="f", but the
incorrect XPath shows pname="c". Staff n="2" has notes with c and a.
Staff n="3" has notes with pname="f".

To find chords with pname="f":
//staff[@n="3"]/layer/chord[note/@pname="f"]

```xml
%%xpath mei_sample.xml
//staff[@n="2"]/layer/chord[note/@pname="c"]
```

```xml
%%xpath mei_sample.xml
//staff[@n="3"]/layer/chord[note/@pname="f"]
```

</details>


## Q3(c)(i): Translate chord to JSON [5 marks]


In [None]:
# TODO: Write your solution here


<details>
<summary>Click to reveal solution</summary>

```json
{
  "chord": {
    "xml_id": "d13e1",
    "dur": 8,
    "dur_ppq": 12,
    "stem_dir": "up",
    "notes": [
      {"xml_id": "d1e101", "pname": "c", "oct": 5},
      {"xml_id": "d1e118", "pname": "a", "oct": 4},
      {"xml_id": "d1e136", "pname": "c", "oct": 4}
    ]
  }
}
```

**Conversion Decisions:**
- `dur` as number: Duration is numeric, enables math operations
- `oct` as number: Octave is numeric
- `notes` as array: Multiple notes â†’ array structure
- `xml_id` not `xml:id`: Colons not allowed in JSON keys
- `stem_dir` not `stem.dir`: Dots can cause issues in some languages

</details>

## Q3(c)(ii): MongoDB find query [5 marks]

Find chords with upward stems that have 'f' in one of their notes.


In [None]:
%%mongodb
db.chords.drop()

In [None]:
%%mongodb
db.chords.insertMany([{"xml_id": "d13e1", "dur": 8, "stem_dir": "up", "notes": [{"pname": "c", "oct": 5}, {"pname": "a", "oct": 4}, {"pname": "c", "oct": 4}]}, {"xml_id": "d17e1", "dur": 8, "stem_dir": "up", "notes": [{"pname": "f", "oct": 3}, {"pname": "f", "oct": 2}]}])

In [None]:
%%mongodb
// TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```js
%%mongodb
db.chords.find({"stem_dir": "up", "notes.pname": "f"})
```

```js
%%mongodb
db.chords.find({ "stem_dir": "up", "notes": { "$elemMatch": { "pname": "f" } } })
```

</details>


## Q3(d)(ii): RDF for first chord [5 marks]


In [None]:
%%writefile chord.ttl
@prefix mei: <http://example.org/mei#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Q3(d)(ii) SOLUTION: RDF for first chord

mei:chord_d13e1 a mei:Chord ;
    mei:duration "8"^^xsd:integer ;
    mei:durationPpq "12"^^xsd:integer ;
    mei:stemDirection "up" ;
    rdfs:member mei:note_d1e101 ,
                mei:note_d1e118 ,
                mei:note_d1e136 .

mei:note_d1e101 a mei:Note ;
    mei:pitchName "c" ;
    mei:octave "5"^^xsd:integer .

mei:note_d1e118 a mei:Note ;
    mei:pitchName "a" ;
    mei:octave "4"^^xsd:integer .

mei:note_d1e136 a mei:Note ;
    mei:pitchName "c" ;
    mei:octave "4"^^xsd:integer .

In [None]:
%%sparql --file chord.ttl
SELECT (COUNT(*) AS ?triples) WHERE { ?s ?p ?o }

---

# Question 4: Zoo Database [30 marks]

## Data Setup

In [None]:
%%sql
DROP TABLE IF EXISTS SightingsNorm;
DROP TABLE IF EXISTS Animal;
DROP TABLE IF EXISTS Species;
DROP TABLE IF EXISTS Enclosure;
DROP TABLE IF EXISTS Zoo;

CREATE TABLE Zoo (
    Name VARCHAR(255) PRIMARY KEY,
    Country VARCHAR(255)
);

CREATE TABLE Enclosure (
    Name VARCHAR(255) PRIMARY KEY,
    Location VARCHAR(255),
    ZooName VARCHAR(255),
    FOREIGN KEY (ZooName) REFERENCES Zoo(Name)
);

CREATE TABLE Species (
    LatinName VARCHAR(255) PRIMARY KEY,
    ConservationStatus VARCHAR(50)
);

CREATE TABLE Animal (
    Identifier INT AUTO_INCREMENT PRIMARY KEY,
    DateOfBirth DATE,
    SpeciesLatinName VARCHAR(255),
    EnclosureName VARCHAR(255),
    FOREIGN KEY (SpeciesLatinName) REFERENCES Species(LatinName),
    FOREIGN KEY (EnclosureName) REFERENCES Enclosure(Name)
);

-- Insert Zoos
INSERT INTO Zoo VALUES ('Singapore Zoo', 'Singapore'), ('London Zoo', 'UK');

-- Insert Enclosures
INSERT INTO Enclosure VALUES
('Tropical Aviary', 'Mandai Lake', 'Singapore Zoo'),
('Savannah Zone', 'Outer Gardens', 'Singapore Zoo'),
('Reptile House', 'Regents Park', 'London Zoo'),
('Bird Paradise', 'Regents Park', 'London Zoo');

-- Insert Species
INSERT INTO Species VALUES
('Buceros bicornis', 'Vulnerable'),
('Panthera leo', 'Vulnerable'),
('Varanus komodoensis', 'Endangered');

-- Insert Animals (Buceros bicornis in BOTH zoos for Q4(d))
INSERT INTO Animal (DateOfBirth, SpeciesLatinName, EnclosureName) VALUES
('2010-04-10', 'Buceros bicornis', 'Tropical Aviary'),
('2012-06-15', 'Panthera leo', 'Savannah Zone'),
('2005-02-01', 'Varanus komodoensis', 'Reptile House'),
('2015-09-09', 'Buceros bicornis', 'Savannah Zone'),
('2008-03-15', 'Buceros bicornis', 'Bird Paradise'),
('2018-11-20', 'Buceros bicornis', 'Bird Paradise');

SELECT 'Zoo database ready!' AS Status;

## Q4(c): Count species in Singapore Zoo [5 marks]


In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
-- Q4(c) SOLUTION: Count species in Singapore Zoo

SELECT COUNT(DISTINCT A.SpeciesLatinName) AS SpeciesCount
FROM Animal A
INNER JOIN Enclosure E ON A.EnclosureName = E.Name
WHERE E.ZooName = 'Singapore Zoo';
```

</details>


## Q4(d): Oldest 'Buceros bicornis' in each zoo [5 marks]


In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
-- Q4(d) SOLUTION: Oldest Buceros bicornis (Great Hornbill) in each zoo
-- MIN(DateOfBirth) = oldest animal (earliest birth date)

SELECT E.ZooName, MIN(A.DateOfBirth) AS OldestBirthDate
FROM Animal A
INNER JOIN Enclosure E ON A.EnclosureName = E.Name
WHERE A.SpeciesLatinName = 'Buceros bicornis'
GROUP BY E.ZooName;
```

</details>


## Q4(e): RDF instance data [10 marks]


<details>
<summary>Click to reveal solution</summary>

**Q4(e)(i) SOLUTION: Suitability assessment for RDF**

Suitability Assessment for RDF:

RDF is well-suited for the zoo database because:

1. Natural graph structure
   - Zoo->Enclosure->Animal->Species relationships map directly to RDF triples

2. Linking capability
   - Can link species to external data (IUCN Red List, Wikipedia, Wikidata)

3. Flexibility
   - Easy to add new properties without schema changes

4. Integration
   - Multiple zoos could share species data via linked URIs
   - Conservation organizations could query across zoos

</details>


In [None]:
%%writefile zoo.ttl
@prefix zoo: <http://example.org/zoo#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Q4(e)(ii) SOLUTION: RDF instance data for Zoo database

# Zoo
zoo:SingaporeZoo a zoo:Zoo ;
    zoo:name "Singapore Zoo" ;
    zoo:country "Singapore" .

zoo:LondonZoo a zoo:Zoo ;
    zoo:name "London Zoo" ;
    zoo:country "UK" .

# Species
zoo:BucerosBicornis a zoo:Species ;
    zoo:latinName "Buceros bicornis" ;
    zoo:conservationStatus "Vulnerable" .

zoo:PantheraLeo a zoo:Species ;
    zoo:latinName "Panthera leo" ;
    zoo:conservationStatus "Vulnerable" .

# Enclosures
zoo:TropicalAviary a zoo:Enclosure ;
    zoo:name "Tropical Aviary" ;
    zoo:location "Mandai Lake" ;
    zoo:partOf zoo:SingaporeZoo .

zoo:BirdParadise a zoo:Enclosure ;
    zoo:name "Bird Paradise" ;
    zoo:location "Regents Park" ;
    zoo:partOf zoo:LondonZoo .

# Animals
zoo:Animal001 a zoo:Animal ;
    zoo:identifier "SG-HB-001" ;
    zoo:dateOfBirth "2010-04-10"^^xsd:date ;
    zoo:species zoo:BucerosBicornis ;
    zoo:livesIn zoo:TropicalAviary .

zoo:Animal002 a zoo:Animal ;
    zoo:identifier "UK-HB-001" ;
    zoo:dateOfBirth "2008-03-15"^^xsd:date ;
    zoo:species zoo:BucerosBicornis ;
    zoo:livesIn zoo:BirdParadise .

In [None]:
%%sparql --file zoo.ttl
SELECT (COUNT(*) AS ?triples) WHERE { ?s ?p ?o }