<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/september-2021/notebook-september-2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 September 2021 - Practice Notebook

## Exam Overview

| Section | Questions | Marks |
|---------|-----------|-------|
| Section A | 10 MCQs (1a-1j) | 40 |
| Section B | Answer 2 of 3 (Q2, Q3, Q4) | 60 |
| **Total** | | **100** |

## How to Use This Notebook

1. **Run the Setup cells first** - they install dependencies and create databases
2. **Try answering each question yourself** before revealing the solution
3. **Experiment!** - Modify queries to deepen your understanding
4. **Use the collapsible sections** - Click to reveal hints and answers

---

# Environment Setup

Run these cells first to set up all required dependencies.

In [None]:
# Install and start MySQL
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and databases
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON exam_db.* TO 'examuser'@'localhost';"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# Install Python libraries
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml rdflib

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL setup complete!")

In [None]:
# MongoDB Setup (for Question 3)
!sudo wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!pip install -q pymongo
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

from pymongo import MongoClient
mongo_client = MongoClient('localhost', 27017)
print("MongoDB setup complete!")

---

# Section A: Multiple Choice Questions (40 marks)

Each MCQ is worth 4 marks. For each question:
1. Read the question and think about your answer
2. Use the code cells to explore and verify
3. Check your answer in the collapsible section

---

## Q1(a): E/R Diagram - Many-to-Many Relationships [4 marks]

**Question:**  
An E/R diagram for a zoo-breeding program shows **Keeper** and **Animal** in a **many-to-many** relationship. If this ER model is implemented in a **relational** database, what change is needed?

**Options:**
1. Rename attributes (e.g., `date of birth` → `dob`)
2. The M:N relationship requires a **new entity** (join/associative table)
3. A circular loop in the ER diagram must be removed
4. Spaces in attribute names must be removed

---

### Explore It Yourself

Think about: How would you implement this in SQL? Can you directly link two tables with M:N?

In [None]:
%%sql
-- Try creating the tables. What's missing for M:N relationship?

DROP TABLE IF EXISTS KeeperAnimal;
DROP TABLE IF EXISTS Keeper;
DROP TABLE IF EXISTS Animal;

CREATE TABLE Keeper (
    keeper_id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE Animal (
    animal_id INT PRIMARY KEY,
    name VARCHAR(100),
    species VARCHAR(100)
);

-- How do we link them? Try adding the join table below:
-- CREATE TABLE KeeperAnimal (
--     ???
-- );

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answer: (ii)** The M:N relationship requires a **new entity** (join/associative table)

**Solution - The Join Table:**
```sql
CREATE TABLE KeeperAnimal (
    keeper_id INT,
    animal_id INT,
    assigned_date DATE,
    PRIMARY KEY (keeper_id, animal_id),
    FOREIGN KEY (keeper_id) REFERENCES Keeper(keeper_id),
    FOREIGN KEY (animal_id) REFERENCES Animal(animal_id)
);
```

**Why others are wrong:**
- (i) Renaming is best practice, not structural requirement
- (iii) Circular loops aren't inherently problematic
- (iv) Spaces in names are convention, not structural

</details>

---

## Q1(b): Normalization Level [4 marks]

**Question:**  
Evaluate the normalization level of this table:

| Animal  | Species      | Feed      |
|---------|--------------|----------|
| Simba   | Lion         | Meat      |
| Hiss    | Royal python | Meat      |
| Eeyore  | Donkey       | Silage    |
| Fozzy   | Brown bear   | Nuts      |
| Fozzy   | Brown bear   | Berries   |
| Baloo   | Brown bear   | Nuts      |
| Baloo   | Brown bear   | Berries   |

**Options:**
1. The table is in **1NF** only
2. The table is in **2NF**
3. The table is in **3NF**
4. The table is in **4NF**

---

### Explore It Yourself

In [None]:
%%sql
-- Create and examine the table
DROP TABLE IF EXISTS AnimalFeed;

CREATE TABLE AnimalFeed (
    Animal VARCHAR(50),
    Species VARCHAR(50),
    Feed VARCHAR(50)
);

INSERT INTO AnimalFeed VALUES
('Simba', 'Lion', 'Meat'),
('Hiss', 'Royal python', 'Meat'),
('Eeyore', 'Donkey', 'Silage'),
('Fozzy', 'Brown bear', 'Nuts'),
('Fozzy', 'Brown bear', 'Berries'),
('Baloo', 'Brown bear', 'Nuts'),
('Baloo', 'Brown bear', 'Berries');

SELECT * FROM AnimalFeed;

In [None]:
%%sql
-- Look for redundancy. What's repeated unnecessarily?
SELECT Animal, Species, COUNT(*) as rows
FROM AnimalFeed
GROUP BY Animal, Species
HAVING COUNT(*) > 1;

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answer: (i)** The table is in **1NF** only

**Analysis:**
- **1NF ✓**: All values are atomic (single values per cell)
- **Not 2NF ✗**: If PK is `(Animal, Feed)`, then `Species` depends only on `Animal` (partial dependency)
- **Not 3NF**: Can't be 3NF without being 2NF first

**The redundancy:** `Species` is repeated for each animal's feed type.
- Fozzy → Brown bear appears twice
- Baloo → Brown bear appears twice

**To normalize:** Split into `Animals(Animal, Species)` and `AnimalFeeds(Animal, Feed)`

</details>

---

## Q1(c): SQL GRANT Permissions [4 marks]

**Question:**  
A temporary admin needs to add, update, delete records in the `Students` table. Which `GRANT` is best?

**Options:**
1. `GRANT ALL ON * WITH GRANT OPTION`
2. `GRANT SELECT ON Students TO 'temp';`
3. `GRANT INSERT, UPDATE, SELECT, DELETE ON Students TO 'temp';`
4. `GRANT ALL ON Students TO 'temp';`

---

### Explore It Yourself

In [None]:
%%sql
-- Create a test table and user to experiment with GRANT
DROP TABLE IF EXISTS Students;
CREATE TABLE Students (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    grade CHAR(1)
);

INSERT INTO Students VALUES (1, 'Alice', 'A'), (2, 'Bob', 'B');

-- Think: What's the minimum privilege needed for add/update/delete?
-- What extra powers does 'ALL' include that we don't want?

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answer: (iii)** `GRANT INSERT, UPDATE, SELECT, DELETE ON Students TO 'temp';`

**Why this is correct:**
- Provides exactly what's needed: add (INSERT), update (UPDATE), delete (DELETE), and view (SELECT)
- Follows **principle of least privilege**

**Why others are wrong:**
- (i) `GRANT ALL ON *` - Way too broad! Affects ALL tables, allows re-granting
- (ii) `SELECT` only - Can't add, update, or delete
- (iv) `ALL` includes ALTER, DROP, etc. - Too powerful for a temp admin

</details>

---

## Q1(d): Counting RDF Triples [4 marks]

**Question:**  
Count the RDF triples in this Turtle snippet:

```turtle
chEvents:22498 a event:Event, ecrm:E7_Activity, schema:Event ;
             dct:date "1952-11-30T17:30:00"^^xsd:dateTime ;
             rdfs:label "Cordelle Walcott"@en .
```

**Options:**
1. 3
2. 4
3. 5
4. 6

---

### Explore It Yourself

In [None]:
from rdflib import Graph

# Parse the Turtle and count triples
ttl = """
@prefix chEvents: <http://example.org/events/> .
@prefix event: <http://example.org/event#> .
@prefix ecrm: <http://example.org/ecrm#> .
@prefix schema: <http://schema.org/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

chEvents:22498 a event:Event, ecrm:E7_Activity, schema:Event ;
             dct:date "1952-11-30T17:30:00"^^xsd:dateTime ;
             rdfs:label "Cordelle Walcott"@en .
"""

g = Graph()
g.parse(data=ttl, format='turtle')

print(f"Number of triples: {len(g)}")
print("\nAll triples:")
for s, p, o in g:
    print(f"  {s.split('/')[-1]} -- {p.split('/')[-1].split('#')[-1]} --> {o}")

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answer: (iii)** 5 triples

**The 5 triples are:**
1. `chEvents:22498 a event:Event`
2. `chEvents:22498 a ecrm:E7_Activity`
3. `chEvents:22498 a schema:Event`
4. `chEvents:22498 dct:date "1952-11-30T17:30:00"^^xsd:dateTime`
5. `chEvents:22498 rdfs:label "Cordelle Walcott"@en`

**Key insight:** Each comma-separated object after `a` creates a separate triple!

</details>

---

## Q1(e): XML Well-Formedness [4 marks]

**Question:**  
Why is this XML **not well-formed**?

```xml
<movie>
  <title>Citizen Kane</title>
  <cast>
    <actor>Orson Welles</actor>
    <actor role="Jebediah Leland">Joseph Cotton</actor>
</movie>
```

**Select the reason:**
1. cast should come before title
2. **The cast element is not closed**
3. title should have a lang attribute
4. actor for Orson Welles needs role
5. releaseYear is missing

---

### Explore It Yourself

In [None]:
from lxml import etree

# Try parsing this XML - what error do you get?
xml_snippet = """
<movie>
  <title>Citizen Kane</title>
  <cast>
    <actor>Orson Welles</actor>
    <actor role="Jebediah Leland">Joseph Cotton</actor>
</movie>
"""

try:
    root = etree.fromstring(xml_snippet)
    print("XML parsed successfully!")
except etree.XMLSyntaxError as e:
    print(f"XMLSyntaxError: {e}")

In [None]:
# Now try the corrected version
xml_fixed = """
<movie>
  <title>Citizen Kane</title>
  <cast>
    <actor>Orson Welles</actor>
    <actor role="Jebediah Leland">Joseph Cotton</actor>
  </cast>
</movie>
"""

try:
    root = etree.fromstring(xml_fixed)
    print("XML parsed successfully!")
    print(etree.tostring(root, pretty_print=True, encoding='unicode'))
except etree.XMLSyntaxError as e:
    print(f"XMLSyntaxError: {e}")

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answer: (ii)** The cast element is not closed

**Explanation:**
- `<cast>` is opened but never closed with `</cast>`
- This violates **well-formedness** (every start tag needs an end tag)

**Why others don't break well-formedness:**
- (i), (iii), (iv), (v) might affect **validity** against a schema, but not well-formedness
- Well-formedness = structural rules (matching tags, proper nesting)
- Validity = conformance to schema rules (required attributes, elements)

</details>

---

## Q1(f): XML Schema Validity [4 marks]

**Question:**  
The same XML is **not valid** per the schema. Ignoring well-formedness issues, which statements identify the **schema violations**?

Given schema requires:
- `<title>` with required `lang` attribute
- `<releaseYear>` element
- `<cast>` element

**Select all that apply:**
1. cast should come before title
2. cast is not closed
3. **title should have a lang attribute**
4. actor for Orson Welles must have role
5. **releaseYear is missing**

---

### Explore It Yourself

In [None]:
from lxml import etree

# XML with cast closed, but missing lang and releaseYear
xml_test = """
<movie>
  <title>Citizen Kane</title>
  <cast>
    <actor>Orson Welles</actor>
  </cast>
</movie>
"""

# Simple schema requiring lang attribute and releaseYear
xsd = """
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="movie">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="title">
          <xsd:complexType mixed="true">
            <xsd:attribute name="lang" use="required"/>
          </xsd:complexType>
        </xsd:element>
        <xsd:element name="releaseYear" type="xsd:integer"/>
        <xsd:element name="cast">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="actor" maxOccurs="unbounded" type="xsd:string"/>
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>
"""

xml_doc = etree.fromstring(xml_test)
xsd_doc = etree.fromstring(xsd)
schema = etree.XMLSchema(xsd_doc)

if schema.validate(xml_doc):
    print("XML is valid!")
else:
    print("XML is NOT valid. Errors:")
    for error in schema.error_log:
        print(f"  - {error.message}")

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answers: (iii) and (v)**

- **(iii)** title should have a lang attribute - Schema says `use="required"`
- **(v)** releaseYear is missing - Schema requires this element

**Why others are wrong:**
- (i) Order doesn't matter with `<xs:all>` (any order allowed)
- (ii) Unclosed tag is **well-formedness**, not validity
- (iv) `role` attribute is likely optional

</details>

---

## Q1(g): MongoDB vs SQL [4 marks]

**Question:**  
Which statements comparing MongoDB with SQL are **true**?

1. Unlike SQL, MongoDB has no explicit indexes
2. Unlike MongoDB, a SQL DBMS can guarantee ACID compliance in all transactions
3. A single MongoDB update would often map to more than one command in SQL
4. A MongoDB document can have a more complex structure than an SQL table

---

### Explore It Yourself

In [None]:
# MongoDB example - complex nested document
from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.test_db

# A single MongoDB document with nested structure
animal_doc = {
    "name": "Simba",
    "species": "Lion",
    "feeds": ["Meat", "Fish"],  # Array!
    "keeper": {                   # Nested object!
        "name": "John",
        "shift": "Morning"
    },
    "medical_history": [          # Array of objects!
        {"date": "2023-01-15", "treatment": "Vaccination"},
        {"date": "2023-06-20", "treatment": "Checkup"}
    ]
}

# Insert and display
db.animals.drop()
db.animals.insert_one(animal_doc)

print("MongoDB document:")
import json
for doc in db.animals.find():
    doc.pop('_id')  # Remove MongoDB's auto-generated ID for cleaner display
    print(json.dumps(doc, indent=2))

In [None]:
# In SQL, the same data would need MULTIPLE tables
print("""In SQL, you would need:

1. Animals table
2. AnimalFeeds table (for the array)
3. Keepers table
4. AnimalKeepers table (relationship)
5. MedicalHistory table

That's 5 tables vs 1 MongoDB document!""")

In [None]:
# MongoDB DOES support indexes!
db.animals.create_index("name")
print("Indexes on animals collection:")
for index in db.animals.list_indexes():
    print(f"  - {index}")

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answers: (ii), (iii), (iv)**

- **(ii) ✓** SQL guarantees ACID; MongoDB traditionally has limited ACID support
- **(iii) ✓** Updating nested data in MongoDB = one command; in SQL = multiple UPDATEs across tables
- **(iv) ✓** MongoDB documents can have arrays, nested objects - much more complex than flat SQL rows

**Why (i) is FALSE:**
- MongoDB DOES support indexes: `db.collection.createIndex({ field: 1 })`

</details>

---

## Q1(h): Precision vs Recall [4 marks]

**Question:**  
A researcher wants documents for quick digitization, then discards irrelevant items. Should the IR system focus on **precision** or **recall**?

**Options:**
1. Focus on **precision**; recall is less critical
2. Focus on **recall**; precision is less critical

---

### Think About It

- **Precision** = Of items retrieved, what % are relevant? (Avoid junk)
- **Recall** = Of all relevant items, what % did we find? (Don't miss anything)

The researcher can easily discard irrelevant items. What matters more?

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answer: (i)** Focus on **precision**

**Reasoning:**
- Researcher can easily discard irrelevant items manually
- Don't want to be overwhelmed by false positives (low precision)
- High precision = fewer irrelevant items to sort through

**When to prefer recall:**
- Medical diagnosis (can't miss a disease)
- Legal discovery (must find all relevant documents)

</details>

---

## Q1(i): Graphs vs Trees [4 marks]

**Question:**  
What distinguishes a **general graph** from a **tree**?

1. A graph does not need a root node; a tree does
2. A tree can include text; a graph cannot
3. A node in a tree has exactly one parent node; a graph has no such constraint
4. A tree does not need a root node; a graph does

---

### Think About It

Consider:
- File system (tree) vs Social network (graph)
- XML document (tree) vs RDF (graph)

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answers: (i) and (iii)**

- **(i) ✓** Trees must have exactly one root; graphs don't need a root
- **(iii) ✓** In a tree, each node (except root) has exactly ONE parent; graphs can have multiple "parents" (incoming edges)

**Why others are wrong:**
- (ii) Both can contain text/data - FALSE
- (iv) Opposite of (i) - FALSE

**Key insight:** All trees are graphs, but not all graphs are trees!

</details>

---

## Q1(j): SQL Joins [4 marks]

**Question:**  
Which statements about SQL joins are **correct**?

1. A LEFT JOIN will produce at least as many rows as an INNER JOIN
2. An INNER JOIN will produce at least as many rows as a LEFT JOIN
3. A CROSS JOIN will produce at least as many rows as a LEFT JOIN
4. A LEFT JOIN will produce at least as many rows as a CROSS JOIN
5. No type of join can produce more rows than a CROSS JOIN

---

### Explore It Yourself

In [None]:
%%sql
-- Create test tables
DROP TABLE IF EXISTS TableA;
DROP TABLE IF EXISTS TableB;

CREATE TABLE TableA (id INT, val VARCHAR(10));
CREATE TABLE TableB (id INT, val VARCHAR(10));

INSERT INTO TableA VALUES (1, 'A1'), (2, 'A2'), (3, 'A3');
INSERT INTO TableB VALUES (1, 'B1'), (2, 'B2');

SELECT 'TableA' as tbl, COUNT(*) as rows FROM TableA
UNION ALL
SELECT 'TableB', COUNT(*) FROM TableB;

In [None]:
%%sql
-- Compare different joins
SELECT 'INNER JOIN' as join_type, COUNT(*) as rows
FROM TableA a INNER JOIN TableB b ON a.id = b.id
UNION ALL
SELECT 'LEFT JOIN', COUNT(*)
FROM TableA a LEFT JOIN TableB b ON a.id = b.id
UNION ALL
SELECT 'CROSS JOIN', COUNT(*)
FROM TableA a CROSS JOIN TableB b;

<details>
<summary><b>Click to reveal answer</b></summary>

**Correct Answers: (i), (iii), (v)**

From our example (TableA=3 rows, TableB=2 rows):
- INNER JOIN: 2 rows (only matching)
- LEFT JOIN: 3 rows (all left + matches)
- CROSS JOIN: 6 rows (3 × 2 = Cartesian product)

**Analysis:**
- **(i) ✓** LEFT JOIN ≥ INNER JOIN (3 ≥ 2)
- **(ii) ✗** INNER JOIN < LEFT JOIN (2 < 3)
- **(iii) ✓** CROSS JOIN ≥ LEFT JOIN (6 ≥ 3)
- **(iv) ✗** LEFT JOIN < CROSS JOIN (3 < 6)
- **(v) ✓** CROSS JOIN is always maximum (rows × rows)

</details>

---

# Section B: Open-Ended Questions (60 marks)

Answer **TWO** of the following three questions.

---

# Question 2: Bird Spotter's Database [30 marks]

## Setup: Create the Sightings Table

First, let's create the denormalized table from the exam:

In [None]:
%%sql
-- Create the denormalized Sightings table as shown in the exam
DROP TABLE IF EXISTS Sightings;

CREATE TABLE Sightings (
    Species VARCHAR(100),
    Date DATE,
    NumberSighted INT,
    ConservationStatus VARCHAR(50),
    NatureReserve VARCHAR(100),
    Location VARCHAR(50)
);

INSERT INTO Sightings VALUES
('Bar-tailed godwit', '2021-04-21', 31, 'Least concern', 'Rainham Marshes', '51.5N 0.2E'),
('Wood pigeon', '2021-04-21', 31, 'Least concern', 'Rainham Marshes', '51.5N 0.2E'),
('Greater spotted woodpecker', '2021-06-13', 1, 'Least concern', 'Epping Forest', '51.6N 0.0E'),
('European turtle dove', '2021-06-13', 2, 'Vulnerable', 'Epping Forest', '51.6N 0.0E'),
('Wood pigeon', '2021-06-13', 2, 'Least concern', 'Epping Forest', '51.6N 0.0E'),
('Great bustard', '2020-04-15', 3, 'Vulnerable', 'Salisbury Plain', '51.1N -1.8W'),
('Bar-tailed godwit', '2020-04-20', 53, 'Least concern', 'Rainham Marshes', '51.5N 0.2E');

SELECT * FROM Sightings;

---

## Q2(a): Query - Bird Types Since 2021 [4 marks]

**Question:** Give a query to retrieve all bird types seen since the first of January 2021.

**Hint:** Use `SELECT DISTINCT` and `WHERE` with date comparison.

### Your Answer

In [None]:
%%sql
-- Write your query here:


<details>
<summary><b>Click to check answer</b></summary>

```sql
SELECT DISTINCT Species
FROM Sightings
WHERE Date >= '2021-01-01';
```

**Key points:**
- `DISTINCT` removes duplicate species
- Date format: `'YYYY-MM-DD'`
- `>=` includes January 1st

</details>

---

## Q2(b): Is the Table in 1NF? [3 marks]

**Question:** Is this table in 1NF? Explain your reasoning.

### Your Answer

*Write your explanation in the cell below:*

*Double-click to edit and write your answer here*



<details>
<summary><b>Click to check answer</b></summary>

**Yes**, the table is in 1NF because:
1. Each cell contains a single atomic value (no lists or arrays)
2. Each row is unique (Species + Date combination)
3. All entries in a column are of the same data type

</details>

---

## Q2(c): Normalize the Data [7 marks]

**Question:** Normalise this data, listing the tables that result and their primary and foreign keys.

### Your Answer

Create the normalized tables below:

In [None]:
%%sql
-- Drop existing tables
DROP TABLE IF EXISTS SightingsNorm;
DROP TABLE IF EXISTS Species;
DROP TABLE IF EXISTS NatureReserves;

-- Create your normalized tables here:
-- 1. Species table (species_name PK, conservation_status)


-- 2. NatureReserves table (reserve_name PK, location)


-- 3. Sightings table (species_name FK, reserve_name FK, date, number_sighted)


<details>
<summary><b>Click to check answer</b></summary>

```sql
-- 1. Species table
CREATE TABLE Species (
    species_name VARCHAR(100) PRIMARY KEY,
    conservation_status VARCHAR(50)
);

-- 2. NatureReserves table
CREATE TABLE NatureReserves (
    reserve_name VARCHAR(100) PRIMARY KEY,
    location VARCHAR(50)
);

-- 3. Sightings table
CREATE TABLE SightingsNorm (
    species_name VARCHAR(100),
    reserve_name VARCHAR(100),
    date DATE,
    number_sighted INT,
    PRIMARY KEY (species_name, reserve_name, date),
    FOREIGN KEY (species_name) REFERENCES Species(species_name),
    FOREIGN KEY (reserve_name) REFERENCES NatureReserves(reserve_name)
);
```

</details>

---

## Q2(d): What Normal Form? [4 marks]

**Question:** What normal form have you reached? Explain your conclusion.

### Your Answer

*Double-click to edit and write your answer here*



<details>
<summary><b>Click to check answer</b></summary>

**Third Normal Form (3NF)**

The tables are in 3NF because:
1. **1NF ✓**: All values are atomic
2. **2NF ✓**: No partial dependencies (non-key attributes depend on entire PK)
3. **3NF ✓**: No transitive dependencies (non-key attributes don't depend on other non-key attributes)

</details>

---

## Q2(e): Query with JOIN [5 marks]

**Question:** Give a query for your new tables to retrieve bird types and their conservation status for birds seen since the first of January 2021.

First, let's set up the normalized tables with data:

In [None]:
%%sql
-- Setup normalized tables with data
DROP TABLE IF EXISTS SightingsNorm;
DROP TABLE IF EXISTS Species;
DROP TABLE IF EXISTS NatureReserves;

CREATE TABLE Species (
    species_name VARCHAR(100) PRIMARY KEY,
    conservation_status VARCHAR(50)
);

CREATE TABLE NatureReserves (
    reserve_name VARCHAR(100) PRIMARY KEY,
    location VARCHAR(50)
);

CREATE TABLE SightingsNorm (
    species_name VARCHAR(100),
    reserve_name VARCHAR(100),
    date DATE,
    number_sighted INT,
    PRIMARY KEY (species_name, reserve_name, date),
    FOREIGN KEY (species_name) REFERENCES Species(species_name),
    FOREIGN KEY (reserve_name) REFERENCES NatureReserves(reserve_name)
);

-- Insert data
INSERT INTO Species VALUES
('Bar-tailed godwit', 'Least concern'),
('Wood pigeon', 'Least concern'),
('Greater spotted woodpecker', 'Least concern'),
('European turtle dove', 'Vulnerable'),
('Great bustard', 'Vulnerable');

INSERT INTO NatureReserves VALUES
('Rainham Marshes', '51.5N 0.2E'),
('Epping Forest', '51.6N 0.0E'),
('Salisbury Plain', '51.1N -1.8W');

INSERT INTO SightingsNorm VALUES
('Bar-tailed godwit', 'Rainham Marshes', '2021-04-21', 31),
('Wood pigeon', 'Rainham Marshes', '2021-04-21', 31),
('Greater spotted woodpecker', 'Epping Forest', '2021-06-13', 1),
('European turtle dove', 'Epping Forest', '2021-06-13', 2),
('Wood pigeon', 'Epping Forest', '2021-06-13', 2),
('Great bustard', 'Salisbury Plain', '2020-04-15', 3),
('Bar-tailed godwit', 'Rainham Marshes', '2020-04-20', 53);

SELECT 'Setup complete!' as status;

### Your Answer

In [None]:
%%sql
-- Write your JOIN query here:


<details>
<summary><b>Click to check answer</b></summary>

```sql
SELECT DISTINCT s.species_name, sp.conservation_status
FROM SightingsNorm s
JOIN Species sp ON s.species_name = sp.species_name
WHERE s.date >= '2021-01-01';
```

</details>

---

## Q2(f): Transactions [7 marks]

**Question:** Would a transaction make a difference for the bird spotter's next set of updates? Give example SQL operations to illustrate your argument.

### Your Answer

In [None]:
%%sql
-- Write a transaction example here:
-- START TRANSACTION;
-- ... your statements ...
-- COMMIT; or ROLLBACK;


<details>
<summary><b>Click to check answer</b></summary>

**Yes**, transactions would help ensure data integrity.

```sql
START TRANSACTION;

-- Insert new sighting
INSERT INTO SightingsNorm (species_name, reserve_name, date, number_sighted)
VALUES ('European turtle dove', 'Epping Forest', '2021-09-07', 3);

-- Update conservation status
UPDATE Species
SET conservation_status = 'Endangered'
WHERE species_name = 'European turtle dove';

COMMIT;
-- Or ROLLBACK; if something fails
```

**Benefits:**
- **Atomicity**: All statements succeed or all fail together
- **Consistency**: Database stays in valid state
- **Isolation**: Other users don't see partial updates

</details>

---

# Question 3: MEI Music Encoding [30 marks]

## Setup: Parse the MEI XML

In [None]:
from lxml import etree
from IPython.display import display, Markdown
import json

mei_data = """
<measure>
  <staff n="2">
    <layer n="1">
      <chord xml:id="d13e1" dur="8" dur.ppq="12" stem.dir="up">
        <note xml:id="d1e101" pname="c" oct="5"/>
        <note xml:id="d1e118" pname="a" oct="4"/>
        <note xml:id="d1e136" pname="c" oct="4"/>
      </chord>
    </layer>
  </staff>
  <staff n="3">
    <layer n="1">
      <chord xml:id="d17e1" dur="8" dur.ppq="12" stem.dir="up">
        <note xml:id="d1e157" pname="f" oct="3"/>
        <note xml:id="d1e174" pname="f" oct="2"/>
      </chord>
    </layer>
  </staff>
</measure>
"""

root_mei = etree.fromstring(mei_data)
print("MEI snippet parsed successfully!")
print(etree.tostring(root_mei, pretty_print=True, encoding='unicode'))

---

## Q3(a): List Element Types [2 marks]

**Question:** List all the element types you can see in this code.

### Your Answer

*Double-click to edit and list the element types:*

1. 
2. 
3. 
4. 
5. 

<details>
<summary><b>Click to check answer</b></summary>

1. `<measure>`
2. `<staff>`
3. `<layer>`
4. `<chord>`
5. `<note>`

</details>

---

## Q3(b): Fix the XPath [3 marks]

**Question:** This XPath is incorrect:
```xpath
/staff[n="2"]/layer/chord[note/@pname="c"]
```

Give an XPath expression that would work.

### Your Answer

In [None]:
# Try the incorrect XPath first
try:
    result = root_mei.xpath('/staff[n="2"]/layer/chord[note/@pname="c"]')
    print(f"Incorrect XPath found: {len(result)} results")
except Exception as e:
    print(f"Error: {e}")

# Now write the correct XPath:
correct_xpath = ""  # Fill this in!

if correct_xpath:
    result = root_mei.xpath(correct_xpath)
    print(f"\nCorrect XPath found: {len(result)} results")
    for r in result:
        print(etree.tostring(r, pretty_print=True, encoding='unicode'))

<details>
<summary><b>Click to check answer</b></summary>

```xpath
//staff[@n="2"]/layer/chord[note/@pname="c"]
```

**Fixes:**
1. `//staff` instead of `/staff` - search anywhere, not just root
2. `[@n="2"]` instead of `[n="2"]` - `@` indicates attribute

</details>

---

## Q3(c)(i): Translate to JSON [5 marks]

**Question:** Translate the first chord element into JSON as effectively as you can.

### Your Answer

In [None]:
# Build a JSON representation of the first chord
chord_el = root_mei.xpath('//chord')[0]  # Get first chord

# Create your JSON structure here:
chord_dict = {
    # Fill in the structure
}

print(json.dumps(chord_dict, indent=2))

<details>
<summary><b>Click to check answer</b></summary>

```python
chord_dict = {
    "xml_id": "d13e1",
    "dur": 8,
    "dur_ppq": 12,
    "stem_dir": "up",
    "notes": [
        {"xml_id": "d1e101", "pname": "c", "oct": 5},
        {"xml_id": "d1e118", "pname": "a", "oct": 4},
        {"xml_id": "d1e136", "pname": "c", "oct": 4}
    ]
}
```

</details>

---

## Q3(c)(ii): MongoDB Find Command [5 marks]

**Question:** Give a MongoDB find command that would return only chords with upward stems that have f in one of their notes.

### Your Answer

In [None]:
# Setup: Insert chord documents into MongoDB
db = mongo_client.music_db
db.chords.drop()

db.chords.insert_many([
    {
        "xml_id": "d13e1",
        "dur": 8,
        "stem_dir": "up",
        "notes": [
            {"pname": "c", "oct": 5},
            {"pname": "a", "oct": 4},
            {"pname": "c", "oct": 4}
        ]
    },
    {
        "xml_id": "d17e1",
        "dur": 8,
        "stem_dir": "up",
        "notes": [
            {"pname": "f", "oct": 3},
            {"pname": "f", "oct": 2}
        ]
    }
])

print("Chords inserted. Now write your find query:")

In [None]:
# Write your MongoDB find query here:
query = {
    # Fill in the query
}

results = db.chords.find(query)
for doc in results:
    doc.pop('_id')
    print(json.dumps(doc, indent=2))

<details>
<summary><b>Click to check answer</b></summary>

```python
query = {
    "stem_dir": "up",
    "notes.pname": "f"
}
```

Or using `$elemMatch`:
```python
query = {
    "stem_dir": "up",
    "notes": {"$elemMatch": {"pname": "f"}}
}
```

</details>

---

## Q3(d)(i): Why rdfs:member? [3 marks]

**Question:** Why use `rdfs:member` instead of a new `mei:hasNotes` property?

### Your Answer

*Double-click to edit and write your answer here*



<details>
<summary><b>Click to check answer</b></summary>

Using `rdfs:member` leverages an **existing W3C standard** vocabulary, which:
1. **Maximizes interoperability** - other linked data tools understand it
2. **Avoids redundancy** - no need to define what already exists
3. **Follows best practices** - reuse before creating new terms

</details>

---

## Q3(d)(ii): RDF for First Chord [5 marks]

**Question:** Give some RDF (in Turtle) for the first chord element.

### Your Answer

In [None]:
# Write your Turtle RDF here:
turtle_rdf = """
@prefix mei: <http://example.org/mei#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

# Add your RDF triples here:

"""

# Test if it parses correctly
from rdflib import Graph
g = Graph()
try:
    g.parse(data=turtle_rdf, format='turtle')
    print(f"Valid Turtle! Contains {len(g)} triples.")
except Exception as e:
    print(f"Parse error: {e}")

<details>
<summary><b>Click to check answer</b></summary>

```turtle
@prefix mei: <http://example.org/mei#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mei:chord_d13e1 a mei:Chord ;
    mei:duration "8"^^xsd:integer ;
    mei:stemDirection "up" ;
    rdfs:member mei:note_d1e101 ,
                mei:note_d1e118 ,
                mei:note_d1e136 .

mei:note_d1e101 a mei:Note ;
    mei:pitchName "c" ;
    mei:octave "5"^^xsd:integer .

mei:note_d1e118 a mei:Note ;
    mei:pitchName "a" ;
    mei:octave "4"^^xsd:integer .

mei:note_d1e136 a mei:Note ;
    mei:pitchName "c" ;
    mei:octave "4"^^xsd:integer .
```

</details>

---

## Q3(e): Compare XML, MongoDB, and Linked Data [7 marks]

**Question:** How do these three models differ in what they might offer for music notation? What advantages and disadvantages does each have?

### Your Answer

*Double-click to edit and write your comparison here*



<details>
<summary><b>Click to check answer</b></summary>

| Aspect | XML/MEI | MongoDB/JSON | Linked Data/RDF |
|--------|---------|--------------|------------------|
| **Structure** | Hierarchical tree | Flexible documents | Graph (triples) |
| **Schema** | Strict (XSD) | Schema-less | Ontologies (OWL) |
| **Querying** | XPath, XQuery | MongoDB queries | SPARQL |

**XML/MEI:**
- ✓ Industry standard for music
- ✓ Strong validation
- ✗ Verbose
- ✗ Limited cross-document linking

**MongoDB/JSON:**
- ✓ Flexible schema
- ✓ Fast queries, good for web apps
- ✗ No standard music schema
- ✗ Less validation

**Linked Data/RDF:**
- ✓ Links to global knowledge
- ✓ Semantic reasoning
- ✗ Complex to implement
- ✗ Steeper learning curve

</details>

---

# Question 4: Zoo Database [30 marks]

## Setup: Create Zoo Tables

In [None]:
%%sql
-- Create Zoo database tables
DROP TABLE IF EXISTS Animal;
DROP TABLE IF EXISTS Species;
DROP TABLE IF EXISTS Enclosure;
DROP TABLE IF EXISTS Zoo;

CREATE TABLE Zoo (
    name VARCHAR(255) PRIMARY KEY,
    country VARCHAR(255)
);

CREATE TABLE Enclosure (
    name VARCHAR(255) PRIMARY KEY,
    location VARCHAR(255),
    zoo_name VARCHAR(255),
    FOREIGN KEY (zoo_name) REFERENCES Zoo(name)
);

CREATE TABLE Species (
    latin_name VARCHAR(255) PRIMARY KEY,
    conservation_status VARCHAR(50)
);

CREATE TABLE Animal (
    identifier INT AUTO_INCREMENT PRIMARY KEY,
    date_of_birth DATE,
    latin_name VARCHAR(255),
    enclosure_name VARCHAR(255),
    FOREIGN KEY (latin_name) REFERENCES Species(latin_name),
    FOREIGN KEY (enclosure_name) REFERENCES Enclosure(name)
);

-- Insert sample data
INSERT INTO Zoo VALUES ('Singapore Zoo', 'Singapore'), ('London Zoo', 'UK');

INSERT INTO Enclosure VALUES 
('Tropical Zone', 'Mandai Lake', 'Singapore Zoo'),
('Savannah Zone', 'Outer Gardens', 'Singapore Zoo'),
('Reptile House', 'Regents Park', 'London Zoo');

INSERT INTO Species VALUES 
('Buceros bicornis', 'Vulnerable'),
('Panthera leo', 'Vulnerable'),
('Elephas maximus', 'Endangered');

INSERT INTO Animal (date_of_birth, latin_name, enclosure_name) VALUES
('2010-04-10', 'Buceros bicornis', 'Tropical Zone'),
('2012-06-15', 'Panthera leo', 'Savannah Zone'),
('2005-02-01', 'Elephas maximus', 'Reptile House'),
('2015-09-09', 'Buceros bicornis', 'Savannah Zone');

SELECT 'Zoo database setup complete!' as status;

---

## Q4(a): List Tables and Fields [4 marks]

**Question:** List the tables and their fields for an SQL implementation of this design. Indicate primary keys for each table.

### Your Answer

*Double-click to edit and list your tables:*



<details>
<summary><b>Click to check answer</b></summary>

**1. Zoo**
- name (PK)
- country

**2. Enclosure**
- name (PK)
- location
- zoo_name (FK → Zoo)

**3. Species**
- latin_name (PK)
- conservation_status

**4. Animal**
- identifier (PK)
- date_of_birth
- latin_name (FK → Species)
- enclosure_name (FK → Enclosure)

</details>

---

## Q4(b): CREATE TABLE Commands [6 marks]

**Question:** Give SQL CREATE TABLE commands for any TWO of your tables, including any foreign keys.

### Your Answer

In [None]:
%%sql
-- Write your CREATE TABLE commands here (for 2 tables):


<details>
<summary><b>Click to check answer</b></summary>

```sql
CREATE TABLE Zoo (
    name VARCHAR(255) PRIMARY KEY,
    country VARCHAR(255) NOT NULL
);

CREATE TABLE Enclosure (
    name VARCHAR(255) PRIMARY KEY,
    location VARCHAR(255),
    zoo_name VARCHAR(255) NOT NULL,
    FOREIGN KEY (zoo_name) REFERENCES Zoo(name)
);
```

</details>

---

## Q4(c): Count Species in Singapore Zoo [5 marks]

**Question:** Give a single SQL query to find out how many species are housed in the zoo which has the name 'Singapore Zoo'.

### Your Answer

In [None]:
%%sql
-- Write your query here:


<details>
<summary><b>Click to check answer</b></summary>

```sql
SELECT COUNT(DISTINCT a.latin_name) AS species_count
FROM Animal a
JOIN Enclosure e ON a.enclosure_name = e.name
WHERE e.zoo_name = 'Singapore Zoo';
```

</details>

---

## Q4(d): Oldest Animal Per Zoo [5 marks]

**Question:** Give a single SQL query to find out the date of birth of the oldest animal of the species called 'Buceros bicornis' in each zoo.

### Your Answer

In [None]:
%%sql
-- Write your query here:


<details>
<summary><b>Click to check answer</b></summary>

```sql
SELECT e.zoo_name, MIN(a.date_of_birth) AS oldest_birth_date
FROM Animal a
JOIN Enclosure e ON a.enclosure_name = e.name
WHERE a.latin_name = 'Buceros bicornis'
GROUP BY e.zoo_name;
```

**Key points:**
- `MIN(date_of_birth)` = earliest date = oldest animal
- `GROUP BY e.zoo_name` = one result per zoo

</details>

---

## Q4(e): XML or RDF Representation [10 marks]

**Question:** Choose ONE of XML or RDF and:
1. BRIEFLY assess the suitability of this model for your chosen technology
2. Give some instance data for the database in your chosen technology

### Your Answer

*Double-click to write your suitability assessment:*



In [None]:
# Write your instance data here (XML or RDF/Turtle):
instance_data = """

"""

print(instance_data)

<details>
<summary><b>Click to check answer (RDF example)</b></summary>

**Suitability (RDF):**
- Natural graph structure for Zoo→Enclosure→Animal→Species
- Can link to external data (IUCN Red List, Wikipedia)
- Flexible - easy to add new properties

**Instance Data (Turtle):**
```turtle
@prefix zoo: <http://example.org/zoo#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

zoo:SingaporeZoo a zoo:Zoo ;
    zoo:name "Singapore Zoo" ;
    zoo:country "Singapore" .

zoo:TropicalZone a zoo:Enclosure ;
    zoo:name "Tropical Zone" ;
    zoo:location "Mandai Lake" ;
    zoo:partOf zoo:SingaporeZoo .

zoo:BucerosBicornis a zoo:Species ;
    zoo:latinName "Buceros bicornis" ;
    zoo:conservationStatus "Vulnerable" .

zoo:Animal001 a zoo:Animal ;
    zoo:dateOfBirth "2010-04-10"^^xsd:date ;
    zoo:species zoo:BucerosBicornis ;
    zoo:livesIn zoo:TropicalZone .
```

</details>

---

# Congratulations!

You've completed all the practice questions for the September 2021 exam.

## Next Steps

1. Review any questions you found difficult
2. Read the solution sheet for detailed explanations
3. Try modifying the queries to answer related questions
4. Practice with other past papers

Good luck with your exam!