<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/march-2024/notebook-march-2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, and SPARQL.

In [None]:
# === MySQL Setup ===
!apt-get update -qq > /dev/null
!apt-get install -y -qq mysql-server > /dev/null
!service mysql start
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === SQL Magic ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0
%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

# === SPARQL Magic (cellspell) ===
!pip install "cellspell[sparql] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.sparql

In [None]:
# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

!mongo --quiet --eval 'print("MongoDB ready!")'

# === MongoDB Magic (cellspell) ===
!pip install "cellspell[mongodb] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.mongodb
%mongodb mongodb://localhost:27017/exam_db

In [None]:
%%writefile carnegie_hall.ttl
@prefix schema: <http://schema.org/> .
@prefix gnd: <http://d-nb.info/standards/elementset/gnd#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix chm: <http://data.carnegiehall.org/model/> .
@prefix chi: <http://data.carnegiehall.org/instruments/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix mo: <http://purl.org/ontology/mo/> .

<http://data.carnegiehall.org/names/18065> a chm:Entity, schema:Person ;
    rdfs:label "Maria Callas" ;
    gnd:playedInstrument chi:61 ;
    schema:birthDate "1923-12-02"^^xsd:date ;
    schema:birthPlace <http://sws.geonames.org/5128581/> ;
    schema:deathDate "1977-09-16"^^xsd:date ;
    schema:name "Maria Callas" ;
    skos:exactMatch <http://dbpedia.org/resource/Maria_Callas>,
        <http://id.loc.gov/authorities/names/n50032183>,
        wd:Q128297,
        <https://musicbrainz.org/artist/9dee40b2-25ad-404c-9c9a-139feffd4b57> .

chi:61 a mo:Instrument ;
    rdfs:label "soprano" .

wd:Q128297 wdt:P1477 "Maria Anna Cecilia Sofia Kalogeropoulou"@en,
    "Μαρία Άννα Καικιλία Σοφία Καλογεροπούλου"@el .

wdt:P1477 schema:description "full name of a person at birth, if different from their current, generally used name"@en .

<http://data.carnegiehall.org/names/52432> a chm:Entity, schema:Person ;
    rdfs:label "Joan Sutherland" ;
    gnd:playedInstrument chi:61 ;
    schema:birthDate "1926-11-07"^^xsd:date ;
    schema:name "Joan Sutherland" ;
    skos:exactMatch wd:Q229444 .

wd:Q229444 wdt:P1477 "Joan Alston Sutherland"@en .

<http://data.carnegiehall.org/names/12345> a chm:Entity, schema:Person ;
    rdfs:label "Luciano Pavarotti" ;
    gnd:playedInstrument chi:62 ;
    schema:birthDate "1935-10-12"^^xsd:date ;
    schema:name "Luciano Pavarotti" ;
    skos:exactMatch wd:Q36767 .

chi:62 a mo:Instrument ;
    rdfs:label "tenor" .

wd:Q36767 wdt:P1477 "Luciano Pavarotti"@en .

---

# Question 2: Carnegie Hall RDF/Linked Data [30 marks]

## Context

RDF data from the Carnegie Hall data lab describing Maria Callas.

## Q2(b)(ii) [5 marks]


In [None]:
%%sparql --file carnegie_hall.ttl
# TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sparql
%%sparql --file carnegie_hall.ttl
PREFIX gnd: <http://d-nb.info/standards/elementset/gnd#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX chi: <http://data.carnegiehall.org/instruments/>

SELECT ?person ?personLabel ?birthName
WHERE {
    ?person gnd:playedInstrument chi:61 .
    OPTIONAL { ?person rdfs:label ?personLabel }
    ?person skos:exactMatch ?wdEntity .
    ?wdEntity wdt:P1477 ?birthName .
}
```

</details>


---

# Question 3: UK Government Exam Attainment Data [30 marks]

## Q3(c) [15 marks]


<details>
<summary>Click to reveal solution</summary>

**Q3(c) SOLUTION - Model explanation**

RELATIONAL MODEL DESIGN:

Tables:
1. CharacteristicType - Gender, FSM, All students, etc.
2. Characteristic - Male, Female, Eligible for FSM, etc.
3. SubjectArea - Maths, Classical Studies, etc.
4. Subject - Additional Mathematics, Classical Greek, etc.
5. GradeMetric - Total Students, Number at grade A*, etc.
6. Attainment - Fact table with values

Design Choices:
- Separate CharacteristicType: Normalizes type/characteristic hierarchy
- Subject linked to SubjectArea: Enforces categorization
- GradeMetric table: Allows adding metrics without schema change
- NULL for Value: Handles "not applicable" properly
- Decimal for Value: Handles counts and percentages

Normal Forms:
- 1NF: All atomic values, proper primary keys
- 2NF: No partial dependencies
- 3NF: No transitive dependencies
- BCNF: All determinants are candidate keys

</details>


In [None]:
%%sql
-- Q3(c) SOLUTION - CREATE TABLE statements
DROP TABLE IF EXISTS Attainment;
DROP TABLE IF EXISTS GradeMetric;
DROP TABLE IF EXISTS Subject;
DROP TABLE IF EXISTS SubjectArea;
DROP TABLE IF EXISTS Characteristic;
DROP TABLE IF EXISTS CharacteristicType;

CREATE TABLE CharacteristicType (
    CharTypeId INT PRIMARY KEY AUTO_INCREMENT,
    TypeName VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE Characteristic (
    CharId INT PRIMARY KEY AUTO_INCREMENT,
    CharTypeId INT NOT NULL,
    CharName VARCHAR(100) NOT NULL,
    FOREIGN KEY (CharTypeId) REFERENCES CharacteristicType(CharTypeId),
    UNIQUE (CharTypeId, CharName)
);

CREATE TABLE SubjectArea (
    SubjectAreaId INT PRIMARY KEY AUTO_INCREMENT,
    AreaName VARCHAR(100) NOT NULL UNIQUE
);

CREATE TABLE Subject (
    SubjectId INT PRIMARY KEY AUTO_INCREMENT,
    SubjectName VARCHAR(100) NOT NULL UNIQUE,
    SubjectAreaId INT NOT NULL,
    FOREIGN KEY (SubjectAreaId) REFERENCES SubjectArea(SubjectAreaId)
);

CREATE TABLE GradeMetric (
    MetricId INT PRIMARY KEY AUTO_INCREMENT,
    MetricName VARCHAR(50) NOT NULL UNIQUE,
    MetricType ENUM('count', 'cumulative', 'percentage') NOT NULL
);

CREATE TABLE Attainment (
    AttainmentId INT PRIMARY KEY AUTO_INCREMENT,
    CharId INT NOT NULL,
    SubjectId INT NOT NULL,
    MetricId INT NOT NULL,
    Value DECIMAL(10,4),
    AcademicYear VARCHAR(9),
    FOREIGN KEY (CharId) REFERENCES Characteristic(CharId),
    FOREIGN KEY (SubjectId) REFERENCES Subject(SubjectId),
    FOREIGN KEY (MetricId) REFERENCES GradeMetric(MetricId),
    UNIQUE (CharId, SubjectId, MetricId, AcademicYear)
);

SELECT 'Tables created!' AS Status;

In [None]:
%%sql
-- Insert sample data for testing
INSERT INTO CharacteristicType (TypeName) VALUES
('Gender'), ('All students'), ('Free School Meals');

INSERT INTO Characteristic (CharTypeId, CharName) VALUES
(1, 'Male'), (1, 'Female'),
(2, 'State-funded students'),
(3, 'Eligible for FSM');

INSERT INTO SubjectArea (AreaName) VALUES
('Maths'), ('Classical Studies'), ('Design and Technology'), ('All STEM subjects');

INSERT INTO Subject (SubjectName, SubjectAreaId) VALUES
('Additional Mathematics', 1),
('Classical Greek', 2),
('Textiles Technology', 3),
('Total STEM subjects', 4);

INSERT INTO GradeMetric (MetricName, MetricType) VALUES
('Total Students', 'count'),
('Number at grade A*', 'count'),
('Number achieving grade A*-C', 'cumulative'),
('Percent achieving grade A*-C', 'percentage');

-- Sample attainment data
INSERT INTO Attainment (CharId, SubjectId, MetricId, Value, AcademicYear) VALUES
(2, 2, 1, 100, '2023-2024'),      -- Female, Classical Greek, Total Students
(2, 2, 3, 99, '2023-2024'),       -- Female, Classical Greek, A*-C count
(3, 3, 1, 661, '2023-2024'),      -- State-funded, Textiles, Total Students
(3, 3, 3, 475, '2023-2024');      -- State-funded, Textiles, A*-C count

SELECT 'Sample data inserted!' AS Status;

## Q3(d) [4 marks]


In [None]:
%%sql
-- TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```sql
%%sql
-- Q3(d) SOLUTION: Percentage of A*-C for Classical Studies by Characteristic
-- This calculates the actual percentage from count / total
SELECT
    ct.TypeName AS CharacteristicType,
    c.CharName AS Characteristic,
    ROUND(a_ac.Value / a_total.Value * 100, 2) AS PercentAStarToC
FROM Attainment a_ac
INNER JOIN Attainment a_total
    ON a_ac.CharId = a_total.CharId
    AND a_ac.SubjectId = a_total.SubjectId
    AND a_ac.AcademicYear = a_total.AcademicYear
INNER JOIN Characteristic c ON a_ac.CharId = c.CharId
INNER JOIN CharacteristicType ct ON c.CharTypeId = ct.CharTypeId
INNER JOIN Subject s ON a_ac.SubjectId = s.SubjectId
INNER JOIN SubjectArea sa ON s.SubjectAreaId = sa.SubjectAreaId
INNER JOIN GradeMetric m_ac ON a_ac.MetricId = m_ac.MetricId
INNER JOIN GradeMetric m_total ON a_total.MetricId = m_total.MetricId
WHERE sa.AreaName = 'Classical Studies'
  AND m_ac.MetricName = 'Number achieving grade A*-C'
  AND m_total.MetricName = 'Total Students'
  AND a_total.Value > 0
ORDER BY ct.TypeName, c.CharName;
```

</details>


---

# Question 4: MongoDB Document Database [30 marks]

In [None]:
%%mongodb
db.people.drop()
db.people.insertMany([
  {
    "_id": 1,
    "first_name": "Tom",
    "email": "tom@example.com",
    "cell": "765-555-5555",
    "likes": ["fashion", "spas", "shopping"],
    "businesses": [
      {"name": "Entertainment 1080", "partner": "Jean", "status": "Bankrupt", "date_founded": new Date("2012-05-19")},
      {"name": "Swag for Tweens", "date_founded": new Date("2012-11-01")}
    ]
  },
  {
    "_id": 2,
    "first_name": "Jane",
    "email": "jane@example.com",
    "cell": "555-123-4567",
    "likes": ["travel", "fashun", "reading"],
    "businesses": [
      {"name": "Tech Solutions", "status": "Active", "date_founded": new Date("2019-03-15")}
    ]
  },
  {
    "_id": 3,
    "first_name": "Bob",
    "email": "bob@example.com",
    "likes": ["spas", "golf"],
    "businesses": [
      {"name": "Old Venture", "status": "Bankrupt", "date_founded": new Date("2015-01-10")},
      {"name": "New Hope Ltd", "status": "Active", "date_founded": new Date("2021-06-01")}
    ]
  }
])

## Q4(a)(i) [2 marks]


In [None]:
%%mongodb
// TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

**Q4(a)(i) SOLUTION**

MongoDB Query for people who like spas:

db.people.find({ likes: "spas" })

Explanation: MongoDB automatically searches within arrays.
When likes is an array, { likes: "spas" } matches documents
where "spas" is an element of the array.

```js
%%mongodb
db.people.find({"likes": "spas"})
```

</details>


## Q4(a)(ii) [4 marks]


In [None]:
%%mongodb
// TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

**Q4(a)(ii) SOLUTION**

MongoDB Query for businesses founded before March 2020 AND at least one Bankrupt:

db.people.find({
    "businesses.date_founded": { $lt: ISODate("2020-03-01") },
    "businesses.status": "Bankrupt"
})

IMPORTANT NOTE:
This finds documents where:
- At least one business was founded before March 1, 2020, AND
- At least one business has status "Bankrupt"

These don't have to be the SAME business element.

If you need BOTH conditions on the SAME business, use $elemMatch:

db.people.find({
    businesses: {
        $elemMatch: {
            date_founded: { $lt: ISODate("2020-03-01") },
            status: "Bankrupt"
        }
    }
})

```js
%%mongodb
db.people.find({
    "businesses.date_founded": {"$lt": new Date("2020-03-01")},
    "businesses.status": "Bankrupt"
})
```

</details>


## Q4(b)(i) [4 marks]


<details>
<summary>Click to reveal solution</summary>

**Q4(b)(i) SOLUTION**

FIX "fashun" -> "fashion" IN MONGODB:

Approach:
Use updateMany() with $set and positional $ operator to update array elements.

Query:
db.people.updateMany(
    { likes: "fashun" },           // Find documents with "fashun"
    { $set: { "likes.$": "fashion" } }  // Replace matched element
)

Step-by-step:
1. updateMany() - Updates all matching documents (not just first)
2. { likes: "fashun" } - Filter for documents where likes contains "fashun"
3. $set - The update operator to set a value
4. "likes.$" - Positional operator refers to first matched array element
5. "fashion" - The new value to set

Alternative using $pull and $addToSet:
// Remove wrong value
db.people.updateMany(
    { likes: "fashun" },
    { $pull: { likes: "fashun" } }
);
// Add correct value if not present
db.people.updateMany(
    { likes: { $ne: "fashion" } },
    { $addToSet: { likes: "fashion" } }
);

</details>


In [None]:
%%mongodb
db.people.updateMany(
    {"likes": "fashun"},
    {"$set": {"likes.$": "fashion"}}
)

In [None]:
%%mongodb
// TODO: Write your query here


<details>
<summary>Click to reveal solution</summary>

```js
%%mongodb
db.people.find({"first_name": "Jane"})
```

</details>


## Q4(b)(iii) [8 marks]


<details>
<summary>Click to reveal solution</summary>

**Q4(b)(iii) SOLUTION**

RELATIONAL MODEL TABLES:

| Table          | Primary Key            | Foreign Keys               |
|----------------|------------------------|----------------------------|
| Person         | PersonId               | -                          |
| Interest       | InterestId             | -                          |
| PersonInterest | (PersonId, InterestId) | PersonId -> Person         |
|                |                        | InterestId -> Interest     |
| Business       | BusinessId             | PersonId -> Person         |
| Partner (opt)  | (BusinessId, PartnerId)| BusinessId -> Business     |
|                |                        | PartnerId -> Person        |

</details>


In [None]:
%%sql
-- Q4(b)(iii) SOLUTION - CREATE TABLE statements
DROP TABLE IF EXISTS PersonInterest;
DROP TABLE IF EXISTS Business;
DROP TABLE IF EXISTS Interest;
DROP TABLE IF EXISTS Person;

-- 1. Person table
CREATE TABLE Person (
    PersonId INT PRIMARY KEY AUTO_INCREMENT,
    FirstName VARCHAR(100),
    Email VARCHAR(255) UNIQUE,
    Cell VARCHAR(20)
);

-- 2. Interest table (lookup for likes)
CREATE TABLE Interest (
    InterestId INT PRIMARY KEY AUTO_INCREMENT,
    InterestName VARCHAR(50) NOT NULL UNIQUE
);

-- 3. PersonInterest junction table
CREATE TABLE PersonInterest (
    PersonId INT,
    InterestId INT,
    PRIMARY KEY (PersonId, InterestId),
    FOREIGN KEY (PersonId) REFERENCES Person(PersonId) ON DELETE CASCADE,
    FOREIGN KEY (InterestId) REFERENCES Interest(InterestId)
);

-- 4. Business table
CREATE TABLE Business (
    BusinessId INT PRIMARY KEY AUTO_INCREMENT,
    PersonId INT NOT NULL,
    BusinessName VARCHAR(200) NOT NULL,
    PartnerName VARCHAR(100),  -- Simple string for partner
    Status VARCHAR(50),
    DateFounded DATE,
    FOREIGN KEY (PersonId) REFERENCES Person(PersonId) ON DELETE CASCADE
);

SELECT 'Tables created!' AS Status;