<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/september-2023/notebook-september-2023-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, and RDF tools.

In [None]:
# === MySQL Setup ===
!apt-get update -qq > /dev/null
!apt-get install -y -qq mysql-server > /dev/null
!service mysql start
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === SQL Magic ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0
%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

# === SPARQL Magic (cellspell) ===
!pip install "cellspell[sparql] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.sparql

In [None]:
# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

!pip install -q pymongo

!mongo --quiet --eval 'print("MongoDB ready!")'

# === MongoDB Magic (cellspell) ===
!pip install "cellspell[mongodb] @ git+https://github.com/sreent/jupyter-query-magics.git" -q
%load_ext cellspell.mongodb
%mongodb mongodb://localhost:27017/hathi_trust

---

# Question 1: Linked Data (RDF + SPARQL) [30 marks]

## Setup: Load Sample RDF Data

In [None]:
%%writefile babelnet_sample.ttl
@prefix bn: <http://babelnet.org/rdf/> .
@prefix lemon: <http://www.lemon-model.net/lemon#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .

bn:post_n_EN a lemon:LexicalEntry ;
    lemon:canonicalForm bn:post_n_EN_form ;
    lemon:language "EN" ;
    lexinfo:partOfSpeech lexinfo:noun .

bn:post_n_EN_form lemon:writtenRep "post" .

bn:run_v_EN a lemon:LexicalEntry ;
    lemon:canonicalForm bn:run_v_EN_form ;
    lemon:language "EN" ;
    lexinfo:partOfSpeech lexinfo:verb .

bn:run_v_EN_form lemon:writtenRep "run" .

bn:house_n_EN a lemon:LexicalEntry ;
    lemon:canonicalForm bn:house_n_EN_form ;
    lemon:language "EN" ;
    lexinfo:partOfSpeech lexinfo:noun .

bn:house_n_EN_form lemon:writtenRep "house" .

## Q1(c)(i) [6 marks]

### Solution

In [None]:
%%sparql --file babelnet_sample.ttl
PREFIX lemon:   <http://www.lemon-model.net/lemon#>
PREFIX lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#>

SELECT ?writtenRep ?lang
WHERE {
  ?lexEntry a lemon:LexicalEntry ;
            lemon:canonicalForm ?form ;
            lemon:language ?lang ;
            lexinfo:partOfSpeech lexinfo:noun .

  ?form lemon:writtenRep ?writtenRep .
}

In [None]:
%%sparql --file babelnet_sample.ttl
# Try your own query here


## Q1(c)(ii) [4 marks]

### Solution

In [None]:
%%sparql --file babelnet_sample.ttl
PREFIX lemon:   <http://www.lemon-model.net/lemon#>
PREFIX lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#>

SELECT ?language ?pos
WHERE {
  ?lexEntry a lemon:LexicalEntry ;
            lemon:canonicalForm ?form ;
            lemon:language ?language ;
            lexinfo:partOfSpeech ?pos .

  ?form lemon:writtenRep "post" .
}

In [None]:
%%sparql --file babelnet_sample.ttl
# Try your own query here


## Q1(d) [7 marks total]

### Solution

**Q1(d) SOLUTION**

(i) Role of the document:
ONTOLOGY/SCHEMA DEFINITION
- Defines classes (LexicalSense, SenseDefinition)
- Defines properties (definition, value)
- Structures the Lemon lexical model

(ii) Format:
TURTLE (or RDF/XML depending on content negotiation)

(iii) OWL prefix refers to:
WEB ONTOLOGY LANGUAGE (OWL)
- Namespace: http://www.w3.org/2002/07/owl#
- Provides expressive ontology constructs
- e.g., owl:Class, owl:ObjectProperty, owl:disjointWith

(iv) Definition triples for English noun "post":

```turtle
@prefix lemon: <http://www.lemon-model.net/lemon#> .
@prefix bn: <http://babelnet.org/rdf/> .
@prefix ex: <http://example.org/> .

# Link the LexicalEntry to a LexicalSense
bn:post_n_EN lemon:sense bn:post_n_EN_sense .

# Define the LexicalSense and link to SenseDefinition
bn:post_n_EN_sense a lemon:LexicalSense ;
    lemon:definition ex:post_n_EN_def .

# Provide the actual definition text
ex:post_n_EN_def a lemon:SenseDefinition ;
    lemon:value "A piece of wood or metal set upright to support something."@en .
```

---

# Question 2: ER Question - Estate Agency [30 marks]

## Q2(a) [3 marks]

### Solution

**Q2(a) SOLUTION**

CARDINALITY INDICATIONS:

| Relationship           | Cardinality | Explanation                          |
|------------------------|-------------|--------------------------------------|
| Seller - Property      | 1:M         | One seller owns many properties      |
| Estate Agent - Property| 1:M         | One agent handles many properties    |
| Property - Offers      | 1:M         | One property can have many offers    |
| Property - Views       | 1:M         | One property can have many viewings  |
| Buyer - Offers         | 1:M         | One buyer can make many offers       |
| Buyer - Views          | 1:M         | One buyer can have many viewings     |

## Q2(d) [3 marks]

### Solution

In [None]:
%%sql
-- Q2(d) SOLUTION - Create the tables
DROP TABLE IF EXISTS Views;
DROP TABLE IF EXISTS Offers;
DROP TABLE IF EXISTS Property;
DROP TABLE IF EXISTS Seller;
DROP TABLE IF EXISTS EstateAgent;
DROP TABLE IF EXISTS Buyer;

CREATE TABLE Seller (
    Name VARCHAR(100) PRIMARY KEY,
    Address VARCHAR(200),
    PhoneNumber VARCHAR(50)
);

CREATE TABLE EstateAgent (
    Name VARCHAR(100) PRIMARY KEY
);

CREATE TABLE Buyer (
    Name VARCHAR(100) PRIMARY KEY,
    Address VARCHAR(200),
    PhoneNumber VARCHAR(50)
);

CREATE TABLE Property (
    Address VARCHAR(200) PRIMARY KEY,
    Type VARCHAR(50),
    Bedrooms INT,
    AskingPrice DECIMAL(12, 2),
    SellerName VARCHAR(100) NOT NULL,
    AgentName VARCHAR(100) NOT NULL,
    FOREIGN KEY (SellerName) REFERENCES Seller(Name),
    FOREIGN KEY (AgentName) REFERENCES EstateAgent(Name)
);

CREATE TABLE Offers (
    PropertyAddress VARCHAR(200),
    BuyerName VARCHAR(100),
    OfferDate DATE,
    OfferStatus VARCHAR(50),
    OfferValue DECIMAL(12, 2),
    PRIMARY KEY (PropertyAddress, BuyerName, OfferDate),
    FOREIGN KEY (PropertyAddress) REFERENCES Property(Address),
    FOREIGN KEY (BuyerName) REFERENCES Buyer(Name)
);

CREATE TABLE Views (
    PropertyAddress VARCHAR(200),
    BuyerName VARCHAR(100),
    ViewDate DATE,
    PRIMARY KEY (PropertyAddress, BuyerName, ViewDate),
    FOREIGN KEY (PropertyAddress) REFERENCES Property(Address),
    FOREIGN KEY (BuyerName) REFERENCES Buyer(Name)
);

SELECT 'Tables created!' AS Status;

In [None]:
%%sql
-- Insert sample data
INSERT INTO Seller VALUES ('Alice Seller', '1 Seller St', '555-111');
INSERT INTO Seller VALUES ('Bob Seller', '2 Seller Rd', '555-222');
INSERT INTO EstateAgent VALUES ('AgentGrace');
INSERT INTO EstateAgent VALUES ('AgentHeidi');
INSERT INTO Buyer VALUES ('Charlie Buyer', '99 Buyer Rd', '555-333');
INSERT INTO Buyer VALUES ('Doris Buyer', '100 Buyer Ln', '555-444');
INSERT INTO Property VALUES ('10 Main Street', 'Flat', 2, 250000, 'Alice Seller', 'AgentGrace');
INSERT INTO Property VALUES ('20 Baker Avenue', 'Terraced House', 3, 350000, 'Bob Seller', 'AgentHeidi');
INSERT INTO Offers VALUES ('10 Main Street', 'Charlie Buyer', '2023-01-05', 'sale completed', 240000);
INSERT INTO Offers VALUES ('10 Main Street', 'Doris Buyer', '2023-01-10', 'rejected', 230000);
INSERT INTO Offers VALUES ('20 Baker Avenue', 'Doris Buyer', '2023-02-01', 'sale completed', 340000);

SELECT 'Sample data inserted!' AS Status;

## Q2(e)(i) [6 marks]

### Solution

In [None]:
%%sql
-- Q2(e)(i) SOLUTION: Commission per agent since Jan 2023
SELECT
    p.AgentName AS EstateAgent,
    SUM(o.OfferValue * 0.01) AS TotalCommission
FROM Property p
INNER JOIN Offers o ON p.Address = o.PropertyAddress
WHERE o.OfferStatus = 'sale completed'
  AND o.OfferDate >= '2023-01-01'
GROUP BY p.AgentName;

In [None]:
%%sql
-- Try your own query here


## Q2(e)(ii) [2 marks]

### Solution

In [None]:
%%sql
-- Q2(e)(ii) SOLUTION: Top earning agent
SELECT
    p.AgentName AS EstateAgent,
    SUM(o.OfferValue * 0.01) AS TotalCommission
FROM Property p
INNER JOIN Offers o ON p.Address = o.PropertyAddress
WHERE o.OfferStatus = 'sale completed'
  AND o.OfferDate >= '2023-01-01'
GROUP BY p.AgentName
ORDER BY TotalCommission DESC
LIMIT 1;

In [None]:
%%sql
-- Try your own query here


---

# Question 3: IR/Document DB - Hathi Trust [30 marks]

## Q3(a) [2 marks]

### Solution

In [None]:
# Q3(a) SOLUTION
listed_as_german = 2_200_000
precision = 0.80

true_positives = listed_as_german * precision

print(f"Listed as German: {listed_as_german:,}")
print(f"Precision: {precision}")
print(f"")
print(f"True Positives = Listed × Precision")
print(f"              = {listed_as_german:,} × {precision}")
print(f"              = {true_positives:,.0f} books are actually German")

## Q3(b) [3 marks]

### Solution

In [None]:
# Q3(b) SOLUTION
recall = 0.88

# Recall = True Positives / All Actual German
# Therefore: All Actual German = True Positives / Recall
total_german = true_positives / recall

print(f"True Positives: {true_positives:,.0f}")
print(f"Recall: {recall}")
print(f"")
print(f"All German Books = True Positives / Recall")
print(f"                 = {true_positives:,.0f} / {recall}")
print(f"                 = {total_german:,.0f} books in the collection are German")

## Q3(d) [2 marks]

### Solution

**Q3(d) SOLUTION**

**F1 MEASURE:**

F1 = Harmonic mean of Precision and Recall

Formula: F1 = 2 × (Precision × Recall) / (Precision + Recall)

Properties:
- Ranges from 0 to 1 (higher is better)
- Balances precision and recall into single metric
- Penalizes extreme imbalance between P and R
- F1 = 1 only when both P and R are perfect

In [None]:
# Q3(d) Calculate F1 for German and Danish
p_german, r_german = 0.80, 0.88
p_danish, r_danish = 1.00, 0.76

f1_german = 2 * (p_german * r_german) / (p_german + r_german)
f1_danish = 2 * (p_danish * r_danish) / (p_danish + r_danish)

print(f"German F1 = 2 × (0.80 × 0.88) / (0.80 + 0.88) = {f1_german:.3f}")
print(f"Danish F1 = 2 × (1.00 × 0.76) / (1.00 + 0.76) = {f1_danish:.3f}")

## Q3(f) [5 marks]

### Solution

In [None]:
%%mongodb
db.books.drop()
db.books.insertMany([
    {"title": "Book1", "lang": "German", "year": 1850, "text": "Ein Wort Strudel..."},
    {"title": "Book2", "lang": "German", "year": 1905, "text": "Keine Erwähnung"},
    {"title": "Book3", "lang": "English", "year": 1845, "text": "Something about strudel."},
    {"title": "Book4", "lang": "German", "year": 1830, "text": "No mention of desserts"},
    {"title": "Book5", "lang": "German", "year": 1880, "text": "STRUDEL mania!"}
])

In [None]:
%%mongodb
db.books.find({"lang": "German", "year": {"$gte": 1800, "$lt": 1900}})

In [None]:
%%mongodb
// Try your own query here


## Q3(g) [2 marks]

### Solution

In [None]:
%%mongodb
db.books.find({"lang": "German", "year": {"$gte": 1800, "$lt": 1900}, "text": {"$regex": /Strudel/i}})

In [None]:
%%mongodb
// Try your own query here
