<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/mock-april-2021/notebook-mock-april-2021-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 Mock April 2021 - Solutions Notebook

This notebook contains **complete solutions** for the Mock April 2021 exam.

**Exam Structure:**
- Section A: 10 MCQs (Q1a-j)
- Section B: Answer 2 of 3 questions
  - Q2: Doctor Who Database
  - Q3: XML/XSD/XSLT Cast List
  - Q4: Recipe Database

**Instructions:**
1. Run the Setup cells first
2. All solution cells are pre-filled with correct answers
3. Compare with your own attempts from the practice notebook

---

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, and SPARQL.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === xmllint Setup (for XML/XPath exercises) ===
!apt -y -qq install libxml2-utils xsltproc > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml sparqlwrapper
!pip install -q git+https://github.com/sreent/jupyter-query-magics.git

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

%load_ext cellspell

print("MySQL ready!")
print("xmllint ready!")
print("xsltproc ready!")

In [None]:
# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

# Test MongoDB is running
!mongo --quiet --eval 'print("MongoDB ready!")'

In [None]:
# === SPARQL Setup (for RDF/Turtle queries) ===
from SPARQLWrapper import SPARQLWrapper, JSON
import re

def run_sparql(query, endpoint="https://dbpedia.org/sparql", limit=50):
    """Run a SPARQL query against DBpedia and print results."""
    sparql = SPARQLWrapper(endpoint)
    
    # Only add LIMIT if not already in query
    if not re.search(r'\bLIMIT\b', query, re.IGNORECASE):
        query = query + f"\nLIMIT {limit}"
    
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    # Print results dynamically based on SELECT variables
    vars = results["head"]["vars"]
    for result in results["results"]["bindings"]:
        row = [f"{var}: {result[var]['value']}" for var in vars if var in result]
        print("  ".join(row))
    
    return results

print("SPARQL ready!")

---

# Question 2: Doctor Who Database [30 marks]

## Background

Doctor Who was first broadcast in 1963, with William Hartnell playing what we now call the 'First Doctor'. From the start, the Doctor was accompanied by companions. Between 1963 and the present, there have been 13 numbered incarnations of the Doctor, and one extra incarnation called 'The War Doctor', along with dozens of companions.

---

## Q2(a): Design Tables [6 marks]

**Question:** We wish to design a simple database about this series, modelling Doctors, their companions, who played them and when they were broadcast. List the tables needed, indicating all keys.

### Solution

| Table | Primary Key | Foreign Keys |
|-------|-------------|---------------|
| **Actors** | Name | - |
| **Doctors** | Incarnation | PlayedBy → Actors(Name) |
| **Companions** | Name | PlayedBy → Actors(Name) |
| **DoctorCompanion** | (Doctor, Companion) | Doctor → Doctors, Companion → Companions |

In [None]:
%%sql
-- Q2(a) SOLUTION: Create the Doctor Who database schema

DROP TABLE IF EXISTS DoctorCompanion;
DROP TABLE IF EXISTS Companions;
DROP TABLE IF EXISTS Doctors;
DROP TABLE IF EXISTS Actors;

-- Actors table (people who play Doctors and Companions)
CREATE TABLE Actors (
    Name VARCHAR(100) PRIMARY KEY
);

-- Doctors table (incarnations of the Doctor)
CREATE TABLE Doctors (
    Incarnation VARCHAR(50) PRIMARY KEY,
    PlayedBy VARCHAR(100),
    PeriodStart DATE,
    PeriodEnd DATE,
    FOREIGN KEY (PlayedBy) REFERENCES Actors(Name)
);

-- Companions table
CREATE TABLE Companions (
    Name VARCHAR(100) PRIMARY KEY,
    PlayedBy VARCHAR(100),
    FOREIGN KEY (PlayedBy) REFERENCES Actors(Name)
);

-- Junction table for Doctor-Companion (M:N relationship)
CREATE TABLE DoctorCompanion (
    Doctor VARCHAR(50),
    Companion VARCHAR(100),
    PRIMARY KEY (Doctor, Companion),
    FOREIGN KEY (Doctor) REFERENCES Doctors(Incarnation),
    FOREIGN KEY (Companion) REFERENCES Companions(Name)
);

SELECT 'Doctor Who schema created!' AS Status;

In [None]:
%%sql
-- Insert sample data

-- Actors
INSERT INTO Actors (Name) VALUES
    ('William Hartnell'),
    ('Patrick Troughton'),
    ('Jon Pertwee'),
    ('Tom Baker'),
    ('Peter Davison'),
    ('Colin Baker'),
    ('David Tennant'),
    ('Matt Smith'),
    ('Carole Ann Ford'),
    ('Louise Jameson'),
    ('Nicola Bryant'),
    ('Karen Gillan');

-- Doctors (incarnations)
INSERT INTO Doctors (Incarnation, PlayedBy, PeriodStart, PeriodEnd) VALUES
    ('First Doctor', 'William Hartnell', '1963-11-23', '1966-10-29'),
    ('Second Doctor', 'Patrick Troughton', '1966-10-29', '1969-06-21'),
    ('Third Doctor', 'Jon Pertwee', '1970-01-03', '1974-06-08'),
    ('Fourth Doctor', 'Tom Baker', '1974-06-08', '1981-03-21'),
    ('Fifth Doctor', 'Peter Davison', '1981-03-21', '1984-03-16'),
    ('Sixth Doctor', 'Colin Baker', '1984-03-16', '1986-12-06'),
    ('Tenth Doctor', 'David Tennant', '2005-06-18', '2010-01-01'),
    ('Eleventh Doctor', 'Matt Smith', '2010-04-03', '2013-12-25');

-- Companions
INSERT INTO Companions (Name, PlayedBy) VALUES
    ('Susan Foreman', 'Carole Ann Ford'),
    ('Leela', 'Louise Jameson'),
    ('Peri', 'Nicola Bryant'),
    ('Amy Pond', 'Karen Gillan');

-- Doctor-Companion relationships
INSERT INTO DoctorCompanion (Doctor, Companion) VALUES
    ('First Doctor', 'Susan Foreman'),
    ('Fourth Doctor', 'Leela'),
    ('Fifth Doctor', 'Peri'),
    ('Sixth Doctor', 'Peri'),
    ('Eleventh Doctor', 'Amy Pond');

SELECT 'Sample data inserted!' AS Status;

## Q2(b): CREATE TABLE [3 marks]

**Question:** Give a MySQL command for creating ONE of these tables.

### Solution

In [None]:
# Q2(b) SOLUTION: Example CREATE TABLE statement
print("""CREATE TABLE Doctors (
    Incarnation VARCHAR(50) PRIMARY KEY,
    PlayedBy VARCHAR(100),
    PeriodStart DATE,
    PeriodEnd DATE,
    FOREIGN KEY (PlayedBy) REFERENCES Actors(Name)
);""")

## Q2(c): Is it 2NF? [3 marks]

**Question:** Are your tables in 2NF? How can you tell?

### Solution

In [None]:
# Q2(c) SOLUTION
print("""Yes, the tables are in 2NF.

To be in 2NF:
1. Must be in 1NF (atomic values, no repeating groups) - YES
2. No partial dependencies (non-key attributes depend on WHOLE key) - YES

Verification:
- Tables with single-column PKs (Actors, Doctors, Companions) are automatically in 2NF
  because partial dependencies are impossible with a single-column key.
- DoctorCompanion has composite PK (Doctor, Companion) but NO non-key attributes,
  so there can't be any partial dependencies.
""")

## Q2(d)(i): Who played the Doctor whose companion was Amy Pond? [2 marks]

### Solution

In [None]:
%%sql
-- Q2(d)(i) SOLUTION: Who played the Doctor whose companion was Amy Pond?

SELECT D.PlayedBy
FROM Doctors D
INNER JOIN DoctorCompanion DC ON D.Incarnation = DC.Doctor
WHERE DC.Companion = 'Amy Pond';

## Q2(d)(ii): Was Peri featured before Leela? [4 marks]

### Solution

In [None]:
%%sql
-- Q2(d)(ii) SOLUTION: Was Peri featured before Leela?
-- Compare the earliest PeriodStart dates of the Doctors associated with each companion

SELECT C.Name AS Companion, MIN(D.PeriodStart) AS FirstAppearance
FROM Companions C
INNER JOIN DoctorCompanion DC ON C.Name = DC.Companion
INNER JOIN Doctors D ON DC.Doctor = D.Incarnation
WHERE C.Name IN ('Peri', 'Leela')
GROUP BY C.Name
ORDER BY FirstAppearance;

-- Result: Leela (Fourth Doctor, 1974) appeared BEFORE Peri (Fifth Doctor, 1981)

## Q2(d)(iii): Which incarnation had the most companions? [3 marks]

### Solution

In [None]:
%%sql
-- Q2(d)(iii) SOLUTION: Which incarnation had the most companions?

SELECT DC.Doctor AS Incarnation, COUNT(*) AS CompanionCount
FROM DoctorCompanion DC
GROUP BY DC.Doctor
ORDER BY CompanionCount DESC
LIMIT 1;

## Q2(e): RDF/Turtle Analysis [9 marks]

Here is an extract from the DBpedia entry for the First Doctor:

```turtle
dbr:First_Doctor rdfs:label "First Doctor"@en ;
                dbp:periodEnd "1966-10-29"^^xsd:date ;
                dbp:periodStart "1963-11-23"^^xsd:date ;
                dbp:companions "Ben Jackson"@en ,
                              "Vicki"@en ,
                              "Sara Kingdom"@en ,
                              "Steven Taylor"@en ,
                              "Susan Foreman"@en ,
                              "Polly"@en ,
                              "Ian Chesterton"@en ,
                              "Barbara Wright"@en ,
                              "Dodo Chaplet"@en ,
                              "Katarina"@en ;
                dct:subject dbc:Doctor_Who_Doctors ;
                dbp:next dbr:Second_Doctor .
```

### Solution

In [None]:
# Q2(e)(i) SOLUTION: What serialization language is this?
print("Answer: Turtle (Terse RDF Triple Language)")
print("")

# Q2(e)(ii) SOLUTION: How many triples are encoded here?
print("""Number of triples: 14

Counting:
- rdfs:label → 1 triple
- dbp:periodEnd → 1 triple
- dbp:periodStart → 1 triple
- dbp:companions → 10 triples (one per companion name)
- dct:subject → 1 triple
- dbp:next → 1 triple
Total: 14 triples""")

In [None]:
# Q2(e)(iii) SOLUTION: What can your database schema do that this approach can't?
print("""The database schema can:

1. Track who PLAYED each companion (actor information)
   - The RDF only lists companion names as literal strings
   - Our schema links companions to actors via foreign key

2. Query which actor played the Doctor
   - The RDF extract doesn't include actor information

3. Link companions to their actors as separate entities
   - RDF uses string literals, not linked entities with properties
""")

In [None]:
# Q2(e)(iv) SOLUTION: How would you fix that problem?
print("""Fix by using URIs instead of string literals for companions:

dbr:First_Doctor dbp:companions dbr:Susan_Foreman ,
                                dbr:Ian_Chesterton ,
                                dbr:Barbara_Wright .

dbr:Susan_Foreman a dbr:Companion ;
    rdfs:label "Susan Foreman"@en ;
    dbr:playedBy dbr:Carole_Ann_Ford .

dbr:Carole_Ann_Ford a foaf:Person ;
    rdfs:label "Carole Ann Ford"@en .

Key changes:
1. Use resource URIs (dbr:Susan_Foreman) instead of literals ("Susan Foreman"@en)
2. Create separate resources for each companion
3. Add dbr:playedBy property linking to actor resources
""")

In [None]:
# Q2(e)(v) SOLUTION: Fix the SPARQL query
print("""Original (broken) query:

SELECT dbr:doctor
WHERE {
  dbr:First_Doctor dbp:next+ ?doctor .
  dbr:doctor dbp:companion "Leela" .
}

---

FIXED query:

SELECT ?doctor
WHERE {
  dbr:First_Doctor dbp:next* ?doctor .
  ?doctor dbp:companions "Leela"@en .
}

---

Fixes made:
1. SELECT ?doctor (variable with ?, not literal dbr:doctor)
2. ?doctor in WHERE (consistent variable name with ?)
3. dbp:companions (correct property name from the data)
4. "Leela"@en (add language tag to match data format)
5. dbp:next* instead of + (to include First Doctor if needed)
""")

---

# Question 3: XML Cast List [30 marks]

## Q3(a): XML Fragment Analysis

### Solution

In [None]:
%%writefile castlist.xml
<castList xmlns="http://www.tei-c.org/ns/1.0">
  <castGroup>
    <castGroup>
      <head>four lovers</head>
      <castItem xml:id="Hermia_MND">
        <role>
          <name>Hermia</name>
        </role>
      </castItem>
      <castItem xml:id="Lysander_MND">
        <role>
          <name>Lysander</name>
        </role>
      </castItem>
      <castItem xml:id="Helena_MND">
        <role>
          <name>Helena</name>
        </role>
      </castItem>
      <castItem xml:id="Demetrius_MND">
        <role>
          <name>Demetrius</name>
        </role>
      </castItem>
    </castGroup>
  </castGroup>
  <castGroup>
    <castItem xml:id="Theseus_MND">
      <role>
        <name>Theseus</name>
      </role>
      <roleDesc>duke of Athens</roleDesc>
    </castItem>
  </castGroup>
</castList>

In [None]:
# Q3(a) SOLUTION
print("""Q3(a)(i): The original fragment was unbalanced - what was missing?
Answer: The closing </castList> tag was missing.

Q3(a)(ii): What format is this?
Answer: XML (Extensible Markup Language), specifically using the TEI
(Text Encoding Initiative) namespace: http://www.tei-c.org/ns/1.0

Q3(a)(iii): What attributes are used in this fragment?
Answer:
1. xmlns - namespace declaration
2. xml:id - unique identifier for elements
""")

## Q3(b): XSD Schema Analysis

### Solution

In [None]:
# Q3(b) SOLUTION
print("""Q3(b)(i): What is this XSD file and what does it do?
Answer: This is an XSD (XML Schema Definition) file.

Purpose:
- Defines the structure and constraints for valid XML documents
- Specifies which elements can contain which children
- Defines attribute types and requirements
- Enables validation of XML against the schema

---

Q3(b)(ii): Does missing model.global make document invalid?
Answer: No, the document is NOT invalid.

The XSD specifies:
<xs:choice minOccurs="0" maxOccurs="unbounded">

The minOccurs="0" means zero occurrences are valid.
The model.global and model.headLike elements are OPTIONAL.

---

Q3(b)(iii): Do castGroup elements follow the definition correctly?
Answer: Yes, the castGroup elements follow the definition correctly.

Verification:
1. First castGroup contains another castGroup (valid per xs:element ref)
2. Inner castGroup has <head> (model.headLike) then <castItem> elements (valid)
3. Second castGroup contains <castItem> with nested <roleDesc> (valid)
""")

## Q3(c): XSLT Transformation

### Solution

In [None]:
%%writefile castlist.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:tei="http://www.tei-c.org/ns/1.0">

<xsl:output method="html" indent="yes"/>

<xsl:template match="/">
  <html>
    <body>
      <h1>Cast List</h1>
      <dl>
        <xsl:apply-templates select="//tei:castItem"/>
      </dl>
    </body>
  </html>
</xsl:template>

<xsl:template match="tei:castItem">
  <div>
    <xsl:apply-templates select="node()|@*"/>
  </div>
</xsl:template>

<xsl:template match="tei:role/tei:name">
  <dt>
    <xsl:value-of select="."/>
  </dt>
</xsl:template>

<xsl:template match="tei:roleDesc">
  <dd>
    <xsl:value-of select="."/>
  </dd>
</xsl:template>

<!-- Suppress other text nodes -->
<xsl:template match="text()"/>

</xsl:stylesheet>

In [None]:
# Q3(c) SOLUTION
print("""Q3(c)(i): What is this file and what is it for?
Answer: This is an XSLT (Extensible Stylesheet Language Transformations) file.

Purpose:
- Transforms XML documents into other formats (HTML, text, different XML)
- Uses template matching to process XML nodes
- Here, it transforms TEI cast list into HTML (<div>, <dt>, <dd>)

---

Q3(c)(ii): What was missing from the original match attributes?
Answer: The NAMESPACE PREFIX was missing.

The XML uses namespace xmlns="http://www.tei-c.org/ns/1.0", so XSLT must:
1. Declare the namespace: xmlns:tei="http://www.tei-c.org/ns/1.0"
2. Use prefix in match: match="tei:castItem" instead of match="castItem"

---

Q3(c)(iii): Output format and content:
Format: HTML

Content: A definition list structure with:
- Role names in <dt> tags (Hermia, Lysander, Helena, Demetrius, Theseus)
- Role descriptions in <dd> tags (duke of Athens)
- Each castItem wrapped in <div>
""")

In [None]:
# Run the transformation to see the actual output
print("Transformation output:")
print("=" * 50)
!xsltproc castlist.xsl castlist.xml

## Q3(d): Relational Model for Cast Data [11 marks]

### Solution

In [None]:
%%sql
-- Q3(d)(i) SOLUTION: Create tables for cast information

DROP TABLE IF EXISTS Roles;
DROP TABLE IF EXISTS CastGroups;
DROP TABLE IF EXISTS Plays;

CREATE TABLE Plays (
    PlayId VARCHAR(50) PRIMARY KEY,
    Title VARCHAR(200)
);

CREATE TABLE CastGroups (
    GroupId VARCHAR(50) PRIMARY KEY,
    GroupName VARCHAR(100),
    PlayId VARCHAR(50),
    ParentGroupId VARCHAR(50),
    FOREIGN KEY (PlayId) REFERENCES Plays(PlayId),
    FOREIGN KEY (ParentGroupId) REFERENCES CastGroups(GroupId)
);

CREATE TABLE Roles (
    RoleId VARCHAR(50) PRIMARY KEY,
    Name VARCHAR(100),
    Description VARCHAR(500),
    GroupId VARCHAR(50),
    FOREIGN KEY (GroupId) REFERENCES CastGroups(GroupId)
);

-- Insert sample data
INSERT INTO Plays VALUES ('MND', 'A Midsummer Night''s Dream');

INSERT INTO CastGroups VALUES 
    ('MND_main', NULL, 'MND', NULL),
    ('MND_lovers', 'four lovers', 'MND', 'MND_main'),
    ('MND_royalty', NULL, 'MND', 'MND_main');

INSERT INTO Roles VALUES
    ('Hermia_MND', 'Hermia', NULL, 'MND_lovers'),
    ('Lysander_MND', 'Lysander', NULL, 'MND_lovers'),
    ('Helena_MND', 'Helena', NULL, 'MND_lovers'),
    ('Demetrius_MND', 'Demetrius', NULL, 'MND_lovers'),
    ('Theseus_MND', 'Theseus', 'duke of Athens', 'MND_royalty');

SELECT 'Cast tables created!' AS Status;

In [None]:
%%sql
-- View the roles with their groups
SELECT R.Name, R.Description, CG.GroupName
FROM Roles R
LEFT JOIN CastGroups CG ON R.GroupId = CG.GroupId
ORDER BY CG.GroupName, R.Name;

In [None]:
# Q3(d)(ii) SOLUTION: Which works better - relational or XML?
print("""Answer: XML works better for this specific use case.

Reasons XML is better:
1. Hierarchical nature - Cast lists are naturally nested (groups within groups)
2. Mixed content - Roles can have both name AND description text
3. Document-centric - This is essentially a document, not transactional data
4. Existing standards - TEI is an established standard for theatrical texts
5. Transformation - XSLT easily converts to HTML for display

Where relational would be better:
- Large-scale querying across many plays
- Transactional updates (actor assignments, scheduling)
- Joining with other data (actors, performance dates, venues)
- Statistical analysis across multiple plays
""")

---

# Question 4: Recipe Database [30 marks]

## E/R Diagram from Exam

```
RecipeBook ──Contains──> Recipe ──uses──> Ingredient
    │                      │
  Author                preparedBy
  Title                    │
                           ▼
                    PreparationStep
                         │
                       Action
```

## Q4(a): Missing Elements [4 marks]

### Solution

In [None]:
# Q4(a) SOLUTION
print("""Missing elements from the E/R model:

1. Quantity for ingredients
   - The "uses" relationship needs quantity and unit attributes
   - Example: "2 cups flour", "500g butter"

2. Step ordering
   - PreparationSteps need a sequence number
   - Otherwise, steps could be in any order

3. Recipe name
   - The Recipe entity needs a Name attribute
   - Currently only has relationships, no identifying data

4. Unique identifiers
   - Need primary keys beyond just names
   - RecipeId, IngredientId, etc. for proper foreign key relationships
""")

## Q4(b): Cardinalities [3 marks]

### Solution

In [None]:
# Q4(b) SOLUTION
print("""Cardinalities of the three relationships:

| Relationship | Cardinality | Explanation |
|--------------|-------------|-------------|
| Contains     | M:N         | A book has many recipes; a recipe can appear in multiple books |
| Uses         | M:N         | A recipe uses many ingredients; an ingredient is used in many recipes |
| PreparedBy   | 1:M         | A recipe has many steps; a step belongs to one recipe |

Note: M:N relationships require junction tables in the relational model.
""")

## Q4(c): CREATE TABLE Commands [10 marks]

### Solution

In [None]:
%%sql
-- Q4(c) SOLUTION: Create Recipe database tables

DROP TABLE IF EXISTS RecipeIngredients;
DROP TABLE IF EXISTS BookContains;
DROP TABLE IF EXISTS PreparationSteps;
DROP TABLE IF EXISTS Recipes;
DROP TABLE IF EXISTS Ingredients;
DROP TABLE IF EXISTS RecipeBooks;

-- Recipe Books
CREATE TABLE RecipeBooks (
    BookId INT AUTO_INCREMENT PRIMARY KEY,
    Title VARCHAR(200) NOT NULL,
    Author VARCHAR(100)
);

-- Recipes
CREATE TABLE Recipes (
    RecipeId INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(200) NOT NULL
);

-- Ingredients
CREATE TABLE Ingredients (
    IngredientId INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL
);

-- Junction: Books contain Recipes (M:N)
CREATE TABLE BookContains (
    BookId INT,
    RecipeId INT,
    PRIMARY KEY (BookId, RecipeId),
    FOREIGN KEY (BookId) REFERENCES RecipeBooks(BookId),
    FOREIGN KEY (RecipeId) REFERENCES Recipes(RecipeId)
);

-- Junction: Recipes use Ingredients (M:N with quantity)
CREATE TABLE RecipeIngredients (
    RecipeId INT,
    IngredientId INT,
    Quantity DECIMAL(10,2),
    Unit VARCHAR(50),
    PRIMARY KEY (RecipeId, IngredientId),
    FOREIGN KEY (RecipeId) REFERENCES Recipes(RecipeId),
    FOREIGN KEY (IngredientId) REFERENCES Ingredients(IngredientId)
);

-- Preparation Steps (1:M from Recipe)
CREATE TABLE PreparationSteps (
    StepId INT AUTO_INCREMENT PRIMARY KEY,
    RecipeId INT NOT NULL,
    StepNumber INT NOT NULL,
    Action TEXT NOT NULL,
    FOREIGN KEY (RecipeId) REFERENCES Recipes(RecipeId),
    UNIQUE (RecipeId, StepNumber)
);

SELECT 'Recipe database schema created!' AS Status;

In [None]:
%%sql
-- Insert sample data for testing queries

INSERT INTO RecipeBooks (Title, Author) VALUES
    ('Mushrooms', 'John Cage'),
    ('Italian Classics', 'Julia Child'),
    ('Vegan Delights', 'Anonymous');

INSERT INTO Recipes (Name) VALUES
    ('Mushroom Risotto'),
    ('Spaghetti Carbonara'),
    ('Mushroom Soup'),
    ('Vegan Curry');

INSERT INTO Ingredients (Name) VALUES
    ('mushrooms'),
    ('butter'),
    ('rice'),
    ('pasta'),
    ('eggs'),
    ('cream'),
    ('vegetables');

-- Book 1 (Mushrooms) contains recipes 1 and 3
-- Book 2 (Italian Classics) contains recipe 2
-- Book 3 (Vegan Delights) contains recipe 4
INSERT INTO BookContains VALUES
    (1, 1), (1, 3),
    (2, 2),
    (3, 4);

-- Recipe ingredients (butter is in recipes 1 and 2, not in 3 or 4)
INSERT INTO RecipeIngredients VALUES
    (1, 1, 200, 'g'),   -- Risotto: mushrooms
    (1, 2, 50, 'g'),    -- Risotto: butter
    (1, 3, 300, 'g'),   -- Risotto: rice
    (2, 4, 400, 'g'),   -- Carbonara: pasta
    (2, 5, 4, 'pieces'),-- Carbonara: eggs
    (2, 2, 100, 'g'),   -- Carbonara: butter
    (3, 1, 500, 'g'),   -- Soup: mushrooms
    (3, 6, 200, 'ml'),  -- Soup: cream
    (4, 7, 500, 'g');   -- Curry: vegetables

-- Preparation steps
INSERT INTO PreparationSteps (RecipeId, StepNumber, Action) VALUES
    (1, 1, 'Slice mushrooms'),
    (1, 2, 'Saute in butter'),
    (1, 3, 'Add rice and stock'),
    (2, 1, 'Boil pasta'),
    (2, 2, 'Mix eggs and cheese'),
    (2, 3, 'Combine and serve'),
    (3, 1, 'Chop mushrooms'),
    (3, 2, 'Simmer with cream'),
    (4, 1, 'Chop vegetables'),
    (4, 2, 'Cook with spices');

SELECT 'Sample data inserted!' AS Status;

## Q4(d): Find recipe books that never use butter [4 marks]

### Solution

In [None]:
%%sql
-- Q4(d) SOLUTION: Find recipe books that never use butter
-- Method 1: Using NOT IN

SELECT RB.Title, RB.Author
FROM RecipeBooks RB
WHERE RB.BookId NOT IN (
    SELECT DISTINCT BC.BookId
    FROM BookContains BC
    INNER JOIN RecipeIngredients RI ON BC.RecipeId = RI.RecipeId
    INNER JOIN Ingredients I ON RI.IngredientId = I.IngredientId
    WHERE I.Name = 'butter'
);

In [None]:
%%sql
-- Q4(d) ALTERNATIVE SOLUTION: Using LEFT JOIN with IS NULL

SELECT DISTINCT RB.Title, RB.Author
FROM RecipeBooks RB
LEFT JOIN (
    SELECT DISTINCT BC.BookId
    FROM BookContains BC
    INNER JOIN RecipeIngredients RI ON BC.RecipeId = RI.RecipeId
    INNER JOIN Ingredients I ON RI.IngredientId = I.IngredientId
    WHERE I.Name = 'butter'
) ButterBooks ON RB.BookId = ButterBooks.BookId
WHERE ButterBooks.BookId IS NULL;

## Q4(e): Average steps in "Mushrooms" by "John Cage" [4 marks]

### Solution

In [None]:
%%sql
-- Q4(e) SOLUTION: Average number of steps in "Mushrooms" by "John Cage"
-- Method 1: Using subquery to count steps per recipe, then average

SELECT AVG(StepCount) AS AvgSteps
FROM (
    SELECT R.RecipeId, COUNT(PS.StepId) AS StepCount
    FROM RecipeBooks RB
    INNER JOIN BookContains BC ON RB.BookId = BC.BookId
    INNER JOIN Recipes R ON BC.RecipeId = R.RecipeId
    INNER JOIN PreparationSteps PS ON R.RecipeId = PS.RecipeId
    WHERE RB.Title = 'Mushrooms' AND RB.Author = 'John Cage'
    GROUP BY R.RecipeId
) AS RecipeStepCounts;

In [None]:
%%sql
-- Q4(e) ALTERNATIVE: Direct calculation
-- Total steps / Number of distinct recipes

SELECT 
    COUNT(PS.StepId) AS TotalSteps,
    COUNT(DISTINCT R.RecipeId) AS NumRecipes,
    COUNT(PS.StepId) / COUNT(DISTINCT R.RecipeId) AS AvgSteps
FROM RecipeBooks RB
INNER JOIN BookContains BC ON RB.BookId = BC.BookId
INNER JOIN Recipes R ON BC.RecipeId = R.RecipeId
INNER JOIN PreparationSteps PS ON R.RecipeId = PS.RecipeId
WHERE RB.Title = 'Mushrooms' AND RB.Author = 'John Cage';

## Q4(f): Alternative Technology Analysis [5 marks]

### Solution

In [None]:
# Q4(f) SOLUTION: MongoDB Analysis
print("""Would MongoDB be more suitable for this recipe database?

ANSWER: Partially yes, depending on use case.

ADVANTAGES of MongoDB for Recipes:
┌────────────────────┬──────────────────────────────────────────┐
│ Aspect             │ Benefit                                  │
├────────────────────┼──────────────────────────────────────────┤
│ Document structure │ Recipe as single document with embedded  │
│                    │ steps and ingredients                    │
│ Flexible schema    │ Different recipes can have different     │
│                    │ fields (cooking time, difficulty, etc.)  │
│ Nested arrays      │ Ingredients and steps naturally fit as   │
│                    │ arrays within the recipe document        │
│ Read performance   │ Single document retrieval (no JOINs)     │
└────────────────────┴──────────────────────────────────────────┘

Example MongoDB Document:
{
  "_id": "carbonara",
  "name": "Spaghetti Carbonara",
  "ingredients": [
    {"name": "spaghetti", "quantity": 400, "unit": "g"},
    {"name": "guanciale", "quantity": 200, "unit": "g"}
  ],
  "steps": [
    {"order": 1, "action": "Boil pasta"},
    {"order": 2, "action": "Fry guanciale"}
  ],
  "books": ["Italian Classics", "Quick Dinners"]
}

DISADVANTAGES of MongoDB:
┌─────────────────────────┬────────────────────────────────────┐
│ Aspect                  │ Problem                            │
├─────────────────────────┼────────────────────────────────────┤
│ Cross-collection queries│ "All books with butter" requires   │
│                         │ scanning all documents             │
│ Data duplication        │ Ingredient info repeated across    │
│                         │ recipes                            │
│ Referential integrity   │ No enforced foreign keys           │
│ Update anomalies        │ Changing ingredient name requires  │
│                         │ updating many documents            │
└─────────────────────────┴────────────────────────────────────┘

CONCLUSION:
- MongoDB: Best if recipes are primarily read as whole documents
- Relational: Best for complex cross-recipe analytics and data integrity
""")

---

# Section A: MCQ Solutions

Complete solutions for all 10 MCQs.

In [None]:
print("""SECTION A: MCQ SOLUTIONS
========================

Q1(a) Normalisation - Primary reason for reducing duplication
Answer: ii. Duplicate information can get out of sync, leading to logical inconsistencies

Q1(b) JOIN Types - Get NULL for anonymous books
Answer: i. LEFT JOIN

Q1(c) Transactions - Replace XXXXXXX after START TRANSACTION
Answer: iv. COMMIT;

Q1(d) MongoDB regex /*man/ with year 1934
Answer: v. Finds all books with title and author ending in man with a year of 1934

Q1(e) URLs in Linked Data - Select ALL correct
Answer: i, ii, iv
- i. URL is unique like Primary Key ✓
- ii. URL is shareable for same reference ✓
- iv. URLs can be dereferenced ✓
- (v is FALSE - URLs are NOT permanent/reliable)

Q1(f) XPath //note/title
Answer: i. The four title elements that appear as direct child nodes of note elements

Q1(g) MapReduce - Select ALL correct
Answer: i, iii, v, vi, viii
- i. Map phase on local data ✓
- iii. Map produces key-distributable data ✓
- v. Input data can be distributed ✓
- vi. Reducer operations can be distributed ✓
- viii. Makes parallel processing easier ✓

Q1(h) RDF Inference - Select ALL that MUST be true
Answer: iii, v
- iii. Event is mo:Performance (from rdfs:range) ✓
- v. Event mo:listener orcid:... (from owl:inverseOf) ✓

Q1(i) Precision/Recall - 20% precision, 225,030 relevant in 15M docs
Answer: ii, iv
- ii. 80 docs → 16 correct matches (80 × 0.20 = 16) ✓
- iv. Over 10× better than random (20%/1.5% ≈ 13×) ✓

Q1(j) Copyleft - Select ALL correct
Answer: ii, iv, v
- ii. Copyleft is NOT permissive ✓
- iv. Derivatives must be copyleft ✓
- v. GPL is copyleft ✓
""")

---

# End of Solutions Notebook

All solutions have been provided. Compare with your attempts in the practice notebook!