<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/mock-april-2021/notebook-mock-april-2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 Mock April 2021 - Practice Notebook

This notebook provides hands-on practice for the Mock April 2021 exam.

**Exam Structure:**
- Section A: 10 MCQs (Q1a-j)
- Section B: Answer 2 of 3 questions
  - Q2: Doctor Who Database
  - Q3: XML/XSD/XSLT Cast List
  - Q4: Recipe Database

**Instructions:**
1. Run the Setup cells first
2. Write your answers in the empty code cells
3. Check your answers against the solution sheet

---

# 1. Environment Setup

Run these cells first to set up MySQL, MongoDB, xmllint, and SPARQL.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost';"

# === xmllint Setup (for XML/XPath exercises) ===
!apt -y -qq install libxml2-utils xsltproc > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml sparqlwrapper

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")
print("xsltproc ready!")

In [None]:
# === MongoDB Setup ===
!wget -q http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb > /dev/null 2>&1
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add - > /dev/null 2>&1
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list > /dev/null
!apt-get update -qq > /dev/null
!apt-get install -y -qq mongodb-org > /dev/null
!mkdir -p /data/db
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

# Test MongoDB is running
!mongo --quiet --eval 'print("MongoDB ready!")'

In [None]:
# === SPARQL Setup (for RDF/Turtle queries) ===
from SPARQLWrapper import SPARQLWrapper, JSON
import re

def run_sparql(query, endpoint="https://dbpedia.org/sparql", limit=50):
    """Run a SPARQL query against DBpedia and print results."""
    sparql = SPARQLWrapper(endpoint)
    
    # Only add LIMIT if not already in query
    if not re.search(r'\bLIMIT\b', query, re.IGNORECASE):
        query = query + f"\nLIMIT {limit}"
    
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    # Print results dynamically based on SELECT variables
    vars = results["head"]["vars"]
    for result in results["results"]["bindings"]:
        row = [f"{var}: {result[var]['value']}" for var in vars if var in result]
        print("  ".join(row))
    
    return results

print("SPARQL ready!")

---

# Question 2: Doctor Who Database [30 marks]

## Background

Doctor Who was first broadcast in 1963, with William Hartnell playing what we now call the 'First Doctor'. From the start, the Doctor was accompanied by companions. Between 1963 and the present, there have been 13 numbered incarnations of the Doctor, and one extra incarnation called 'The War Doctor', along with dozens of companions.

---

## Q2(a): Design Tables [6 marks]

**Question:** We wish to design a simple database about this series, modelling Doctors, their companions, who played them and when they were broadcast. List the tables needed, indicating all keys.

In [None]:
%%sql
-- Create the Doctor Who database schema
-- Q2(a): List your tables with keys

DROP TABLE IF EXISTS DoctorCompanion;
DROP TABLE IF EXISTS Companions;
DROP TABLE IF EXISTS Doctors;
DROP TABLE IF EXISTS Actors;

-- Actors table (people who play Doctors and Companions)
CREATE TABLE Actors (
    Name VARCHAR(100) PRIMARY KEY
);

-- Doctors table (incarnations of the Doctor)
CREATE TABLE Doctors (
    Incarnation VARCHAR(50) PRIMARY KEY,
    PlayedBy VARCHAR(100),
    PeriodStart DATE,
    PeriodEnd DATE,
    FOREIGN KEY (PlayedBy) REFERENCES Actors(Name)
);

-- Companions table
CREATE TABLE Companions (
    Name VARCHAR(100) PRIMARY KEY,
    PlayedBy VARCHAR(100),
    FOREIGN KEY (PlayedBy) REFERENCES Actors(Name)
);

-- Junction table for Doctor-Companion (M:N relationship)
CREATE TABLE DoctorCompanion (
    Doctor VARCHAR(50),
    Companion VARCHAR(100),
    PRIMARY KEY (Doctor, Companion),
    FOREIGN KEY (Doctor) REFERENCES Doctors(Incarnation),
    FOREIGN KEY (Companion) REFERENCES Companions(Name)
);

SELECT 'Doctor Who schema created!' AS Status;

## Sample Data: Doctor Who Tables

### Actors
| Name |
|------|
| William Hartnell |
| Patrick Troughton |
| Jon Pertwee |
| Tom Baker |
| Peter Davison |
| Colin Baker |
| David Tennant |
| Matt Smith |
| Carole Ann Ford |
| Louise Jameson |
| Nicola Bryant |
| Karen Gillan |

### Doctors
| Incarnation | PlayedBy | PeriodStart | PeriodEnd |
|-------------|----------|-------------|------------|
| First Doctor | William Hartnell | 1963-11-23 | 1966-10-29 |
| Second Doctor | Patrick Troughton | 1966-10-29 | 1969-06-21 |
| Third Doctor | Jon Pertwee | 1970-01-03 | 1974-06-08 |
| Fourth Doctor | Tom Baker | 1974-06-08 | 1981-03-21 |
| Fifth Doctor | Peter Davison | 1981-03-21 | 1984-03-16 |
| Sixth Doctor | Colin Baker | 1984-03-16 | 1986-12-06 |
| Tenth Doctor | David Tennant | 2005-06-18 | 2010-01-01 |
| Eleventh Doctor | Matt Smith | 2010-04-03 | 2013-12-25 |

### Companions
| Name | PlayedBy |
|------|----------|
| Susan Foreman | Carole Ann Ford |
| Leela | Louise Jameson |
| Peri | Nicola Bryant |
| Amy Pond | Karen Gillan |

### DoctorCompanion (Junction Table)
| Doctor | Companion |
|--------|-----------|
| First Doctor | Susan Foreman |
| Fourth Doctor | Leela |
| Fifth Doctor | Peri |
| Sixth Doctor | Peri |
| Eleventh Doctor | Amy Pond |

In [None]:
%%sql
-- Insert sample data

-- Actors
INSERT INTO Actors (Name) VALUES
    ('William Hartnell'),
    ('Patrick Troughton'),
    ('Jon Pertwee'),
    ('Tom Baker'),
    ('Peter Davison'),
    ('Colin Baker'),
    ('David Tennant'),
    ('Matt Smith'),
    ('Carole Ann Ford'),
    ('Louise Jameson'),
    ('Nicola Bryant'),
    ('Karen Gillan');

-- Doctors (incarnations)
INSERT INTO Doctors (Incarnation, PlayedBy, PeriodStart, PeriodEnd) VALUES
    ('First Doctor', 'William Hartnell', '1963-11-23', '1966-10-29'),
    ('Second Doctor', 'Patrick Troughton', '1966-10-29', '1969-06-21'),
    ('Third Doctor', 'Jon Pertwee', '1970-01-03', '1974-06-08'),
    ('Fourth Doctor', 'Tom Baker', '1974-06-08', '1981-03-21'),
    ('Fifth Doctor', 'Peter Davison', '1981-03-21', '1984-03-16'),
    ('Sixth Doctor', 'Colin Baker', '1984-03-16', '1986-12-06'),
    ('Tenth Doctor', 'David Tennant', '2005-06-18', '2010-01-01'),
    ('Eleventh Doctor', 'Matt Smith', '2010-04-03', '2013-12-25');

-- Companions
INSERT INTO Companions (Name, PlayedBy) VALUES
    ('Susan Foreman', 'Carole Ann Ford'),
    ('Leela', 'Louise Jameson'),
    ('Peri', 'Nicola Bryant'),
    ('Amy Pond', 'Karen Gillan');

-- Doctor-Companion relationships
INSERT INTO DoctorCompanion (Doctor, Companion) VALUES
    ('First Doctor', 'Susan Foreman'),
    ('Fourth Doctor', 'Leela'),
    ('Fifth Doctor', 'Peri'),
    ('Sixth Doctor', 'Peri'),
    ('Eleventh Doctor', 'Amy Pond');

SELECT 'Sample data inserted!' AS Status;

## Q2(b): CREATE TABLE [3 marks]

**Question:** Give a MySQL command for creating ONE of these tables.

In [None]:
# Your CREATE TABLE statement is shown above in the schema setup
# Example for Doctors table:
print("""
CREATE TABLE Doctors (
    Incarnation VARCHAR(50) PRIMARY KEY,
    PlayedBy VARCHAR(100),
    PeriodStart DATE,
    PeriodEnd DATE,
    FOREIGN KEY (PlayedBy) REFERENCES Actors(Name)
);
""")

## Q2(c): Is it 2NF? [3 marks]

**Question:** Are your tables in 2NF? How can you tell?

In [None]:
# Answer:
# Yes, the tables are in 2NF.
# 
# To be in 2NF:
# 1. Must be in 1NF (atomic values, no repeating groups) - YES
# 2. No partial dependencies (non-key attributes depend on WHOLE key) - YES
#
# Tables with single-column PKs (Actors, Doctors, Companions) are automatically in 2NF.
# DoctorCompanion has composite PK (Doctor, Companion) but NO non-key attributes,
# so there can't be any partial dependencies.

print("Tables are in 2NF because:")
print("1. All values are atomic (1NF satisfied)")
print("2. Single-column PKs can't have partial dependencies")
print("3. Junction table has no non-key attributes")

## Q2(d)(i): Who played the Doctor whose companion was Amy Pond? [2 marks]

In [None]:
%%sql
-- Write your query here:


## Q2(d)(ii): Was Peri featured before Leela? [4 marks]

In [None]:
%%sql
-- Write your query here:
-- Hint: Compare the PeriodStart dates of the Doctors associated with each companion


## Q2(d)(iii): Which incarnation had the most companions? [3 marks]

In [None]:
%%sql
-- Write your query here:
-- Hint: Use GROUP BY and COUNT


## Q2(e): RDF/Turtle Analysis [9 marks]

Here is an extract from the DBpedia entry for the First Doctor:

```turtle
dbr:First_Doctor rdfs:label "First Doctor"@en ;
                dbp:periodEnd "1966-10-29"^^xsd:date ;
                dbp:periodStart "1963-11-23"^^xsd:date ;
                dbp:companions "Ben Jackson"@en ,
                              "Vicki"@en ,
                              "Sara Kingdom"@en ,
                              "Steven Taylor"@en ,
                              "Susan Foreman"@en ,
                              "Polly"@en ,
                              "Ian Chesterton"@en ,
                              "Barbara Wright"@en ,
                              "Dodo Chaplet"@en ,
                              "Katarina"@en ;
                dct:subject dbc:Doctor_Who_Doctors ;
                dbp:next dbr:Second_Doctor .
```

In [None]:
# Q2(e)(i): What serialization language is this?
print("Answer: Turtle (Terse RDF Triple Language)")

# Q2(e)(ii): How many triples are encoded here?
# Count:
# - rdfs:label → 1
# - dbp:periodEnd → 1
# - dbp:periodStart → 1
# - dbp:companions → 10 (one per companion name)
# - dct:subject → 1
# - dbp:next → 1
# Total: 14 triples
print("\nNumber of triples: 14")

In [None]:
# Q2(e)(iii): What can your database schema do that this RDF approach can't?
print("""The database schema can:
1. Track who PLAYED each companion (actor information)
2. Query which actor played the Doctor
3. Link companions to their actors as separate entities

The RDF extract only has companion names as literal strings,
not as linked entities with their own properties.""")

In [None]:
# Q2(e)(iv): How would you fix that problem?
print("""Fix by using URIs instead of string literals for companions:

dbr:First_Doctor dbp:companions dbr:Susan_Foreman ,
                                dbr:Ian_Chesterton ,
                                dbr:Barbara_Wright .

dbr:Susan_Foreman a dbr:Companion ;
    rdfs:label "Susan Foreman"@en ;
    dbr:playedBy dbr:Carole_Ann_Ford .
""")

In [None]:
# Q2(e)(v): Fix the SPARQL query
# Original (broken):
# SELECT dbr:doctor
# WHERE {
#   dbr:First_Doctor dbp:next+ ?doctor .
#   dbr:doctor dbp:companion "Leela" .
# }

print("""Fixed SPARQL query:

SELECT ?doctor
WHERE {
  dbr:First_Doctor dbp:next* ?doctor .
  ?doctor dbp:companions "Leela"@en .
}

Fixes made:
1. SELECT ?doctor (variable, not literal)
2. ?doctor in WHERE (same variable name)
3. dbp:companions (correct property name)
4. "Leela"@en (add language tag)
""")

---

# Question 3: XML Cast List [30 marks]

## Q3(a): XML Fragment Analysis

In [None]:
%%writefile castlist.xml
<castList xmlns="http://www.tei-c.org/ns/1.0">
  <castGroup>
    <castGroup>
      <head>four lovers</head>
      <castItem xml:id="Hermia_MND">
        <role>
          <name>Hermia</name>
        </role>
      </castItem>
      <castItem xml:id="Lysander_MND">
        <role>
          <name>Lysander</name>
        </role>
      </castItem>
      <castItem xml:id="Helena_MND">
        <role>
          <name>Helena</name>
        </role>
      </castItem>
      <castItem xml:id="Demetrius_MND">
        <role>
          <name>Demetrius</name>
        </role>
      </castItem>
    </castGroup>
  </castGroup>
  <castGroup>
    <castItem xml:id="Theseus_MND">
      <role>
        <name>Theseus</name>
      </role>
      <roleDesc>duke of Athens</roleDesc>
    </castItem>
  </castGroup>
</castList>

In [None]:
# View the XML
!cat castlist.xml

# Check if well-formed
!xmllint --noout castlist.xml && echo "\nXML is well-formed!"

In [None]:
# Q3(a)(i): The original fragment was unbalanced - what was missing?
print("Answer: The closing </castList> tag was missing.")

# Q3(a)(ii): What format is this?
print("\nFormat: XML (using TEI - Text Encoding Initiative namespace)")

# Q3(a)(iii): What attributes are used?
print("\nAttributes used:")
print("1. xmlns - namespace declaration")
print("2. xml:id - unique identifier")

## Q3(b): XSD Schema Analysis

The XSD extract defines the `castGroup` element with:
- Optional `model.global` or `model.headLike` elements (`minOccurs="0"`)
- One or more of: `castItem`, `castGroup`, or `roleDesc`

In [None]:
# Q3(b)(i): What is this XSD file and what does it do?
print("""Answer: This is an XSD (XML Schema Definition) file.

Purpose:
- Defines the structure and constraints for valid XML documents
- Specifies which elements can contain which children
- Defines attribute types and requirements
- Enables validation of XML against the schema
""")

# Q3(b)(ii): Does missing model.global make document invalid?
print("""Q3(b)(ii): No, the document is NOT invalid.

The XSD specifies minOccurs="0" for model.global/model.headLike,
meaning zero occurrences are valid. The elements are OPTIONAL.
""")

# Q3(b)(iii): Do castGroup elements follow the definition?
print("""Q3(b)(iii): Yes, the castGroup elements follow the definition correctly.

Verification:
1. First castGroup contains another castGroup (valid)
2. Inner castGroup has <head> (model.headLike) then <castItem> elements (valid)
3. Second castGroup contains <castItem> with <roleDesc> (valid)
""")

## Q3(c): XSLT Transformation

In [None]:
%%writefile castlist.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:tei="http://www.tei-c.org/ns/1.0">

<xsl:output method="html" indent="yes"/>

<xsl:template match="/">
  <html>
    <body>
      <h1>Cast List</h1>
      <dl>
        <xsl:apply-templates select="//tei:castItem"/>
      </dl>
    </body>
  </html>
</xsl:template>

<xsl:template match="tei:castItem">
  <div>
    <xsl:apply-templates select="node()|@*"/>
  </div>
</xsl:template>

<xsl:template match="tei:role/tei:name">
  <dt>
    <xsl:value-of select="."/>
  </dt>
</xsl:template>

<xsl:template match="tei:roleDesc">
  <dd>
    <xsl:value-of select="."/>
  </dd>
</xsl:template>

<!-- Suppress other text nodes -->
<xsl:template match="text()"/>

</xsl:stylesheet>

In [None]:
# Q3(c)(i): What is this file and what is it for?
print("""Answer: This is an XSLT (Extensible Stylesheet Language Transformations) file.

Purpose:
- Transforms XML documents into other formats (HTML, text, different XML)
- Uses template matching to process XML nodes
- Here, it transforms TEI cast list into HTML (<div>, <dt>, <dd>)
""")

# Q3(c)(ii): What was missing from the original?
print("""Q3(c)(ii): The NAMESPACE PREFIX was missing.

The XML uses namespace xmlns="http://www.tei-c.org/ns/1.0", so XSLT must:
1. Declare the namespace: xmlns:tei="http://www.tei-c.org/ns/1.0"
2. Use prefix in match: match="tei:castItem" instead of match="castItem"
""")

In [None]:
# Q3(c)(iii): Run the transformation to see the output
!xsltproc castlist.xsl castlist.xml

In [None]:
print("""Q3(c)(iii): Output format and content:

Format: HTML

Content: A definition list structure with:
- Role names in <dt> tags (Hermia, Lysander, Helena, Demetrius, Theseus)
- Role descriptions in <dd> tags (duke of Athens)
- Each castItem wrapped in <div>
""")

## Q3(d): Relational Model for Cast Data [11 marks]

In [None]:
%%sql
-- Q3(d)(i): Create tables for cast information

DROP TABLE IF EXISTS Roles;
DROP TABLE IF EXISTS CastGroups;
DROP TABLE IF EXISTS Plays;

CREATE TABLE Plays (
    PlayId VARCHAR(50) PRIMARY KEY,
    Title VARCHAR(200)
);

CREATE TABLE CastGroups (
    GroupId VARCHAR(50) PRIMARY KEY,
    GroupName VARCHAR(100),
    PlayId VARCHAR(50),
    ParentGroupId VARCHAR(50),
    FOREIGN KEY (PlayId) REFERENCES Plays(PlayId),
    FOREIGN KEY (ParentGroupId) REFERENCES CastGroups(GroupId)
);

CREATE TABLE Roles (
    RoleId VARCHAR(50) PRIMARY KEY,
    Name VARCHAR(100),
    Description VARCHAR(500),
    GroupId VARCHAR(50),
    FOREIGN KEY (GroupId) REFERENCES CastGroups(GroupId)
);

-- Insert sample data
INSERT INTO Plays VALUES ('MND', 'A Midsummer Night''s Dream');

INSERT INTO CastGroups VALUES 
    ('MND_main', NULL, 'MND', NULL),
    ('MND_lovers', 'four lovers', 'MND', 'MND_main'),
    ('MND_royalty', NULL, 'MND', 'MND_main');

INSERT INTO Roles VALUES
    ('Hermia_MND', 'Hermia', NULL, 'MND_lovers'),
    ('Lysander_MND', 'Lysander', NULL, 'MND_lovers'),
    ('Helena_MND', 'Helena', NULL, 'MND_lovers'),
    ('Demetrius_MND', 'Demetrius', NULL, 'MND_lovers'),
    ('Theseus_MND', 'Theseus', 'duke of Athens', 'MND_royalty');

SELECT 'Cast tables created!' AS Status;

In [None]:
%%sql
-- View the roles
SELECT R.Name, R.Description, CG.GroupName
FROM Roles R
LEFT JOIN CastGroups CG ON R.GroupId = CG.GroupId
ORDER BY CG.GroupName, R.Name;

In [None]:
# Q3(d)(ii): Which works better - relational or XML?
print("""Answer: XML works better for this specific use case.

Reasons XML is better:
1. Hierarchical nature - Cast lists are naturally nested (groups within groups)
2. Mixed content - Roles can have both name AND description text
3. Document-centric - This is essentially a document, not transactional data
4. Existing standards - TEI is an established standard for theatrical texts
5. Transformation - XSLT easily converts to HTML for display

Where relational would be better:
- Large-scale querying across many plays
- Transactional updates
- Joining with other data (actors, performance dates)
""")

---

# Question 4: Recipe Database [30 marks]

## E/R Diagram from Exam

```
RecipeBook ──Contains──> Recipe ──uses──> Ingredient
    │                      │
  Author                preparedBy
  Title                    │
                           ▼
                    PreparationStep
                         │
                       Action
```

## Q4(a): Missing Elements [4 marks]

**Question:** Is there anything missing from this model?

In [None]:
print("""Missing elements:

1. Quantity for ingredients - "uses" needs amount (e.g., "2 cups flour")
2. Step ordering - PreparationSteps need a sequence number
3. Recipe name - The Recipe entity needs a Name attribute
4. Unique identifiers - Need PKs beyond just names (RecipeId, etc.)
""")

## Q4(b): Cardinalities [3 marks]

In [None]:
print("""Cardinalities:

| Relationship | Cardinality | Explanation |
|--------------|-------------|-------------|
| Contains     | M:N         | A book has many recipes; a recipe can appear in multiple books |
| Uses         | M:N         | A recipe uses many ingredients; an ingredient is used in many recipes |
| PreparedBy   | 1:M         | A recipe has many steps; a step belongs to one recipe |
""")

## Q4(c): CREATE TABLE Commands [10 marks]

In [None]:
%%sql
-- Drop tables in reverse dependency order
DROP TABLE IF EXISTS RecipeIngredients;
DROP TABLE IF EXISTS BookContains;
DROP TABLE IF EXISTS PreparationSteps;
DROP TABLE IF EXISTS Recipes;
DROP TABLE IF EXISTS Ingredients;
DROP TABLE IF EXISTS RecipeBooks;

-- Recipe Books
CREATE TABLE RecipeBooks (
    BookId INT AUTO_INCREMENT PRIMARY KEY,
    Title VARCHAR(200) NOT NULL,
    Author VARCHAR(100)
);

-- Recipes
CREATE TABLE Recipes (
    RecipeId INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(200) NOT NULL
);

-- Ingredients
CREATE TABLE Ingredients (
    IngredientId INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL
);

-- Junction: Books contain Recipes (M:N)
CREATE TABLE BookContains (
    BookId INT,
    RecipeId INT,
    PRIMARY KEY (BookId, RecipeId),
    FOREIGN KEY (BookId) REFERENCES RecipeBooks(BookId),
    FOREIGN KEY (RecipeId) REFERENCES Recipes(RecipeId)
);

-- Junction: Recipes use Ingredients (M:N with quantity)
CREATE TABLE RecipeIngredients (
    RecipeId INT,
    IngredientId INT,
    Quantity DECIMAL(10,2),
    Unit VARCHAR(50),
    PRIMARY KEY (RecipeId, IngredientId),
    FOREIGN KEY (RecipeId) REFERENCES Recipes(RecipeId),
    FOREIGN KEY (IngredientId) REFERENCES Ingredients(IngredientId)
);

-- Preparation Steps (1:M from Recipe)
CREATE TABLE PreparationSteps (
    StepId INT AUTO_INCREMENT PRIMARY KEY,
    RecipeId INT NOT NULL,
    StepNumber INT NOT NULL,
    Action TEXT NOT NULL,
    FOREIGN KEY (RecipeId) REFERENCES Recipes(RecipeId),
    UNIQUE (RecipeId, StepNumber)
);

SELECT 'Recipe database schema created!' AS Status;

## Sample Data: Recipe Tables

### RecipeBooks
| BookId | Title | Author |
|--------|-------|--------|
| 1 | Mushrooms | John Cage |
| 2 | Italian Classics | Julia Child |
| 3 | Vegan Delights | Anonymous |

### Recipes
| RecipeId | Name |
|----------|------|
| 1 | Mushroom Risotto |
| 2 | Spaghetti Carbonara |
| 3 | Mushroom Soup |
| 4 | Vegan Curry |

### Ingredients
| IngredientId | Name |
|--------------|------|
| 1 | mushrooms |
| 2 | butter |
| 3 | rice |
| 4 | pasta |
| 5 | eggs |
| 6 | cream |
| 7 | vegetables |

### BookContains
| BookId | RecipeId |
|--------|----------|
| 1 | 1 |
| 1 | 3 |
| 2 | 2 |
| 3 | 4 |

### RecipeIngredients
| RecipeId | IngredientId | Quantity | Unit |
|----------|--------------|----------|------|
| 1 | 1 | 200 | g |
| 1 | 2 | 50 | g |
| 1 | 3 | 300 | g |
| 2 | 4 | 400 | g |
| 2 | 5 | 4 | pieces |
| 2 | 2 | 100 | g |
| 3 | 1 | 500 | g |
| 3 | 6 | 200 | ml |
| 4 | 7 | 500 | g |

### PreparationSteps
| StepId | RecipeId | StepNumber | Action |
|--------|----------|------------|--------|
| 1 | 1 | 1 | Slice mushrooms |
| 2 | 1 | 2 | Saute in butter |
| 3 | 1 | 3 | Add rice and stock |
| 4 | 2 | 1 | Boil pasta |
| 5 | 2 | 2 | Mix eggs and cheese |
| 6 | 2 | 3 | Combine and serve |
| 7 | 3 | 1 | Chop mushrooms |
| 8 | 3 | 2 | Simmer with cream |
| 9 | 4 | 1 | Chop vegetables |
| 10 | 4 | 2 | Cook with spices |

In [None]:
%%sql
-- Insert sample data

INSERT INTO RecipeBooks (Title, Author) VALUES
    ('Mushrooms', 'John Cage'),
    ('Italian Classics', 'Julia Child'),
    ('Vegan Delights', 'Anonymous');

INSERT INTO Recipes (Name) VALUES
    ('Mushroom Risotto'),
    ('Spaghetti Carbonara'),
    ('Mushroom Soup'),
    ('Vegan Curry');

INSERT INTO Ingredients (Name) VALUES
    ('mushrooms'),
    ('butter'),
    ('rice'),
    ('pasta'),
    ('eggs'),
    ('cream'),
    ('vegetables');

-- Book 1 (Mushrooms) contains recipes 1 and 3
-- Book 2 (Italian Classics) contains recipe 2
-- Book 3 (Vegan Delights) contains recipe 4
INSERT INTO BookContains VALUES
    (1, 1), (1, 3),
    (2, 2),
    (3, 4);

-- Recipe ingredients
INSERT INTO RecipeIngredients VALUES
    (1, 1, 200, 'g'),   -- Risotto: mushrooms
    (1, 2, 50, 'g'),    -- Risotto: butter
    (1, 3, 300, 'g'),   -- Risotto: rice
    (2, 4, 400, 'g'),   -- Carbonara: pasta
    (2, 5, 4, 'pieces'),-- Carbonara: eggs
    (2, 2, 100, 'g'),   -- Carbonara: butter (for this example)
    (3, 1, 500, 'g'),   -- Soup: mushrooms
    (3, 6, 200, 'ml'),  -- Soup: cream
    (4, 7, 500, 'g');   -- Curry: vegetables

-- Preparation steps
INSERT INTO PreparationSteps (RecipeId, StepNumber, Action) VALUES
    (1, 1, 'Slice mushrooms'),
    (1, 2, 'Saute in butter'),
    (1, 3, 'Add rice and stock'),
    (2, 1, 'Boil pasta'),
    (2, 2, 'Mix eggs and cheese'),
    (2, 3, 'Combine and serve'),
    (3, 1, 'Chop mushrooms'),
    (3, 2, 'Simmer with cream'),
    (4, 1, 'Chop vegetables'),
    (4, 2, 'Cook with spices');

SELECT 'Sample data inserted!' AS Status;

## Q4(d): Find recipe books that never use butter [4 marks]

In [None]:
%%sql
-- Write your query here:
-- Hint: Use NOT IN or LEFT JOIN with IS NULL to find books 
-- whose recipes don't use 'butter'


## Q4(e): Average steps in "Mushrooms" by "John Cage" [4 marks]

In [None]:
%%sql
-- Write your query here:
-- Hint: Count steps per recipe, then average


## Q4(f): Alternative Technology Analysis [5 marks]

**Question:** Would MongoDB, XML, or Linked Data be more suitable for this database?

In [None]:
print("""MongoDB Analysis:

Would MongoDB be more suitable? Partially yes, for certain aspects.

Advantages of MongoDB for Recipes:
- Document structure: Recipe as single document with embedded steps/ingredients
- Flexible schema: Different recipes can have different fields
- Nested arrays: Ingredients and steps naturally fit as arrays
- Read performance: Single document retrieval (no JOINs)

Example MongoDB Document:
{
  "_id": "carbonara",
  "name": "Spaghetti Carbonara",
  "ingredients": [
    {"name": "spaghetti", "quantity": 400, "unit": "g"},
    {"name": "guanciale", "quantity": 200, "unit": "g"}
  ],
  "steps": [
    {"order": 1, "action": "Boil pasta"},
    {"order": 2, "action": "Fry guanciale"}
  ],
  "books": ["Italian Classics", "Quick Dinners"]
}

Disadvantages:
- Cross-collection queries: "All books with butter" requires scanning
- Data duplication: Ingredient info repeated across recipes
- Referential integrity: No enforced foreign keys

Conclusion: MongoDB works well if recipes are primarily read as whole documents.
Relational is better for complex cross-recipe analytics.
""")

---

# Section A: MCQ Practice

Test your understanding of the MCQ topics.

## Q1(a): Normalisation

**Question:** What is the ONE primary reason normalisation (reducing duplication) is desirable?

- i. Storage cost savings
- ii. Duplicate information can get out of sync
- iii. Speeds data entry
- iv. Security threat

In [None]:
# Your answer:
print("Answer: ii")
print("Duplicate information can get out of sync, leading to logical inconsistencies.")

## Q1(b): JOIN Types

**Question:** To get NULL for anonymous books (no matching author), what JOIN type do you need?

- i. LEFT JOIN
- ii. INNER JOIN
- iii. CROSS JOIN
- iv. INNER JOIN with special WHERE

In [None]:
%%sql
-- Demonstrate LEFT JOIN vs INNER JOIN

DROP TABLE IF EXISTS TestBooks;
DROP TABLE IF EXISTS TestAuthors;

CREATE TABLE TestAuthors (Id INT PRIMARY KEY, Name VARCHAR(50));
CREATE TABLE TestBooks (Id INT PRIMARY KEY, Title VARCHAR(50), AuthorId INT);

INSERT INTO TestAuthors VALUES (1, 'Tolkien');
INSERT INTO TestBooks VALUES (1, 'Beowulf', NULL), (2, 'The Hobbit', 1);

SELECT '--- LEFT JOIN (includes anonymous) ---' AS '';
SELECT B.Title, A.Name 
FROM TestBooks B LEFT JOIN TestAuthors A ON B.AuthorId = A.Id;

In [None]:
%%sql
SELECT '--- INNER JOIN (excludes anonymous) ---' AS '';
SELECT B.Title, A.Name 
FROM TestBooks B INNER JOIN TestAuthors A ON B.AuthorId = A.Id;

## Q1(d): MongoDB Regex

In [None]:
# Test MongoDB regex
!mongo exam_db --quiet --eval '
db.books.drop();
db.books.insertMany([
    {title: "Batman", author: "Bob Kane", year: 1939},
    {title: "Superman", author: "Jerry Siegel", year: 1938},
    {title: "Sandman", author: "Neil Gaiman", year: 1989},
    {title: "Watchmen", author: "Alan Moore", year: 1986}
]);

print("Books ending in \"man\" with author ending in \"man\":");
db.books.find({title: /man$/, author: /man$/}).forEach(printjson);
'

---

# Done!

Check your answers against the **solution sheet**.