<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/mock-october-2025/notebook-mock-october-2025-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 Mock October 2025 - Solutions Notebook

This notebook contains complete solutions for the Mock October 2025 exam.

**Exam Structure:**
- Part A: 10 MCQs (not included in mock)
- Part B: Answer BOTH questions - 60 marks
  - Q2: MARC Library Catalogue (Database Selection, ER Modeling, XML/XPath)
  - Q3: Conference Management System (ER Model, SQL, Security, XML vs Relational)

---

# 1. Environment Setup

Run these cells first to set up MySQL, xmllint, and Python libraries.

In [None]:
# === MySQL Setup ===
!apt-get update -qq > /dev/null
!apt-get install -y -qq mysql-server > /dev/null
!service mysql start
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost' WITH GRANT OPTION;"

# === SQL Magic ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0
%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

# === XPath Magic (cellspell) ===
!apt-get install -y libxml2-utils -qq > /dev/null
!pip install git+https://github.com/sreent/jupyter-query-magics.git -q
%load_ext cellspell.xpath

---

# Question 2: MARC Library Catalogue [30 marks]

## Context

The Bodleian Library of Oxford University stores its main catalogue using MARC (MAchine-Readable Cataloging).

**MARC record example:**

```
leader 00000nam a22002772i 4500
001    990118204070107026
005    20011114100255.0
008    860101s1954    enka j       000 1 eng d
035    ##$a(UkOxU)011820407
035    ##$a(UkOxU)011820407BIB01
904    ##$aMatched
010    ##$aGB54-13352
035    ##$aOCLC ocm06935463 from D960307M
040    ##$aKWW $cKWW $dOCL $dUKM $dEQO
041    1#$aengswe
082    ##$a[Fic]
090    ##$aPZ7.L6585 $bPi5
092    ##$aD0503666694
100    1#$aLindgren, Astrid, $d1907-2002.
240    10$aPippi Långstrump. $lEnglish
245    10$aPippi Longstocking / $c[translated from the Swedish by Edna Hurup ; illustrated by Richard Kennedy]
260    ##$aLondon : $bOxford, $c[1954]
300    ##$a120 p. : $bill. ; $c21 cm
500    ##$aTranslation of Pippi Långstrump
520    ##$aEscapades of a lucky little girl who lives with a horse and a monkey--but without any parents--at the edge of a Swedish village
700    1#$aKennedy, Richard.
```

## Q2(d): MARC XML

Create the MARC XML sample file:

In [None]:
%%writefile marc_sample.xml
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam a22002772i 4500</leader>
  <controlfield tag="001">990118204070107026</controlfield>
  <controlfield tag="005">20011114100255.0</controlfield>
  <controlfield tag="008">860101s1954    enka j       000 1 eng d</controlfield>
  
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">KWW</subfield>
    <subfield code="c">KWW</subfield>
    <subfield code="d">OCL</subfield>
    <subfield code="d">UKM</subfield>
    <subfield code="d">EQO</subfield>
  </datafield>
  
  <datafield tag="041" ind1="1" ind2=" ">
    <subfield code="a">engswe</subfield>
  </datafield>
  
  <datafield tag="100" ind1="1" ind2=" ">
    <subfield code="a">Lindgren, Astrid,</subfield>
    <subfield code="d">1907-2002.</subfield>
  </datafield>
  
  <datafield tag="240" ind1="1" ind2="0">
    <subfield code="a">Pippi Långstrump.</subfield>
    <subfield code="l">English</subfield>
  </datafield>
  
  <datafield tag="245" ind1="1" ind2="0">
    <subfield code="a">Pippi Longstocking /</subfield>
    <subfield code="c">[translated from the Swedish by Edna Hurup ; illustrated by Richard Kennedy]</subfield>
  </datafield>
  
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="a">London :</subfield>
    <subfield code="b">Oxford,</subfield>
    <subfield code="c">[1954]</subfield>
  </datafield>
  
  <datafield tag="700" ind1="1" ind2=" ">
    <subfield code="a">Kennedy, Richard.</subfield>
  </datafield>
</record>

In [None]:
%xpath marc_sample.xml

## Q2(d)(i): XPath for translation information [2 marks]

**Question:** Give an XPath expression for retrieving information on whether items are translations.

---

### Solution

In [None]:
%%xpath --ns marc=http://www.loc.gov/MARC21/slim marc_sample.xml
//marc:datafield[@tag='041']/@ind1

In [None]:
%%xpath --ns marc=http://www.loc.gov/MARC21/slim marc_sample.xml
//marc:datafield[@tag='041']/marc:subfield[@code='a']/text()

In [None]:
%%xpath --ns marc=http://www.loc.gov/MARC21/slim marc_sample.xml
# Try your own XPath here

---

# Question 3: Conference Management System [30 marks]

## Q3(b): SQL query for dinner tickets [4 marks]

---

### Solution

In [None]:
%%sql
-- Q3(b) SOLUTION - Create tables
DROP TABLE IF EXISTS AttendsDinner;
DROP TABLE IF EXISTS AttendsDay;
DROP TABLE IF EXISTS AttendsWorkshop;
DROP TABLE IF EXISTS Review;
DROP TABLE IF EXISTS PaperAuthor;
DROP TABLE IF EXISTS Paper;
DROP TABLE IF EXISTS Registration;
DROP TABLE IF EXISTS Workshop;
DROP TABLE IF EXISTS Dinner;
DROP TABLE IF EXISTS Day;
DROP TABLE IF EXISTS Person;
DROP TABLE IF EXISTS Conference;

CREATE TABLE Conference (
    conference_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL,
    start_date DATE,
    end_date DATE,
    location VARCHAR(200)
);

CREATE TABLE Person (
    person_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE,
    affiliation VARCHAR(200)
);

CREATE TABLE Dinner (
    dinner_id INT PRIMARY KEY AUTO_INCREMENT,
    conference_id INT,
    date DATE,
    venue VARCHAR(200),
    price DECIMAL(10,2),
    FOREIGN KEY (conference_id) REFERENCES Conference(conference_id)
);

CREATE TABLE Registration (
    registration_id INT PRIMARY KEY AUTO_INCREMENT,
    person_id INT,
    conference_id INT,
    name_tag_text VARCHAR(100),
    registration_date DATE,
    FOREIGN KEY (person_id) REFERENCES Person(person_id),
    FOREIGN KEY (conference_id) REFERENCES Conference(conference_id)
);

CREATE TABLE AttendsDinner (
    registration_id INT,
    dinner_id INT,
    ticket_count INT DEFAULT 1,
    PRIMARY KEY (registration_id, dinner_id),
    FOREIGN KEY (registration_id) REFERENCES Registration(registration_id),
    FOREIGN KEY (dinner_id) REFERENCES Dinner(dinner_id)
);

In [None]:
%%sql
-- Insert sample data
INSERT INTO Conference (name, start_date, end_date, location) VALUES
('SIGMOD 2025', '2025-06-22', '2025-06-27', 'Berlin, Germany');

INSERT INTO Person (name, email, affiliation) VALUES
('Alice Smith', 'alice@university.edu', 'University A'),
('Bob Jones', 'bob@college.edu', 'College B'),
('Carol White', 'carol@institute.org', 'Institute C');

INSERT INTO Dinner (conference_id, date, venue, price) VALUES
(1, '2025-06-25', 'Grand Hotel Berlin', 75.00);

INSERT INTO Registration (person_id, conference_id, name_tag_text, registration_date) VALUES
(1, 1, 'Alice Smith - University A', '2025-04-15'),
(2, 1, 'Bob Jones - College B', '2025-04-20'),
(3, 1, 'Carol White - Institute C', '2025-05-01');

INSERT INTO AttendsDinner (registration_id, dinner_id, ticket_count) VALUES
(1, 1, 2),  -- Alice ordered 2 tickets
(2, 1, 1),  -- Bob ordered 1 ticket
(3, 1, 3);  -- Carol ordered 3 tickets

In [None]:
%%sql
-- Q3(b) SOLUTION - Query for total dinner tickets
SELECT SUM(ticket_count) AS total_dinner_tickets
FROM AttendsDinner;

In [None]:
%%sql
-- Alternative: with conference details
SELECT 
    c.name AS conference,
    d.date AS dinner_date,
    d.venue,
    COUNT(*) AS registrations_with_dinner,
    SUM(ad.ticket_count) AS total_tickets
FROM Conference c
JOIN Dinner d ON c.conference_id = d.conference_id
JOIN AttendsDinner ad ON d.dinner_id = ad.dinner_id
GROUP BY c.conference_id, d.dinner_id;

In [None]:
%%sql
-- Try your own query here


## Q3(c): Double-blind review security [4 marks]

---

### Solution

MODEL CHANGES NEEDED:

1. Create views that hide sensitive columns:
   - ReviewerPaperView: shows papers without author information
   - AuthorReviewView: shows reviews without reviewer information

2. Add anonymized_pdf_path column to Paper (PDF with author names removed)

GRANT STATEMENTS:

In [None]:
%%sql
-- Q3(c) SOLUTION - Create tables for reviews
CREATE TABLE Paper (
    paper_id INT PRIMARY KEY AUTO_INCREMENT,
    title VARCHAR(300),
    abstract TEXT,
    pdf_path VARCHAR(500),
    anonymized_pdf_path VARCHAR(500),
    status ENUM('submitted', 'under_review', 'accepted', 'rejected') DEFAULT 'submitted'
);

CREATE TABLE PaperAuthor (
    paper_id INT,
    person_id INT,
    author_order INT,
    is_corresponding BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (paper_id, person_id),
    FOREIGN KEY (paper_id) REFERENCES Paper(paper_id),
    FOREIGN KEY (person_id) REFERENCES Person(person_id)
);

CREATE TABLE Review (
    review_id INT PRIMARY KEY AUTO_INCREMENT,
    paper_id INT,
    reviewer_id INT,
    score INT,
    feedback TEXT,
    recommendation ENUM('accept', 'weak_accept', 'weak_reject', 'reject'),
    FOREIGN KEY (paper_id) REFERENCES Paper(paper_id),
    FOREIGN KEY (reviewer_id) REFERENCES Person(person_id)
);

In [None]:
%%sql
-- Q3(c) SOLUTION - Views and Grants for double-blind

-- View for reviewers: shows papers WITHOUT author info
CREATE OR REPLACE VIEW ReviewerPaperView AS
SELECT paper_id, title, abstract, anonymized_pdf_path, status
FROM Paper;

-- View for authors: shows reviews WITHOUT reviewer info  
CREATE OR REPLACE VIEW AuthorReviewView AS
SELECT paper_id, score, feedback, recommendation
FROM Review;

In [None]:
%%sql
-- Try your own query here


**GRANT statements for double-blind review:**

```sql
-- Create roles
CREATE ROLE reviewer_role;
CREATE ROLE author_role;
CREATE ROLE pc_chair_role;

-- Reviewers: can see papers (without authors), can write reviews
GRANT SELECT ON ReviewerPaperView TO reviewer_role;
GRANT SELECT, INSERT, UPDATE ON Review TO reviewer_role;
-- Reviewers CANNOT see PaperAuthor table
REVOKE ALL ON PaperAuthor FROM reviewer_role;

-- Authors: can see their papers and anonymized reviews
GRANT SELECT ON Paper TO author_role;
GRANT SELECT ON AuthorReviewView TO author_role;
-- Authors CANNOT see reviewer_id in Review table
-- (handled via view - they only access AuthorReviewView)

-- PC Chair: full access for decision making
GRANT ALL ON Paper, Review, PaperAuthor TO pc_chair_role;
```

---

# Summary

## Key Concepts Covered

| Topic | Key Points |
|-------|------------|
| Database Selection | Match DB type to data structure and query patterns |
| MARC/Bibliographic | Document DB for flexible schema; FRBR model for ER |
| XML Design | Generic vs semantic elements; schema trade-offs |
| Linked Data | BIBFRAME benefits (web integration) and risks (migration) |
| ER Modeling | Identify entities, attributes, relationships, cardinalities |
| SQL Security | Views + GRANT/REVOKE for column-level access control |
| Denormalization | Avoid premature optimization; use indexes first |
| XML vs Relational | Relational better for structured, transactional data |