<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/mock-october-2025/notebook-mock-october-2025.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 Mock October 2025 - Practice Notebook

This notebook provides hands-on practice for the Mock October 2025 exam.

**Exam Structure:**
- Part A: 10 MCQs (not included in mock)
- Part B: Answer BOTH questions - 60 marks
  - Q2: MARC Library Catalogue (Database Selection, ER Modeling, XML/XPath)
  - Q3: Conference Management System (ER Model, SQL, Security, XML vs Relational)

**Instructions:**
1. Run the Setup cells first
2. Write your answers in the empty code cells
3. Check your answers against the solution sheet

---

# 1. Environment Setup

Run these cells first to set up MySQL, xmllint, and Python libraries.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost' WITH GRANT OPTION;"

# === xmllint Setup (for XML/XPath) ===
!apt -y -qq install libxml2-utils > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")

---

# Question 2: MARC Library Catalogue [30 marks]

## Context

The Bodleian Library of Oxford University stores its main catalogue using a standard called MARC. When navigating to a particular book in the web interface and choosing to view the source record, you see:

In [None]:
# MARC record example (displayed as text)
marc_record = """
leader 00000nam a22002772i 4500
001    990118204070107026
005    20011114100255.0
008    860101s1954    enka j       000 1 eng d
035    ##$a(UkOxU)011820407
035    ##$a(UkOxU)011820407BIB01
904    ##$aMatched
010    ##$aGB54-13352
035    ##$aOCLC ocm06935463 from D960307M
040    ##$aKWW $cKWW $dOCL $dUKM $dEQO
041    1#$aengswe
082    ##$a[Fic]
090    ##$aPZ7.L6585 $bPi5
092    ##$aD0503666694
100    1#$aLindgren, Astrid, $d1907-2002.
240    10$aPippi Långstrump. $lEnglish
245    10$aPippi Longstocking / $c[translated from the Swedish by Edna Hurup ; illustrated by Richard Kennedy]
260    ##$aLondon : $bOxford, $c[1954]
300    ##$a120 p. : $bill. ; $c21 cm
500    ##$aTranslation of Pippi Långstrump
520    ##$aEscapades of a lucky little girl who lives with a horse and a monkey--but without any parents--at the edge of a Swedish village
700    1#$aKennedy, Richard.
"""
print(marc_record)

**MARC Code Reference:**
- Opening 8 lines (up to code 035): Mostly catalogue IDs
- Code 040: Cataloguing agencies ($a=original source, $c=transcribing agency, $d=modifiers)
- Code 041: Language (1=translation, $a=language codes like "engswe")
- Code 100: Main personal name (1=surname first, $a=name string, $d=dates)

## Q2(a): Best database system for MARC [8 marks]

**Question:** The MARC standard specifies a binary format for sharing data from this sort of catalogue, but what sort of database system would be best for storing and retrieving it? Justify your answer.

In [None]:
# Q2(a) YOUR ANSWER:
# Recommended database system:
#
# Justification:
#
#

## Q2(b): Representing personal name (code 100) [6 marks]

**Question:** Code point 100 indicates the main personal name associated with the record. The leading 1 refers to the format of the name (surname first). $a indicates the name string, while $d specifies the dates. How would you represent this information in your recommended database system? What problems might arise?

In [None]:
# Q2(b) YOUR ANSWER:
# Data representation:
#
# Problems that might arise:
#
#

## Q2(c): ER model for bibliographic items [6 marks]

**Question:** Suggest Entities, Attributes and Relationships for an ER model that would represent bibliographic items such as this book. Try to include everything you can see in the record above, and any additional information you might think would be useful.

In [None]:
# Q2(c) YOUR ANSWER:
# Entities and Attributes:
#
# Relationships:
#
#

## Q2(d): MARC XML

The Library of Congress created an XML format and schema for sharing MARC data. Here's a fragment of MARC XML:

In [None]:
%%writefile marc_sample.xml
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam a22002772i 4500</leader>
  <controlfield tag="001">990118204070107026</controlfield>
  <controlfield tag="005">20011114100255.0</controlfield>
  <controlfield tag="008">860101s1954    enka j       000 1 eng d</controlfield>
  
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">KWW</subfield>
    <subfield code="c">KWW</subfield>
    <subfield code="d">OCL</subfield>
    <subfield code="d">UKM</subfield>
    <subfield code="d">EQO</subfield>
  </datafield>
  
  <datafield tag="041" ind1="1" ind2=" ">
    <subfield code="a">engswe</subfield>
  </datafield>
  
  <datafield tag="100" ind1="1" ind2=" ">
    <subfield code="a">Lindgren, Astrid,</subfield>
    <subfield code="d">1907-2002.</subfield>
  </datafield>
  
  <datafield tag="240" ind1="1" ind2="0">
    <subfield code="a">Pippi Långstrump.</subfield>
    <subfield code="l">English</subfield>
  </datafield>
  
  <datafield tag="245" ind1="1" ind2="0">
    <subfield code="a">Pippi Longstocking /</subfield>
    <subfield code="c">[translated from the Swedish by Edna Hurup ; illustrated by Richard Kennedy]</subfield>
  </datafield>
  
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="a">London :</subfield>
    <subfield code="b">Oxford,</subfield>
    <subfield code="c">[1954]</subfield>
  </datafield>
  
  <datafield tag="700" ind1="1" ind2=" ">
    <subfield code="a">Kennedy, Richard.</subfield>
  </datafield>
</record>

In [None]:
# Verify XML is well-formed
!xmllint --noout marc_sample.xml && echo "XML is well-formed!"

## Q2(d)(i): XPath for translation information [2 marks]

**Question:** Give an XPath expression for retrieving information on whether items are translations (see earlier for the relevant MARC code - hint: code 041).

In [None]:
# Q2(d)(i) YOUR XPATH EXPRESSION:
xpath_expr = ""  # Fill in your expression

In [None]:
# Test your XPath with lxml
from lxml import etree

doc = etree.parse('marc_sample.xml')
namespaces = {'marc': 'http://www.loc.gov/MARC21/slim'}

# Note: The MARC XML uses a namespace, so prefix paths with 'marc:'
# Example: //marc:datafield[@tag='041']/@ind1

if xpath_expr:
    result = doc.xpath(xpath_expr, namespaces=namespaces)
    print("XPath result:", result)
else:
    print("Please fill in your XPath expression above")

## Q2(d)(ii): Alternative XML encoding [4 marks]

An alternative approach to encoding MARC data in XML could have produced code like this:

```xml
<mainPerson>
  <name format="surnameFirst">Lindgren, Astrid,</name>
  <dates>1907-2002</dates>
</mainPerson>
```

**Question:** What difference do you think this would make for the schema and the functionality of the system? Why might they have chosen not to do this (and to use datafield and subfield instead)?

In [None]:
# Q2(d)(ii) YOUR ANSWER:
# Differences for schema and functionality:
#
# Why they chose datafield/subfield instead:
#
#

## Q2(e): BIBFRAME Linked Data [4 marks]

**Question:** The Library of Congress have moved towards developing a Linked Data standard called BIBFRAME to supersede MARC. What benefits and what risks might they expect from this move?

In [None]:
# Q2(e) YOUR ANSWER:
# Benefits:
#
# Risks:
#
#

---

# Question 3: Conference Management System [30 marks]

## Context

Academic conferences are often organised through online database applications. These cover:

**Review Process:**
- Track who has submitted papers
- Assign each paper to multiple reviewers
- Get reviewer scores and feedback
- Program committee decides which papers to accept

**Conference Registration:**
- Who is registering
- What should be printed on name tags
- When they will attend (which days for multi-day conferences)
- Any extras (workshops, dinner)

## Q3(a): ER model for conference system [14 marks]

**Question:** Develop an ER model for a basic conference system. List entities, attributes and relationships for your model (you don't need to draw anything, just answer as text).

In [None]:
# Q3(a) YOUR ANSWER:
# Entities and Attributes:
#
# Relationships:
#
#

## Q3(b): SQL query for dinner tickets [4 marks]

**Question:** Imagine you have converted your model to the relational model and implemented it as a MySQL database. Give a query that would return the number of dinner tickets ordered.

In [None]:
%%sql
-- Q3(b) First, create sample tables based on your ER model
DROP TABLE IF EXISTS AttendsDinner;
DROP TABLE IF EXISTS Dinner;
DROP TABLE IF EXISTS Registration;
DROP TABLE IF EXISTS Conference;

-- Create your tables here:


In [None]:
%%sql
-- Q3(b) Insert sample data:


In [None]:
%%sql
-- Q3(b) YOUR SQL QUERY for dinner tickets:


## Q3(c): Double-blind review security [4 marks]

**Question:** Reviews are often 'double blind'. This means that a reviewer doesn't know who authored the paper they are reviewing, and the author doesn't know who reviewed their paper. Give any GRANT statements for your database (and list any changes you'd have to make to your model) that would help support that anonymity.

In [None]:
# Q3(c) YOUR ANSWER:
# Model changes needed:
#
# GRANT statements:
#
#

In [None]:
%%sql
-- Q3(c) Example SQL for views and grants (optional - demonstrate your approach):


## Q3(d): Denormalization for large conferences [4 marks]

**Question:** A colleague looks at your database and suggests that, because some conferences can be very large with a thousand or more attendees, you should consider denormalising some tables to improve performance. How would you respond? Justify your answer.

In [None]:
# Q3(d) YOUR ANSWER:
# Response to colleague:
#
# Justification:
#
#

## Q3(e): XML database vs relational [4 marks]

**Question:** Another colleague suggests that you replace the relational database with an XML database. What advantages or disadvantages might this offer?

In [None]:
# Q3(e) YOUR ANSWER:
# Advantages of XML database:
#
# Disadvantages of XML database:
#
# Conclusion:
#

---

# Done!

Check your answers against the **solution sheet**.