<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/past-exam-papers/mock-october-2025/notebook-mock-october-2025-solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM3010 Mock October 2025 - Solutions Notebook

This notebook contains complete solutions for the Mock October 2025 exam.

**Exam Structure:**
- Part A: 10 MCQs (not included in mock)
- Part B: Answer BOTH questions - 60 marks
  - Q2: MARC Library Catalogue (Database Selection, ER Modeling, XML/XPath)
  - Q3: Conference Management System (ER Model, SQL, Security, XML vs Relational)

---

# 1. Environment Setup

Run these cells first to set up MySQL, xmllint, and Python libraries.

In [None]:
# === MySQL Setup ===
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user and database
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS exam_db;"
!mysql -e "GRANT ALL PRIVILEGES ON *.* TO 'examuser'@'localhost' WITH GRANT OPTION;"

# === xmllint Setup (for XML/XPath) ===
!apt -y -qq install libxml2-utils > /dev/null

# === Python libraries ===
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0 lxml

%reload_ext sql
%sql mysql+pymysql://examuser:exampass@localhost/exam_db

print("MySQL ready!")
print("xmllint ready!")

---

# Question 2: MARC Library Catalogue [30 marks]

## Context

The Bodleian Library of Oxford University stores its main catalogue using MARC (MAchine-Readable Cataloging).

In [None]:
# MARC record example (displayed as text)
marc_record = """
leader 00000nam a22002772i 4500
001    990118204070107026
005    20011114100255.0
008    860101s1954    enka j       000 1 eng d
035    ##$a(UkOxU)011820407
035    ##$a(UkOxU)011820407BIB01
904    ##$aMatched
010    ##$aGB54-13352
035    ##$aOCLC ocm06935463 from D960307M
040    ##$aKWW $cKWW $dOCL $dUKM $dEQO
041    1#$aengswe
082    ##$a[Fic]
090    ##$aPZ7.L6585 $bPi5
092    ##$aD0503666694
100    1#$aLindgren, Astrid, $d1907-2002.
240    10$aPippi Långstrump. $lEnglish
245    10$aPippi Longstocking / $c[translated from the Swedish by Edna Hurup ; illustrated by Richard Kennedy]
260    ##$aLondon : $bOxford, $c[1954]
300    ##$a120 p. : $bill. ; $c21 cm
500    ##$aTranslation of Pippi Långstrump
520    ##$aEscapades of a lucky little girl who lives with a horse and a monkey--but without any parents--at the edge of a Swedish village
700    1#$aKennedy, Richard.
"""
print(marc_record)

## Q2(a): Best database system for MARC [8 marks]

**Question:** The MARC standard specifies a binary format for sharing data from this sort of catalogue, but what sort of database system would be best for storing and retrieving it? Justify your answer.

---

### SOLUTION

In [None]:
# Q2(a) SOLUTION:

answer_2a = """
RECOMMENDED: Document Database (e.g., MongoDB) or Relational Database with careful design

JUSTIFICATION:

1. Why Document Database is often preferred:
   - Schema flexibility: MARC has ~999 possible field codes, but any record uses a subset.
     Document DBs handle this without NULL-heavy tables.
   - Variable subfields: Fields like 040 have multiple $a, $c, $d subfields - stored as arrays easily.
   - Self-contained records: Each bibliographic record is a logical unit.
   - Query needs: Libraries retrieve whole records, search specific fields - well-supported.

2. Why Relational could also work:
   - Mature technology with decades of library experience.
   - Strong ACID consistency for cataloguing workflows.
   - Cross-record queries (all books by author) use natural joins.
   - Normalized authority tables prevent duplication.

3. Less suitable options:
   - Key-Value: Too limited for complex queries.
   - Graph DB: Overkill unless modeling relationships between works is primary.

The choice depends on query patterns and existing infrastructure.
"""
print(answer_2a)

## Q2(b): Representing personal name (code 100) [6 marks]

**Question:** How would you represent code 100 (personal name) in your recommended database system? What problems might arise?

---

### SOLUTION

In [None]:
# Q2(b) SOLUTION:

# Document Database representation:
document_representation = {
    "field_100": {
        "ind1": "1",
        "ind2": " ",
        "subfields": {
            "a": "Lindgren, Astrid,",
            "d": "1907-2002."
        }
    }
}

# Or with normalized Person collection:
normalized_person = {
    "_id": "person_lindgren_astrid",
    "name_surname_first": "Lindgren, Astrid",
    "name_display": "Astrid Lindgren",
    "birth_year": 1907,
    "death_year": 2002,
    "authority_ids": ["LC:n50048009", "VIAF:32783289"]
}

print("Document representation:")
print(document_representation)
print("\nNormalized person:")
print(normalized_person)

In [None]:
# Q2(b) SOLUTION - Relational representation:

relational_sql = """
CREATE TABLE Person (
    person_id INT PRIMARY KEY AUTO_INCREMENT,
    name_string VARCHAR(200),
    name_format ENUM('surname_first', 'forename_first', 'single_name'),
    birth_year SMALLINT,
    death_year SMALLINT
);

CREATE TABLE BookAuthor (
    book_id INT,
    person_id INT,
    role ENUM('main', 'contributor', 'illustrator'),
    PRIMARY KEY (book_id, person_id, role),
    FOREIGN KEY (book_id) REFERENCES Book(book_id),
    FOREIGN KEY (person_id) REFERENCES Person(person_id)
);
"""
print(relational_sql)

In [None]:
# Q2(b) SOLUTION - Problems that might arise:

problems = """
PROBLEMS THAT MIGHT ARISE:

1. Name parsing: "Lindgren, Astrid," includes trailing comma - inconsistent punctuation.

2. Date formats: "1907-2002." has trailing period; some dates are approximate ("ca. 1900")
   or partial ("1907-" for living persons).

3. Name variants: Same person may appear as:
   - "Lindgren, Astrid"
   - "Astrid Lindgren"
   - "A. Lindgren"

4. Authority control: Need to link to authority records (VIAF, LC) to identify
   the same person across different records.

5. Indicator interpretation: ind1="1" means surname first, but the application
   must know this mapping.

6. Multiple names: Pseudonyms, married names, transliterated names
   (e.g., Cyrillic to Latin).
"""
print(problems)

## Q2(c): ER model for bibliographic items [6 marks]

**Question:** Suggest Entities, Attributes and Relationships for an ER model.

---

### SOLUTION

In [None]:
# Q2(c) SOLUTION - ER Model for Bibliographic Data:

er_model = """
ENTITIES AND ATTRIBUTES:

1. Work
   - work_id (PK)
   - uniform_title
   - original_language

2. Expression
   - expression_id (PK)
   - work_id (FK)
   - language
   - form (text/audio/etc)

3. Manifestation
   - manifestation_id (PK)
   - expression_id (FK)
   - publisher
   - place
   - date
   - pages
   - dimensions
   - isbn

4. Item
   - item_id (PK)
   - manifestation_id (FK)
   - library_id (FK)
   - call_number
   - condition

5. Person
   - person_id (PK)
   - name_display
   - name_inverted
   - birth_date
   - death_date

6. CorporateBody
   - corp_id (PK)
   - name
   - location

7. Subject
   - subject_id (PK)
   - term
   - scheme (LCSH/MeSH/etc)

8. Language
   - language_code (PK)
   - language_name

RELATIONSHIPS:

1. Created: Person -> Work (M:N)
   Attributes: role (author/illustrator/translator)

2. PublishedBy: Manifestation -> CorporateBody (M:1)

3. HasSubject: Work -> Subject (M:N)

4. InLanguage: Expression -> Language (M:N)

5. TranslationOf: Expression -> Expression (M:1)

6. CataloguedBy: Manifestation -> CatalogueAgency (M:N)
   Attributes: role, date

7. HeldBy: Item -> Library (M:1)
"""
print(er_model)

## Q2(d): MARC XML

Create the MARC XML sample file:

In [None]:
%%writefile marc_sample.xml
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam a22002772i 4500</leader>
  <controlfield tag="001">990118204070107026</controlfield>
  <controlfield tag="005">20011114100255.0</controlfield>
  <controlfield tag="008">860101s1954    enka j       000 1 eng d</controlfield>
  
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">KWW</subfield>
    <subfield code="c">KWW</subfield>
    <subfield code="d">OCL</subfield>
    <subfield code="d">UKM</subfield>
    <subfield code="d">EQO</subfield>
  </datafield>
  
  <datafield tag="041" ind1="1" ind2=" ">
    <subfield code="a">engswe</subfield>
  </datafield>
  
  <datafield tag="100" ind1="1" ind2=" ">
    <subfield code="a">Lindgren, Astrid,</subfield>
    <subfield code="d">1907-2002.</subfield>
  </datafield>
  
  <datafield tag="240" ind1="1" ind2="0">
    <subfield code="a">Pippi Långstrump.</subfield>
    <subfield code="l">English</subfield>
  </datafield>
  
  <datafield tag="245" ind1="1" ind2="0">
    <subfield code="a">Pippi Longstocking /</subfield>
    <subfield code="c">[translated from the Swedish by Edna Hurup ; illustrated by Richard Kennedy]</subfield>
  </datafield>
  
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="a">London :</subfield>
    <subfield code="b">Oxford,</subfield>
    <subfield code="c">[1954]</subfield>
  </datafield>
  
  <datafield tag="700" ind1="1" ind2=" ">
    <subfield code="a">Kennedy, Richard.</subfield>
  </datafield>
</record>

In [None]:
# Verify XML is well-formed
!xmllint --noout marc_sample.xml && echo "XML is well-formed!"

## Q2(d)(i): XPath for translation information [2 marks]

**Question:** Give an XPath expression for retrieving information on whether items are translations.

---

### SOLUTION

In [None]:
# Q2(d)(i) SOLUTION:
from lxml import etree

doc = etree.parse('marc_sample.xml')
namespaces = {'marc': 'http://www.loc.gov/MARC21/slim'}

# XPath to check if item is a translation (code 041, ind1="1" means translation)
xpath_translation_check = "//marc:datafield[@tag='041']/@ind1"

# Alternative: get the full 041 field for translations
xpath_translation_full = "//marc:datafield[@tag='041' and @ind1='1']"

# Test the expressions
print("XPath expression:", xpath_translation_check)
result = doc.xpath(xpath_translation_check, namespaces=namespaces)
print("Result:", result)
print("\nInterpretation: ind1='1' means this is a translation")

# Get language codes
xpath_languages = "//marc:datafield[@tag='041']/marc:subfield[@code='a']/text()"
languages = doc.xpath(xpath_languages, namespaces=namespaces)
print("\nLanguage codes:", languages)
print("'engswe' means: English translation from Swedish")

## Q2(d)(ii): Alternative XML encoding [4 marks]

**Question:** What difference would semantic element names make? Why did they choose datafield/subfield?

---

### SOLUTION

In [None]:
# Q2(d)(ii) SOLUTION:

answer_2d_ii = """
DIFFERENCES WITH SEMANTIC ELEMENT NAMES:

| Aspect              | Generic (datafield/subfield) | Semantic (mainPerson/name) |
|---------------------|------------------------------|----------------------------|
| Schema complexity   | Simple, one schema for all   | Complex, hundreds of elements |
| Readability         | Requires MARC knowledge      | Self-documenting           |
| XPath queries       | //datafield[@tag='100']      | //mainPerson               |
| Validation          | Structure only               | Can validate content types |
| Extensibility       | Easy to add new codes        | Requires schema changes    |
| Tooling             | Generic MARC processors work | Need custom tools          |

WHY LIBRARY OF CONGRESS CHOSE GENERIC APPROACH:

1. Backward compatibility: Any valid MARC record converts without schema changes.

2. Schema stability: MARC has ~999 field codes; defining elements for each would 
   create a massive, frequently-changing schema.

3. Interoperability: Libraries worldwide use different MARC variants (MARC21, 
   UNIMARC); generic structure handles all.

4. Existing tooling: MARC processors already understand tag/subfield structure.

5. Separation of concerns: Structure (XML) separate from semantics (MARC docs).
"""
print(answer_2d_ii)

## Q2(e): BIBFRAME Linked Data [4 marks]

**Question:** What benefits and risks might the Library of Congress expect from moving to BIBFRAME?

---

### SOLUTION

In [None]:
# Q2(e) SOLUTION:

answer_2e = """
BENEFITS OF BIBFRAME/LINKED DATA:

1. Web integration: URIs for resources enable linking across the web.

2. Deduplication: Shared authority URIs (VIAF, Wikidata) reduce redundancy.

3. Richer relationships: RDF expresses complex relationships 
   (translations, adaptations).

4. Interoperability: Standard vocabularies (schema.org) enable cross-domain queries.

5. Discovery: Search engines can understand and index bibliographic data.

6. Flexibility: Add new properties without breaking existing data.


RISKS OF THE MOVE:

1. Migration cost: Billions of MARC records must be converted.

2. Training: Librarians must learn RDF, SPARQL, new workflows.

3. Tool ecosystem: Decades of MARC tools must be replaced.

4. Data loss: Some MARC nuances may not map cleanly to BIBFRAME.

5. Complexity: RDF/Linked Data has steeper learning curve.

6. Dependency: Relying on external URIs (Wikidata) creates dependencies.

7. Performance: SPARQL queries can be slower than optimized MARC searches.
"""
print(answer_2e)

---

# Question 3: Conference Management System [30 marks]

## Q3(a): ER model for conference system [14 marks]

---

### SOLUTION

In [None]:
# Q3(a) SOLUTION - ER Model:

er_model_conference = """
ENTITIES AND ATTRIBUTES:

1. Conference
   - conference_id (PK)
   - name
   - start_date
   - end_date
   - location
   - registration_deadline
   - submission_deadline

2. Person
   - person_id (PK)
   - name
   - email
   - affiliation
   - dietary_requirements

3. Paper
   - paper_id (PK)
   - title
   - abstract
   - pdf_path
   - submission_date
   - status (submitted/under_review/accepted/rejected)

4. Review
   - review_id (PK)
   - paper_id (FK)
   - reviewer_id (FK -> Person)
   - score
   - feedback
   - confidence
   - recommendation
   - submitted_date

5. Registration
   - registration_id (PK)
   - person_id (FK)
   - conference_id (FK)
   - name_tag_text
   - registration_date
   - amount_paid

6. Day
   - day_id (PK)
   - conference_id (FK)
   - date
   - description

7. Workshop
   - workshop_id (PK)
   - conference_id (FK)
   - name
   - date
   - capacity
   - extra_cost

8. Dinner
   - dinner_id (PK)
   - conference_id (FK)
   - date
   - venue
   - price

RELATIONSHIPS:

1. Submits: Person -> Paper (M:N)
   Attributes: role (author/corresponding_author), author_order

2. AssignedTo: Paper -> Person (M:N) [for reviewers]
   Attributes: assignment_date

3. RegistersFor: Person -> Conference (M:N) via Registration

4. AttendsDay: Registration -> Day (M:N)

5. AttendsWorkshop: Registration -> Workshop (M:N)

6. AttendsDinner: Registration -> Dinner (M:N)
   Attributes: ticket_count

7. PCMember: Person -> Conference (M:N)
   Attributes: role (chair/pc_member)
"""
print(er_model_conference)

## Q3(b): SQL query for dinner tickets [4 marks]

---

### SOLUTION

In [None]:
%%sql
-- Q3(b) SOLUTION - Create tables
DROP TABLE IF EXISTS AttendsDinner;
DROP TABLE IF EXISTS AttendsDay;
DROP TABLE IF EXISTS AttendsWorkshop;
DROP TABLE IF EXISTS Review;
DROP TABLE IF EXISTS PaperAuthor;
DROP TABLE IF EXISTS Paper;
DROP TABLE IF EXISTS Registration;
DROP TABLE IF EXISTS Workshop;
DROP TABLE IF EXISTS Dinner;
DROP TABLE IF EXISTS Day;
DROP TABLE IF EXISTS Person;
DROP TABLE IF EXISTS Conference;

CREATE TABLE Conference (
    conference_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL,
    start_date DATE,
    end_date DATE,
    location VARCHAR(200)
);

CREATE TABLE Person (
    person_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE,
    affiliation VARCHAR(200)
);

CREATE TABLE Dinner (
    dinner_id INT PRIMARY KEY AUTO_INCREMENT,
    conference_id INT,
    date DATE,
    venue VARCHAR(200),
    price DECIMAL(10,2),
    FOREIGN KEY (conference_id) REFERENCES Conference(conference_id)
);

CREATE TABLE Registration (
    registration_id INT PRIMARY KEY AUTO_INCREMENT,
    person_id INT,
    conference_id INT,
    name_tag_text VARCHAR(100),
    registration_date DATE,
    FOREIGN KEY (person_id) REFERENCES Person(person_id),
    FOREIGN KEY (conference_id) REFERENCES Conference(conference_id)
);

CREATE TABLE AttendsDinner (
    registration_id INT,
    dinner_id INT,
    ticket_count INT DEFAULT 1,
    PRIMARY KEY (registration_id, dinner_id),
    FOREIGN KEY (registration_id) REFERENCES Registration(registration_id),
    FOREIGN KEY (dinner_id) REFERENCES Dinner(dinner_id)
);

In [None]:
%%sql
-- Insert sample data
INSERT INTO Conference (name, start_date, end_date, location) VALUES
('SIGMOD 2025', '2025-06-22', '2025-06-27', 'Berlin, Germany');

INSERT INTO Person (name, email, affiliation) VALUES
('Alice Smith', 'alice@university.edu', 'University A'),
('Bob Jones', 'bob@college.edu', 'College B'),
('Carol White', 'carol@institute.org', 'Institute C');

INSERT INTO Dinner (conference_id, date, venue, price) VALUES
(1, '2025-06-25', 'Grand Hotel Berlin', 75.00);

INSERT INTO Registration (person_id, conference_id, name_tag_text, registration_date) VALUES
(1, 1, 'Alice Smith - University A', '2025-04-15'),
(2, 1, 'Bob Jones - College B', '2025-04-20'),
(3, 1, 'Carol White - Institute C', '2025-05-01');

INSERT INTO AttendsDinner (registration_id, dinner_id, ticket_count) VALUES
(1, 1, 2),  -- Alice ordered 2 tickets
(2, 1, 1),  -- Bob ordered 1 ticket
(3, 1, 3);  -- Carol ordered 3 tickets

In [None]:
%%sql
-- Q3(b) SOLUTION - Query for total dinner tickets
SELECT SUM(ticket_count) AS total_dinner_tickets
FROM AttendsDinner;

In [None]:
%%sql
-- Alternative: with conference details
SELECT 
    c.name AS conference,
    d.date AS dinner_date,
    d.venue,
    COUNT(*) AS registrations_with_dinner,
    SUM(ad.ticket_count) AS total_tickets
FROM Conference c
JOIN Dinner d ON c.conference_id = d.conference_id
JOIN AttendsDinner ad ON d.dinner_id = ad.dinner_id
GROUP BY c.conference_id, d.dinner_id;

## Q3(c): Double-blind review security [4 marks]

---

### SOLUTION

In [None]:
# Q3(c) SOLUTION:

answer_3c = """
MODEL CHANGES NEEDED:

1. Create views that hide sensitive columns:
   - ReviewerPaperView: shows papers without author information
   - AuthorReviewView: shows reviews without reviewer information

2. Add anonymized_pdf_path column to Paper (PDF with author names removed)

GRANT STATEMENTS:
"""
print(answer_3c)

In [None]:
%%sql
-- Q3(c) SOLUTION - Create tables for reviews
CREATE TABLE Paper (
    paper_id INT PRIMARY KEY AUTO_INCREMENT,
    title VARCHAR(300),
    abstract TEXT,
    pdf_path VARCHAR(500),
    anonymized_pdf_path VARCHAR(500),
    status ENUM('submitted', 'under_review', 'accepted', 'rejected') DEFAULT 'submitted'
);

CREATE TABLE PaperAuthor (
    paper_id INT,
    person_id INT,
    author_order INT,
    is_corresponding BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (paper_id, person_id),
    FOREIGN KEY (paper_id) REFERENCES Paper(paper_id),
    FOREIGN KEY (person_id) REFERENCES Person(person_id)
);

CREATE TABLE Review (
    review_id INT PRIMARY KEY AUTO_INCREMENT,
    paper_id INT,
    reviewer_id INT,
    score INT,
    feedback TEXT,
    recommendation ENUM('accept', 'weak_accept', 'weak_reject', 'reject'),
    FOREIGN KEY (paper_id) REFERENCES Paper(paper_id),
    FOREIGN KEY (reviewer_id) REFERENCES Person(person_id)
);

In [None]:
%%sql
-- Q3(c) SOLUTION - Views and Grants for double-blind

-- View for reviewers: shows papers WITHOUT author info
CREATE OR REPLACE VIEW ReviewerPaperView AS
SELECT paper_id, title, abstract, anonymized_pdf_path, status
FROM Paper;

-- View for authors: shows reviews WITHOUT reviewer info  
CREATE OR REPLACE VIEW AuthorReviewView AS
SELECT paper_id, score, feedback, recommendation
FROM Review;

In [None]:
# Q3(c) SOLUTION - GRANT statements (conceptual - would need actual users)

grant_statements = """
-- Create roles
CREATE ROLE reviewer_role;
CREATE ROLE author_role;
CREATE ROLE pc_chair_role;

-- Reviewers: can see papers (without authors), can write reviews
GRANT SELECT ON ReviewerPaperView TO reviewer_role;
GRANT SELECT, INSERT, UPDATE ON Review TO reviewer_role;
-- Reviewers CANNOT see PaperAuthor table
REVOKE ALL ON PaperAuthor FROM reviewer_role;

-- Authors: can see their papers and anonymized reviews
GRANT SELECT ON Paper TO author_role;
GRANT SELECT ON AuthorReviewView TO author_role;
-- Authors CANNOT see reviewer_id in Review table
-- (handled via view - they only access AuthorReviewView)

-- PC Chair: full access for decision making
GRANT ALL ON Paper, Review, PaperAuthor TO pc_chair_role;
"""
print(grant_statements)

## Q3(d): Denormalization for large conferences [4 marks]

---

### SOLUTION

In [None]:
# Q3(d) SOLUTION:

answer_3d = """
RESPONSE: Denormalization is likely UNNECESSARY and potentially harmful.

JUSTIFICATION:

1. SCALE PERSPECTIVE:
   - 1,000 attendees is SMALL for modern databases
   - MySQL easily handles millions of rows
   - Conference queries are read-heavy but not high-frequency

2. PROBLEMS WITH DENORMALIZATION:
   - Update anomalies: changing email requires multiple updates
   - Storage waste: redundant data across tables
   - Consistency risks: data can become inconsistent

3. BETTER ALTERNATIVES:
   a) Proper indexing:
      CREATE INDEX idx_reg_conf ON Registration(conference_id);
      CREATE INDEX idx_paper_status ON Paper(status);
   
   b) Query optimization: Use EXPLAIN to find slow queries
   
   c) Caching: Cache computed values at application layer
   
   d) Read replicas: For high read load, add replica databases

4. WHEN DENORMALIZATION MIGHT HELP:
   - If profiling shows specific slow queries
   - For historical reporting (snapshot tables)
   - For dashboards (materialized views)

BOTTOM LINE: At 1,000 attendees with proper indexes, query response 
times should be milliseconds. Denormalization adds complexity without 
measurable benefit at this scale. "Premature optimization is the root 
of all evil" - Donald Knuth.
"""
print(answer_3d)

## Q3(e): XML database vs relational [4 marks]

---

### SOLUTION

In [None]:
# Q3(e) SOLUTION:

answer_3e = """
COMPARISON: XML DATABASE vs RELATIONAL FOR CONFERENCE SYSTEM

POTENTIAL ADVANTAGES OF XML:

1. Document handling: Paper submissions are documents - natural fit

2. Flexible schema: Easy to add optional fields (dietary needs, accessibility)

3. Hierarchical data: Paper -> Authors -> Affiliations nests naturally

4. Integration: If other systems use XML (submission portals)


LIKELY DISADVANTAGES:

1. Query complexity: "Count dinner tickets by conference" is harder 
   in XQuery than SQL GROUP BY

2. Aggregation: SQL aggregation functions are simpler and more mature

3. Referential integrity: Harder to enforce constraints like 
   "reviewer cannot be author of same paper"

4. Performance: Relational query optimizers are more mature

5. Skills: Developers more familiar with SQL than XQuery

6. Tooling: Fewer ORMs, admin tools, hosting options for XML databases


VERDICT: For a conference system with structured entities (Person, Paper, 
Review, Registration) and relational queries (join papers with reviews, 
count registrations), a RELATIONAL DATABASE is more appropriate.

XML databases excel for document-centric, hierarchical data - not for 
transactional systems with many relationships.
"""
print(answer_3e)

---

# Summary

## Key Concepts Covered

| Topic | Key Points |
|-------|------------|
| Database Selection | Match DB type to data structure and query patterns |
| MARC/Bibliographic | Document DB for flexible schema; FRBR model for ER |
| XML Design | Generic vs semantic elements; schema trade-offs |
| Linked Data | BIBFRAME benefits (web integration) and risks (migration) |
| ER Modeling | Identify entities, attributes, relationships, cardinalities |
| SQL Security | Views + GRANT/REVOKE for column-level access control |
| Denormalization | Avoid premature optimization; use indexes first |
| XML vs Relational | Relational better for structured, transactional data |