<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/Lectures/CM3010%20MCQ%20September%202022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## SECTION 1: Question (d) – XML & XPath

### Context
The question: “Given this snippet, how many results does the XPath
`//disk[@xml:id="1847336"]/track[@duration>150]/*` select?”

We’ll parse the snippet, run that XPath, and see the count of matched nodes.

### Install & Parse XML with `lxml`

In [1]:
!pip install lxml

from lxml import etree
from IPython.display import display, Markdown

xml_data = """
<collection>
  <disk xml:id="d1847336">
    <title>The Greatest Hits Ever: Volume 123</title>
    <tracks>
      <track no="1" duration="193">
        <title>What is wrong with parsley?</title>
        <artist>Herbal Reasoning</artist>
      </track>
      <track no="2" duration="167">
        <title>Love threw me a googly</title>
        <artist>Botham and the Fielders</artist>
      </track>
      <track no="3" duration="121">
        <title>Comedy farm</title>
        <artist>Just weird</artist>
      </track>
    </tracks>
  </disk>
</collection>
"""

root = etree.fromstring(xml_data)
print("Parsed root tag:", root.tag)

Parsed root tag: collection


### Running the XPath Query


In [3]:
xp_expr = '//disk[xml:id="d1847336"]/track[@duration>150]'
nodes = root.xpath(xp_expr)

print("Number of matched nodes:", len(nodes))
for i,node in enumerate(nodes, start=1):
    snippet = etree.tostring(node, pretty_print=True, encoding='unicode').strip()
    print(f"Match {i}:\n{snippet}\n")

Number of matched nodes: 0


**Explanation:** We expect 2 tracks (duration=193, 167) each with 2 children (`<title>`, `<artist>`), so total 4.

## SECTION 2: Question (h) – SQL Joins for “Shug Avery”

### Context
We want to find staff members (Employees) who have had interactions
with a client named “Shug Avery.” The exam question asks, “How might
the query continue?” and shows multiple join approaches.

Below, we’ll create a small MySQL DB with tables: `Client`, `Employee`, `Meeting`,
and sample data. Then we can attempt different FROM/JOIN/WHERE styles.

### Install & Setup MySQL

In [5]:
# 1) MySQL installation (on Colab or Debian/Ubuntu)
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# 2) Create user & DB
!mysql -e "CREATE USER IF NOT EXISTS 'examuser'@'localhost' IDENTIFIED BY 'exampass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS question_h_db;"
!mysql -e "GRANT ALL PRIVILEGES ON question_h_db.* TO 'examuser'@'localhost';"

# 3) Python libs for SQL magic
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0
%reload_ext sql

import pandas as pd
pd.set_option('display.max_rows', 10)

# 4) Connect
%sql mysql+pymysql://examuser:exampass@localhost/question_h_db

print("MySQL ready for question (h) scenario.")



W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)


 * Starting MySQL database server mysqld
   ...done.
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m43.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m41.2 MB/s[0m eta [36m0:00:00[0m
[?25hMySQL ready for question (h) scenario.


### Create Tables & Insert Sample Data

In [6]:
%%sql
DROP TABLE IF EXISTS Meeting;
DROP TABLE IF EXISTS Employee;
DROP TABLE IF EXISTS Client;

CREATE TABLE Client (
  ClientID INT PRIMARY KEY AUTO_INCREMENT,
  givenName VARCHAR(100),
  familyName VARCHAR(100)
);

CREATE TABLE Employee (
  EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
  givenName VARCHAR(100),
  familyName VARCHAR(100)
);

CREATE TABLE Meeting (
  ID INT PRIMARY KEY AUTO_INCREMENT,
  ClientID INT,
  EmployeeID INT,
  FOREIGN KEY (ClientID) REFERENCES Client(ClientID),
  FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID)
);

INSERT INTO Client (givenName, familyName) VALUES
('Shug', 'Avery'),
('Sam', 'Adams'),
('Jane', 'Doe');

INSERT INTO Employee (givenName, familyName) VALUES
('Alice', 'Smith'),
('Bob', 'Marley');

-- Some Meeting records
INSERT INTO Meeting (ClientID, EmployeeID) VALUES
(1,1),  -- Shug Avery with Alice Smith
(1,2),  -- Shug Avery with Bob Marley
(2,1),  -- Sam Adams with Alice Smith
(3,2);  -- Jane Doe with Bob Marley

 * mysql+pymysql://examuser:***@localhost/question_h_db
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
3 rows affected.
2 rows affected.
4 rows affected.


[]

### Queries for “Shug Avery”

In [7]:
%%sql


UsageError: %%sql is a cell magic, but the cell body is empty. Did you mean the line magic %sql (single %)?


## SECTION 3: Question (i) – MongoDB Actors Born Before 1957

### Context
We want to see which query is correct for dateOfBirth < ISODate("1957-01-01").

### Setup & Insert Data in MongoDB

In [9]:
# Install MongoDB's dependencies
!sudo wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb

# Import the public key used by the package management system
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add -

# Create a list file for MongoDB
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list

# Reload the local package database
!apt-get update > /dev/null

# Install the MongoDB packages
!apt-get install -y mongodb-org > /dev/null

# Install pymongo
!pip install -q pymongo

# Create Data Folder
!mkdir -p /data/db

# Start MongoDB
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

from pymongo import MongoClient

# Establish connection to MongoDB
try:
    client = MongoClient('localhost', 27017)
    print("Connected to MongoDB")
except Exception as e:
    print("Error connecting to MongoDB: ", e)
    exit()

# List databases to check the connection
try:
    databases = client.list_database_names()
    print("Databases:", databases)
except Exception as e:
    print("Error listing databases: ", e)

# Retrieve server status
try:
    server_status = client.admin.command("serverStatus")
    print("Server Status:", server_status)
except Exception as e:
    print("Error retrieving server status: ", e)

# Perform basic database operations (Create, Read)
try:
    db = client.test_db
    collection = db.test_collection
    # Insert a document
    insert_result = collection.insert_one({"name": "test", "value": 123})
    print("Insert operation result:", insert_result.inserted_id)
    # Read a document
    read_result = collection.find_one({"name": "test"})
    print("Read operation result:", read_result)
except Exception as e:
    print("Error performing database operations: ", e)

--2025-02-08 01:17:13--  http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
Resolving archive.ubuntu.com (archive.ubuntu.com)... 185.125.190.81, 91.189.91.81, 185.125.190.83, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|185.125.190.81|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1318204 (1.3M) [application/vnd.debian.binary-package]
Saving to: ‘libssl1.1_1.1.1f-1ubuntu2_amd64.deb’


2025-02-08 01:17:13 (17.7 MB/s) - ‘libssl1.1_1.1.1f-1ubuntu2_amd64.deb’ saved [1318204/1318204]

Selecting previously unselected package libssl1.1:amd64.
(Reading database ... 125458 files and directories currently installed.)
Preparing to unpack libssl1.1_1.1.1f-1ubuntu2_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2) ...
Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /u

###  Insert Sample Actors with Mongo Shell Commands

In [11]:
# 1) We'll prepare a multiline string for data insertion.
insert_actors = """
db.actors.insert({
  "name": "ActorBorn1956",
  "dateOfBirth": ISODate("1956-05-10T00:00:00Z")
});

db.actors.insert({
  "name": "ActorBorn1958",
  "dateOfBirth": ISODate("1958-03-01T00:00:00Z")
});

db.actors.insert({
  "name": "ActorBorn1930",
  "dateOfBirth": ISODate("1930-07-20T00:00:00Z")
});
"""

# 2) Execute the shell commands
!mongo --quiet --eval '{insert_actors}'

print("Inserted 3 sample actors into the 'actors' collection.")

WriteResult({ "nInserted" : 1 })
Inserted 3 sample actors into the 'actors' collection.


### Confirm Insert with find().pretty()

In [12]:
check_insert = """
db.actors.find().pretty()
"""
!mongo --quiet --eval '{check_insert}'


{
	"_id" : ObjectId("67a6b294e4e06b976387ce87"),
	"name" : "ActorBorn1956",
	"dateOfBirth" : ISODate("1956-05-10T00:00:00Z")
}
{
	"_id" : ObjectId("67a6b294e4e06b976387ce88"),
	"name" : "ActorBorn1958",
	"dateOfBirth" : ISODate("1958-03-01T00:00:00Z")
}
{
	"_id" : ObjectId("67a6b294e4e06b976387ce89"),
	"name" : "ActorBorn1930",
	"dateOfBirth" : ISODate("1930-07-20T00:00:00Z")
}


### Testing the “Correct” Queries

In [14]:
# Option (i): findOne with "$lt: ISODate(...)"
query = """
db.actors.find(
).pretty();
"""

!mongo --quiet --eval '{query}'

{
	"_id" : ObjectId("67a6b294e4e06b976387ce87"),
	"name" : "ActorBorn1956",
	"dateOfBirth" : ISODate("1956-05-10T00:00:00Z")
}
{
	"_id" : ObjectId("67a6b294e4e06b976387ce88"),
	"name" : "ActorBorn1958",
	"dateOfBirth" : ISODate("1958-03-01T00:00:00Z")
}
{
	"_id" : ObjectId("67a6b294e4e06b976387ce89"),
	"name" : "ActorBorn1930",
	"dateOfBirth" : ISODate("1930-07-20T00:00:00Z")
}


## SECTION 4: Question (j) – RecipeML & DTD

### Context
We have a DTD snippet for a <recipe> element:
```
<!ELEMENT recipe (head, description*, equipment?, ingredients, directions, nutrition?, diet-exchanges?)>
```
Wanted statements: e.g. "exactly one <ingredients>", "it must come before <directions>", etc.

Below we show a minimal recipe snippet that obeys that order.

### Create a Minimal DTD & Save to File

In [18]:
dtd_content = """\
<!ELEMENT recipe (head, description*, equipment?, ingredients, directions, nutrition?, diet-exchanges?)>
<!ELEMENT head (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT equipment (#PCDATA)>
<!ELEMENT ingredients (item+)>
<!ELEMENT item (#PCDATA)>
<!ATTLIST item qty CDATA #IMPLIED>
<!ELEMENT directions (#PCDATA)>
<!ELEMENT nutrition (#PCDATA)>
<!ELEMENT diet-exchanges (#PCDATA)>

<!-- For demonstration, we skip real definitions of %common.att; %measurement.att; -->
"""

# We'll write it to a local file called recipe.dtd
with open("recipe.dtd", "w") as f:
    f.write(dtd_content)

print("Created recipe.dtd with minimal structure.")


Created recipe.dtd with minimal structure.


## Create the Recipe XML snippet referencing `recipe.dtd`

In [21]:
recipe_xml = """\
<!DOCTYPE recipe SYSTEM "recipe.dtd">
<recipe>
   <head>Example Recipe Head</head>
   <description>Quick description line 1</description>
   <description>Quick description line 2</description>
   <ingredients>
     <item qty="2">Eggs</item>
   </ingredients>
   <directions>Beat eggs thoroughly</directions>
</recipe>
"""

with open("test_recipe.xml", "w") as f:
    f.write(recipe_xml)

print("Created test_recipe.xml referencing our minimal DTD.")


Created test_recipe.xml referencing our minimal DTD.


### Validate the XML Using lxml

In [22]:
from lxml import etree

# Create parser with DTD validation
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
try:
    tree = etree.parse("test_recipe.xml", parser)
    print("DTD validation: SUCCESS. The XML conforms to the minimal DTD.")
except etree.XMLSyntaxError as e:
    print("DTD validation: FAILED!")
    print("Reason:", e)


DTD validation: SUCCESS. The XML conforms to the minimal DTD.
