<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/Lectures/CM3010%20March%202022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Question 2: XML Family Tree (16th Century English Monarchy)
We have a snippet describing:

```xml
<royal name="Henry" xml:id="HenryVII">
  <title rank="king" territory="England" regnal="VII"
         from="1485-08-22" to="1509-04-21" />
  <relationship type="marriage" spouse="#ElizabethOfYork">
    <children>
      <royal name="Arthur" xml:id="ArthurTudor"/>
      <royal name="Henry" xml:id="HenryVIII">
        <title rank="king" territory="England" regnal="VIII"
               from="1509-04-22" to="1547-01-28" />
        <!-- more nested children/relationship omitted for brevity -->
      </royal>
    </children>
  </relationship>
</royal>
```

We'll **parse** a bigger sample chunk, then run XPath queries from the exam:
- (a) Identify elements vs. attributes
- (b) `//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]`
- (c) `//title[@rank="king" or @rank="queen"]/../relationship/children/royal/relationship/children/royal/`
- (d) Insert new marriage data for Mary I, etc.

## Parsing and Experimenting with lxml

In [11]:
!pip install lxml

from lxml import etree
from IPython.display import display, Markdown

# We'll define a sample genealogical XML snippet
xml_data = """
<royal name="Henry" xml:id="HenryVII">
  <title rank="king" territory="England" regnal="VII"
         from="1485-08-22" to="1509-04-21" />
  <relationship type="marriage" spouse="#ElizabethOfYork">
    <children>
      <royal name="Arthur" xml:id="ArthurTudor"/>
      <royal name="Henry" xml:id="HenryVIII">
        <title rank="king" territory="England" regnal="VIII"
               from="1509-04-22" to="1547-01-28" />
        <relationship type="marriage" spouse="#CatherineOfAragon"
                      from="1509-06-11" to="1533-05-23">
          <children>
            <royal name="Mary">
              <title rank="queen" territory="England" regnal="I"
                     from="1553-07-19" to="1558-11-17" />
              <relationship type="marriage" spouse="#PhilipOfSpain"
                            from="1554-07-25" />
            </royal>
          </children>
        </relationship>
      </royal>
    </children>
  </relationship>
</royal>
"""



In [12]:
# 2) Parse the XML
root = etree.fromstring(xml_data)
print("XML parsed successfully. Root tag =", root.tag)

XML parsed successfully. Root tag = royal


In [13]:
# 3) A helper function to display each node in a list of nodes
def display_xml(nodes):
    """
    Given a list of Element nodes, convert each to a pretty-printed string
    and display it in Markdown.
    """
    for node in nodes:
        xml_str = etree.tostring(node, pretty_print=True, encoding='unicode').strip()
        display(Markdown(f"```xml\n{xml_str}\n```"))

In [19]:
# 4) Example XPath expression (Question 2(b))
xp_expr = '//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]'

# 5) Evaluate the expression to get a list of matching nodes
matching_nodes = root.xpath(xp_expr)

# 6) Display those matching nodes
display_xml(matching_nodes)

### Additional Task
- Insert new data for Mary I’s queen consort of Spain from "1556-01-16" to "..."
  and see if the parser holds.
- Use `root.xpath(...)` to verify your newly added node.


In [21]:
# 1) Install and start MySQL server (on Colab or Debian/Ubuntu)
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# 2) Create user & DB (e.g. 'hospital_db') for our hospital scenario
!mysql -e "CREATE USER IF NOT EXISTS 'dbuser'@'localhost' IDENTIFIED BY 'dbpass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS hospital_db;"
!mysql -e "GRANT ALL PRIVILEGES ON hospital_db.* TO 'dbuser'@'localhost';"

# 3) Install Python libs for SQL Magic
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0

# 4) Load the sql extension and configure
%reload_ext sql

import pandas as pd
pd.set_option('display.max_rows', 10)

# 5) Connect to 'hospital_db' in MySQL using our user/password
%sql mysql+pymysql://dbuser:dbpass@localhost/hospital_db

print("MySQL environment is ready. Connected to hospital_db!")




W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)


 * Starting MySQL database server mysqld
   ...done.
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[?25hMySQL environment is ready. Connected to hospital_db!


### Create Tables for the Hospital Scenario

In [22]:
%%sql
DROP TABLE IF EXISTS Doctor_Department;
DROP TABLE IF EXISTS Doctor;
DROP TABLE IF EXISTS Department;
DROP TABLE IF EXISTS PatientWardStay;
DROP TABLE IF EXISTS Patient;
DROP TABLE IF EXISTS Ward;
DROP TABLE IF EXISTS Building;
DROP TABLE IF EXISTS Hospital;

CREATE TABLE Hospital (
  HospitalID INT PRIMARY KEY AUTO_INCREMENT,
  Name VARCHAR(100)
);

CREATE TABLE Building (
  BuildingID INT PRIMARY KEY AUTO_INCREMENT,
  HospitalID INT,
  Name VARCHAR(100),
  FOREIGN KEY (HospitalID) REFERENCES Hospital(HospitalID)
);

CREATE TABLE Ward (
  WardID INT PRIMARY KEY AUTO_INCREMENT,
  BuildingID INT,
  Name VARCHAR(100),
  FOREIGN KEY (BuildingID) REFERENCES Building(BuildingID)
);

CREATE TABLE Department (
  DeptID INT PRIMARY KEY AUTO_INCREMENT,
  Name VARCHAR(100)
);

CREATE TABLE Doctor (
  DoctorID INT PRIMARY KEY AUTO_INCREMENT,
  Name VARCHAR(100)
);

CREATE TABLE Doctor_Department (
  DoctorID INT,
  DeptID INT,
  PRIMARY KEY(DoctorID, DeptID),
  FOREIGN KEY (DoctorID) REFERENCES Doctor(DoctorID),
  FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
);

CREATE TABLE Patient (
  PatientID INT PRIMARY KEY AUTO_INCREMENT,
  Name VARCHAR(100),
  DoB DATE
);

CREATE TABLE PatientWardStay (
  PatientID INT,
  WardID INT,
  ArrivalDate DATE,
  DepartureDate DATE,
  PRIMARY KEY(PatientID, WardID),
  FOREIGN KEY (PatientID) REFERENCES Patient(PatientID),
  FOREIGN KEY (WardID) REFERENCES Ward(WardID)
);


 * mysql+pymysql://dbuser:***@localhost/hospital_db
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.


[]

#### Explanation
We create 8 tables, similar to a possible design for Q4:
- Hospital, Building, Ward (1–M relationships)
- Department and Doctor (with a bridging table Doctor_Department for M–N)
- Patient and a bridging table PatientWardStay to store arrival/departure data.


### Insert Sample Data

In [23]:
%%sql
-- 1) Insert sample hospitals
INSERT INTO Hospital (Name) VALUES
('City Hospital'),
('General Hospital');

-- 2) Insert buildings
INSERT INTO Building (HospitalID, Name) VALUES
(1, 'Main Building'),
(1, 'Annex'),
(2, 'North Wing');

-- 3) Insert wards
INSERT INTO Ward (BuildingID, Name) VALUES
(1, 'Ward A'),
(1, 'Orthopedics Ward'),
(3, 'Ward B');

-- 4) Insert departments
INSERT INTO Department (Name) VALUES
('Orthopedics'),
('Accident & Emergency');

-- 5) Insert doctors
INSERT INTO Doctor (Name) VALUES
('Dr. Song Ci'),
('Dr. Neha Kapoor');

-- 6) Link doctors to departments (M–N)
INSERT INTO Doctor_Department (DoctorID, DeptID) VALUES
(1, 1),  -- Dr. Song Ci in Orthopedics
(1, 2),  -- Dr. Song Ci also in A&E
(2, 2);  -- Dr. Neha Kapoor in A&E

-- 7) Insert patients
INSERT INTO Patient (Name, DoB) VALUES
('Neha Ahuja', '1990-05-12'),
('John Smith', '1985-03-22');

-- 8) Insert patient-ward stays
INSERT INTO PatientWardStay (PatientID, WardID, ArrivalDate, DepartureDate) VALUES
(1, 1, '2023-08-01', '2023-08-15'),  -- Neha Ahuja in Ward A
(2, 2, '2023-08-05', '2023-08-10');  -- John Smith in Orthopedics Ward


 * mysql+pymysql://dbuser:***@localhost/hospital_db
2 rows affected.
3 rows affected.
3 rows affected.
2 rows affected.
2 rows affected.
3 rows affected.
2 rows affected.
2 rows affected.


[]

In [24]:
%%sql



 * mysql+pymysql://dbuser:***@localhost/hospital_db
