<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/Lectures/CM3010%20March%202022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Question 2: XML Family Tree (16th Century English Monarchy)
We have a snippet describing:

```xml
<royal name="Henry" xml:id="HenryVII">
  <title rank="king" territory="England" regnal="VII"
         from="1485-08-22" to="1509-04-21" />
  <relationship type="marriage" spouse="#ElizabethOfYork">
    <children>
      <royal name="Arthur" xml:id="ArthurTudor"/>
      <royal name="Henry" xml:id="HenryVIII">
        <title rank="king" territory="England" regnal="VIII"
               from="1509-04-22" to="1547-01-28" />
        <!-- more nested children/relationship omitted for brevity -->
      </royal>
    </children>
  </relationship>
</royal>
```

We'll **parse** a bigger sample chunk, then run XPath queries from the exam:
- (a) Identify elements vs. attributes
- (b) `//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]`
- (c) `//title[@rank="king" or @rank="queen"]/../relationship/children/royal/relationship/children/royal/`
- (d) Insert new marriage data for Mary I, etc.

## Parsing and Experimenting with lxml

In [None]:
!pip install lxml

from lxml import etree
from IPython.display import display, Markdown

# We'll define a sample genealogical XML snippet
xml_data = """
<royal name="Henry" xml:id="HenryVII">
  <title rank="king" territory="England" regnal="VII"
         from="1485-08-22" to="1509-04-21" />
  <relationship type="marriage" spouse="#ElizabethOfYork">
    <children>
      <royal name="Arthur" xml:id="ArthurTudor"/>
      <royal name="Henry" xml:id="HenryVIII">
        <title rank="king" territory="England" regnal="VIII"
               from="1509-04-22" to="1547-01-28" />
        <relationship type="marriage" spouse="#CatherineOfAragon"
                      from="1509-06-11" to="1533-05-23">
          <children>
            <royal name="Mary">
              <title rank="queen" territory="England" regnal="I"
                     from="1553-07-19" to="1558-11-17" />
              <relationship type="marriage" spouse="#PhilipOfSpain"
                            from="1554-07-25" />
            </royal>
          </children>
        </relationship>
      </royal>
    </children>
  </relationship>
</royal>
"""

In [None]:
# 2) Parse the XML
root = etree.fromstring(xml_data)
print("XML parsed successfully. Root tag =", root.tag)

In [None]:
# 3) A helper function to display each node in a list of nodes
def display_xml(nodes):
    """
    Given a list of Element nodes, convert each to a pretty-printed string
    and display it in Markdown.
    """
    for node in nodes:
        xml_str = etree.tostring(node, pretty_print=True, encoding='unicode').strip()
        display(Markdown(f"```xml\n{xml_str}\n```"))

In [None]:
# 4) Example XPath expression (Question 2(b))
xp_expr = '//title[@rank="king" and @regnal="VIII"]/../royal[@name="Henry"]'

# 5) Evaluate the expression to get a list of matching nodes
matching_nodes = root.xpath(xp_expr)

# 6) Display those matching nodes
display_xml(matching_nodes)

### Additional Task
- Insert new data for Mary I’s queen consort of Spain from "1556-01-16" to "..."
  and see if the parser holds.
- Use `root.xpath(...)` to verify your newly added node.


# **Q3(h) and (i) with a Triples Table**


## MySQL Setup

In [None]:
# Install MySQL (if in Colab/Ubuntu environment), start the service
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user & DB for bird spotting
!mysql -e "CREATE USER IF NOT EXISTS 'birduser'@'localhost' IDENTIFIED BY 'birdpass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS triple_store;"
!mysql -e "GRANT ALL PRIVILEGES ON triple_store.* TO 'birduser'@'localhost';"

# Install Python libs
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0

%reload_ext sql

import pandas as pd
pd.set_option('display.max_rows', 10)

# Connect to bird_spotter DB
%sql mysql+pymysql://birduser:birdpass@localhost/triple_store

print("MySQL ready for SPARQL question (Q3).")




W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)


 * Starting MySQL database server mysqld
   ...done.
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m59.4 MB/s[0m eta [36m0:00:00[0m
[?25hMySQL ready for SPARQL question (Q3).


We’ll show how to store RDF-like data in a **Triples** table (Subject, Predicate, Object) in MySQL, then run a **recursive CTE** to find entities whose “birthPlace” eventually leads to “New York City” up the location chain.

### Create Triples Table & Insert Data

In [None]:
%%sql
DROP TABLE IF EXISTS Triples;

CREATE TABLE Triples (
  Subject  VARCHAR(50),
  Predicate VARCHAR(50),
  Object   VARCHAR(50)
);


INSERT INTO Triples (Subject, Predicate, Object) VALUES
('SongCi', 'instanceOf', 'Human'),
('SongCi', 'birthPlace', 'New York City'),
('SongCi', 'occupation', 'Doctor'),

('NehaKapoor', 'instanceOf', 'Human'),
('NehaKapoor', 'birthPlace', 'Boston'),
('NehaKapoor', 'occupation', 'Actor')
;


 * mysql+pymysql://birduser:***@localhost/triple_store
0 rows affected.
0 rows affected.
6 rows affected.


[]

## Find Human Born in NYC

In [None]:
%%sql



## Find Doctor Born in NYC

In [None]:
%%sql



# Question 4: Hospital Database – Final Notebook

Below we implement the final solution approach:

- **Hospital** (Name)  
- **Building** (Name, HospitalName)  
- **Ward** (Name, BuildingName)  
- **Patient** (ID)  
- **PatientWardStay** bridging the “staysIn” relationship with arrival/departure  
- **Department** (Name, HospitalName)  
- **Doctor** (Name)  
- **Doctor_Department** bridging the many–many “worksAt.”  

We create them in MySQL, insert sample data, then run queries demonstrating how to answer part (a) questions.


In [None]:
# 1) Install and start MySQL server (on Colab or Debian/Ubuntu)
!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# 2) Create user & DB (e.g. 'hospital_db') for our hospital scenario
!mysql -e "CREATE USER IF NOT EXISTS 'dbuser'@'localhost' IDENTIFIED BY 'dbpass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS hospital_db;"
!mysql -e "GRANT ALL PRIVILEGES ON hospital_db.* TO 'dbuser'@'localhost';"

# 3) Install Python libs for SQL Magic
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0

# 4) Load the sql extension and configure
%reload_ext sql

import pandas as pd
pd.set_option('display.max_rows', 10)

# 5) Connect to 'hospital_db' in MySQL using our user/password
%sql mysql+pymysql://dbuser:dbpass@localhost/hospital_db

print("MySQL environment is ready. Connected to hospital_db!")


### Create Tables for the Hospital Scenario

In [None]:
%%sql
DROP TABLE IF EXISTS Doctor_Department;
DROP TABLE IF EXISTS Doctor;
DROP TABLE IF EXISTS Department;
DROP TABLE IF EXISTS PatientWardStay;
DROP TABLE IF EXISTS Ward;
DROP TABLE IF EXISTS Building;
DROP TABLE IF EXISTS Hospital;
DROP TABLE IF EXISTS Patient;

-- 1) Hospital with Name as PK
CREATE TABLE Hospital (
  Name VARCHAR(100) PRIMARY KEY
);

-- 2) Building with (Name, HospitalName) as composite PK
CREATE TABLE Building (
  Name VARCHAR(100),
  HospitalName VARCHAR(100),
  Address VARCHAR(255),
  PRIMARY KEY (Name),
  FOREIGN KEY (HospitalName) REFERENCES Hospital(Name)
);

-- 3) Ward with (Name, BuildingName, HospitalName)
CREATE TABLE Ward (
  Name VARCHAR(100),
  BuildingName VARCHAR(100),
  PRIMARY KEY (Name),
  FOREIGN KEY (BuildingName) REFERENCES Building(Name)
);

-- 4) Patient with numeric ID from original E/R
CREATE TABLE Patient (
  ID INT PRIMARY KEY,
  Name VARCHAR(100),
  DoB DATE
);

-- 5) PatientWardStay bridging "staysIn" with arrival/departure
CREATE TABLE PatientWardStay (
  PatientID INT,
  WardName VARCHAR(100),
  ArrivalDate DATE,
  DepartureDate DATE,
  PRIMARY KEY (PatientID, WardName),
  FOREIGN KEY (PatientID) REFERENCES Patient(ID),
  FOREIGN KEY (WardName) REFERENCES Ward(Name)
);

-- 6) Department with (Name, HospitalName)
CREATE TABLE Department (
  Name VARCHAR(100),
  HospitalName VARCHAR(100),
  PRIMARY KEY (Name),
  FOREIGN KEY (HospitalName) REFERENCES Hospital(Name)
);

-- 7) Doctor with Name as PK
CREATE TABLE Doctor (
  Name VARCHAR(100) PRIMARY KEY
);

-- 8) Doctor_Department bridging many–many "worksAt"
CREATE TABLE Doctor_Department (
  DoctorName VARCHAR(100),
  DeptName VARCHAR(100),
  PRIMARY KEY (DoctorName, DeptName),
  FOREIGN KEY (DoctorName) REFERENCES Doctor(Name),
  FOREIGN KEY (DeptName) REFERENCES Department(Name)
);


#### Explanation
We create 8 tables, similar to a possible design for Q4:
- Hospital, Building, Ward (1–M relationships)
- Department and Doctor (with a bridging table Doctor_Department for M–N)
- Patient and a bridging table PatientWardStay to store arrival/departure data.


### Insert Sample Data

In [None]:
%%sql
-- 1) Hospitals
INSERT INTO Hospital (Name) VALUES
('City Hospital'),
('General Hospital');

-- 2) Buildings
INSERT INTO Building (Name, HospitalName, Address) VALUES
('Main Building', 'City Hospital', 'Main Street'),
('Annex', 'City Hospital', 'Annex Lane'),
('North Wing', 'General Hospital', 'North Av');

-- 3) Wards
INSERT INTO Ward (Name, BuildingName) VALUES
('Ward A', 'Main Building'),
('Orthopedics Ward', 'Main Building'),
('Ward B', 'North Wing');

-- 4) Patients
INSERT INTO Patient (ID, Name, DoB) VALUES
(100, 'Neha Ahuja', '1990-05-12'),
(101, 'John Smith', '1985-03-22');

-- 5) PatientWardStay
INSERT INTO PatientWardStay
  (PatientID, WardName, ArrivalDate, DepartureDate)
VALUES
(100, 'Ward A', '2023-08-01', '2023-08-15'),
(101, 'Orthopedics Ward', '2023-08-05', '2023-08-10');

-- 6) Departments
INSERT INTO Department (Name, HospitalName) VALUES
('Orthopedics', 'City Hospital'),
('Accident & Emergency', 'City Hospital'),
('ENT', 'General Hospital');

-- 7) Doctors
INSERT INTO Doctor (Name) VALUES
('Dr. Song Ci'),
('Dr. Neha Kapoor');

-- 8) Doctor_Department bridging
INSERT INTO Doctor_Department (DoctorName, DeptName) VALUES
('Dr. Song Ci', 'Orthopedics'),
('Dr. Song Ci', 'Accident & Emergency'),
('Dr. Neha Kapoor', 'Accident & Emergency');


## Example Queries (Answering sub-question (a))

### (i) Which building did the patient named Neha Ahuja stay in?


In [None]:
%%sql



### (ii) Which hospital was responsible for Neha Ahuja’s stay?


In [None]:
%%sql

### (iii) e.g. "In which wards are Orthopedics patients housed?"


In [None]:
%%sql

### (iv) Which hospitals does the doctor Song Ci work in?


In [None]:
%%sql


### (v) "What departments for building X?", etc.

In [None]:
%%sql