<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/Lectures/CM3010_September_2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **1. Introduction**

This notebook is tailored to the **Bird Spotter’s Records** (MySQL),
**Music Encoding/MEI** (XML, XPath, RDF), and **Zoo Database** (MySQL) exam questions.

It includes:

1. **MySQL** setup and queries for Bird Spotter’s data and Zoo data.
2. **XML + XPath** for MEI chord and staff queries.
3. **RDF / SPARQL** examples for music data.
4. **MongoDB** examples (optional) for JSON-based data (if needed).

**Sections**:
- **Section A:** MySQL for Bird Spotter’s Records
- **Section B:** XPath + MEI Examples
- **Section C:** RDF + SPARQL for Music Encoding
- **Section D:** MySQL for Zoo Database

### **2. MySQL Setup & Bird Spotter’s Records**

#### **2.1**

## Section A: MySQL Setup (Bird Spotter’s Records)

We'll install MySQL, create a database (e.g. `bird_spotter`), and create tables that align with
**Question 2(a–f)** in the exam (e.g., a `Sightings` table, `Species` table, etc.).

In [None]:
#### **2.2 (Code)**
# ----- MySQL Installation & Setup in Colab/Ubuntu-like environment -----

!apt -qq update > /dev/null
!apt -y -qq install mysql-server > /dev/null
!service mysql start

# Create user & DB
!mysql -e "CREATE USER IF NOT EXISTS 'birduser'@'localhost' IDENTIFIED BY 'birdpass';"
!mysql -e "CREATE DATABASE IF NOT EXISTS bird_spotter;"
!mysql -e "GRANT ALL PRIVILEGES ON bird_spotter.* TO 'birduser'@'localhost';"

# Install required Python libraries
!pip install -q sqlalchemy==2.0.20 ipython-sql==0.5.0 pymysql==1.1.0 prettytable==2.0.0

%reload_ext sql

import pandas as pd
pd.set_option('display.max_rows', 10)

# Connect to the bird_spotter DB
%sql mysql+pymysql://birduser:birdpass@localhost/bird_spotter

print("MySQL for Bird Spotter’s Records is ready.")

#### **2.3** Create Tables (Q2: Bird Spotter’s `Sightings` and Normalized Tables)

We'll create:
- `Species` (SpeciesName, ConservationStatus, etc.)
- `Locations` (LocationID, NatureReserve, etc.)
- `Sightings` (SightingID, SpeciesName, Date, NumberSighted, LocationID, ...)

We can then run queries to demonstrate date filtering, distinct species, normalization, etc.

#### **2.4 Creating Tables**

In [None]:
%%sql

CREATE TABLE IF NOT EXISTS Species (
  SpeciesName VARCHAR(100) PRIMARY KEY,
  ConservationStatus VARCHAR(50)
);

CREATE TABLE IF NOT EXISTS Locations (
  LocationID INT AUTO_INCREMENT PRIMARY KEY,
  NatureReserve VARCHAR(100),
  Latitude DECIMAL(9,6),
  Longitude DECIMAL(9,6)
);

CREATE TABLE IF NOT EXISTS Sightings (
  SightingID INT AUTO_INCREMENT PRIMARY KEY,
  SpeciesName VARCHAR(100),
  Date DATE,
  NumberSighted INT,
  LocationID INT,
  FOREIGN KEY (SpeciesName) REFERENCES Species(SpeciesName),
  FOREIGN KEY (LocationID) REFERENCES Locations(LocationID)
);

#### **2.5 Insert Test Data**

In [None]:
%%sql

-- Insert sample species
INSERT IGNORE INTO Species (SpeciesName, ConservationStatus) VALUES
('Cardinal', 'Least Concern'),
('Wood pigeon', 'Least Concern'),
('Bald Eagle', 'Vulnerable'),
('Snowy Owl', 'Endangered');

-- Insert sample locations
INSERT IGNORE INTO Locations (NatureReserve, Latitude, Longitude) VALUES
('Green Meadows Reserve', 39.12345, -77.54321),
('River Edge Sanctuary', 38.98765, -77.65432);

-- Insert sample sightings
INSERT IGNORE INTO Sightings (SpeciesName, Date, NumberSighted, LocationID)
VALUES
('Cardinal', '2020-12-31', 2, 1),
('Cardinal', '2021-01-02', 4, 1),
('Wood pigeon', '2021-07-15', 10, 2),
('Bald Eagle', '2022-03-10', 1, 1),
('Snowy Owl', '2021-09-07', 2, 1);

#### **2.6 Example Queries**

1. Retrieve all bird types since Jan 1, 2021 (2a).
2. Check if table is in 1NF (2b) – (the explanation is conceptual).
3. Normalization example: we have Species, Locations, Sightings (2c).
4. Which normal form we reached? Likely 3NF (2d).
5. A query to join species & sightings for birds seen since 2021-01-01 (2e).
6. Transaction example (2f): multiple operations in a single unit.


In [None]:
#### **2.7 Query 2(a)**
%%sql
SELECT DISTINCT SpeciesName
FROM Sightings
WHERE Date >= '2021-01-01';

In [None]:
#### **2.8 Query 2(e)**
%%sql
SELECT s.SpeciesName, sp.ConservationStatus
FROM Sightings s
JOIN Species sp ON s.SpeciesName = sp.SpeciesName
WHERE s.Date >= '2021-01-01';

In [None]:
#### **2.9 Transaction Example (2f)**
%%sql
START TRANSACTION;

INSERT INTO Sightings (SpeciesName, Date, NumberSighted, LocationID)
VALUES ('Bald Eagle', '2022-05-01', 1, 2);

UPDATE Species
SET ConservationStatus = 'Endangered'
WHERE SpeciesName = 'Bald Eagle';

SELECT * FROM Sightings;

-- COMMIT;  -- Uncomment to finalize
-- ROLLBACK; -- Uncomment to revert

*(After you run the cell, you can manually comment/uncomment COMMIT or ROLLBACK to test each scenario.)*

### **3. XPath + MEI Example**

#### **3.1 Section B: XML + XPath (Music Encoding / MEI)**

We'll parse a small MEI-like snippet containing chords, staves, etc.
We can query staff `@n="2"` that contain notes with `@pname="f"`, etc.

In [None]:
#### **3.2 Install and Import `lxml`**

!pip install lxml
from lxml import etree
from IPython.display import display, Markdown

print("lxml installed. Ready for XPath on MEI-like data.")

#### **3.3 Sample MEI-Like XML**
We'll have a `<staff n=\"2\">` containing a `<layer>` and `<chord>`, and so on.

In [None]:
#### **3.4 Define MEI XML**

mei_data = """
<score>
  <staff n="1">
    <layer>
      <chord>
        <note pname="c" oct="4"/>
        <note pname="e" oct="4"/>
      </chord>
    </layer>
  </staff>

  <staff n="2">
    <layer>
      <chord>
        <note pname="f" oct="5"/>
        <note pname="a" oct="5"/>
      </chord>
      <chord>
        <note pname="g" oct="5"/>
        <note pname="c" oct="6"/>
      </chord>
    </layer>
  </staff>

  <staff n="3">
    <layer>
      <chord>
        <note pname="d" oct="4"/>
        <note pname="f" oct="4"/>
      </chord>
    </layer>
  </staff>
</score>
"""

root_mei = etree.fromstring(mei_data)
print("MEI-like XML parsed.")

#### 3.5 Query: Find All Chords in `staff n="2"` that Contain a Note `@pname="f"

In [None]:
#### **3.6 The XPath Query**

def display_xml(nodes):
    for node in nodes:
        # Convert to string
        xml_str = etree.tostring(node, pretty_print=True, encoding='unicode').strip()
        display(Markdown(f"```xml\n{xml_str}\n```"))

# This is the correct XPath approach:
chords_with_f = root_mei.xpath('//staff[@n="2"]/layer/chord[note[@pname="f"]]')
display_xml(chords_with_f)

### **4. RDF / SPARQL (Music Encoding or Extended MEI)**

#### 4.1 RDF & SPARQL for Music Encoding

We'll demonstrate how to represent a chord and notes in RDF, then query with SPARQL.

In [None]:
#### **4.2 Install & Import `rdflib`**

!pip install rdflib
from rdflib import Graph, Namespace, Literal, RDF, URIRef
from rdflib.namespace import FOAF, XSD

print("rdflib installed.")

In [None]:
#### **4.3 Creating a Simple Turtle**

%%writefile chord_data.ttl
@prefix mei: <http://example.org/mei#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/chord1> a mei:Chord ;
    mei:stemDirection "up"^^xsd:string ;
    mei:hasNote <http://example.org/note1> ,
                <http://example.org/note2> .

<http://example.org/note1> a mei:Note ;
    mei:pname "f"^^xsd:string ;
    mei:oct   "5"^^xsd:string .

<http://example.org/note2> a mei:Note ;
    mei:pname "a"^^xsd:string ;
    mei:oct   "5"^^xsd:string .

In [None]:
#### **4.4 Load & SPARQL Query**
g = Graph()
g.parse("chord_data.ttl", format="turtle")

print("Triples loaded:", len(g))

# Example SPARQL: find chords with stemDirection = "up" that have a note with pname="f"
q = """
PREFIX mei: <http://example.org/mei#>
SELECT ?chord
WHERE {
  ?chord a mei:Chord ;
         mei:stemDirection "up" ;
         mei:hasNote ?note .
  ?note mei:pname "f" .
}
"""

res = g.query(q)
for row in res:
    print("Chord found:", row.chord)