<a href="https://colab.research.google.com/github/sethkipsangmutuba/SQL/blob/main/f1%20ADBMS_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Section 2: Relational Algebra and Relational Calculus

## Fundamental Unary Operations: SELECT and PROJECT

## Set-Theoretic Operations in Relational Algebra

## Binary Operations: JOIN and DIVISION

## Advanced Relational Operations

## Sample Queries Using Relational Algebra

## Tuple-Based Relational Calculus

## Domain-Based Relational Calculus

## Summary


# Relational Algebra and Relational Calculus
## A Deeper Conceptual Understanding

---

## 6.1 Core Unary Operations: Selection and Projection

These operations apply to a single relation and shape the output by filtering rows or columns.

### Selection (σ): Row Filtering

- **Purpose**: Retrieves rows satisfying a specified condition.
- **Type**: Horizontal operation (reduces rows)

**Key Traits**:
- Schema remains unchanged.
- Conditions use comparison operators (=, <, >, ≠) and logical operators (AND, OR, NOT).

### Projection (π): Column Filtering

- **Purpose**: Retrieves specific attributes (columns).
- **Type**: Vertical operation (reduces columns)

**Key Traits**:
- Eliminates duplicates (since relations are sets).
- Produces a relation with only the selected attributes.

---

In [667]:
import sqlite3
import pandas as pd
import seaborn as sns

# Load Titanic dataset and drop rows with missing data
titanic = sns.load_dataset("titanic").dropna()

# Connect to SQLite in-memory database
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

# Load DataFrame into SQLite
titanic.to_sql("Titanic", conn, index=False, if_exists="replace")

# --- Selection: Survived Passengers in 1st Class ---
print("Selection: Survived Passengers in 1st Class\n")
cursor.execute("""
SELECT * FROM Titanic
WHERE survived = 1 AND pclass = 1
""")
for row in cursor.fetchall():
    print(row)

# --- Projection: Unique Embarkation Towns ---
print("\nProjection: Unique Embarkation Towns\n")
cursor.execute("""
SELECT DISTINCT embark_town FROM Titanic
""")
for row in cursor.fetchall():
    print(row)

# --- Combined Selection + Projection: Female Survivors Under 30 ---
print("\nCombined Selection + Projection: Female Survivors Under 30\n")
cursor.execute("""
SELECT sex, age, fare, class FROM Titanic
WHERE sex = 'female' AND age < 30 AND survived = 1
""")
for row in cursor.fetchall():
    print(row)

conn.close()


Selection: Survived Passengers in 1st Class

(1, 1, 'female', 38.0, 1, 0, 71.2833, 'C', 'First', 'woman', 0, 'C', 'Cherbourg', 'yes', 0)
(1, 1, 'female', 35.0, 1, 0, 53.1, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0)
(1, 1, 'female', 58.0, 0, 0, 26.55, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 1)
(1, 1, 'male', 28.0, 0, 0, 35.5, 'S', 'First', 'man', 1, 'A', 'Southampton', 'yes', 1)
(1, 1, 'female', 49.0, 1, 0, 76.7292, 'C', 'First', 'woman', 0, 'D', 'Cherbourg', 'yes', 0)
(1, 1, 'female', 23.0, 3, 2, 263.0, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0)
(1, 1, 'male', 23.0, 0, 1, 63.3583, 'C', 'First', 'man', 1, 'D', 'Cherbourg', 'yes', 0)
(1, 1, 'female', 19.0, 0, 2, 26.2833, 'S', 'First', 'woman', 0, 'D', 'Southampton', 'yes', 0)
(1, 1, 'female', 22.0, 1, 0, 66.6, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0)
(1, 1, 'female', 44.0, 0, 0, 27.7208, 'C', 'First', 'woman', 0, 'B', 'Cherbourg', 'yes', 1)
(1, 1, 'female', 58.0, 0, 0, 146.5208, 'C', 


## 6.2 Set-Based Operations in Relational Algebra

These operations are grounded in set theory and assume relations contain unique tuples.

### Union (∪)

- Combines tuples from both relations, eliminating duplicates.

**Requirements**:
- Same number of attributes.
- Corresponding attributes from the same domains.

### Set Difference (−)

- Yields tuples in the first relation that are absent from the second.
- Also requires union compatibility.

### Intersection (∩)

- Shows tuples common to both relations.
- Can be derived from other operations.

### Cartesian Product (×)

- Produces all possible tuple combinations from two relations.
- Can result in large datasets—typically refined with selection or joins.

---

## 6.3 Binary Operations: Join and Division

These operations involve two relations and express meaningful data associations.

### Join Operations (⨝)

- **Theta Join (θ-Join)**: Combines tuples using any comparison condition (e.g., =, <, >).
- **Equi-Join**: A theta join with only equality conditions; may include duplicate attributes.
- **Natural Join**: Matches tuples with equal values in same-named attributes, removing duplicates. It is often cleaner and more intuitive.

### Division (÷)

- Used for "for all" type queries.
- Returns all values from one relation that are related to every tuple in another.
- Common use case: finding students who have taken all courses offered.

---

In [668]:
import sqlite3
import pandas as pd
import seaborn as sns

# Load and clean Titanic dataset
titanic = sns.load_dataset("titanic").dropna()

# Subset A: Female passengers
female = titanic[titanic['sex'] == 'female'][['age', 'embark_town']]

# Subset B: Passengers under 30
under_30 = titanic[titanic['age'] < 30][['age', 'embark_town']]

# Ensure union compatibility (same columns and types)
female.columns = ['age', 'embark_town']
under_30.columns = ['age', 'embark_town']

# Create SQLite in-memory DB and insert tables
conn = sqlite3.connect(":memory:")
female.to_sql("Female", conn, index=False, if_exists="replace")
under_30.to_sql("Under30", conn, index=False, if_exists="replace")
cursor = conn.cursor()

# --- UNION: All unique passengers who are either female OR under 30 ---
print("\nUNION (female ∪ under_30):\n")
cursor.execute("""
SELECT DISTINCT * FROM Female
UNION
SELECT DISTINCT * FROM Under30
""")
for row in cursor.fetchall():
    print(row)

# --- SET DIFFERENCE: Females who are NOT under 30 ---
print("\nSET DIFFERENCE (female − under_30):\n")
cursor.execute("""
SELECT DISTINCT * FROM Female
EXCEPT
SELECT DISTINCT * FROM Under30
""")
for row in cursor.fetchall():
    print(row)

# --- INTERSECTION: Passengers who are female AND under 30 ---
print("\nINTERSECTION (female ∩ under_30):\n")
cursor.execute("""
SELECT DISTINCT * FROM Female
INTERSECT
SELECT DISTINCT * FROM Under30
""")
for row in cursor.fetchall():
    print(row)

# --- CARTESIAN PRODUCT: All combinations of 2 rows (use small sample to keep it readable) ---
print("\nCARTESIAN PRODUCT (first 3 female × first 3 under_30):\n")
cursor.execute("CREATE TABLE F3 AS SELECT * FROM Female LIMIT 3")
cursor.execute("CREATE TABLE U3 AS SELECT * FROM Under30 LIMIT 3")

cursor.execute("""
SELECT F3.age, F3.embark_town, U3.age, U3.embark_town
FROM F3, U3
""")
for row in cursor.fetchall():
    print(row)

conn.close()



UNION (female ∪ under_30):

(0.92, 'Southampton')
(1.0, 'Southampton')
(2.0, 'Southampton')
(3.0, 'Southampton')
(4.0, 'Southampton')
(6.0, 'Southampton')
(11.0, 'Southampton')
(14.0, 'Southampton')
(15.0, 'Southampton')
(16.0, 'Cherbourg')
(16.0, 'Southampton')
(17.0, 'Cherbourg')
(17.0, 'Southampton')
(18.0, 'Cherbourg')
(18.0, 'Southampton')
(19.0, 'Cherbourg')
(19.0, 'Southampton')
(21.0, 'Cherbourg')
(21.0, 'Southampton')
(22.0, 'Cherbourg')
(22.0, 'Southampton')
(23.0, 'Cherbourg')
(23.0, 'Southampton')
(24.0, 'Cherbourg')
(24.0, 'Southampton')
(25.0, 'Cherbourg')
(25.0, 'Southampton')
(26.0, 'Cherbourg')
(27.0, 'Cherbourg')
(27.0, 'Southampton')
(28.0, 'Southampton')
(29.0, 'Southampton')
(30.0, 'Cherbourg')
(30.0, 'Southampton')
(31.0, 'Cherbourg')
(31.0, 'Southampton')
(32.0, 'Cherbourg')
(32.5, 'Southampton')
(33.0, 'Queenstown')
(33.0, 'Southampton')
(34.0, 'Southampton')
(35.0, 'Southampton')
(36.0, 'Cherbourg')
(36.0, 'Southampton')
(38.0, 'Cherbourg')
(39.0, 'Cherbourg')


## 6.4 Enhanced Relational Operations

These additional operators are introduced in extended relational algebra for practical use.

### Renaming (ρ)

- Assigns new names to relations or attributes.
- Useful for self-joins or naming conflict resolution.

### Extended Projection

- Allows arithmetic expressions and renaming within projection.

### Aggregation and Grouping

- Includes functions such as COUNT, AVG, SUM, MIN, and MAX.
- Not part of classical relational algebra but widely used in practice.

---


In [669]:
import sqlite3
import pandas as pd
import seaborn as sns

# Load Titanic dataset and clean missing values
titanic = sns.load_dataset("titanic").dropna()

# Connect to in-memory SQLite
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

# Load Titanic into SQLite
titanic.to_sql("Titanic", conn, index=False, if_exists="replace")

# --- 1. RENAMING (ρ): Using aliases for relation and attributes ---
print("\n1. RENAMING (aliasing table and columns):\n")
cursor.execute("""
SELECT t.age AS passenger_age, t.fare AS ticket_fare
FROM Titanic AS t
LIMIT 5
""")
for row in cursor.fetchall():
    print(row)

# --- 2. EXTENDED PROJECTION: Arithmetic expression + alias ---
print("\n2. EXTENDED PROJECTION (fare per person):\n")
cursor.execute("""
SELECT fare / (sibsp + parch + 1) AS fare_per_person
FROM Titanic
LIMIT 5
""")
for row in cursor.fetchall():
    print(row)

# --- 3. AGGREGATION AND GROUPING: COUNT, AVG, SUM, etc. ---

# Count passengers per class
print("\n3.1 COUNT: Number of passengers in each class:\n")
cursor.execute("""
SELECT class, COUNT(*) AS passenger_count
FROM Titanic
GROUP BY class
""")
for row in cursor.fetchall():
    print(row)

# Average age per gender
print("\n3.2 AVG: Average age by sex:\n")
cursor.execute("""
SELECT sex, AVG(age) AS average_age
FROM Titanic
GROUP BY sex
""")
for row in cursor.fetchall():
    print(row)

# Total fare by embark_town
print("\n3.3 SUM: Total fare collected by embark_town:\n")
cursor.execute("""
SELECT embark_town, SUM(fare) AS total_fare
FROM Titanic
GROUP BY embark_town
""")
for row in cursor.fetchall():
    print(row)

# Minimum and Maximum fare by class
print("\n3.4 MIN/MAX: Min and Max fare by class:\n")
cursor.execute("""
SELECT class, MIN(fare) AS min_fare, MAX(fare) AS max_fare
FROM Titanic
GROUP BY class
""")
for row in cursor.fetchall():
    print(row)

conn.close()



1. RENAMING (aliasing table and columns):

(38.0, 71.2833)
(35.0, 53.1)
(54.0, 51.8625)
(4.0, 16.7)
(58.0, 26.55)

2. EXTENDED PROJECTION (fare per person):

(35.64165,)
(26.55,)
(51.8625,)
(5.566666666666666,)
(26.55,)

3.1 COUNT: Number of passengers in each class:

('First', 157)
('Second', 15)
('Third', 10)

3.2 AVG: Average age by sex:

('female', 32.67613636363637)
('male', 38.38212765957447)

3.3 SUM: Total fare collected by embark_town:

('Cherbourg', 6717.262700000001)
('Queenstown', 180.0)
('Southampton', 7466.129099999999)

3.4 MIN/MAX: Min and Max fare by class:

('First', 0.0, 512.3292)
('Second', 10.5, 39.0)
('Third', 7.65, 16.7)


---

## 6.5 Tuple Relational Calculus (TRC)

A declarative query language based on first-order logic, using tuple variables.

### Structure

- Defined as: t such that P(t), where:
  - t is a tuple variable
  - P(t) is a logical predicate

### Features

- Expresses what to retrieve, not how.
- Supports logical connectors (AND, OR, NOT) and quantifiers (Exists, For all)

---

In [670]:
import sqlite3
import pandas as pd
import seaborn as sns

# Reload dataset and reconnect
titanic = sns.load_dataset("titanic").dropna()
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
titanic.to_sql("Titanic", conn, index=False, if_exists="replace")


182

### Query 1:

**TRC:**  
t | t ∈ Titanic ∧ t.pclass = 1 ∧ t.survived = 1


In [671]:
cursor.execute("SELECT * FROM Titanic WHERE pclass = 1 AND survived = 1")
print(cursor.fetchall())


[(1, 1, 'female', 38.0, 1, 0, 71.2833, 'C', 'First', 'woman', 0, 'C', 'Cherbourg', 'yes', 0), (1, 1, 'female', 35.0, 1, 0, 53.1, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (1, 1, 'female', 58.0, 0, 0, 26.55, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 1), (1, 1, 'male', 28.0, 0, 0, 35.5, 'S', 'First', 'man', 1, 'A', 'Southampton', 'yes', 1), (1, 1, 'female', 49.0, 1, 0, 76.7292, 'C', 'First', 'woman', 0, 'D', 'Cherbourg', 'yes', 0), (1, 1, 'female', 23.0, 3, 2, 263.0, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (1, 1, 'male', 23.0, 0, 1, 63.3583, 'C', 'First', 'man', 1, 'D', 'Cherbourg', 'yes', 0), (1, 1, 'female', 19.0, 0, 2, 26.2833, 'S', 'First', 'woman', 0, 'D', 'Southampton', 'yes', 0), (1, 1, 'female', 22.0, 1, 0, 66.6, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (1, 1, 'female', 44.0, 0, 0, 27.7208, 'C', 'First', 'woman', 0, 'B', 'Cherbourg', 'yes', 1), (1, 1, 'female', 58.0, 0, 0, 146.5208, 'C', 'First', 'woman', 0, 'B', 'Cherbou

### Query 2:

**TRC:**  
t | t ∈ Titanic ∧ t.sex = 'female' ∧ t.age < 30


In [672]:
cursor.execute("SELECT * FROM Titanic WHERE sex = 'female' AND age < 30")
print(cursor.fetchall())


[(1, 3, 'female', 4.0, 1, 1, 16.7, 'S', 'Third', 'child', 0, 'G', 'Southampton', 'yes', 0), (1, 2, 'female', 29.0, 0, 0, 10.5, 'S', 'Second', 'woman', 0, 'F', 'Southampton', 'yes', 1), (1, 1, 'female', 23.0, 3, 2, 263.0, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (1, 1, 'female', 19.0, 0, 2, 26.2833, 'S', 'First', 'woman', 0, 'D', 'Southampton', 'yes', 0), (1, 1, 'female', 22.0, 1, 0, 66.6, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (0, 3, 'female', 2.0, 0, 1, 10.4625, 'S', 'Third', 'child', 0, 'G', 'Southampton', 'no', 0), (0, 3, 'female', 29.0, 1, 1, 10.4625, 'S', 'Third', 'woman', 0, 'G', 'Southampton', 'no', 0), (1, 1, 'female', 19.0, 1, 0, 91.0792, 'C', 'First', 'woman', 0, 'B', 'Cherbourg', 'yes', 0), (0, 1, 'female', 2.0, 1, 2, 151.55, 'S', 'First', 'child', 0, 'C', 'Southampton', 'no', 0), (1, 1, 'female', 17.0, 1, 0, 108.9, 'C', 'First', 'woman', 0, 'C', 'Cherbourg', 'yes', 0), (1, 1, 'female', 24.0, 0, 0, 83.1583, 'C', 'First', 'woman', 0, 'C', 'Ch

### Query 3 (with ∃ Existential Quantifier):

**TRC:**  
t | ∃u (u ∈ Titanic ∧ u.embark_town = 'Southampton' ∧ t.passenger_id = u.passenger_id)  
(No passenger_id, so simulate by filtering)


In [673]:
cursor.execute("SELECT * FROM Titanic WHERE embark_town = 'Southampton'")
print(cursor.fetchall())


[(1, 1, 'female', 35.0, 1, 0, 53.1, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (0, 1, 'male', 54.0, 0, 0, 51.8625, 'S', 'First', 'man', 1, 'E', 'Southampton', 'no', 1), (1, 3, 'female', 4.0, 1, 1, 16.7, 'S', 'Third', 'child', 0, 'G', 'Southampton', 'yes', 0), (1, 1, 'female', 58.0, 0, 0, 26.55, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 1), (1, 2, 'male', 34.0, 0, 0, 13.0, 'S', 'Second', 'man', 1, 'D', 'Southampton', 'yes', 1), (1, 1, 'male', 28.0, 0, 0, 35.5, 'S', 'First', 'man', 1, 'A', 'Southampton', 'yes', 1), (0, 1, 'male', 19.0, 3, 2, 263.0, 'S', 'First', 'man', 1, 'C', 'Southampton', 'no', 0), (0, 1, 'male', 45.0, 1, 0, 83.475, 'S', 'First', 'man', 1, 'C', 'Southampton', 'no', 0), (1, 2, 'female', 29.0, 0, 0, 10.5, 'S', 'Second', 'woman', 0, 'F', 'Southampton', 'yes', 1), (0, 3, 'male', 25.0, 0, 0, 7.65, 'S', 'Third', 'man', 1, 'F', 'Southampton', 'no', 1), (1, 1, 'female', 23.0, 3, 2, 263.0, 'S', 'First', 'woman', 0, 'C', 'Southampton', 'yes', 0), (0, 1,


## 6.6 Domain Relational Calculus (DRC)

Similar to TRC but uses domain variables (individual attribute values).

### Structure

- Defined as: a list of domain variables such that a predicate is true.

### Features

- Closer to SQL-style syntax.
- Requires explicit attribute ordering and naming.

---

**DRC:**  
<sex, age> | sex = 'female' ∧ age < 30  


In [674]:
cursor.execute("SELECT sex, age FROM Titanic WHERE sex = 'female' AND age < 30")
print(cursor.fetchall())


[('female', 4.0), ('female', 29.0), ('female', 23.0), ('female', 19.0), ('female', 22.0), ('female', 2.0), ('female', 29.0), ('female', 19.0), ('female', 2.0), ('female', 17.0), ('female', 24.0), ('female', 18.0), ('female', 16.0), ('female', 24.0), ('female', 24.0), ('female', 22.0), ('female', 24.0), ('female', 23.0), ('female', 24.0), ('female', 14.0), ('female', 23.0), ('female', 25.0), ('female', 16.0), ('female', 22.0), ('female', 18.0), ('female', 4.0), ('female', 21.0), ('female', 24.0), ('female', 15.0), ('female', 18.0), ('female', 24.0), ('female', 27.0), ('female', 29.0), ('female', 21.0), ('female', 17.0), ('female', 27.0), ('female', 16.0), ('female', 19.0)]


**DRC:**  
<class, COUNT(*)> | group by class  

In [675]:
cursor.execute("SELECT class, COUNT(*) FROM Titanic GROUP BY class")
print(cursor.fetchall())


[('First', 157), ('Second', 15), ('Third', 10)]


**DRC:**  
<embark_town, AVG(fare)> | group by embark_town


In [676]:
cursor.execute("SELECT embark_town, AVG(fare) FROM Titanic GROUP BY embark_town")
print(cursor.fetchall())


[('Cherbourg', 103.3425030769231), ('Queenstown', 90.0), ('Southampton', 64.92286173913043)]



## 6.7 Summary of Key Concepts

| Concept              | Relational Algebra                      | Relational Calculus                   |
|----------------------|-----------------------------------------|----------------------------------------|
| **Approach**         | Procedural (how to get result)          | Declarative (what to retrieve)         |
| **Foundation**       | Set operations and algebraic operators  | First-order predicate logic            |
| **Variants**         | Basic + Extended (joins, aggregates)    | TRC and DRC                            |
| **Relation to SQL**  | Forms structural backbone               | Shapes declarative syntax              |
| **Expressive Power** | Equivalent to relational calculus       | Equivalent to relational algebra       |
| **Ideal Use Case**   | Query optimization, planning            | Logical reasoning, expressing queries  |


# Relational Algebra Operations from Set Theory (Applied to the Titanic Dataset)

Relational algebra includes operations that come from standard set theory. These operations—UNION, INTERSECTION, and SET DIFFERENCE (MINUS)—allow us to merge or compare rows (tuples) from two datasets (relations) in various ways. To apply them, both relations must have the same number and type of columns. This compatibility ensures that the combined rows are meaningful and properly aligned.

Let’s understand each operation through examples based on the Titanic dataset, which contains information about passengers on the RMS Titanic.

---

##  UNION Operation

This operation merges two sets of rows and removes duplicates. It returns all the distinct tuples that are present in either or both datasets.

**Titanic example:**  
Suppose we have two sets of passengers:

- One set includes passengers who embarked from Southampton  
- Another includes passengers who survived the sinking

If we apply the UNION operation, we get a list of unique passengers who either embarked from Southampton, survived, or both.

---


###UNION: Embarked from Southampton OR Survived

In [677]:
cursor.execute("""
SELECT DISTINCT sex, age, embark_town FROM Titanic WHERE embark_town = 'Southampton'
UNION
SELECT DISTINCT sex, age, embark_town FROM Titanic WHERE survived = 1
""")
print(cursor.fetchall())


[('female', 2.0, 'Southampton'), ('female', 4.0, 'Southampton'), ('female', 14.0, 'Southampton'), ('female', 15.0, 'Southampton'), ('female', 16.0, 'Cherbourg'), ('female', 16.0, 'Southampton'), ('female', 17.0, 'Cherbourg'), ('female', 17.0, 'Southampton'), ('female', 18.0, 'Cherbourg'), ('female', 18.0, 'Southampton'), ('female', 19.0, 'Cherbourg'), ('female', 19.0, 'Southampton'), ('female', 21.0, 'Cherbourg'), ('female', 21.0, 'Southampton'), ('female', 22.0, 'Cherbourg'), ('female', 22.0, 'Southampton'), ('female', 23.0, 'Cherbourg'), ('female', 23.0, 'Southampton'), ('female', 24.0, 'Cherbourg'), ('female', 24.0, 'Southampton'), ('female', 25.0, 'Southampton'), ('female', 27.0, 'Southampton'), ('female', 29.0, 'Southampton'), ('female', 30.0, 'Cherbourg'), ('female', 30.0, 'Southampton'), ('female', 31.0, 'Cherbourg'), ('female', 31.0, 'Southampton'), ('female', 32.0, 'Cherbourg'), ('female', 32.5, 'Southampton'), ('female', 33.0, 'Queenstown'), ('female', 33.0, 'Southampton'), (


## INTERSECTION Operation

This operation returns only those tuples that are common to both datasets.

**Titanic example:**  
Using the same sets as above, the INTERSECTION would return the passengers who both embarked from Southampton and survived. This helps narrow down the dataset to those who satisfy both criteria.

---

###NTERSECTION: Embarked from Southampton AND Survived

In [678]:
cursor.execute("""
SELECT DISTINCT sex, age, embark_town FROM Titanic WHERE embark_town = 'Southampton'
INTERSECT
SELECT DISTINCT sex, age, embark_town FROM Titanic WHERE survived = 1
""")
print(cursor.fetchall())


[('female', 4.0, 'Southampton'), ('female', 14.0, 'Southampton'), ('female', 15.0, 'Southampton'), ('female', 16.0, 'Southampton'), ('female', 17.0, 'Southampton'), ('female', 18.0, 'Southampton'), ('female', 19.0, 'Southampton'), ('female', 21.0, 'Southampton'), ('female', 22.0, 'Southampton'), ('female', 23.0, 'Southampton'), ('female', 24.0, 'Southampton'), ('female', 27.0, 'Southampton'), ('female', 29.0, 'Southampton'), ('female', 30.0, 'Southampton'), ('female', 31.0, 'Southampton'), ('female', 32.5, 'Southampton'), ('female', 33.0, 'Southampton'), ('female', 34.0, 'Southampton'), ('female', 35.0, 'Southampton'), ('female', 36.0, 'Southampton'), ('female', 39.0, 'Southampton'), ('female', 40.0, 'Southampton'), ('female', 43.0, 'Southampton'), ('female', 47.0, 'Southampton'), ('female', 48.0, 'Southampton'), ('female', 49.0, 'Southampton'), ('female', 51.0, 'Southampton'), ('female', 52.0, 'Southampton'), ('female', 53.0, 'Southampton'), ('female', 58.0, 'Southampton'), ('female',


## SET DIFFERENCE (MINUS) Operation

This operation gives the tuples that are present in one dataset but not in the other.

**Titanic example:**  
If we take the list of all passengers who embarked from Southampton, and subtract from it the list of those who survived, the result will show Southampton passengers who did not survive.

The difference is directional—subtracting A from B is not the same as subtracting B from A. So in this case, if we instead subtract Southampton passengers from survivors, we’d get survivors who did not embark from Southampton.

---

###Southampton passengers minus survivors

In [679]:
cursor.execute("""
SELECT DISTINCT sex, age, embark_town FROM Titanic WHERE embark_town = 'Southampton'
EXCEPT
SELECT DISTINCT sex, age, embark_town FROM Titanic WHERE survived = 1
""")
print(cursor.fetchall())


[('female', 2.0, 'Southampton'), ('female', 25.0, 'Southampton'), ('female', 57.0, 'Southampton'), ('male', 19.0, 'Southampton'), ('male', 21.0, 'Southampton'), ('male', 25.0, 'Southampton'), ('male', 29.0, 'Southampton'), ('male', 33.0, 'Southampton'), ('male', 36.5, 'Southampton'), ('male', 39.0, 'Southampton'), ('male', 40.0, 'Southampton'), ('male', 45.0, 'Southampton'), ('male', 45.5, 'Southampton'), ('male', 46.0, 'Southampton'), ('male', 47.0, 'Southampton'), ('male', 50.0, 'Southampton'), ('male', 54.0, 'Southampton'), ('male', 55.0, 'Southampton'), ('male', 61.0, 'Southampton'), ('male', 62.0, 'Southampton'), ('male', 64.0, 'Southampton'), ('male', 65.0, 'Southampton'), ('male', 70.0, 'Southampton')]



##  Important Notes on Compatibility

For these operations to work:

- The two relations must have the same number of attributes (columns)  
- The attributes must be of the same types, such as both being names, ages, or ticket numbers

This condition is called **union compatibility**. If it’s not met, the operations won’t produce a valid result.

---

##  Cartesian Product (CROSS PRODUCT)

The Cartesian Product pairs every row from one table with every row from another. This operation can result in a very large number of combinations and is often only meaningful when followed by a filtering condition.

**Titanic example:**  
Suppose we have a list of adult female passengers, and another list of cabin records. If we pair every woman with every cabin, we get many meaningless combinations. However, if we then filter those combinations to keep only the ones where the cabin assignment matches the passenger, we arrive at valid pairings.

This shows that the Cartesian Product is rarely useful on its own, but becomes powerful when used together with filters that enforce logical relationships between data points.

---

###Cartesian Product Example (using class)

In [680]:
# Step 1: Create adult female passengers (18+)
cursor.execute("""
CREATE TEMP TABLE AdultFemales AS
SELECT sex, age, class FROM Titanic
WHERE sex = 'female' AND age >= 18
""")

# Step 2: Create distinct classes
cursor.execute("""
CREATE TEMP TABLE Classes AS
SELECT DISTINCT class FROM Titanic
""")

# Step 3: Cartesian Product (all combinations)
cursor.execute("""
SELECT f.class AS passenger_class, c.class AS matched_class
FROM AdultFemales f, Classes c
""")
print("Cartesian Product (no filter):")
print(cursor.fetchall())

# Step 4: Filtered Cartesian Product (matching class only)
cursor.execute("""
SELECT f.sex, f.age, f.class
FROM AdultFemales f, Classes c
WHERE f.class = c.class
""")
print("\nFiltered Cartesian Product (matching class only):")
print(cursor.fetchall())


Cartesian Product (no filter):
[('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('Second', 'First'), ('Second', 'Third'), ('Second', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('Second', 'First'), ('Second', 'Third'), ('Second', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('First', 'First'), ('First', 'Third'), ('First', 'Second'), ('Third', 'First'), ('Third', 'Third'), ('Third


## Summary of Set-Based Operations in the Titanic Dataset Context

| Operation         | What It Does                                 | Titanic Example                                                  |
|------------------|-----------------------------------------------|------------------------------------------------------------------|
| UNION            | Combines rows from both sets, removing duplicates | All passengers who either survived or embarked from Southampton |
| INTERSECTION     | Keeps only rows that appear in both sets      | Passengers who survived and also embarked from Southampton      |
| SET DIFFERENCE   | Removes rows of one set from another          | Southampton passengers who didn’t survive                       |
| CARTESIAN PRODUCT| Combines all row pairs from two tables        | All possible pairings of women and cabin records                |


# 6.8 Binary Relational Operations: JOIN and DIVISION (Using the Titanic Dataset)

In relational algebra, binary operations involve combining two datasets (relations) based on some relationship between their values. The two main binary operations are:

- **JOIN**: Combines related information across different datasets  
- **DIVISION**: Extracts tuples that match all values in another set  

Let’s explore these operations using the Titanic dataset.

---

## 🔹 JOIN Operation

The JOIN operation is one of the most powerful tools in relational algebra. It connects rows from two datasets when they share common attribute values.

**Titanic example (JOIN):**  
Imagine the Titanic data is split into two relations:

- One table has passenger information (name, age, ticket number, etc.)  
- Another table contains ticket details, like fare and cabin  

If both datasets contain a common attribute such as the ticket number, we can perform a JOIN to combine information from both. This would let us know not only the passenger's name and age but also how much they paid and where they stayed.

This operation is especially useful when datasets are normalized — meaning related data is separated into different tables to reduce redundancy. JOIN lets you put the pieces back together.

---

In [681]:
# Step 1: Create Passengers table
cursor.execute("""
CREATE TEMP TABLE Passengers AS
SELECT sex, age, fare FROM Titanic
WHERE fare IS NOT NULL
""")

# Step 2: Create Fares table with 'fare group' (simulating ticket detail)
cursor.execute("""
CREATE TEMP TABLE Fares AS
SELECT DISTINCT fare,
       CASE
         WHEN fare < 10 THEN 'Low'
         WHEN fare < 50 THEN 'Medium'
         ELSE 'High'
       END AS fare_group
FROM Titanic
WHERE fare IS NOT NULL
""")

# Step 3: JOIN on fare
cursor.execute("""
SELECT p.sex, p.age, p.fare, f.fare_group
FROM Passengers p
JOIN Fares f ON p.fare = f.fare
""")
print("JOIN Result (Passengers + Fares):")
cursor.fetchall()


JOIN Result (Passengers + Fares):


[('female', 38.0, 71.2833, 'High'),
 ('female', 35.0, 53.1, 'High'),
 ('male', 54.0, 51.8625, 'High'),
 ('female', 4.0, 16.7, 'Medium'),
 ('female', 58.0, 26.55, 'Medium'),
 ('male', 34.0, 13.0, 'Medium'),
 ('male', 28.0, 35.5, 'Medium'),
 ('male', 19.0, 263.0, 'High'),
 ('female', 49.0, 76.7292, 'High'),
 ('male', 65.0, 61.9792, 'High'),
 ('male', 45.0, 83.475, 'High'),
 ('female', 29.0, 10.5, 'Medium'),
 ('male', 25.0, 7.65, 'Low'),
 ('female', 23.0, 263.0, 'High'),
 ('male', 46.0, 61.175, 'High'),
 ('male', 71.0, 34.6542, 'Medium'),
 ('male', 23.0, 63.3583, 'High'),
 ('male', 21.0, 77.2875, 'High'),
 ('male', 47.0, 52.0, 'High'),
 ('male', 24.0, 247.5208, 'High'),
 ('female', 32.5, 13.0, 'Medium'),
 ('male', 54.0, 77.2875, 'High'),
 ('female', 19.0, 26.2833, 'Medium'),
 ('male', 37.0, 53.1, 'High'),
 ('male', 24.0, 79.2, 'High'),
 ('male', 36.5, 26.0, 'Medium'),
 ('female', 22.0, 66.6, 'High'),
 ('male', 61.0, 33.5, 'Medium'),
 ('male', 56.0, 30.6958, 'Medium'),
 ('female', 50.0, 28

## 🔸 Variations of the JOIN Operation

There are several forms of JOIN, depending on how strictly you want to match rows between datasets:

- **Inner Join**: Keeps only those rows where the values match in both datasets  
  → *e.g., passengers with known cabin numbers*

- **Left Join (or Outer Join)**: Keeps all rows from the first dataset and matches from the second if possible  
  → *e.g., all passengers, showing cabin details only if available*

- **Equijoin**: A JOIN where the matching condition is based solely on equality  
  → *e.g., matching ticket numbers exactly*

JOIN operations help us explore connections, such as which passengers paid similar fares, who shared cabins, or families that traveled together.

---

### Inner Join – Passengers with Fare Group (only matching rows)

In [682]:
cursor.execute("""
CREATE TEMP TABLE FareGroups AS
SELECT DISTINCT fare,
       CASE
         WHEN fare < 10 THEN 'Low'
         WHEN fare < 50 THEN 'Medium'
         ELSE 'High'
       END AS fare_group
FROM Titanic
WHERE fare IS NOT NULL
""")

cursor.execute("""
SELECT t.sex, t.age, t.fare, f.fare_group
FROM Titanic t
INNER JOIN FareGroups f ON t.fare = f.fare
""")
print("Inner Join (only matching fares):")
print(cursor.fetchall())


Inner Join (only matching fares):
[('female', 38.0, 71.2833, 'High'), ('female', 35.0, 53.1, 'High'), ('male', 54.0, 51.8625, 'High'), ('female', 4.0, 16.7, 'Medium'), ('female', 58.0, 26.55, 'Medium'), ('male', 34.0, 13.0, 'Medium'), ('male', 28.0, 35.5, 'Medium'), ('male', 19.0, 263.0, 'High'), ('female', 49.0, 76.7292, 'High'), ('male', 65.0, 61.9792, 'High'), ('male', 45.0, 83.475, 'High'), ('female', 29.0, 10.5, 'Medium'), ('male', 25.0, 7.65, 'Low'), ('female', 23.0, 263.0, 'High'), ('male', 46.0, 61.175, 'High'), ('male', 71.0, 34.6542, 'Medium'), ('male', 23.0, 63.3583, 'High'), ('male', 21.0, 77.2875, 'High'), ('male', 47.0, 52.0, 'High'), ('male', 24.0, 247.5208, 'High'), ('female', 32.5, 13.0, 'Medium'), ('male', 54.0, 77.2875, 'High'), ('female', 19.0, 26.2833, 'Medium'), ('male', 37.0, 53.1, 'High'), ('male', 24.0, 79.2, 'High'), ('male', 36.5, 26.0, 'Medium'), ('female', 22.0, 66.6, 'High'), ('male', 61.0, 33.5, 'Medium'), ('male', 56.0, 30.6958, 'Medium'), ('female', 50.

###Left Join – All passengers, show fare group if available

In [683]:
cursor.execute("""
SELECT t.sex, t.age, t.fare, f.fare_group
FROM Titanic t
LEFT JOIN FareGroups f ON t.fare = f.fare
""")
print("\nLeft Join (all passengers, group if exists):")
print(cursor.fetchall())



Left Join (all passengers, group if exists):
[('female', 38.0, 71.2833, 'High'), ('female', 35.0, 53.1, 'High'), ('male', 54.0, 51.8625, 'High'), ('female', 4.0, 16.7, 'Medium'), ('female', 58.0, 26.55, 'Medium'), ('male', 34.0, 13.0, 'Medium'), ('male', 28.0, 35.5, 'Medium'), ('male', 19.0, 263.0, 'High'), ('female', 49.0, 76.7292, 'High'), ('male', 65.0, 61.9792, 'High'), ('male', 45.0, 83.475, 'High'), ('female', 29.0, 10.5, 'Medium'), ('male', 25.0, 7.65, 'Low'), ('female', 23.0, 263.0, 'High'), ('male', 46.0, 61.175, 'High'), ('male', 71.0, 34.6542, 'Medium'), ('male', 23.0, 63.3583, 'High'), ('male', 21.0, 77.2875, 'High'), ('male', 47.0, 52.0, 'High'), ('male', 24.0, 247.5208, 'High'), ('female', 32.5, 13.0, 'Medium'), ('male', 54.0, 77.2875, 'High'), ('female', 19.0, 26.2833, 'Medium'), ('male', 37.0, 53.1, 'High'), ('male', 24.0, 79.2, 'High'), ('male', 36.5, 26.0, 'Medium'), ('female', 22.0, 66.6, 'High'), ('male', 61.0, 33.5, 'Medium'), ('male', 56.0, 30.6958, 'Medium'), ('

###Equijoin – Same as inner join but emphasizes equality condition

In [684]:
cursor.execute("""
SELECT t.sex, t.age, t.fare, f.fare_group
FROM Titanic t
JOIN FareGroups f ON t.fare = f.fare  -- equality condition (equijoin)
""")
print("\nEquijoin (fare equality):")
print(cursor.fetchall())



Equijoin (fare equality):
[('female', 38.0, 71.2833, 'High'), ('female', 35.0, 53.1, 'High'), ('male', 54.0, 51.8625, 'High'), ('female', 4.0, 16.7, 'Medium'), ('female', 58.0, 26.55, 'Medium'), ('male', 34.0, 13.0, 'Medium'), ('male', 28.0, 35.5, 'Medium'), ('male', 19.0, 263.0, 'High'), ('female', 49.0, 76.7292, 'High'), ('male', 65.0, 61.9792, 'High'), ('male', 45.0, 83.475, 'High'), ('female', 29.0, 10.5, 'Medium'), ('male', 25.0, 7.65, 'Low'), ('female', 23.0, 263.0, 'High'), ('male', 46.0, 61.175, 'High'), ('male', 71.0, 34.6542, 'Medium'), ('male', 23.0, 63.3583, 'High'), ('male', 21.0, 77.2875, 'High'), ('male', 47.0, 52.0, 'High'), ('male', 24.0, 247.5208, 'High'), ('female', 32.5, 13.0, 'Medium'), ('male', 54.0, 77.2875, 'High'), ('female', 19.0, 26.2833, 'Medium'), ('male', 37.0, 53.1, 'High'), ('male', 24.0, 79.2, 'High'), ('male', 36.5, 26.0, 'Medium'), ('female', 22.0, 66.6, 'High'), ('male', 61.0, 33.5, 'Medium'), ('male', 56.0, 30.6958, 'Medium'), ('female', 50.0, 28.7



## 🔹 DIVISION Operation

The DIVISION operation is more complex and used in narrower scenarios. It’s often described as finding “those who match all values” in another set.

**Titanic example (DIVISION):**  
Suppose we want to find passengers who traveled with all their children. You could set up:

- One relation that lists all parent-child relationships  
- Another that lists all the children known to be on board  

The DIVISION operation would return only those passengers who are associated with every child in the second set — i.e., parents whose full families were on the ship.

Another use could be identifying passengers who accessed every class section (first, second, and third class) — though in reality, Titanic passengers usually remained in one class. This illustrates that the DIVISION operator works best when analyzing complete relationships across categories.

---

## 🔸 Summary

| Operation | Purpose                                   | Titanic Example                                                      |
|-----------|-------------------------------------------|----------------------------------------------------------------------|
| JOIN      | Combines related records from two datasets| Linking passenger details with ticket or cabin data                 |
| DIVISION  | Finds records matching all values in another set | Identifying passengers who were linked to all listed dependents or visited all sections |

JOIN is far more common and practical in most real-world scenarios. DIVISION is more abstract and used when completeness of relationship is required.


# 6.4 Additional Relational Operations (Using Titanic Dataset)

Modern relational databases often require queries that go beyond basic relational algebra. These enhanced operations add expressive power and are commonly used in analytics and reporting.

---

## 🔹 6.4.1 Generalized Projection

**Definition**: Extends projection by allowing arithmetic or computed expressions over attributes.

**Titanic Example**:

Suppose we want to calculate:

- Survival Score = fare / age  
- Adjusted Fare = fare + 15 (service fee)

Generalized projection allows us to include these derived columns in our result.

**Use case**: Ideal for reporting tasks that involve computed values like "discounted fare" or "fare per family member".

---


In [685]:
cursor.execute("""
SELECT
    survived,
    age,
    fare,
    ROUND(fare / age, 2) AS survival_score,
    ROUND(fare + 15, 2) AS adjusted_fare
FROM Titanic
WHERE age IS NOT NULL AND fare IS NOT NULL
LIMIT 10
""")
print(cursor.fetchall())


[(1, 38.0, 71.2833, 1.88, 86.28), (1, 35.0, 53.1, 1.52, 68.1), (0, 54.0, 51.8625, 0.96, 66.86), (1, 4.0, 16.7, 4.18, 31.7), (1, 58.0, 26.55, 0.46, 41.55), (1, 34.0, 13.0, 0.38, 28.0), (1, 28.0, 35.5, 1.27, 50.5), (0, 19.0, 263.0, 13.84, 278.0), (1, 49.0, 76.7292, 1.57, 91.73), (0, 65.0, 61.9792, 0.95, 76.98)]



## 🔹 6.4.2 Aggregate Functions & Grouping

**Definition**: Enables use of functions like SUM, COUNT, AVG, MIN, MAX on groups of tuples.

Grouping allows applying these functions per category (e.g., per class, gender).

**Titanic Example**:

- Get average fare per passenger class  
- Count number of survivors per gender

**Use case**: Statistical summaries and dashboards — e.g., survival rates grouped by class and gender.

---


###Average fare per passenger class

In [686]:
cursor.execute("""
SELECT pclass, ROUND(AVG(fare), 2) AS avg_fare
FROM Titanic
WHERE fare IS NOT NULL
GROUP BY pclass
""")
print("Average Fare per Class:")
print(cursor.fetchall())


Average Fare per Class:
[(1, 89.02), (2, 18.44), (3, 11.03)]


###Number of survivors per gender

In [687]:
cursor.execute("""
SELECT sex, COUNT(*) AS survivor_count
FROM Titanic
WHERE survived = 1
GROUP BY sex
""")
print("\nSurvivors by Gender:")
print(cursor.fetchall())



Survivors by Gender:
[('female', 82), ('male', 41)]



## 🔹 6.4.3 Recursive Closure

**Definition**: Retrieves data involving recursive relationships (hierarchies).

**Challenge**: Regular JOINs only go one level deep; recursive closure can find multi-level connections.

**Titanic Example**:

Though Titanic doesn’t have hierarchical employee relationships like supervisor chains, imagine a situation where we model family relationships.

Recursive closure would help trace multi-generational family groups, e.g., parents → children → grandchildren on board.

---


In [688]:
# Create Family table and insert data
cursor.execute("DROP TABLE IF EXISTS Family")

cursor.execute("""
CREATE TABLE Family (
    parent TEXT,
    child TEXT
)
""")

cursor.executemany("""
INSERT INTO Family (parent, child) VALUES (?, ?)
""", [
    ('Anna', 'Beth'),
    ('Beth', 'Cara'),
    ('Cara', 'Dana')
])

# Recursive closure: get all descendants of Anna
cursor.execute("""
WITH RECURSIVE Descendants(person) AS (
    SELECT child FROM Family WHERE parent = 'Anna'
    UNION
    SELECT Family.child
    FROM Family
    JOIN Descendants ON Family.parent = Descendants.person
)
SELECT * FROM Descendants
""")

print("All descendants of Anna:")
print(cursor.fetchall())


All descendants of Anna:
[('Beth',), ('Cara',), ('Dana',)]


## 🔹 6.4.4 Outer JOINs

**Definition**: Enhances JOIN by preserving unmatched rows from one or both tables using NULLs.

**Types**:

- LEFT OUTER JOIN: Keeps all records from the left table  
- RIGHT OUTER JOIN: Keeps all from the right  
- FULL OUTER JOIN: Keeps all from both

**Titanic Example**:

Suppose we join passenger info with cabin data.

Many passengers have no recorded cabin → LEFT JOIN ensures all passengers still appear, even if cabin is missing.

**Use case**: Essential in reporting missing or incomplete data (e.g., which passengers have unknown fare or cabin info).

---


In [689]:
# Split Titanic into two parts
cursor.execute("DROP TABLE IF EXISTS Passengers")
cursor.execute("DROP TABLE IF EXISTS Cabins")

cursor.execute("""
CREATE TABLE Passengers AS
SELECT rowid AS id, pclass, sex, age, fare, deck
FROM Titanic
""")

cursor.execute("""
CREATE TABLE Cabins AS
SELECT rowid AS id, deck
FROM Titanic
WHERE deck IS NOT NULL
""")

# LEFT OUTER JOIN: all passengers + any matching cabin (deck) info
cursor.execute("""
SELECT Passengers.id, Passengers.sex, Passengers.age, Cabins.deck
FROM Passengers
LEFT JOIN Cabins ON Passengers.id = Cabins.id
""")

print("LEFT OUTER JOIN — All passengers, with or without cabin info:")
print(cursor.fetchall())


LEFT OUTER JOIN — All passengers, with or without cabin info:
[(1, 'female', 38.0, 'C'), (2, 'female', 35.0, 'C'), (3, 'male', 54.0, 'E'), (4, 'female', 4.0, 'G'), (5, 'female', 58.0, 'C'), (6, 'male', 34.0, 'D'), (7, 'male', 28.0, 'A'), (8, 'male', 19.0, 'C'), (9, 'female', 49.0, 'D'), (10, 'male', 65.0, 'B'), (11, 'male', 45.0, 'C'), (12, 'female', 29.0, 'F'), (13, 'male', 25.0, 'F'), (14, 'female', 23.0, 'C'), (15, 'male', 46.0, 'E'), (16, 'male', 71.0, 'A'), (17, 'male', 23.0, 'D'), (18, 'male', 21.0, 'D'), (19, 'male', 47.0, 'C'), (20, 'male', 24.0, 'B'), (21, 'female', 32.5, 'E'), (22, 'male', 54.0, 'D'), (23, 'female', 19.0, 'D'), (24, 'male', 37.0, 'C'), (25, 'male', 24.0, 'B'), (26, 'male', 36.5, 'F'), (27, 'female', 22.0, 'C'), (28, 'male', 61.0, 'B'), (29, 'male', 56.0, 'A'), (30, 'female', 50.0, 'C'), (31, 'male', 1.0, 'F'), (32, 'male', 3.0, 'F'), (33, 'female', 44.0, 'B'), (34, 'female', 58.0, 'B'), (35, 'female', 2.0, 'G'), (36, 'male', 40.0, 'A'), (37, 'female', 31.0, '

## 🔹 6.4.5 OUTER UNION

**Definition**: Combines two relations with partially matching attributes, filling unmatched parts with NULL.

**Titanic Example**:

Suppose we had two separate tables:

- Passengers(Name, Age, Gender, Embarked)  
- Crew(Name, Age, Role)

OUTER UNION can combine both, keeping shared fields (Name, Age) and filling unmatched (Role, Embarked) with NULLs.

**Use case**: Used when we need a combined view of different but related entities, like all people aboard regardless of role.

---


In [690]:
# Create Passengers table
cursor.execute("DROP TABLE IF EXISTS Passengers")
cursor.execute("""
CREATE TABLE Passengers (
    name TEXT,
    age REAL,
    gender TEXT,
    embarked TEXT
)
""")

cursor.executemany("""
INSERT INTO Passengers (name, age, gender, embarked) VALUES (?, ?, ?, ?)
""", [
    ('Alice', 30, 'female', 'Southampton'),
    ('Bob', 25, 'male', 'Cherbourg')
])

# Create Crew table
cursor.execute("DROP TABLE IF EXISTS Crew")
cursor.execute("""
CREATE TABLE Crew (
    name TEXT,
    age REAL,
    role TEXT
)
""")

cursor.executemany("""
INSERT INTO Crew (name, age, role) VALUES (?, ?, ?)
""", [
    ('Charles', 40, 'Engineer'),
    ('Diana', 35, 'Stewardess')
])

# Simulate OUTER UNION using UNION ALL with matching columns and NULLs
cursor.execute("""
SELECT name, age, gender, embarked, NULL AS role
FROM Passengers
UNION ALL
SELECT name, age, NULL AS gender, NULL AS embarked, role
FROM Crew
""")

print("OUTER UNION (Passengers + Crew):")
cursor.fetchall()


OUTER UNION (Passengers + Crew):


[('Alice', 30.0, 'female', 'Southampton', None),
 ('Bob', 25.0, 'male', 'Cherbourg', None),
 ('Charles', 40.0, None, None, 'Engineer'),
 ('Diana', 35.0, None, None, 'Stewardess')]


## Summary Table

| Operation              | Purpose                                  | Titanic Example                                      |
|------------------------|------------------------------------------|------------------------------------------------------|
| Generalized Projection | Include computed/derived columns         | Fare per person, Age group categories               |
| Aggregates & Grouping  | Summarize using functions (e.g., AVG)    | Avg fare by class, Survival count by gender         |
| Recursive Closure      | Trace hierarchical/recursive relations   | (Hypothetical) Trace multi-level family relationships|
| Outer JOIN             | Include unmatched rows with NULL values  | Passengers without cabin info                        |
| OUTER UNION            | Merge different schemas with overlap     | Merge passengers and crew into one combined manifest |


# 6.5 Example Queries in Relational Algebra (Using Titanic Dataset)

🔹 **Query 1**  
Retrieve the names and ticket numbers of all passengers in first class.

$$
\text{FIRST_CLASS} \leftarrow \sigma_{\text{Pclass}=1}(\text{PASSENGERS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name, Ticket}}(\text{FIRST_CLASS})
$$  

---


In [691]:
cursor.execute("""
SELECT sex, fare
FROM Titanic
WHERE pclass = 1
""")
print(cursor.fetchall())


[('female', 71.2833), ('female', 53.1), ('male', 51.8625), ('female', 26.55), ('male', 35.5), ('male', 263.0), ('female', 76.7292), ('male', 61.9792), ('male', 83.475), ('female', 263.0), ('male', 61.175), ('male', 34.6542), ('male', 63.3583), ('male', 77.2875), ('male', 52.0), ('male', 247.5208), ('male', 77.2875), ('female', 26.2833), ('male', 53.1), ('male', 79.2), ('female', 66.6), ('male', 33.5), ('male', 30.6958), ('female', 28.7125), ('female', 27.7208), ('female', 146.5208), ('male', 31.0), ('female', 113.275), ('female', 76.2917), ('male', 90.0), ('female', 83.475), ('male', 90.0), ('male', 52.5542), ('male', 26.55), ('female', 86.5), ('male', 79.65), ('male', 0.0), ('female', 153.4625), ('female', 135.6333), ('male', 29.7), ('female', 77.9583), ('female', 91.0792), ('female', 151.55), ('female', 247.5208), ('male', 151.55), ('female', 108.9), ('female', 56.9292), ('female', 83.1583), ('female', 262.375), ('female', 164.8667), ('female', 134.5), ('female', 135.6333), ('female'


🔹 **Query 2**  
For every passenger who embarked from ‘Cherbourg’, list their name, age, and fare.

$$
\text{CHERBOURG} \leftarrow \sigma_{\text{Embarked}='C'}(\text{PASSENGERS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name, Age, Fare}}(\text{CHERBOURG})
$$  
This pulls only passengers who boarded at Cherbourg and lists selected attributes.

---


In [692]:
cursor.execute("""
SELECT sex, age, fare
FROM Titanic
WHERE embarked = 'C'
""")
print(cursor.fetchall())


[('female', 38.0, 71.2833), ('female', 49.0, 76.7292), ('male', 65.0, 61.9792), ('male', 71.0, 34.6542), ('male', 23.0, 63.3583), ('male', 24.0, 247.5208), ('male', 24.0, 79.2), ('male', 56.0, 30.6958), ('female', 50.0, 28.7125), ('female', 44.0, 27.7208), ('female', 58.0, 146.5208), ('male', 40.0, 31.0), ('female', 31.0, 113.275), ('female', 32.0, 76.2917), ('male', 37.0, 29.7), ('female', 19.0, 91.0792), ('male', 36.0, 12.875), ('female', 50.0, 247.5208), ('female', 17.0, 108.9), ('female', 30.0, 56.9292), ('female', 24.0, 83.1583), ('female', 18.0, 262.375), ('female', 40.0, 134.5), ('female', 36.0, 135.6333), ('female', 16.0, 57.9792), ('female', 41.0, 134.5), ('female', 60.0, 75.25), ('female', 24.0, 69.3), ('male', 25.0, 55.4417), ('male', 27.0, 211.5), ('female', 23.0, 113.275), ('male', 30.0, 27.75), ('male', 49.0, 89.1042), ('female', 23.0, 13.7917), ('male', 25.0, 91.0792), ('male', 58.0, 29.7), ('female', 54.0, 78.2667), ('male', 18.0, 108.9), ('female', 44.0, 57.9792), ('fe


🔹 **Query 3**  
Find the names of passengers who survived and paid more than 100 for their fare.

$$
\text{SURVIVORS} \leftarrow \sigma_{\text{Survived}=1}(\text{PASSENGERS}) \\
\text{HIGH_FARE} \leftarrow \sigma_{\text{Fare}>100}(\text{SURVIVORS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{HIGH_FARE})
$$  
This filters by survival and then by high fare, retrieving only their names.

---


In [693]:
cursor.execute("""
SELECT sex, age, fare
FROM Titanic
WHERE survived = 1 AND fare > 100
""")
print(cursor.fetchall())


[('female', 23.0, 263.0), ('female', 58.0, 146.5208), ('female', 31.0, 113.275), ('female', 58.0, 153.4625), ('female', 35.0, 135.6333), ('female', 50.0, 247.5208), ('male', 0.92, 151.55), ('female', 17.0, 108.9), ('female', 18.0, 262.375), ('female', 31.0, 164.8667), ('female', 40.0, 134.5), ('female', 36.0, 135.6333), ('female', 41.0, 134.5), ('female', 24.0, 263.0), ('male', 36.0, 120.0), ('female', 23.0, 113.275), ('female', 14.0, 120.0), ('male', 17.0, 110.8833), ('female', 39.0, 110.8833), ('female', 40.0, 153.4625), ('male', 36.0, 512.3292), ('female', 15.0, 211.3375), ('female', 18.0, 227.525), ('female', 38.0, 227.525), ('female', 29.0, 211.3375), ('male', 35.0, 512.3292), ('female', 21.0, 262.375), ('female', 36.0, 120.0), ('female', 43.0, 211.3375), ('male', 11.0, 120.0)]



🔹 **Query 4**  
List the passenger IDs of all children (age < 12) traveling with a parent (same family name and ticket).

$$
\text{CHILDREN} \leftarrow \sigma_{\text{Age}<12}(\text{PASSENGERS}) \\
\text{PARENTS} \leftarrow \sigma_{\text{Age} \geq 18}(\text{PASSENGERS}) \\
\text{FAMILY_GROUPS} \leftarrow \text{CHILDREN} \bowtie_{\text{FamilyName, Ticket}} \text{PARENTS} \\
\text{RESULT} \leftarrow \pi_{\text{PassengerId}}(\text{FAMILY_GROUPS})
$$  
This matches child and adult passengers based on common family name and ticket number to infer parent-child pairs.

---


In [694]:
cursor.execute("""
SELECT age, sex, sibsp, parch
FROM Titanic
WHERE age < 12 AND (sibsp > 0 OR parch > 0)
""")
print(cursor.fetchall())


[(4.0, 'female', 1, 1), (1.0, 'male', 2, 1), (3.0, 'male', 1, 1), (2.0, 'female', 0, 1), (2.0, 'female', 1, 2), (0.92, 'male', 1, 2), (2.0, 'male', 1, 1), (4.0, 'male', 0, 2), (4.0, 'female', 2, 1), (6.0, 'male', 0, 1), (11.0, 'male', 1, 2)]



🔹 **Query 5**  
List the names of passengers who were in third class, male, and did not survive.

$$
\text{FILTERED} \leftarrow \sigma_{\text{Pclass}=3 \land \text{Sex}='male' \land \text{Survived}=0}(\text{PASSENGERS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{FILTERED})
$$  
A compound selection filters passengers by all three conditions.

---


In [695]:
cursor.execute("""
SELECT sex, pclass, survived, fare, age
FROM Titanic
WHERE pclass = 3 AND sex = 'male' AND survived = 0
""")
print(cursor.fetchall())


[('male', 3, 0, 7.65, 25.0), ('male', 3, 0, 7.65, 42.0), ('male', 3, 0, 7.65, 19.0)]



🔹 **Query 6**  
Find passengers who had no cabin assigned.

$$
\text{ALL} \leftarrow \pi_{\text{PassengerId}}(\text{PASSENGERS}) \\
\text{WITH_CABIN} \leftarrow \pi_{\text{PassengerId}}(\sigma_{\text{Cabin IS NOT NULL}}(\text{PASSENGERS})) \\
\text{NO_CABIN} \leftarrow \text{ALL} - \text{WITH_CABIN} \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{NO_CABIN} \bowtie \text{PASSENGERS})
$$  
This uses set difference to identify passengers without cabin info.

---


In [696]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, pclass, fare
FROM Titanic
WHERE deck IS NULL
""")
print(cursor.fetchall())


[]



🔹 **Query 7**  
List names of survivors who were traveling with at least one sibling/spouse.

$$
\text{SURVIVORS} \leftarrow \sigma_{\text{Survived}=1}(\text{PASSENGERS}) \\
\text{WITH_RELATIVES} \leftarrow \sigma_{\text{SibSp}>0}(\text{SURVIVORS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{WITH_RELATIVES})
$$  
Filters survivors who reported siblings/spouses aboard.

---


In [697]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, pclass, fare
FROM Titanic
WHERE survived = 1 AND sibsp > 0
""")
print(cursor.fetchall())


[(1, 'female', 38.0, 1, 71.2833), (2, 'female', 35.0, 1, 53.1), (4, 'female', 4.0, 3, 16.7), (9, 'female', 49.0, 1, 76.7292), (14, 'female', 23.0, 1, 263.0), (27, 'female', 22.0, 1, 66.6), (31, 'male', 1.0, 2, 39.0), (32, 'male', 3.0, 2, 26.0), (37, 'female', 31.0, 1, 113.275), (39, 'male', 38.0, 1, 90.0), (40, 'female', 35.0, 1, 83.475), (42, 'male', 37.0, 1, 52.5542), (51, 'female', 63.0, 1, 77.9583), (52, 'female', 19.0, 1, 91.0792), (56, 'male', 0.92, 1, 151.55), (57, 'female', 17.0, 1, 108.9), (60, 'female', 18.0, 1, 262.375), (62, 'female', 40.0, 1, 134.5), (70, 'male', 2.0, 2, 26.0), (71, 'female', 24.0, 1, 263.0), (74, 'female', 60.0, 1, 75.25), (76, 'male', 25.0, 1, 55.4417), (78, 'male', 36.0, 1, 120.0), (79, 'female', 23.0, 1, 113.275), (81, 'female', 33.0, 1, 90.0), (85, 'female', 14.0, 1, 120.0), (90, 'male', 49.0, 1, 89.1042), (95, 'male', 25.0, 1, 91.0792), (96, 'female', 35.0, 1, 90.0), (99, 'female', 54.0, 1, 78.2667), (113, 'female', 48.0, 1, 39.6), (114, 'female', 39


🔹 **Query 8**  
List names of female passengers who survived and paid less than the average fare.

$$
\text{AVG_FARE} \leftarrow \text{AGG}(\text{Fare}, \text{AVG}) \\
\text{FILTERED} \leftarrow \sigma_{\text{Sex}='female' \land \text{Survived}=1 \land \text{Fare}<\text{AVG_FARE}}(\text{PASSENGERS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{FILTERED})
$$  
Uses aggregate to compute the average fare and filters accordingly.

---


In [698]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE sex = 'female' AND survived = 1
  AND fare < (SELECT AVG(fare) FROM Titanic)
""")
print(cursor.fetchall())


[(1, 'female', 38.0, 71.2833), (2, 'female', 35.0, 53.1), (4, 'female', 4.0, 16.7), (5, 'female', 58.0, 26.55), (9, 'female', 49.0, 76.7292), (12, 'female', 29.0, 10.5), (21, 'female', 32.5, 13.0), (23, 'female', 19.0, 26.2833), (27, 'female', 22.0, 66.6), (33, 'female', 44.0, 27.7208), (38, 'female', 32.0, 76.2917), (51, 'female', 63.0, 77.9583), (58, 'female', 30.0, 56.9292), (64, 'female', 36.0, 13.0), (65, 'female', 16.0, 57.9792), (72, 'female', 24.0, 13.0), (73, 'female', 22.0, 55.0), (74, 'female', 60.0, 75.25), (75, 'female', 24.0, 69.3), (80, 'female', 24.0, 16.7), (94, 'female', 23.0, 13.7917), (99, 'female', 54.0, 78.2667), (105, 'female', 34.0, 10.5), (107, 'female', 44.0, 57.9792), (109, 'female', 22.0, 49.5), (110, 'female', 36.0, 71.0), (113, 'female', 48.0, 39.6), (115, 'female', 53.0, 51.4792), (117, 'female', 39.0, 55.9), (122, 'female', 52.0, 78.2667), (125, 'female', 4.0, 39.0), (128, 'female', 21.0, 77.9583), (131, 'female', 24.0, 69.3), (146, 'female', 24.0, 49.50


🔹 **Query 9**  
List names of passengers who share the same ticket number (possible traveling groups).

$$
\text{GROUPED} \leftarrow \pi_{\text{Ticket}, \text{COUNT(PassengerId)}}(\text{PASSENGERS}) \\
\text{SHARED} \leftarrow \sigma_{\text{COUNT}>1}(\text{GROUPED}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{PASSENGERS} \bowtie \text{SHARED})
$$  
Identifies passengers with shared ticket numbers (likely traveling together).

---


In [699]:
# Grouping by fare and family presence to simulate shared booking
cursor.execute("""
CREATE TEMP VIEW SharedGroup AS
SELECT fare
FROM Titanic
WHERE sibsp > 0
GROUP BY fare
HAVING COUNT(*) > 1
""")

# Passengers in such shared groups
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE fare IN (SELECT fare FROM SharedGroup)
""")

print(cursor.fetchall())


[(2, 'female', 35.0, 53.1), (8, 'male', 19.0, 263.0), (9, 'female', 49.0, 76.7292), (11, 'male', 45.0, 83.475), (14, 'female', 23.0, 263.0), (19, 'male', 47.0, 52.0), (24, 'male', 37.0, 53.1), (26, 'male', 36.5, 26.0), (27, 'female', 22.0, 66.6), (31, 'male', 1.0, 39.0), (32, 'male', 3.0, 26.0), (37, 'female', 31.0, 113.275), (39, 'male', 38.0, 90.0), (40, 'female', 35.0, 83.475), (41, 'male', 44.0, 90.0), (42, 'male', 37.0, 52.5542), (46, 'male', 52.0, 79.65), (51, 'female', 63.0, 77.9583), (52, 'female', 19.0, 91.0792), (54, 'female', 2.0, 151.55), (56, 'male', 0.92, 151.55), (57, 'female', 17.0, 108.9), (60, 'female', 18.0, 262.375), (68, 'male', 29.0, 66.6), (70, 'male', 2.0, 26.0), (71, 'female', 24.0, 263.0), (78, 'male', 36.0, 120.0), (79, 'female', 23.0, 113.275), (81, 'female', 33.0, 90.0), (84, 'male', 50.0, 55.9), (85, 'female', 14.0, 120.0), (86, 'male', 64.0, 263.0), (95, 'male', 25.0, 91.0792), (96, 'female', 35.0, 90.0), (99, 'female', 54.0, 78.2667), (100, 'female', 25.


🔹 **Query 10**  
Find the names of adult passengers who traveled alone (no siblings/spouses or parents/children).

$$
\text{ADULTS} \leftarrow \sigma_{\text{Age} \geq 18}(\text{PASSENGERS}) \\
\text{ALONE} \leftarrow \sigma_{\text{SibSp}=0 \land \text{Parch}=0}(\text{ADULTS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{ALONE})
$$  
Combines age and relational info to find solo adult travelers.


In [700]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE age >= 18 AND sibsp = 0 AND parch = 0
""")
print(cursor.fetchall())


[(3, 'male', 54.0, 51.8625), (5, 'female', 58.0, 26.55), (6, 'male', 34.0, 13.0), (7, 'male', 28.0, 35.5), (12, 'female', 29.0, 10.5), (13, 'male', 25.0, 7.65), (16, 'male', 71.0, 34.6542), (19, 'male', 47.0, 52.0), (21, 'female', 32.5, 13.0), (25, 'male', 24.0, 79.2), (28, 'male', 61.0, 33.5), (29, 'male', 56.0, 30.6958), (30, 'female', 50.0, 28.7125), (33, 'female', 44.0, 27.7208), (34, 'female', 58.0, 146.5208), (36, 'male', 40.0, 31.0), (38, 'female', 32.0, 76.2917), (44, 'male', 62.0, 26.55), (45, 'female', 30.0, 86.5), (47, 'male', 40.0, 0.0), (49, 'female', 35.0, 135.6333), (53, 'male', 36.0, 12.875), (58, 'female', 30.0, 56.9292), (59, 'female', 24.0, 83.1583), (63, 'female', 36.0, 135.6333), (64, 'female', 36.0, 13.0), (66, 'male', 45.5, 28.5), (69, 'female', 41.0, 134.5), (72, 'female', 24.0, 13.0), (75, 'female', 24.0, 69.3), (82, 'male', 32.0, 8.05), (83, 'male', 28.0, 26.55), (88, 'male', 52.0, 30.5), (89, 'male', 30.0, 27.75), (91, 'male', 65.0, 26.55), (92, 'male', 48.0,

# Relational Algebra Example Queries — Applied to the Titanic Dataset

These examples demonstrate how to apply core relational algebra operations using the Titanic dataset instead of the company database. Each query shows a real-world interpretation of the data and how it could be expressed using relational algebra principles.

---

## 🔹 Query 1  
**Retrieve the names and ticket numbers of all passengers in First Class.**  
We identify passengers where the class is listed as “1” (indicating First Class), and return only their names and ticket numbers.

**Relational Algebra Steps**:
$$
\text{RESULT} \leftarrow \pi_{\text{Name, Ticket}}(\sigma_{\text{Pclass}=1}(\text{PASSENGERS}))
$$

---



In [701]:
cursor.execute("""
SELECT rowid AS passenger_id, class
FROM Titanic
WHERE class = 'First'
""")
print(cursor.fetchall())


[(1, 'First'), (2, 'First'), (3, 'First'), (5, 'First'), (7, 'First'), (8, 'First'), (9, 'First'), (10, 'First'), (11, 'First'), (14, 'First'), (15, 'First'), (16, 'First'), (17, 'First'), (18, 'First'), (19, 'First'), (20, 'First'), (22, 'First'), (23, 'First'), (24, 'First'), (25, 'First'), (27, 'First'), (28, 'First'), (29, 'First'), (30, 'First'), (33, 'First'), (34, 'First'), (36, 'First'), (37, 'First'), (38, 'First'), (39, 'First'), (40, 'First'), (41, 'First'), (42, 'First'), (44, 'First'), (45, 'First'), (46, 'First'), (47, 'First'), (48, 'First'), (49, 'First'), (50, 'First'), (51, 'First'), (52, 'First'), (54, 'First'), (55, 'First'), (56, 'First'), (57, 'First'), (58, 'First'), (59, 'First'), (60, 'First'), (61, 'First'), (62, 'First'), (63, 'First'), (65, 'First'), (66, 'First'), (67, 'First'), (68, 'First'), (69, 'First'), (71, 'First'), (73, 'First'), (74, 'First'), (75, 'First'), (76, 'First'), (77, 'First'), (78, 'First'), (79, 'First'), (81, 'First'), (83, 'First'), (


## 🔹 Query 2  
**For every passenger who embarked from Cherbourg, list their name, age, and fare.**  
This query filters passengers based on the port of embarkation (Embarked = 'C' for Cherbourg) and returns the relevant personal and financial details.

**Relational Algebra Steps**:
$$
\text{RESULT} \leftarrow \pi_{\text{Name, Age, Fare}}(\sigma_{\text{Embarked}='C'}(\text{PASSENGERS}))
$$

---


In [702]:
cursor.execute("""
SELECT rowid AS passenger_id, age, fare
FROM Titanic
WHERE embarked = 'C'
""")
print(cursor.fetchall())


[(1, 38.0, 71.2833), (9, 49.0, 76.7292), (10, 65.0, 61.9792), (16, 71.0, 34.6542), (17, 23.0, 63.3583), (20, 24.0, 247.5208), (25, 24.0, 79.2), (29, 56.0, 30.6958), (30, 50.0, 28.7125), (33, 44.0, 27.7208), (34, 58.0, 146.5208), (36, 40.0, 31.0), (37, 31.0, 113.275), (38, 32.0, 76.2917), (50, 37.0, 29.7), (52, 19.0, 91.0792), (53, 36.0, 12.875), (55, 50.0, 247.5208), (57, 17.0, 108.9), (58, 30.0, 56.9292), (59, 24.0, 83.1583), (60, 18.0, 262.375), (62, 40.0, 134.5), (63, 36.0, 135.6333), (65, 16.0, 57.9792), (69, 41.0, 134.5), (74, 60.0, 75.25), (75, 24.0, 69.3), (76, 25.0, 55.4417), (77, 27.0, 211.5), (79, 23.0, 113.275), (89, 30.0, 27.75), (90, 49.0, 89.1042), (94, 23.0, 13.7917), (95, 25.0, 91.0792), (97, 58.0, 29.7), (99, 54.0, 78.2667), (102, 18.0, 108.9), (107, 44.0, 57.9792), (109, 22.0, 49.5), (111, 50.0, 106.425), (112, 17.0, 110.8833), (113, 48.0, 39.6), (118, 39.0, 110.8833), (119, 36.0, 40.125), (121, 60.0, 79.2), (122, 52.0, 78.2667), (123, 49.0, 56.9292), (130, 32.0, 30.5


## 🔹 Query 3  
**Find the names of passengers who survived and paid more than 100 in fare.**  
Here we apply two conditions — survival and fare amount — and return only the names of passengers who meet both.

**Relational Algebra Steps**:
$$
\text{RESULT} \leftarrow \pi_{\text{Name}}(\sigma_{\text{Survived}=1 \land \text{Fare}>100}(\text{PASSENGERS}))
$$

---


In [703]:
cursor.execute("""
SELECT rowid AS passenger_id
FROM Titanic
WHERE survived = 1 AND fare > 100
""")
print(cursor.fetchall())


[(14,), (34,), (37,), (48,), (49,), (55,), (56,), (57,), (60,), (61,), (62,), (63,), (69,), (71,), (78,), (79,), (85,), (112,), (118,), (124,), (137,), (139,), (143,), (149,), (152,), (153,), (155,), (160,), (163,), (168,)]



## 🔹 Query 4  
**List the names of children (under 12 years old) who were traveling with someone else on the same ticket.**  
Passengers with the same ticket are likely in the same travel group. This query finds children traveling in such groups.

**Relational Algebra Steps**:
$$
\text{CHILDREN} \leftarrow \sigma_{\text{Age}<12}(\text{PASSENGERS}) \\
\text{GROUP_TICKETS} \leftarrow \pi_{\text{Ticket}}(\sigma_{\text{COUNT}>1}(\gamma_{\text{Ticket}, \text{COUNT(*)}}(\text{PASSENGERS}))) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{CHILDREN} \bowtie \text{GROUP_TICKETS})
$$

---


In [704]:
# Children under 12 who are not alone (sibsp > 0 or parch > 0)
cursor.execute("""
SELECT rowid AS passenger_id, age, sibsp, parch
FROM Titanic
WHERE age < 12 AND (sibsp > 0 OR parch > 0)
""")
print(cursor.fetchall())


[(4, 4.0, 1, 1), (31, 1.0, 2, 1), (32, 3.0, 1, 1), (35, 2.0, 0, 1), (54, 2.0, 1, 2), (56, 0.92, 1, 2), (70, 2.0, 1, 1), (87, 4.0, 0, 2), (125, 4.0, 2, 1), (158, 6.0, 0, 1), (168, 11.0, 1, 2)]



## 🔹 Query 5  
**List the names of male passengers in third class who did not survive.**  
We combine three conditions: gender (Sex = 'male'), class (Pclass = 3), and survival (Survived = 0).

**Relational Algebra Steps**:
$$
\text{RESULT} \leftarrow \pi_{\text{Name}}(\sigma_{\text{Pclass}=3 \land \text{Sex}='male' \land \text{Survived}=0}(\text{PASSENGERS}))
$$

---


In [705]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE pclass = 3 AND sex = 'male' AND survived = 0
""")

results = cursor.fetchall()
for row in results:
    print(row)


(13, 'male', 25.0, 7.65)
(142, 'male', 42.0, 7.65)
(148, 'male', 19.0, 7.65)



## 🔹 Query 6  
**Retrieve the names of passengers who had no cabin assigned.**  
This identifies passengers whose Cabin value is null or empty — likely those in lower class or with less information recorded.

 **Relational Algebra Steps**:
$$
\text{WITH_CABIN} \leftarrow \pi_{\text{PassengerId}}(\sigma_{\text{Cabin IS NOT NULL}}(\text{PASSENGERS})) \\
\text{ALL} \leftarrow \pi_{\text{PassengerId}}(\text{PASSENGERS}) \\
\text{NO_CABIN} \leftarrow \text{ALL} - \text{WITH_CABIN} \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{NO_CABIN} \bowtie \text{PASSENGERS})
$$

---


In [706]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE deck IS NULL
""")

results = cursor.fetchall()
for row in results:
    print(row)



## 🔹 Query 7  
**List the names of survivors who were traveling with at least one sibling or spouse.**  
This query finds survivors who indicated having relatives (siblings or spouse) aboard.

**Relational Algebra Steps**:
$$
\text{RESULT} \leftarrow \pi_{\text{Name}}(\sigma_{\text{Survived}=1 \land \text{SibSp}>0}(\text{PASSENGERS}))
$$

---


In [707]:
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE survived = 1 AND sibsp > 0
""")

results = cursor.fetchall()
for row in results:
    print(row)


(1, 'female', 38.0, 71.2833)
(2, 'female', 35.0, 53.1)
(4, 'female', 4.0, 16.7)
(9, 'female', 49.0, 76.7292)
(14, 'female', 23.0, 263.0)
(27, 'female', 22.0, 66.6)
(31, 'male', 1.0, 39.0)
(32, 'male', 3.0, 26.0)
(37, 'female', 31.0, 113.275)
(39, 'male', 38.0, 90.0)
(40, 'female', 35.0, 83.475)
(42, 'male', 37.0, 52.5542)
(51, 'female', 63.0, 77.9583)
(52, 'female', 19.0, 91.0792)
(56, 'male', 0.92, 151.55)
(57, 'female', 17.0, 108.9)
(60, 'female', 18.0, 262.375)
(62, 'female', 40.0, 134.5)
(70, 'male', 2.0, 26.0)
(71, 'female', 24.0, 263.0)
(74, 'female', 60.0, 75.25)
(76, 'male', 25.0, 55.4417)
(78, 'male', 36.0, 120.0)
(79, 'female', 23.0, 113.275)
(81, 'female', 33.0, 90.0)
(85, 'female', 14.0, 120.0)
(90, 'male', 49.0, 89.1042)
(95, 'male', 25.0, 91.0792)
(96, 'female', 35.0, 90.0)
(99, 'female', 54.0, 78.2667)
(113, 'female', 48.0, 39.6)
(114, 'female', 39.0, 79.65)
(115, 'female', 53.0, 51.4792)
(117, 'female', 39.0, 55.9)
(118, 'female', 39.0, 110.8833)
(121, 'male', 60.0, 79.


## 🔹 Query 8  
**List the names of female survivors who paid below the average fare.**  
This combines gender, survival status, and a computed average of the Fare column.

**Relational Algebra Steps**:
$$
\text{AVG_FARE} \leftarrow \text{AGG(Fare, AVG)} \\
\text{FILTERED} \leftarrow \sigma_{\text{Sex}='female' \land \text{Survived}=1 \land \text{Fare}<\text{AVG_FARE}}(\text{PASSENGERS}) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{FILTERED})
$$

---


In [708]:
# Step 1: Get the average fare
cursor.execute("SELECT AVG(fare) FROM Titanic WHERE fare IS NOT NULL")
avg_fare = cursor.fetchone()[0]

# Step 2: Get female survivors who paid less than the average fare
cursor.execute("""
SELECT rowid AS passenger_id, sex, age, fare
FROM Titanic
WHERE sex = 'female'
  AND survived = 1
  AND fare < ?
""", (avg_fare,))

# Step 3: Fetch and display results
results = cursor.fetchall()
for row in results:
    print(row)


(1, 'female', 38.0, 71.2833)
(2, 'female', 35.0, 53.1)
(4, 'female', 4.0, 16.7)
(5, 'female', 58.0, 26.55)
(9, 'female', 49.0, 76.7292)
(12, 'female', 29.0, 10.5)
(21, 'female', 32.5, 13.0)
(23, 'female', 19.0, 26.2833)
(27, 'female', 22.0, 66.6)
(33, 'female', 44.0, 27.7208)
(38, 'female', 32.0, 76.2917)
(51, 'female', 63.0, 77.9583)
(58, 'female', 30.0, 56.9292)
(64, 'female', 36.0, 13.0)
(65, 'female', 16.0, 57.9792)
(72, 'female', 24.0, 13.0)
(73, 'female', 22.0, 55.0)
(74, 'female', 60.0, 75.25)
(75, 'female', 24.0, 69.3)
(80, 'female', 24.0, 16.7)
(94, 'female', 23.0, 13.7917)
(99, 'female', 54.0, 78.2667)
(105, 'female', 34.0, 10.5)
(107, 'female', 44.0, 57.9792)
(109, 'female', 22.0, 49.5)
(110, 'female', 36.0, 71.0)
(113, 'female', 48.0, 39.6)
(115, 'female', 53.0, 51.4792)
(117, 'female', 39.0, 55.9)
(122, 'female', 52.0, 78.2667)
(125, 'female', 4.0, 39.0)
(128, 'female', 21.0, 77.9583)
(131, 'female', 24.0, 69.3)
(146, 'female', 24.0, 49.5042)
(150, 'female', 27.0, 10.5)
(1


## 🔹 Query 9  
**Find the names of passengers traveling in a group (same ticket used by more than one person).**  
Useful for identifying family or group bookings.

**Relational Algebra Steps**:
$$
\text{DUPLICATE_TICKETS} \leftarrow \sigma_{\text{COUNT}>1}(\gamma_{\text{Ticket}, \text{COUNT(*)}}(\text{PASSENGERS})) \\
\text{RESULT} \leftarrow \pi_{\text{Name}}(\text{PASSENGERS} \bowtie \text{DUPLICATE_TICKETS})
$$

---


In [709]:
# Group by fare, pclass, embark_town — assuming group travelers share these
cursor.execute("""
    SELECT fare, pclass, embark_town
    FROM Titanic
    WHERE fare IS NOT NULL AND embark_town IS NOT NULL
    GROUP BY fare, pclass, embark_town
    HAVING COUNT(*) > 1
""")
groups = cursor.fetchall()

# Step 2: Get passengers with matching group signatures
grouped_passengers = []
for fare, pclass, town in groups:
    cursor.execute("""
        SELECT rowid AS passenger_id, sex, age, pclass, fare, embark_town
        FROM Titanic
        WHERE fare = ? AND pclass = ? AND embark_town = ?
    """, (fare, pclass, town))
    grouped_passengers.extend(cursor.fetchall())

# Step 3: Display results
for row in grouped_passengers:
    print(row)


(47, 'male', 40.0, 1, 0.0, 'Southampton')
(169, 'male', 39.0, 1, 0.0, 'Southampton')
(13, 'male', 25.0, 3, 7.65, 'Southampton')
(142, 'male', 42.0, 3, 7.65, 'Southampton')
(148, 'male', 19.0, 3, 7.65, 'Southampton')
(35, 'female', 2.0, 3, 10.4625, 'Southampton')
(43, 'female', 29.0, 3, 10.4625, 'Southampton')
(12, 'female', 29.0, 2, 10.5, 'Southampton')
(105, 'female', 34.0, 2, 10.5, 'Southampton')
(150, 'female', 27.0, 2, 10.5, 'Southampton')
(162, 'female', 57.0, 2, 10.5, 'Southampton')
(158, 'male', 6.0, 3, 12.475, 'Southampton')
(172, 'female', 27.0, 3, 12.475, 'Southampton')
(6, 'male', 34.0, 2, 13.0, 'Southampton')
(21, 'female', 32.5, 2, 13.0, 'Southampton')
(64, 'female', 36.0, 2, 13.0, 'Southampton')
(72, 'female', 24.0, 2, 13.0, 'Southampton')
(4, 'female', 4.0, 3, 16.7, 'Southampton')
(80, 'female', 24.0, 3, 16.7, 'Southampton')
(167, 'female', 49.0, 1, 25.9292, 'Southampton')
(176, 'female', 48.0, 1, 25.9292, 'Southampton')
(26, 'male', 36.5, 2, 26.0, 'Southampton')
(32, 'm


## 🔹 Query 10  
**List the names of adults (18 or older) who were traveling alone (no siblings/spouse or parents/children on board).**  
We define “alone” as having SibSp = 0 and Parch = 0.

**Relational Algebra Steps**:
$$
\text{RESULT} \leftarrow \pi_{\text{Name}}(\sigma_{\text{Age} \geq 18 \land \text{SibSp}=0 \land \text{Parch}=0}(\text{PASSENGERS}))
$$

In [710]:
cursor.execute("""
    SELECT rowid AS passenger_id, age, sibsp, parch
    FROM Titanic
    WHERE age >= 18 AND sibsp = 0 AND parch = 0
""")
results = cursor.fetchall()

for row in results:
    print(row)


(3, 54.0, 0, 0)
(5, 58.0, 0, 0)
(6, 34.0, 0, 0)
(7, 28.0, 0, 0)
(12, 29.0, 0, 0)
(13, 25.0, 0, 0)
(16, 71.0, 0, 0)
(19, 47.0, 0, 0)
(21, 32.5, 0, 0)
(25, 24.0, 0, 0)
(28, 61.0, 0, 0)
(29, 56.0, 0, 0)
(30, 50.0, 0, 0)
(33, 44.0, 0, 0)
(34, 58.0, 0, 0)
(36, 40.0, 0, 0)
(38, 32.0, 0, 0)
(44, 62.0, 0, 0)
(45, 30.0, 0, 0)
(47, 40.0, 0, 0)
(49, 35.0, 0, 0)
(53, 36.0, 0, 0)
(58, 30.0, 0, 0)
(59, 24.0, 0, 0)
(63, 36.0, 0, 0)
(64, 36.0, 0, 0)
(66, 45.5, 0, 0)
(69, 41.0, 0, 0)
(72, 24.0, 0, 0)
(75, 24.0, 0, 0)
(82, 32.0, 0, 0)
(83, 28.0, 0, 0)
(88, 52.0, 0, 0)
(89, 30.0, 0, 0)
(91, 65.0, 0, 0)
(92, 48.0, 0, 0)
(93, 47.0, 0, 0)
(94, 23.0, 0, 0)
(97, 58.0, 0, 0)
(98, 55.0, 0, 0)
(103, 36.0, 0, 0)
(104, 47.0, 0, 0)
(105, 34.0, 0, 0)
(106, 30.0, 0, 0)
(108, 45.0, 0, 0)
(116, 36.0, 0, 0)
(119, 36.0, 0, 0)
(124, 40.0, 0, 0)
(127, 61.0, 0, 0)
(128, 21.0, 0, 0)
(129, 80.0, 0, 0)
(130, 32.0, 0, 0)
(131, 24.0, 0, 0)
(133, 56.0, 0, 0)
(135, 47.0, 0, 0)
(138, 27.0, 0, 0)
(142, 42.0, 0, 0)
(144, 35.0, 0, 0)


# 6.7 Domain Relational Calculus (Titanic Dataset Version)

**Domain relational calculus (DRC)** is a type of declarative query language used in databases, focusing on *what* data to retrieve rather than *how* to retrieve it.  
Unlike tuple relational calculus, which uses variables that represent entire rows (tuples), DRC uses variables that stand for individual attribute values—like a single passenger's age or name.

---

## 🔸 Structure of a Domain Calculus Expression

The general form is:

$$
\{x_1, x_2, ..., x_n \mid \text{Condition}(x_1, x_2, ..., x_n)\}
$$

- The $x$'s represent domain variables—each tied to one attribute such as `name`, `age`, `sex`, `fare`, or `class`.  
- The condition defines logical constraints that must be true for values to be included in the result.

---

### 🔹 Variable Mapping Example (Titanic Dataset)
For example, when querying the Titanic dataset, each variable could represent:

- $x_1$: passenger name  
- $x_2$: sex  
- $x_3$: age  
- $x_4$: passenger class  
- $x_5$: fare  
- $x_6$: survival status  

---

### 🔹 Domain Relational Calculus (DRC)

{x1, x2, x3, x4, x5, x6 |  
 Titanic(x1, x2, x3, x4, x5, x6) ∧  
 x2 = 'female' ∧ x4 = 1 ∧ x6 = 1 ∧ x5 > 50}


In [711]:
# DRC-mapped SQL query in SQLite
query = """
SELECT rowid AS x1, sex AS x2, age AS x3, pclass AS x4, fare AS x5, survived AS x6
FROM Titanic
WHERE sex = 'female' AND pclass = 1 AND survived = 1 AND fare > 50
"""

# Execute and display
df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x1,x2,x3,x4,x5,x6
0,1,female,38.0,1,71.2833,1
1,2,female,35.0,1,53.1,1
2,9,female,49.0,1,76.7292,1
3,14,female,23.0,1,263.0,1
4,27,female,22.0,1,66.6,1



## 🔸 How Conditions Are Written

A condition is made up of *atoms*, which are simple logical statements. These can be:

- **Membership**:  
  Example:  
  $$
  \text{Passenger}(x_1, x_2, x_3, ..., x_n)
  $$  
  ⟶ This means the set of values appears as a row in the Titanic table.

- **Comparison between variables**:  
  Example:  
  $$
  x_3 < x_5
  $$  
  (compares age and fare — hypothetical)

- **Comparison with constants**:  
  Example:  
  $$
  x_2 = \text{'female'}
  $$

---



### 🔹 Domain Relational Calculus (DRC)

{x1, x2, x3, x4, x5, x6 |  
 Titanic(x1, x2, x3, x4, x5, x6) ∧  
 x2 = 'female' ∧ x4 = 1 ∧ x6 = 1 ∧ x5 > 50}


In [712]:
query = """
SELECT rowid AS x1, sex AS x2, age AS x3, pclass AS x4, fare AS x5, survived AS x6
FROM Titanic
WHERE sex = 'female' AND pclass = 1 AND survived = 1 AND fare > 50
"""

df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x1,x2,x3,x4,x5,x6
0,1,female,38.0,1,71.2833,1
1,2,female,35.0,1,53.1,1
2,9,female,49.0,1,76.7292,1
3,14,female,23.0,1,263.0,1
4,27,female,22.0,1,66.6,1



## Titanic-Themed Examples

### Example 1: List names and ages of all female passengers.
- Define variables for name, sex, and age.
- Apply filter:  
  $$
  x_2 = \text{'female'}
  $$



In [713]:
pd.read_sql_query("PRAGMA table_info(Titanic);", conn)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,survived,INTEGER,0,,0
1,1,pclass,INTEGER,0,,0
2,2,sex,TEXT,0,,0
3,3,age,REAL,0,,0
4,4,sibsp,INTEGER,0,,0
5,5,parch,INTEGER,0,,0
6,6,fare,REAL,0,,0
7,7,embarked,TEXT,0,,0
8,8,class,TEXT,0,,0
9,9,who,TEXT,0,,0


### 🔹 Domain Relational Calculus (DRC)

{x1, x3 |  
 Titanic(x1, x2, x3, x4, x5, x6) ∧  
 x2 = 'female'}


In [714]:
query = """
SELECT rowid AS x1, age AS x3
FROM Titanic
WHERE sex = 'female'
"""
df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x1,x3
0,1,38.0
1,2,35.0
2,4,4.0
3,5,58.0
4,9,49.0


### Example 2: Find the class and fare of passengers who survived.
- Use variables for class, fare, and survived.
- Condition:  
  $$
  x_6 = 1
  $$


### 🔹 Domain Relational Calculus (DRC)

{x4, x5 |  
 Titanic(x1, x2, x3, x4, x5, x6) ∧  
 x6 = 1}


In [715]:
query = """
SELECT pclass AS x4, fare AS x5
FROM Titanic
WHERE survived = 1
"""
df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x4,x5
0,1,71.2833
1,1,53.1
2,3,16.7
3,1,26.55
4,2,13.0



### Example 3: Retrieve names of passengers under 12 years old.
- Variables: name, age  
- Condition:  
  $$
  x_3 < 12
  $$



### 🔹 Domain Relational Calculus (DRC)

{x1, x3 |  
 Titanic(x1, x2, x3, x4, x5, x6) ∧  
 x3 < 12}


In [716]:
query = """
SELECT rowid AS x1, age AS x3
FROM Titanic
WHERE age < 12
"""
df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x1,x3
0,4,4.0
1,31,1.0
2,32,3.0
3,35,2.0
4,54,2.0


### Example 4: List names of passengers who were not assigned a cabin.
- Condition:  
  $$
  x_{\text{cabin}} = \text{NULL} \quad \text{or} \quad x_{\text{cabin}} = ''
  $$



### 🔹 Domain Relational Calculus (DRC)

{x1, xdeck |  
 Titanic(x1, x2, x3, x4, x5, x6, ..., xdeck, ...) ∧  
 (xdeck = NULL ∨ xdeck = '')}


In [717]:
query = """
SELECT rowid AS x1, deck AS xdeck
FROM Titanic
WHERE deck IS NULL OR deck = ''
"""
df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x1,xdeck


### Example 5: Identify all combinations of name and ticket number where the passenger paid more than 100.
- Variables: name, ticket, fare  
- Condition:  
  $$
  x_5 > 100
  $$

---


### 🔹 Domain Relational Calculus (DRC)

{x1, x5 |  
 Titanic(x1, x2, x3, x4, x5, x6, ...) ∧  
 x5 > 100}


In [718]:
query = """
SELECT rowid AS x1, fare AS x5
FROM Titanic
WHERE fare > 100
"""
df_result = pd.read_sql_query(query, conn)
df_result.head()


Unnamed: 0,x1,x5
0,8,263.0
1,14,263.0
2,20,247.5208
3,34,146.5208
4,37,113.275



## QBE and Domain Calculus

**Query-By-Example (QBE)** is a visual query tool developed by IBM, inspired by domain relational calculus.

- In QBE, you don’t write logical expressions.
- Instead, you **fill in example values** in a visual, table-like form.
- It lets users query without writing SQL statements, functioning like a spreadsheet form for data filtering.

**DRC Expression:**

$$
\{x_1, x_3 \mid \text{Titanic}(x_1, x_2, x_3, \ldots) \land x_2 = \text{'female'}\}
$$


In [719]:
query1 = """
SELECT rowid AS id, sex, age
FROM Titanic
WHERE sex = 'female'
"""
df1 = pd.read_sql_query(query1, conn)
df1.head()


Unnamed: 0,id,sex,age
0,1,female,38.0
1,2,female,35.0
2,4,female,4.0
3,5,female,58.0
4,9,female,49.0


# 📘 6.8 Summary (Titanic Dataset Adaptation)

In this section, we explored two foundational languages used in the relational model: **relational algebra** and **relational calculus**. These are formal ways of expressing queries—questions we ask of a dataset like the Titanic passenger manifest—in order to generate meaningful answers.

---

##  Relational Algebra (Sections 6.1–6.4)

We began by discussing the core operations of relational algebra, which describe how to retrieve data in step-by-step fashion.

- **Selection** ($\sigma$) and **projection** ($\pi$) are used to filter and display specific rows or columns.  
  *Example:* You could select passengers who survived, or project only the `"Name"` and `"Age"` columns.

- **Renaming** allows for clarity in complex expressions by giving temporary names to results.

### 🔹 Set Operations
These treat relations like mathematical sets:

- **Union:** Combines data (e.g., merging passengers from two different Titanic classes).
- **Intersection** and **difference:** Identify shared or unique records (e.g., passengers who were both in first class and survived).
- **Cartesian product:** Combines every row from one relation with every row from another. Often a step toward joins.

### 🔹 Joins
- **Theta Join**, **Equijoin**, and **Natural Join**: Used to connect data across tables.  
  *Example:* Join passenger data with survival rates by class and gender.

### 🔹 Query Trees
Visual and systematic representations of how queries are constructed and evaluated.

---

##  Enhanced Relational Operations

In practice, queries often need more power than what basic algebra provides:

- **Generalized projection:** Supports calculated values, e.g., fare per person if multiple passengers shared a ticket.
- **Aggregate functions:** `COUNT`, `AVG`, `SUM`, etc.  
  *Example:* Find average age of survivors or total fare from third-class passengers.
- **Recursive queries:** Useful for finding travel groups or family links across multiple rows.
- **Outer joins / outer unions:** Retain unmatched data in results.  
  *Example:* List all passengers even if cabin number is unknown.

---

##  Relational Calculus (Sections 6.6–6.7)

We then turned to **relational calculus**, a more **declarative** language. Instead of specifying *how* to retrieve data, it describes *what* data is desired.

###  Two Main Types:

#### 1. **Tuple Relational Calculus (TRC)**
- Variables represent rows of the Titanic table.
- *Example:* "There exists a passenger such that they are female and survived."  
- Great for logic-based conditions.

#### 2. **Domain Relational Calculus (DRC)**
- Variables represent individual values (columns).  
- *Example:* Define variables for `name`, `age`, and `survived`; then apply condition like `age < 12`.

- Inspired early visual query tools like **Query-By-Example (QBE)**.

---

### ∃ and ∀ Quantifiers

- **Existential Quantifier** ($\exists$): "There exists"  
  *Used to find passengers who meet a condition.*

- **Universal Quantifier** ($\forall$): "For all"  
  *Used to express conditions like "all passengers under 10 were in third class."*

- Both help define **safe queries**, ensuring finite, meaningful results—crucial for large datasets like the Titanic.

---

##  Final Thoughts

- **Relational algebra**: tells the system **how** to answer a query step by step.
- **Relational calculus**: describes **what** data is desired.

Both are foundational to query languages like **SQL**.

When applied to the Titanic dataset, these formal methods uncover meaningful insights—identifying survivors, comparing fares, grouping passengers by class or age, and linking travel companions or families.

