# RDBMS Part 1

## Topics
* Relational Model Overview (Recap)
* Relational Algebra
 * Operations
   * Set Theory (Recap)
   * Operations on Relations
---

# Relational Model Overview

Wikipedia definition of Relational Model:
> The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations.

The key concepts, that form data modeling are:
* listed in the following table below:

 | Formal Relational Term | Informal Equivalants |
 |:---|:---|
 | relation | table |
 | attribute | column or field |
 | tuple | row or record |
 | cardinality | number of rows |
 | degree | number of columns |
 | domain | pool of legal or atomic values |
 | key | unique identifier |

![Relational_model_concepts.png](attachment:13f976eb-05a0-48a5-9d58-32b6a388c619.png)

* **set theory** for data manipulation

![a_intersect_b.png](attachment:13e6cc3e-2f42-4ae3-8f84-ed10da28c572.png)

* **first order predicate logic** for defining integrity constraints

 **Example** of first order predicate logic<br>
 $ \exists x.(physical\_object(x) \wedge house(x)) $

 where the predicate translates to: "Some physical objects are houses" <br>
 The symbols represents:
 * $ \exists $ - existential quantification, meaning "for some", "there exists", "there is one", etc
 * $ \wedge $ - logical symbol for "and", in set theory, it means conjunction
 
 **Example** of integrity constraints<br>
 `<relation>.<attribute> >= 50`

## Some Terminology
* **Data** or **Entity** - this is the operational data that needs to be represented in the database.
* **Relation** - the characteristics of a table
  * **row** or **tuple** represents a *relationship* among a set of values of the entity. The order of how the tuples appear in the table is **not important**
  * **column** is an **attribute** of the entity within a certain data domain
  * all values in an attribute have the same datatype
  * each attribute has a unique name
  * no 2 tuples can be identical
  
  ![table_props.PNG](attachment:893e6dae-c32b-48fa-b686-97b36ec0ad48.PNG)

* **Relational schema/schema** - is the structure of relation (table) which is defined as `R(A1:D1, A2:D2, …, An:Dn)` where<br>
 * `An` - name of the *nth* attribute
 * `Dn` - data domain (aka the set of values with the same datatype and within the scope of the attribute's domain) <br>eg: a *salary* attribute will contain all possible salary values of type integer or float
 
 **Pictorial Schema**<br>
 ![simple_schema.png](attachment:075c4d69-2e92-465d-aca9-8429d3470b0b.png)
 
 **Text-based Schema**
 
 ```
 Student(sID CHAR(8), sName CHAR(50), gender CHAR(1), age INT, 
         dID CHAR(2), grade CHAR(2))
 Dept(dID CHAR(2), dName CHAR(20), dean CHAR(50))
 Course(cID CHAR(3), cName CHAR(50), hours INT, credit INT, iID CHAR(3))
 Instructor(iID CHAR(3), iName CHAR(50), dID CHAR(2), workload FLOAT)
 RC(sID CHAR(8), cID CHAR(3), score FLOAT)
 ```


* **Atomic** or **First Normal Form (1NF)** - refers to the elements of the domain that are considered to be indivisible units. An indivisible unit means that the data in the column cannot be further broken down into more specific columns.

 **Example: The Telephone Number column violates the 1NF**<br>
 ![first_nf_violation.PNG](attachment:f25239ae-ed41-45c8-8de2-45c01f59eb2d.PNG)
 
 **2 Solutions**<br>
 ![first_nf_soln_1.PNG](attachment:6aa7df3e-f317-4c12-a832-9da02a8651e8.PNG)<br>
 ![first_nf_soln_2.PNG](attachment:17d8b104-2a81-4ae2-8e5c-cb932090c058.PNG)

* **Keys** - used to identify **uniquely identify** each tuple (row). 2 types: 
 * **Superkey** which is a combination of one or more attributes (columns) that when taken collectively is able to uniquely identify tuple (row) in a table
 * **Candidate Key** which is a superkey with the minimum number of columns, typically they are the ID columns of the tables.
   * **Primary Key** is a candidate key that is chosen by the database designer as the principal means of identifying rows within a table. Because DBMS uses primary keys to manage rows in a table, no 2 rows can have the same value on primary key attributes.
* **Foreign Key** - refers to a table that (has among its attributes) the primary key of another relation. This attribute is thus called the foreign key. <br>
 ![foreign_key.png](attachment:42e0ea39-400e-4534-8fa5-0444d2bddce9.png)


* **Data Integrity** - Data integrity refers to the correctness and consistency of data in stored in a database. 3 Types:
 * **Entity integrity** - primary keys cannot have `NULL` and no 2 rows can have the same primary key values (aka **primary key constraint**).
 * **Referential integrity** - the value that appears in one table for the foreign key must also appear for the primary key in another table (aka **foreign key constraint**).
 * **User defined integrity** - applies to the data within the domain range. eg: GPA only has a range between `0` and `4`.

---
# Relational Algebra

**Why learn it?**
1. it provides a formal foundation for relational model operations.
2. it is used as a basis for implementing and optimizing queries in the query processing and optimization modules that are integral parts of relational database management systems (RDBMS)
3. some of its concepts are incorporated into the SQL standard query language for RDBMSs

**What does it consist of?**<br>
It consists of a set of operations that take one or two relations as input and produce a new relation as their result.

**Example: SQL vs Relational Algebra**<br>
SQL: <br>
`SELECT name, salary FROM employee WHERE salary < 50000` <br>

Relational Algebra: <br>
![rel_algebra_01.png](attachment:905a2929-3e55-4cd8-a4f8-0bd32a750ded.png)

## Operations
What kind of operations does relational algebra have? We have 2 types of operations, one that uses the **Set Theory** and the other has **operations that work on relations**. Set operations are typically called **binary** operations as they operate on pairs of relations while **unary** operations such as *select*, *project* and *rename* operate on one relation.

## Set Theory (Recap)
In Set Theory, we have learnt about operations such as **union**, **intersection**, **difference** and **cartesian product**. Note that **duplicates are not allow** in sets!

### 1 Union
The definition of union (denoted by ∪) is the set of all objects that are a member of *A*, or *B*, or both where *A* and *B* are sets. <br>
![union.png](attachment:26987c1f-451b-4b1f-bec6-afac418b5133.png)

Applying that to tables, we need to take note of the following properties:
* Tables must have the same number of attributes
* Corresponding attributes have the same domains

Together, we call these properties **union-compatible**.

**Example: How many tuples are in $R$ U $S$?**<br>
![union_q.png](attachment:edcfc3ff-be42-41b6-89c4-ae1a3d14aa83.png)

### 2 Intersection
The definition of intersection (denoted by $\cap$) is the set containing all elements of *A* that also belong to *B* (or equivalently, all elements of *B* that also belong to *A*) where *A* and *B* are sets.<br>
![intersection.png](attachment:b1a7ff5a-38c0-4d70-bf1a-33407d56cc4d.png)

Applying that to tables, it returns a relation instance containing all tuples that occur in both tables. The tables must be union-compatible.

**Example: What is the intersection of $R$ $\cap$ $S$?**<br>
![union_q.png](attachment:48a3455b-c5a4-4fc3-ae88-fd3305893b94.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![table_intersection.png](attachment:756a5064-e8a9-45f9-beb8-62d6f6baf049.png)

### 3 Difference
The definition of set difference (denoted by $-$) is the set of elements in *B* but not in *A* where *A* and *B* are sets. <br>
![difference.png](attachment:5d50ffd8-81fb-4131-a7f1-dad2e6ddbf02.png)

Applying that to tables, returns a relation instance containing all tuples that occur in **R but not in S**. **Note** that the order of R$-$S produces different results from S$-$R. The tables must be union-compatible.

**Example: Difference apply on R and S**<br>
![union_q.png](attachment:66ef19b3-3879-4087-adb0-47ad9f1447b9.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![table_diff.png](attachment:5ea27be0-2cbf-4dba-b5db-f1840724fac8.png)

### 4 Cartesian Product or Cross Product
The definition of cartesian product (denoted by $x$) is the set of all ordered pairs (a, b) where a is in *A* and b is in *B*. The number of tuples returned by the cartesian product is also known as the **cardinality**.<br>
![Cartesian_Product.png](attachment:a6abed08-621e-4a7c-8702-f7bb20caa3fd.png)

Applying that to tables, returns a relation instance whose schema contains all the fields of R followed by all the fields of S.

**Example: Cartesian Product of R and S**<br>
![table_cp.png](attachment:286886a1-bce6-4b3d-8324-8a8c833b9505.png) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![table_cp_results.png](attachment:4603561b-d5ba-4942-8090-67eeb698632b.png)

----
## Operations on Relations

### 1 Selection
Selects a subset of rows from a relation. It has the expression:<br>
![relation_select.png](attachment:f18d2f5d-fb0b-4f66-95b6-c76c6054ca5a.png)

**Example: Condition of A3 > 0**<br>
![relation_table.png](attachment:0f946fbc-fcbd-4a1b-8274-362aa975c5f4.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![select_01.png](attachment:d120392c-d246-4655-957c-094054970f40.png)

**How do we "build" a condition for selection?**<br>
The selection condition is a Boolean combination of terms that have the forms:
* `attribute` operator `constant`, where `constant` is any value from the tuple
* `attribute1` operator `attribute2`

The operators can be looked up from the following table:

| Precedence | Operator |
|:---:|:---:|
| 1 | round brackets `()` |
| 2 | $\geq$, $\leq$, $>$, $<$, $=$, $!=$ |
| 3 | not $¬$ |
| 4 | and $\wedge$ |
| 5 | or $\vee$ |

**Example: Selection Conditions**<br>
![table.png](attachment:e4fb2348-d1f7-4b3a-bf44-acbb8c20732e.png)

(emp_ID > "10,007" $\vee$ hire_date &lt; "1989-12-31") $\wedge$ gender = "F"

### 2 Projection
Projection deletes unwanted columns from relation. It has the expression:<br>
![project_exp.png](attachment:2a2ae303-673a-4544-ab93-580f4b3ebc5b.png)

**Example: Projection of attribute A3**<br>
![relation_table_02.png](attachment:9925f334-a94b-4f46-af39-c68b8069061c.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![proj_01.png](attachment:bc574046-8368-4484-89e7-3f52fab4463f.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![proj_02.png](attachment:db38e2fa-54af-43dd-a0cb-1c8218df0631.png)<br>
**Note** that since a relation is a set, any duplicated rows are eliminated! However in real DBMS, this is often omitted.

Selection and Projection can also be used together.<br>
**Example**<br>
![proj_select_combi.png](attachment:fe53703c-2899-4ddd-acc0-c1d871212c25.png)

### 3 Rename
Renames the relation. It has the expression:<br>
![relation_rename.png](attachment:4b54e852-0ea1-4b0d-a5ce-317d434b6e06.png)<br>
this is the basis of SQL "self-joins" where there is a need to compare data from the same table.

**Example: Applying it to a Relation called student**<br>
![relation_rename_02.png](attachment:c53ac76a-ed78-406d-86f6-f11568e26fe1.png)

Result is the old relation `student` is now changed to `stu`.

### 4 Natural Join
* Enforce equality on **all attributes with the same name**
* Eliminate **one copy of duplicate attributes**

It has the expression:<br>
![nat_join_01.png](attachment:a065ac5b-5d43-436e-aea9-dbef40d47237.png)<br>

**Example 1: $R \Join S$**<br>
![nat_join_02.png](attachment:a22efd74-a2e0-4da5-a007-44f60b90ae71.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![nat_join_03.png](attachment:c82d34b1-175f-479f-bc91-df3e6c80a5c2.png)


If we were to write it in the relational algebra of projection, selection, and conditions we will get<br>
![nat_join_04.png](attachment:3775df29-abac-42c0-9d13-e4eff1ccd6bc.png)

**Example 2**<br>
Let's say we have the following relations with the following textual schema and table structure (data in the table is not important for now):

College( <u>cName</u>, state, enrollment)<br>
Student( <u>sID</u>, sName, GPA, sizeHS)<br>
Apply( <u>sID</u>, <u>cName</u>, <u>major</u>, decision)

![sample_db.png](attachment:46344445-c117-4b8c-abdf-7e752a76ba65.png)

Find the $names$ and $GPAs$ of the students with $sizeHS>1000$ who applied to $CS$ at college with $enrollment>20000$ and were rejected.

![nat_join_05.png](attachment:fc590eb4-4f1f-4447-b725-f0cb148dca6f.png)

Therefore we can deduce that the **natural join** has the general form:

![nat_join_06.png](attachment:487e32d8-a3bc-443d-9bc6-f48d7e2941b1.png)

### 5 Join or Theta Join
Combines information from 2 or more relations based on some conditions. It has the expression:<br>
![relation_join.png](attachment:709e43f3-18a7-41d1-ad45-ea4d3d7c25d9.png)<br>
![join_08.png](attachment:1e608531-4422-43a8-ad23-b6cfeec66a35.png)<br>
where the condition denotes a selection.

A join consist of the **cartesian product and selection** between 2 relations therefore the general form is:

![join_09.png](attachment:655f4811-7e90-4c7a-b2b8-ea07cb8708c2.png)

**Note** that in this is the basic *join* operation **implemented in most** DMBS and when we use the term "join" we normally mean theta join.

**Example: $R \Join_{B\leq H} S$**<br>
![join_01.png](attachment:646fef2e-041d-43ef-8c4a-76a3656eb906.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![join_02.png](attachment:f4c88d05-edfa-402a-a945-623d6cf13ff3.png)

**Let's do another example**<br>
Given the following table, construct the relational algebra to **return the courses ids that both students (4001 & 4005) have been registered with**.<br>
![join_03.png](attachment:fb98f03c-9b13-43a8-a0dc-a7e00c3ebc4c.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![join_04.png](attachment:25c8f3ad-024d-407e-b615-67dab34288b2.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![join_05.png](attachment:50b1b6be-a5e7-4992-b073-3223662f6226.png)<br>
![join_06.png](attachment:ea303cd4-edbf-46f8-a25b-e13bb9320674.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![join_07.png](attachment:213e76ea-df95-439a-8f32-c152562a9baf.png)

**Differences between Natural Join and Conditional Join**<br>

| Natural Join | Conditional Join |
|:---|:---|
| Natural Join joins two relations based on same attribute name and datatypes. | Conditional Join joins two relations on the basis of the attribute which is explicitly specified in the condition. |
| The resulting relation will contain all the attributes of both the tables but keep only one copy of each common column. | The resulting relation will contain all the attributes of both the relations including duplicate columns. |
| If there is no condition specified then it returns the rows based on the common attribute. | Only return those rows which exists in both the relations. |

**Tips to Composing Larger Expressions**
* compose the expression recursively (aka from inside out)
* take note of parentheses and precedence rules, they govern the order of evaluation
* precedence rules from highest to lowest are as follows
 1. ( $\sigma$ ) selection, ( $\pi$ ) projection, ( $\rho$ ) rename
 2. ( x ) Cartesian Product, ( $\Join$ ) Join
 3. ( $\cap$ ) intersection
 4. ( $\cup$ ) union, ( $-$ ) difference
* always use brackets 

### 6 Outer Join
Joins return tuples formed by combining matching tuples of two or more relations but sometimes we may also want the results of the **unmatched tuples**. That's where the **outer join** is used. The results of an outer join contains both matched and unmatched tuples of 2 or more relations by **"filling"** the unmatched cells in the tuples with `NULL`.

There are 3 types of outer join operators: **left outer join**, **right outer join** and **full outer join**.

#### 6a Left Outer Join
Left outer join results in the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in R that have no matching tuples in S. It has the expression:<br>
![loj_01.png](attachment:ff785da6-bb5e-4074-9b41-fcdc29dd9a28.png)

**Example: R left outer join S**<br>
![oj_table.png](attachment:9097f044-98d1-4191-b48d-a1977ac03247.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![loj_02.png](attachment:049d5495-f98f-470c-9809-4aebc2586e55.png)

#### 6b Right Outer Join
The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched. It has the expression:<br>
![roj_01.png](attachment:8e838f31-a59b-41d8-b07f-430d13d0f3be.png)

**Example: R right outer join S**<br>
![oj_table.png](attachment:09cfb498-e10e-4e42-8af1-98026f8cb637.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![roj_02.png](attachment:7948b18d-d642-49a8-9dff-b6d17b1ce103.png)

#### 6c Full Outer Join
The outer join or full outer join in effect combines the results of the left and right outer joins. It has the expression:<br>
![foj_01.png](attachment:48d08d43-9c39-4cc1-84c4-e503c4bfff95.png)

**Example: R full outer join S**<br>
![oj_table.png](attachment:5fa8c1e9-e098-4f23-b872-5067101b07b0.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![foj_02.png](attachment:ce708cce-78a5-40b1-835e-e5645cdea453.png)

### 7 Division
Division is not implemented **directly** in SQL and can only be applied if the following is fulfilled:
* Attributes of S is proper subset of Attributes of R.
* The relation returned by division operator will have attributes = (All attributes of R – All Attributes of S)
* The relation returned by division operator will return those tuples from relation R which are associated to every S’s tuple.

where R and S are relations. The expression is:<br>
![div_01.png](attachment:9933c54f-6787-45fc-a063-dbf0dd906f25.png)

**Example 1: $R \div S$**<br>
![div_02.png](attachment:283e9a7a-be9a-4f4f-a97b-deb2df4eb273.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![div_03.png](attachment:b3e2d015-a685-4a80-833e-690d91ce2a8d.png)

**Example 2: $R \div S'$**<br>
![div_04.png](attachment:ea090a12-e996-4f46-814c-4c4b6413e37a.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![div_05.png](attachment:ded74814-4365-4266-acc5-010c9c83331a.png)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
![div_06.png](attachment:59c994b5-bae4-457b-90ee-16cd45d46246.png)

The division operation can be broken down into several base relational algebra operations as well. Using example 1 from above:
1. $\pi_{A1,A2} (R) x S$, result place in $T$. This gives us every possible desired combination of `A1` and `A2` from R & S, including those that don't exist in R.
2. $T - \pi_{A1,A2} (R)$, result place in $U$. This results in a "what's missing" relation.
3. $\pi_{A1} (U)$, result place in $V$. We only want values in attribute `A1`.
4. $\pi_{A1} (R) - V$, desired result.

if we were to write it out as 1 line, it would be $\pi_{A1} (R) - \pi_{A1} ( (\pi_{A1,A2} (R) x S ) - \pi_{A1,A2} (R))$

---
## Summary
* had a recap on relational model, it uses set theory and first order predicate logic
* learnt that relational algebra uses a set of operations that take one or two relations as input and produce a new relation as their result
 * had set theory recap in relation to math and how it is applied to tables
 * learnt about operations that can be done on relations:
   * **selection** that has the expression $\sigma_{condition}(Relation)$
   * **projection** that has the expression $\pi_{condition}(Relation)$
   * **rename** that has the expression $\rho_{Relation\_new}(Relation\_old)$
   * **join** that has the expression $Relation1\Join_{condition} Relation2$
   * **natural join** that has the expression $Relation1 \Join Relation2$
     * difference between join and natural join
   * **outer join** consisting of **full outer join, left and right joins**
   * **division** that has the expression $Relation1 \div Relation2$
     * how it can be broken down into several base relational algebra operations