# Activity 45: Relational Algebra & Logical Equivalence

### Note: You may need to install `markdown`:

```
pip install markdown
```

Please ignore the "deprecation message" you may get below.

In [None]:
%load_ext sql
%sql sqlite://

%load_ext autoreload
%autoreload 2

# To help render markdown
from IPython.core.display import display, HTML
from markdown import markdown
def render_markdown_raw(m): return display(HTML(markdown(m))) # must be last element of cell.
def render_markdown(m): return render_markdown_raw(m.toMD())

# import the relational algbera operators
from relation_algebra import Select, Project, Union, NJoin, CrossProduct, BaseRelation
from relation_algebra import get_result, compare_results

from display_tools import side_by_side

import random

In [None]:
%%sql
drop table if exists R; create table R(A int, B int);
drop table if exists S; create table S(B int, C int);
drop table if exists T; create table T(C int, D int);
drop table if exists U; create table U(D int, E int);

In [None]:
for x in range(0,10,2):
    for y in range(0,10,3):
        %sql INSERT INTO R VALUES (:x, :y);
for x in range(0,20,4):
    for y in range(0,10,2):
        %sql INSERT INTO S VALUES (:x, :y);
for x in range(0,5,1):
    for y in range(0,10,2):
        %sql INSERT INTO T VALUES (:x, :y);

# Tutorial: Relational Algebra Python Toolkit

We'll use a python toolkit that the Stanford made to play around with RA.  We'll get started with a quick tutorial, but the syntax should also be pretty intuitive (feel free to look at the source code too!)

#### BaseRelation class

Recall that in our RA operations we'll deal with sets; to get started, we need to take SQL output and turn it into a `BaseRelation` object, which we can optionally name:

In [None]:
r = %sql SELECT * FROM R;
R = BaseRelation(r, name="R")

s = %sql SELECT * FROM S;
S = BaseRelation(s, name="S")

t = %sql SELECT * FROM T;
T = BaseRelation(t, name="T")

For **all operators in the toolkit**, we can use `get_result` to see the set we have:

In [None]:
print(get_result(R))

And (again **for all operators in our toolkit**) we can use `render_markdown(R)`

**_NOTE: This function requires that you have installed the `markdown` python library.  It's just for this function / pretty printing, so if you weren't able to install this library, don't worry!_**

In [None]:
render_markdown(R)
render_markdown(S)
render_markdown(T)

#### Selection, Projection, NJoin (Natural Join) classes

In [None]:
s = Select("A", 2, R) # selection on A=2 for relation R
render_markdown(s)
print(get_result(s))

In [None]:
p = Project(["A"], R)
render_markdown(p)
print(get_result(p))

In [None]:
j = NJoin(R, S) # Natural Join. 
render_markdown(j)
print(get_result(j))

#### Compositionality

Most importantly, these operators are all compositional, so you can pass them in as inputs to each other (as we already did with passing `BaseRelation` into the operators above)!

In [None]:
ps = Project(["A"], Select("A", 2, R))
render_markdown(ps)
print(get_result(ps))

In [None]:
pj = Project(["A"], NJoin(R, S)) 
render_markdown(pj)
print(get_result(pj))

# The Exercise starts really here 

### Exercise 1: SQL -> RA

Let's go through some examples where we'll translate SQL to Relational Algebra- note you can use the tools to debug / test your answers!

**NOTE: The instances we use are randomly generated- feel free to re-generate above to get more useful test examples!**

In [None]:
r = %sql SELECT * FROM R;
s = %sql SELECT * FROM S;
side_by_side(r,s)

**For each of the below queries, translate them from SQL into RA using the python RA toolkit!**

### Exercise 1a

In [None]:
%%sql
SELECT DISTINCT R.B
FROM R
WHERE R.A = 2;

### Exercise 1b

In [None]:
%%sql
SELECT DISTINCT R.A, S.C
FROM R, S
WHERE R.B = S.B;

### Exercise 1c

In [None]:
%%sql
SELECT DISTINCT R.A, T.D
FROM R, S, T
WHERE R.B = S.B AND S.C = T.C AND R.A = 2;

### Exercise 2: RA -> SQL

Let's go through some examples where we'll translate Relational Algebra to SQL- note you can use the tools to debug / test your answers!

**NOTE: The instances we use are randomly generated- feel free to re-generate above to get more useful test examples!**

### Exercise 2a

In [None]:
x = Select("B", 0, Project(["B"], BaseRelation(s, name="S")))
render_markdown(x)
print(get_result(x))

In [None]:
%%sql
--WRITE YOUR QUERY HERE!

### Exercise 2b

In [None]:
x = Project(["A","C"],
        NJoin(
            NJoin(Select("B", 0, BaseRelation(r, name="R")), BaseRelation(s, name="S")),
            Select("C", 0, BaseRelation(t, name="T"))))
render_markdown(x)
print(get_result(x))

### Exercise 2c

Turn the below into SQL!  To verify your solution, we give you the equivalent RA expression in the Python toolkit below. You first want to change the random data instance in a way that allows you to verify your solution against a meaningful result set.

**Exercise 2c-1**: Create a data instance that allows you to compare the query result of your SQL solution against the Relational Algebra version, by modifying following random data generator.

In [None]:
for x in range(0,10,2):
    for y in range(0,10,3):
        %sql INSERT INTO R VALUES (:x, :y);
for x in range(0,20,4):
    for y in range(0,10,2):
        %sql INSERT INTO S VALUES (:x, :y);
for x in range(0,5,1):
    for y in range(0,10,2):
        %sql INSERT INTO T VALUES (:x, :y);
for x in range(0,10,2):
    for y in range(0,5,1):
        %sql INSERT INTO U VALUES (:x, :y);

In [None]:
r = %sql SELECT * FROM R;
R = BaseRelation(r, name="R")
s = %sql SELECT * FROM S;
S = BaseRelation(s, name="S")
t = %sql SELECT * FROM T;
T = BaseRelation(t, name="T")
u = %sql SELECT * FROM U;
U = BaseRelation(u, name="U")

<img src="files/Act-16-1.png">

In [None]:
x = Project(["A"],
        NJoin(
            BaseRelation(r, name="R"), NJoin(
                Project(["B", "D"], NJoin(
                    BaseRelation(s, name="S"), 
                    Select("D", 3, BaseRelation(t, name="T"))) 
                       ), 
                Project(["D"], Select("E", 1, BaseRelation(u, name="U")))
            )
        )
           )
render_markdown(x)
print(get_result(x))

**Excercise 2c-2:** Write the SQL query

After learning about the query plans and what a database actually does, isn't our declarative SQL so much easier...