# SQL set operations

In this Notebook, you will explore how the relational algebra set operations - *union*, *intersection* and *difference*, which operate on two relations to give a single resultant relation - can be applied to two tables using SQL.


This notebook contains several exercises or activities, which are presented with a space for you to try your own solution. In each case, you can see our solution by clicking on the small triangle next to the text "**our solution**", but in all cases, you should attempt the questions yourself before looking at our proposed solutions.



In [None]:
from loader import *
%load_ext sql
%sql postgresql://test:test@localhost:5432/tm351test

This Notebook will just use the `movie_actor` table from the *Movies* dataset.

`movie_actor (movie_id, actor_name, ranking)`

Each movie features one or more actors. Each row records a particular actor featuring in a particular movie 
identified by the `movie_id` and `actor_name` primary key (PK) columns.


column | description
------ | -----------
movie_id  (PK) | movie identifier
actor_name  (PK) | actor's name
ranking | position of actor on the movie's cast list

The following cell recreates and populates the desired table from the Movies dataset, to ensure that the rest of this notebook uses the correct data.

In [None]:
q='''
DROP TABLE IF EXISTS movie_actor;

CREATE TABLE movie_actor (
 movie_id INTEGER NOT NULL,
 actor_name VARCHAR(50) NOT NULL,
 ranking INTEGER NOT NULL,
 PRIMARY KEY (movie_id, actor_name)
);'''

postgres_table_create_and_load(q,'data/movie_actor.dat')

## Using SQL set operators


Using the SQL set operators, answer the following questions about the films *Shrek* (<tt>movie_id</tt> = 4306) and its sequel *Shrek 2* (<tt>movie_id</tt> = 8360).

Write your queries in the form: <br><br>

<tt>SELECT ...</tt><br>
<tt>FROM ...</tt><br>
<tt>WHERE ...</tt><br><br>
<tt><i>*SET_OPERATOR*</i></tt><br><br>
<tt>SELECT ...</tt><br>
<tt>FROM ...</tt><br>
<tt>WHERE ...</tt><br>
<tt>ORDER BY actor_name</tt>

where <tt><i>*SET_OPERATOR*</i></tt> is one of `UNION`, `INTERSECTION` or `DIFFERENCE`.




### Exercise 1: Which actors appeared in either the original movie or the sequel?

In [None]:
%%sql

-- Enter your solution here


#### Our solution

To reveal our solution, click on the triangle symbol on the left-hand end of this cell.

To address this question, you would probably want to use an (inclusive) *OR* operation - we admit of actors who appeared in *either* movie or *both* of them. Select the actors from each movie and then find the *UNION* of the results: 


In [None]:
%%sql

SELECT actor_name 
FROM movie_actor 
WHERE movie_id = 4306

UNION 

SELECT actor_name 
FROM movie_actor WHERE movie_id = 8360
ORDER BY actor_name;

### Exercise 2: Which actors appeared in both the original movie and in the sequel?

In [None]:
%%sql

-- Enter your solution here


#### Our solution

To reveal our solution, click on the triangle symbol on the left-hand end of this cell.

This takes the form of an *AND* operation. Select the actors from each movie and then find the *INTERSECT*ion of the results:

### Exercise 3: Which actors appeared in the original movie but not in the sequel?

In [None]:
%%sql

-- Enter your solution here


#### Our solution

<div class='answer'>This is similar to a "minus" operation where we want to retain items from one set that do not appear in the other. Select the actors from the original movie *EXCEPT* for those in the sequel: 

<tt>SELECT actor_name FROM movie_actor WHERE movie_id = 4306<br/>
 EXCEPT SELECT actor_name FROM movie_actor WHERE movie_id = 8360<br/>
ORDER BY actor_name;</tt>
</div>

In [None]:
%%sql

SELECT actor_name 
FROM movie_actor 
WHERE movie_id = 4306

EXCEPT 

SELECT actor_name 
FROM movie_actor 
WHERE movie_id = 8360

ORDER BY actor_name;

### Exercise 4: Which actors appeared in the sequel but not in the original movie?

In [None]:
%%sql

-- Enter your solution here


#### Our solution

This is another "minus" style operation, but swaps the set order compared to the previous query. Select the actors from the sequel *EXCEPT* for those in the original movie:

In [None]:
%%sql 

SELECT actor_name 
FROM movie_actor 
WHERE movie_id = 8360

EXCEPT 

SELECT actor_name 
FROM movie_actor 
WHERE movie_id = 4306

ORDER BY actor_name

This is another "minus" style operation, but swaps the set order compared to the previous query. Select the actors from the sequel *EXCEPT* for those in the original movie: 
<br/><br/>
<tt>SELECT actor_name FROM movie_actor WHERE movie_id = 8360<br/>
 EXCEPT SELECT actor_name FROM movie_actor WHERE movie_id = 4306<br/>
ORDER BY actor_name;</tt>
</div>

### Exercise 5: Which actors appeared in either the original movie or the sequel, but not both?

In [None]:
%%sql

-- Enter your solution here


#### Our solution

This is essentially an *exclusive OR* operation, *(A and NOT B) OR (B AND NOT A)*. For both the original movie and the sequel, select the actors from each movie that do not appear in the other one, and then find the *UNION* of the results: 

In [None]:
%%sql

(SELECT actor_name 
 FROM movie_actor 
 WHERE movie_id = 4306

 EXCEPT 

 SELECT actor_name 
 FROM movie_actor 
 WHERE movie_id = 8360)

UNION 

(SELECT actor_name
 FROM movie_actor
 WHERE movie_id = 8360

 EXCEPT 
 
 SELECT actor_name 
 FROM movie_actor 
 WHERE movie_id = 4306)

ORDER BY actor_name;

---

### Optional Exercise - Comparing the Performance of Set operations with Other Queries

Using the EXPLAIN technique used in notebook *10.4 Normalised v. unnormalised data* for profiling queries, write some equivalent queries to the above set based queries and compare the performance of the different query styles.


---

### Optional Exercise

If you have time, you might consider revisiting the *movies* dataset to see what sorts of questions you can now turn into queries using the additional SQL constructs reviewed in this notebook.

## Summary
In this Notebook, you have explored how relational algebra set operators can be used in forumulating SQL queries.


## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to `11.2 SQL subqueries`.