# SQL set operations

In this Notebook, you will explore how the relational algebra set operations - *union*, *intersection* and *difference*, 
which operate on two relations to give a single resultant relation - can be applied to two tables using SQL.

Enable access to the PostgreSQL database engine via [SQL Cell Magic](https://pypi.python.org/pypi/ipython-sql).

In [None]:
%load_ext sql
%sql postgresql://test:test@localhost:5432/tm351test

This Notebook will be just using the `movie_actor` table from the *Movies dataset*.

`movie_actor (movie_id, actor_name, ranking)`

Each movie features one or more actors. Each row records a particular actor featuring in a particular movie 
identified by the `movie_id` and `actor_name` primary key (PK) columns.


column | description
------ | -----------
movie_id  (PK) | movie identifier
actor_name  (PK) | actor's name
ranking | position of actor on the movie's cast list

In [None]:
%%sql
DROP TABLE IF EXISTS movie_actor;

CREATE TABLE movie_actor (
 movie_id INTEGER NOT NULL,
 actor_name VARCHAR(50) NOT NULL,
 ranking INTEGER NOT NULL,
 PRIMARY KEY (movie_id, actor_name)
);

Populate the tables from the Movies dataset using [Psycopg](http://initd.org/psycopg/docs/index.html), 
a PostgreSQL database adapter for Python.

In [None]:
import psycopg2 as pg
import pandas as pd
import pandas.io.sql as psqlg

In [None]:
# open a connection to the PostgreSQL database tm351test
conn = pg.connect(dbname='tm351test', host='localhost', user='test', password='test', port=5432)
# create a cursor
c = conn.cursor()
# open movie_actor.dat
io = open('data/movie_actor.dat', 'r')
# execute the PostgreSQL copy command
c.copy_from(io, 'movie_actor')
# close movie_actor.dat
io.close()
# commit transaction
conn.commit()
# close cursor
c.close()
# close database connection
conn.close()

## Activity

Using the SQL set operators, answer the following questions about the films *Shrek* (`movie_id` = 4306) and its sequel *Shrek 2* (`movie_id` = 8360):
    
- Which actors appeared in either the original movie or the sequel?
- Which actors appeared in both the original movie and in the sequel?
- Which actors appeared in the original movie but not in the sequel?
- Which actors appeared in the sequel but not in the original movie?
- Which actors appeared in either the original movie or the sequel, but not both?

In [None]:
%%sql
-- try your code below


Solutions can be found in the `11.1.soln SQL set operations` Notebook, but please DO attempt the activity yourself 
before looking at these solutions.

## Summary
In this Notebook, you have explored how relational algebra set operators are implemented in SQL.

## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to `11.2 SQL subqueries`.