# Set Operations

All of the set operations that we have discussed in class are implemented in python 3 and can be viewed concisely [here](https://docs.python.org/2/library/sets.html#set-objects).

For example, suppose $A$ is the set of all positive multiples of 6 less than 100, and suppose $B$ is the set of all multiples of 8 less than 100.  

- What is the set of numbers between 0 and 100 are multiples of both?


In [8]:
A = set(range(6,100,6))
B = set(range(8,100,8))
print(A)
print(B)
A.intersection(B)

{96, 66, 36, 6, 72, 42, 12, 78, 48, 18, 84, 54, 24, 90, 60, 30}
{32, 64, 96, 8, 40, 72, 16, 48, 80, 24, 56, 88}


{24, 48, 72, 96}

- What is the set of numbers that are multiples of 6 or 8?

In [9]:
print(A.union(B))

{64, 66, 6, 72, 8, 12, 78, 16, 80, 18, 84, 24, 88, 90, 30, 96, 32, 36, 40, 42, 48, 54, 56, 60}


- What is the set of multiples of 8 that are not multiples of 6?

In [10]:
B.difference(A)

{8, 16, 32, 40, 56, 64, 80, 88}

In [12]:
B-A

{8, 16, 32, 40, 56, 64, 80, 88}

In fact, the arguments to these method do not have to be sets.  

In [15]:
A.intersection(range(8,100,8))

{24, 48, 72, 96}

In [16]:
B.difference([24,48,72,96])

{8, 16, 32, 40, 56, 64, 80, 88}

## Your Turn

The two files `top10gross.csv` and `top10adj.csv` contain the top 10 movies by gross revenue and by gross revenue after adjusting for inflation (respectively, as of 2015).  Use the function `readCSV()` below to read each `csv` file into a list of tuples and then use set operations to 
1. determine which movies appear on both and
2. generate a set of distinct movies which appear in the top 10 by gross but not by adjusted gross.

In [3]:
import csv

def readCSV(file):
    fh = open(file,'r')                                  # open file for reading
    data = csv.reader(fh,delimiter=',',quotechar='"')    # create a csv reader.  this is needed because some titles
                                                         # contain commas

    contents = []                                        # create an empty list in which to store the data 
    count = 0
    for line in data:                                    # process each line of the file
        count += 1
        if(count==1):                                    # skip the first line which contains column headers
            continue
        lineData = tuple(line)                           # store the data as a tuple (for consistency with the previous example)
        contents.append(lineData)                        # add the tuple of data to the topMovies list.

    return contents


In [7]:
top10gross = readCSV('top10gross.csv')
top10adj   = readCSV('top10adj.csv')
print(len(top10gross))
print(len(top10adj))

top10moviesG = set()
top10moviesA = set()

for i in range(10):
    top10moviesG.add(top10gross[i][0])
    top10moviesA.add(top10adj[i][0])

top10moviesG.intersection(top10moviesA)
    
    

10
10


{'Star Wars', 'Titanic'}

In [8]:
top10moviesG - top10moviesA

{'Avatar',
 'Avengers: Age of Ultron',
 'Jurassic World',
 "Marvel's The Avengers",
 'Star Wars: Episode I - The Phantom Menace',
 'Star Wars: The Force Awakens',
 'The Dark Knight',
 'The Dark Knight Rises'}

In [9]:
top10moviesA

{'Doctor Zhivago',
 'E.T.: The Extra-Terrestrial',
 'Gone with the Wind',
 'Jaws',
 'Snow White and the Seven Dwarfs',
 'Star Wars',
 'The Exorcist',
 'The Sound of Music',
 'The Ten Commandments',
 'Titanic'}