# Sets 

So far in this class we have learned about sequences (specifically tuples and lists) as ways to store multiple values in a single object.  These objects place the values in a certain order, and since each entry can be any value, repetitions are allowed. But what if we are only concerned with the distinct objects themselves and not about how many times or in what order they appear?   

## 1. The Basics

Python also has a `set` type that has the properties of a mathematical set: it is an unordered collection of distinct objects. Sets can be defined using the roster notation we learned in Section 2.1

In [1]:
S = {1,2,3}
S

{1, 2, 3}

In [2]:
type(S)

set

Notice that repeated values only appear once in the set.

In [4]:
S = {1,1,1,2,2,3,1,3,3,2}
print(S)
S

{1, 2, 3}


{1, 2, 3}

Values can be added to a set using the `add()` method. Of course, if the value is already in the set, it will not be added again

In [6]:
S.add(4)
print(S)

{1, 2, 3, 4}


In [7]:
S.add(3)
print(S)

{1, 2, 3, 4}


Suppose that we already have a list and want to turn it into as set because we only care about the distinct elements. We can do that as follows.

In [8]:
A = [1,2,3,3,3,4,1,2,4]
S2 = set(A)
S2

{1, 2, 3, 4}

Be careful, though.  An empty set cannot be created using roster notation. `{}` will create an empty dictionary, not an empty set. We will discuss dictionaries later in the course.

In [9]:
D = {}
type(D)

dict

To create an empty set, you should use `set()`.

In [11]:
E = set()
len(E)

0

In [12]:
len(S)

4

The `add` method will only add single values, however.  We will learn how to add multiple values next time.

In [13]:
E.add(A)

TypeError: unhashable type: 'list'

In [14]:
E.add(tuple(A))
E

{(1, 2, 3, 3, 3, 4, 1, 2, 4)}

In [15]:
len(E)

1

## 2. An application.

The file `more_ice_cream.csv` contains data about ice cream purchased in a particular area.  Each row contains the flavor, color, and price of the ice cream. The code below reads the data and stores it in a list called `iceCreamData`.

From the `iceCreamData` list, determine 

1. What distinct flavors appear in the list,
2. What distinct colors appear in the list , and
3. What distinct flavor/color combinations appear in the list.

Store each of these in a set and print your results.

In [16]:
# This part of the code reads the file and produes a list of tuples 
# containing the data.  Do not change it. Just run it.

file = open('more_ice_cream.csv','r') # open the file
data = file.read().split("\n")        # read the file as a string, split it into lines,
                                      # and store the data in an list of strings
    
header = data.pop(0)                  # remove the header line and print it
print("Entries:",header)
iceCreamData = []                     # initialize a list to store the data from the each line
for line in data:                     
    lineData = tuple(line.split(',')) # convert the line into a tuple
    iceCreamData.append(lineData)     # add the tuple to the iceCreamData list.
iceCreamData
    


Entries: Flavor,Color,Price


[('Chocolate', 'brown', '4.01'),
 ('Cookie Dough', 'white', '4.69'),
 ('Bubble Gum', 'pink', '4.5'),
 ('Vanilla', 'white', '3.14'),
 ('Vanilla', 'white', '3.53'),
 ('Chocolate', 'brown', '3.59'),
 ('Vanilla', 'white', '3.34'),
 ('Chocolate', 'brown', '3.59'),
 ('Bubble Gum', 'blue', '4.6'),
 ('Vanilla', 'white', '3.76'),
 ('Cookie Dough', 'white', '4.24'),
 ('Bubble Gum', 'pink', '4.5'),
 ('Bubble Gum', 'blue', '4.6'),
 ('Vanilla', 'white', '3.48'),
 ('Bubble Gum', 'blue', '4.8'),
 ('Vanilla', 'white', '3.43'),
 ('Vanilla', 'white', '3.23'),
 ('Bubble Gum', 'pink', '4.85'),
 ('Strawberry', 'pink', '3.79'),
 ('Chocolate', 'brown', '3.51'),
 ('Cookie Dough', 'white', '4.34'),
 ('Vanilla', 'white', '3.92'),
 ('Vanilla', 'white', '3.15'),
 ('Vanilla', 'white', '3.42'),
 ('Chocolate', 'brown', '3.29'),
 ('Vanilla', 'white', '3.48'),
 ('Chocolate', 'brown', '3.57'),
 ('Vanilla', 'white', '3.34'),
 ('Strawberry', 'pink', '4.29'),
 ('Vanilla', 'white', '3.68'),
 ('Vanilla', 'white', '3.72'),
 

In [17]:
iceCreamData[0]

('Chocolate', 'brown', '4.01')

In [18]:
flavors = set()
colors = set()
combos = set()

for p in iceCreamData:
    f = p[0]
    c = p[1]
    flavors.add(f)
    colors.add(c)
    combos.add( (f,c) )

print("Flavors:")
print(flavors)
print("Colors:")
print(colors)
print("Combinations:")
print(combos)

Flavors:
{'Chocolate', 'Cookie Dough', 'Vanilla', 'Bubble Gum', 'Strawberry'}
Colors:
{'pink', 'white', 'brown', 'blue'}
Combinations:
{('Strawberry', 'pink'), ('Cookie Dough', 'white'), ('Bubble Gum', 'blue'), ('Vanilla', 'white'), ('Bubble Gum', 'pink'), ('Chocolate', 'brown')}


## 3. Your turn

The file `top_movies.csv` contains information about the 200 highest grossing films (unadjusted). Each line contains a Title, Studio, Gross, Adjusted Gross, and Year.  The data is read into a list called `topMovies`

In [19]:
# This part of the code reads the file and produes a list of tuples 
# containing the data.  Do not change it. Just run it.
import csv   

file = open('top_movies.csv','r')                    # open file for reading
data = csv.reader(file,delimiter=',',quotechar='"')  # create a csv reader.  this is needed because some titles
                                                     # contain commas
                                               
topMovies = []                                       # create an empty list in which to store the data 
count = 0
for line in data:                                    # process each line of the file
    count += 1
    if(count==1):                                    # skip the first line which contains column headers
        continue
    lineData = tuple(line)                           # store the data as a tuple (for consistency with the previous example)
    topMovies.append(lineData)                       # add the tuple of data to the topMovies list.
    
topMovies

[('Star Wars: The Force Awakens',
  'Buena Vista (Disney)',
  '906723418',
  '906723400',
  '2015'),
 ('Avatar', 'Fox', '760507625', '846120800', '2009'),
 ('Titanic', 'Paramount', '658672302', '1178627900', '1997'),
 ('Jurassic World', 'Universal', '652270625', '687728000', '2015'),
 ("Marvel's The Avengers",
  'Buena Vista (Disney)',
  '623357910',
  '668866600',
  '2012'),
 ('The Dark Knight', 'Warner Bros.', '534858444', '647761600', '2008'),
 ('Star Wars: Episode I - The Phantom Menace',
  'Fox',
  '474544677',
  '785715000',
  '1999'),
 ('Star Wars', 'Fox', '460998007', '1549640500', '1977'),
 ('Avengers: Age of Ultron',
  'Buena Vista (Disney)',
  '459005868',
  '465684200',
  '2015'),
 ('The Dark Knight Rises', 'Warner Bros.', '448139099', '500961700', '2012'),
 ('Shrek 2', 'Dreamworks', '441226247', '618143100', '2004'),
 ('E.T.: The Extra-Terrestrial',
  'Universal',
  '435110554',
  '1234132700',
  '1982'),
 ('The Hunger Games: Catching Fire',
  'Lionsgate',
  '424668047',
 

Determine what studios are on the list and also what years.  Store these in sets and print the results.

In [22]:
studios = set()
years = set()

for movie in topMovies:
    s = movie[1]
    y = movie[4]
    studios.add(s)
    years.add(y)

print(studios)
print()
print(years)
    

{'Lionsgate', 'New Line', 'MPC', 'Paramount', 'Dreamworks', 'Disney', 'TriS', 'Universal', 'Sum.', 'Buena Vista (Disney)', 'Warner Bros. (New Line)', 'RKO', 'MGM', 'Warner Bros.', 'Paramount/Dreamworks', 'Orion', 'NM', 'IFC', 'Selz.', 'UA', 'Sony', 'Columbia', 'AVCO', 'Fox'}

{'2010', '2008', '1984', '1994', '1999', '1997', '1973', '2003', '1970', '2001', '1972', '1921', '1941', '1983', '1963', '1992', '1986', '1977', '1996', '2006', '1982', '2004', '1998', '1995', '1974', '1969', '1962', '1965', '1950', '2012', '1976', '1946', '2009', '2005', '1968', '1955', '2002', '1964', '1952', '1961', '2013', '2014', '1978', '1988', '1967', '1987', '1956', '1991', '1939', '1937', '1981', '2000', '1989', '1945', '2015', '2007', '1940', '1953', '1993', '1975', '1979', '1942', '1954', '1960', '1959', '1957', '2011', '1985', '1980', '1990'}
