#### 7.1: Operations on sets

with other sets

In [2]:
# Intersection
{1, 2, 3, 4, 5}.intersection({3, 4, 5, 6}) # {3, 4, 5}
{1, 2, 3, 4, 5} & {3, 4, 5, 6} # {3, 4, 5}

{3, 4, 5}

In [3]:
# Union
{1, 2, 3, 4, 5}.union({3, 4, 5, 6}) # {1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5} | {3, 4, 5, 6} # {1, 2, 3, 4, 5, 6}

{1, 2, 3, 4, 5, 6}

In [4]:
# Difference
{1, 2, 3, 4}.difference({2, 3, 5}) # {1, 4}
{1, 2, 3, 4} - {2, 3, 5} # {1, 4}

{1, 4}

In [5]:
# Symmetric difference with
{1, 2, 3, 4}.symmetric_difference({2, 3, 5}) # {1, 4, 5}
{1, 2, 3, 4} ^ {2, 3, 5} # {1, 4, 5}

{1, 4, 5}

In [6]:
# Superset check
{1, 2}.issuperset({1, 2, 3}) # False
{1, 2} >= {1, 2, 3} # False

False

In [7]:
# Subset check
{1, 2}.issubset({1, 2, 3}) # True
{1, 2} <= {1, 2, 3} # True

True

In [8]:
# Disjoint check
{1, 2}.isdisjoint({3, 4}) # True
{1, 2}.isdisjoint({1, 4}) # False

False

with single elements

In [9]:
# Existence check
2 in {1,2,3} # True
4 in {1,2,3} # False
4 not in {1,2,3} # True

True

In [13]:
# Add and Remove
s = {1,2,3}
s.add(4) # s == {1,2,3,4}
s
s.discard(3) # s == {1,2,4}
s
s.discard(5) # s == {1,2,4}
s
s.remove(2) # s == {1,4}
s
# s.remove(2) # KeyError!

{1, 4}

Set operations return new sets, but have the corresponding in-place versions:<br>
method in-place operation in-place method <br>

<td> <tr> union      s |= t     update</tr> <br>
<tr> intersection    s &= t     intersection_update</tr> <br>
<tr> difference      s -= t     difference_update </tr><br>
<tr> symmetric_difference    s ^= t       symmetric_difference_update </tr></td><br>

For example:

In [14]:
s = {1, 2}
s.update({3, 4}) # s == {1, 2, 3, 4}
s

{1, 2, 3, 4}

#### 7.2: Get the unique elements of a list


Let's say you've got a list of restaurants -- maybe you read it from a file. You care about the unique restaurants in<br>
the list. The best way to get the unique elements from a list is to turn it into a set:

In [25]:
restaurants = ["McDonald's", "Burger King", "McDonald's", "Chicken Chicken","Chicken Chicken"]
unique_restaurants = set(restaurants)
print(unique_restaurants)
# prints {'Chicken Chicken', "McDonald's", 'Burger King'}

{'Burger King', 'Chicken Chicken', "McDonald's"}


Note that the set is not in the same order as the original list; that is because sets are unordered, just like <font color = 'green'>dicts</font>.

This can easily be transformed back into a List with Python's built in list function, giving another list that is the<br>
same list as the original but without duplicates:

In [26]:
list(unique_restaurants)
# ['Chicken Chicken', "McDonald's", 'Burger King']

['Burger King', 'Chicken Chicken', "McDonald's"]

It's also common to see this as one line:

In [27]:
# Removes all duplicates and returns another list
list(set(restaurants))

['Burger King', 'Chicken Chicken', "McDonald's"]

Now any operations that could be performed on the original list can be done again.

#### 7.3: Set of Sets

In [28]:
{{1,2}, {3,4}}

TypeError: unhashable type: 'set'

leads to:

In [None]:
TypeError: unhashable type: 'set'

Instead, use frozenset:

In [29]:
{frozenset({1, 2}), frozenset({3, 4})}

{frozenset({3, 4}), frozenset({1, 2})}

#### 7.4: Set Operations using Methods and Builtins

We define two sets a and b

In [30]:
a = { 1,2,2,3,4}
b = {3,3,4,4,5}

NOTE: {<font color = 'orange'>1</font>} creates a set of one element, but {} creates an empty <font color = 'green'>dict</font>. The correct way to create an
empty set is <font color = 'green'>set</font>().

**Intersection** <br>
a.intersection(b) returns a new set with elements present in both a and b .


In [31]:
a.intersection(b)

{3, 4}

**Union** <br>
a.union(b) returns a new set with elements present in either a and b.

In [32]:
a.union(b)

{1, 2, 3, 4, 5}

**Difference** <br>
a.difference(b) returns a new set with elements present in a but not in b

In [34]:
a.difference(b)

{1, 2}

In [35]:
b.difference(a)

{5}

**Symmetric Difference** <br>
a.symmetric_difference(b) returns a new set with elements present in either a or b but not in both (uncommon)

In [36]:
a.symmetric_difference(b)

{1, 2, 5}

In [38]:
b.symmetric_difference(a)

{1, 2, 5}

**NOTE**: a.symmetric_difference(b) == b.symmetric_difference(a)

***Subset and superset***

c.issubset(a) tests whether each element of c is in a. <br>
a.issuperset(c) tests whether each element of c is in a.

In [39]:
c = {1,2}
c.issubset(a)

True

In [40]:
a.issuperset(c)

True

The latter operations have equivalent operators as shown below:
<table style ='width:20%'> <tr> <th>Method</th>             <th> Operator</th></tr>
<tr> <th>a.intersection(b)</th>          <th>a & b </th></tr>
<tr> <th>a.union(b)        </th>          <th>         a|b </th></tr>
<tr> <th>a.difference(b)       </th>          <th>     a - b </th></tr>
<tr> <th>a.symmetric_difference(b) </th>          <th> a ^ b </th></tr>
<tr> <th>a.issubset(b)            </th>          <th>  a <= b </th></tr>
<tr> <th>a.issuperset(b)          </th>          <th>  a >= b </th></tr></table>


Sets a and d are disjoint if no element in a is also in d and vice versa

In [41]:
d = {5, 6}
a.isdisjoint(b) # {2, 3, 4} are in both sets

False

In [42]:
a.isdisjoint(d)

True

In [43]:
# This is an equivalent check, but less efficient
len(a & d) == 0

True

In [44]:
# This is even less efficient
a & d == set()


True

***Testing membership*** <br>

The builtin *in* keyword searches for occurances

In [45]:
1 in a

True

In [46]:
6 in a

False

***Length*** <br>

The builtin <font color = 'green'> len</font>() function returns the number of elements in the set


In [47]:
len(a)

4

In [48]:
len(b)

3

#### 7.5: Sets versus multisets

Sets are unordered collections of distinct elements. But sometimes we want to work with unordered collections of <br>
elements that are not necessarily distinct and keep track of the elements' multiplicities

Consider this example:

In [52]:
setA = {'a','b','b','c'}
setA

{'a', 'b', 'c'}

By saving the strings 'a', 'b', 'b', 'c' into a set data structure we've lost the information on the fact that 'b' <br>
occurs twice. Of course saving the elements to a list would retain this information

In [53]:
listA = ['a','b','b','c']
listA

['a', 'b', 'b', 'c']

but a list data structure introduces an extra unneeded ordering that will slow down our computations. <br>
For implementing multisets Python provides the Counter class from the collections module

In [55]:
from collections import Counter
counterA = Counter(['a','b','b','c'])
counterA

Counter({'a': 1, 'b': 2, 'c': 1})

Counter is a dictionary where where elements are stored as dictionary keys and their counts are stored as <br>
dictionary values. And as all dictionaries, it is an unordered collection.