# Sets

## Sets

* A _set_ contains a collection of unique values and works like a mathematical set.
* No two elements can have the same value
* Sets are unordered (kind of like a dictionary)
* A set may contain multiple data types

## Creating a set

* Use the built-in `set()` function
* `myset = set()`
* You may also create a set from an iterable (usually a list, can be a string, tuple, or file)
* `myset = set([1, 2, 3])`

In [5]:
# Set examples
print(set([1, 2, 3]))
print(set([1, 2, 3, 3]))
print(set('hello'))
print(set('hello world'))
print(set(['hello', 'world']))

{1, 2, 3}
{1, 2, 3}
{'e', 'o', 'l', 'h'}
{' ', 'h', 'l', 'e', 'd', 'r', 'o', 'w'}
{'hello', 'world'}


In [6]:
# Getting the number of elements in a set
myset = set([1, 2, 3, 3])
print(len(myset))

3


In [7]:
# The `add` method
#
# Add an item to a set
myset = set()
myset.add(1)
myset.add(2)
myset.add(3)
myset.add(3)
print(myset)

{1, 2, 3}


In [11]:
# The `update` method
#
# Add multiple items to a set
myset = set()
myset.update([1, 2, 3])
myset.update([2, 3, 4])
print(myset)

{1, 2, 3, 4}


In [12]:
# The `discard` and `remove` methods
#
# Both remove an item from a set. `remove` raises an error if the item is not in the set.
myset = set([1, 2, 3])
myset.discard(1)
myset.discard(5)
myset.remove(2)
myset.remove(5)

KeyError: 5

In [13]:
# The `clear`method
#
# Remove all items from the set
myset = set([1, 2, 3])
myset.clear()
print(myset)

set()


In [14]:
# Using a for loop on a set
myset = set([1, 2, 3, 4, 5])
for number in myset:
    print(number)

1
2
3
4
5


In [15]:
# Using the `in` operator
myset = set([1, 2, 3, 4, 5])
print(1 in myset)
print(6 in myset)
print(6 not in myset)

True
False
True


## Mathematical set functions between two sets

* Union - A set containing all of the elements of two separate sets
* Intersection - A set containing an element if it appears in both sets
* Difference - A set containing the elements in the first set that are _not_ in the second set
* Symmetric difference - A set containing the elements in either set, but not both sets

In [19]:
# The `union` method
#
# Creates a new set containing all of the elements in both sets
set1 = set([1, 2, 3, 4, 5])
set2 = set([5, 6, 7, 8, 9])
set3 = set1.union(set2)
print(set1)
print(set2)
print(set3)

# Alternate syntax
set3 = set1 | set2
print(set3)

{1, 2, 3, 4, 5}
{5, 6, 7, 8, 9}
{1, 2, 3, 4, 5, 6, 7, 8, 9}
{1, 2, 3, 4, 5, 6, 7, 8, 9}


In [20]:
# The `intersection` method
#
# Creates a new set containing the intersection of two sets
set1 = set([1, 2, 3, 4, 5])
set2 = set([5, 6, 7, 8, 9])
set3 = set1.intersection(set2)
print(set3)

# Alternate syntax
set3 = set1 & set2
print(set3)

{5}
{5}


In [21]:
# The `difference` method
#
# Creates a new set containing the elements that appear in one set but not the other
set1 = set([1, 2, 3, 4, 5])
set2 = set([5, 6, 7, 8, 9])
set3 = set1.difference(set2)
print(set3)

# Alternate syntax
set3 = set1 - set2
print(set3)

{1, 2, 3, 4}
{1, 2, 3, 4}


In [22]:
# The `symmetric_difference` method
#
# Creates a new set containing the symmetric difference of two sets
set1 = set([1, 2, 3, 4, 5])
set2 = set([5, 6, 7, 8, 9])
set3 = set1.symmetric_difference(set2)
print(set3)

# Alternate syntax
set3 = set1 ^ set2
print(set3)

{1, 2, 3, 4, 6, 7, 8, 9}
{1, 2, 3, 4, 6, 7, 8, 9}


## Finding supersets and subsets

* Superset - All of the elements in set 2 are contained in set 1
* Subset - All of the elements in set 1 are contained in set 2

In [25]:
# `issuperset` method
#
# True if the set is a superset of the provided set, False otherwise
set1 = set([1, 2, 3, 4, 5])
set2 = set([1, 2, 3])
print(set1.issuperset(set2))
print(set2.issuperset(set1))

# Alternate syntax
print(set1 >= set2)

True
False
True


In [26]:
# `issubset` method
#
# True if the set is a subset of the provided set, False otherwise
set1 = set([1, 2, 3, 4, 5])
set2 = set([1, 2, 3])
print(set1.issubset(set2))
print(set2.issubset(set1))

# Alternate syntax
print(set2 <= set1)

False
True
True


## Object serialization

* We can write simple values to a text value
* Sometimes we need something more complex (like, for dictionaries)
* We can _serialize_ the object to a file
* From the file, we can _deserialize_ it back to the original object

## The `pickle` module

* Python calls serialization "pickling"
* Included module that must be imported (`import pickle`)
* Module has various serialization methods

## Pickling an object

1. `import pickle`
2. Open a file for binary writing (`open('file.dat', 'wb')`).
3. Call the `pickle` module's `dump` method to pickle the object and write to file
4. Close the file when done.

In [29]:
# Pickling objects
import pickle

phonebook = {'Greg': '555-1234', 'Katie': '555-2345'}
favorites = ['Greg']

output_file = open('pickled.dat', 'wb')
pickle.dump(phonebook, output_file)
pickle.dump(favorites, output_file)
output_file.close()

## Notes on pickling

* We are writing binary, so we use the `wb` flag.
* The extension on the file isn't important, but note that this is not a text file.
* The file is unreadable as it contains binary data
* We can pickle multiple items and unpickle them in order

## Unpickling

1. `import pickle`
2. Open the file for binary reading(`open('file.dat', 'rb')`)
3. Call the `pickle` module's `load` method to deserialize object from the file
4. Close the file when done

In [30]:
# Unpickling objects
import pickle

input_file = open('pickled.dat', 'rb')
phonebook = pickle.load(input_file)
favorites = pickle.load(input_file)
input_file.close()

print(phonebook)
print(favorites)

{'Greg': '555-1234', 'Katie': '555-2345'}
['Greg']


## Notes on unpickling

* We arereading binary, so we use the `rb` flag.
* We unpickle items in the same order they were pickled
* If you try to unpickle an item that's not in the file, you get a `EOFError`

## Example: Saving and loading complex records

Create a program that records information about a person. This program should save its data to a file and then load it at startup (if the file exists).

In [31]:
# Walkthrough solution