# Python Set

A **set** is a collection data type that only contains unique values.

In [1]:
{'a', 'b', 'c', 'd'}  # this is a set with four unique values

{'a', 'b', 'c', 'd'}

## A set construction

A set can be created literally or using a set function (constructor).

In [5]:
aset = {'a', 'b', 'c', 'd', 'a', 'b'}  # literally create a set
bset = set(['a', 'b', 'c', 'd', 'a', 'b'])  # using a set constructor

print(aset)
print(bset)  # note that only unique values are kept and the order of the items does not matter

{'a', 'c', 'd', 'b'}
{'c', 'd', 'b', 'a'}


A set constructor accepts iterables, which are data types, funciton (iterator) that when looped over can return one value at a time.

In [7]:
bset = set(['a', 'b', 'c', 'd', 'a', 'b'])  # a list in Python is an iterable
cset = set('abcdab')  # a string in Python is an iterable

print(bset)
print(cset)

{'c', 'd', 'b', 'a'}
{'c', 'd', 'b', 'a'}


Because a set and a dictionary in Python use {}. To create an empty set, you need to use a constructor only.

In [30]:
my_empty_set = set()
print(my_empty_set)

set()


Values that can be stored in a set must be hashable (immutable). Trying to create a set of mutable data types will raise _TypeError_.

In [45]:
aset = {[23, 30, 45]}  # this won't work because a list is mutable.

TypeError: unhashable type: 'list'

**Note** a hashable value is a value that fixed and can be hashed (transformed) by a specific mathematical function to a new specific value. Python comes with several common hash algorithms that you can use to transform values.

In [52]:
import hashlib

message = 'this is a secret message I want to send to Thailand'

md5_value = hashlib.md5(message.encode('utf8'))
print(md5_value.hexdigest())  # returns a specfic hash

# if the message changes, so does the hash value
md5_value = hashlib.md5(message.replace('Thailand', 'Japan').encode('utf8'))
print(md5_value.hexdigest())  # returns a specfic hash

75621aac7ddd7446837579c50ae2b168
397cc360bce06ec9538e5f111c8f300e


## Accessing values

A set cannot be indexed like a list because the position of the items in a set is uncertain. Trying to get a value from a set using an index will raise _TypeError_.

In [8]:
cset = set('abcdab')

cset[3]

TypeError: 'set' object is not subscriptable

However, you can use **in** operator to check if the value is in a set.

In [12]:
cset = set('abcdab')

print('a' in cset)
print('x' in cset)

True
False


Looking up a value in a set is much faster than a list. Therefore, if you want to just check whether the value is in a list, convert a list to a set and then search for it using **in** operator.

In [43]:
alist = list(range(10000))  # creates a list of 1000 integers
aset = set(alist)
%timeit 9999 in aset
%timeit 9999 in alist

47.9 ns ± 2.37 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
115 µs ± 485 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


**Note** the time used to search for the value in a set is much faster!! So, remember to use set if you want to just search for a value.

Like a list, the size of a set can be found using a len function.

In [14]:
cset = set('abcdab')

print(len(cset))

4


## Built-in methods

In [19]:
cset = set('abcdab')

cset.add('xyz')
print(cset)

{'c', 'd', 'a', 'xyz', 'b'}


In [20]:
cset = set('abcdab')

for i in range(len(cset)):
    value = cset.pop()
    print(value)
print(cset)  # the set is now empty

c
d
b
a
set()


A set comes with methods for common set operations.

In [22]:
aset = set('abcdefg')
bset = set('defghi')

print(aset.union(bset))
print(bset.intersection(aset))
print(aset.difference(bset))

{'e', 'c', 'i', 'd', 'a', 'f', 'g', 'h', 'b'}
{'e', 'f', 'g', 'd'}
{'c', 'b', 'a'}


In [24]:
aset = set('abcdefg')
bset = set('defghi')
aset.update(bset)  # aset can be combined with bset with an update method
print(aset)

{'e', 'c', 'i', 'd', 'a', 'f', 'g', 'h', 'b'}


In [26]:
aset = set('abcdefg')
bset = set('defghi')

aset.difference_update(bset)  # only keeps the differences
print(aset)

{'c', 'a', 'b'}


In [27]:
aset = set('abcdefg')
bset = set('defghi')

aset.intersection_update(bset)  # only keeps the differences
print(aset)  # only keeps common values

{'e', 'f', 'g', 'd'}


**Note** all update related methods alters a set in-place.

In [28]:
aset = set('abcdefg')

aset.clear()  # clear out the set

print(aset)

set()
