# Introduction to sets

## Objectives/Overview

* What are sets?
* Why use sets?
* Working with sets
 * Basic syntax
 * Manipulating sets
 * Set operations

# What are sets

In Python, a set is an object that is made up of an unordered collection of unique elements.

That means, similar to lists and dictionaries, sets are objects made up of other elements. But, unlike lists, elements in sets have no order. And, unlike dictionaries, elements in sets are not associated with any corresponding value. Furthermore, unlike both lists and dictionaries, sets cannot have more than one of the same element.

Individual elements of sets must be “hashable” (meaning, each element of the set must a hash value which never changes during its lifetime). Thus, only immutable objects (numbers, strings, and some tuples, and not lists, dictionaries, or other sets) can be elements of sets.

Sets may be “frozen sets”, which are immutable once created. These can actually be elements in sets as well.

Sets are a unique datatype with inherent advantages and disadvantages. Which brings us to our next section…


# Why use sets

For one, speed! Unlike lists, sets are unordered. Thus, checking for membership of a value in a set is faster than checking to see if a value is included in a list.

For two, memory! Unlike dictionaries, sets do not have (indexes and keys). Thus, sets take up less memory than dictionaries.

So, sets are a useful datatype to use in situations that require checking for membership, don’t need to keep track of item order or duplicate items, and have hashable elements.

# Working with Sets: Basic Syntax

## The **set()** Function

Sets can be declared by passing an iterable (such as a list, string, or tuple) through the **set()** function.

The order in which the elements are entered is unimportant, and duplicate elements are made irrelevant.

In [20]:
print set(["h", "e", "l", "l", "o"])
print set(["o", "l", "e", "h"])
print set(["o", "l", "e", "h"]) == set(["h", "e", "l", "l", "o"])

set(['h', 'e', 'l', 'o'])
set(['h', 'e', 'l', 'o'])
True


Passing a string through the **set()** function will create a set of its characters.

In [21]:
print set("hello")

set(['h', 'e', 'l', 'o'])


Frozen sets are declared using the **frozenset()** function. Although they cannot be modified like sets, they are evaluated as though they are sets.

In [22]:
print frozenset(["h", "e", "l", "l", "o"]) == set(["h", "e", "l", "l", "o"])

True


## Curly Brackets

In newer versions of Python (2.7 and after), sets can be declared using curly brackets.

In [23]:
print {"h", "e", "l", "l", "o"}

set(['h', 'e', 'l', 'o'])


This can be handy because it requires less typing. However, curly brackets cannot express empty sets, and will not transform strings into sets of their characters. So be careful!

In [24]:
print type(set())
print type({})

<type 'set'>
<type 'dict'>


In [25]:
a = "hello"
print set(a)
print {a}

set(['h', 'e', 'l', 'o'])
set(['hello'])


## Working with Sets: Manipulating (non-Frozen) Sets

Non-frozen sets can be manipulated with a number of methods for adding or removing elements:

### Add

The **.add()** method adds hashable elements to the set.

In [26]:
a = set([1, 2, 3])
a.add(4)
a

{1, 2, 3, 4}

### Remove

The **.remove()** method removes hashable elements from the set. It returns **None**, and raises a key error when the element isn't found. 

In [27]:
a = set([1, 2, 3, 4])
a.remove(4)
a

{1, 2, 3}

In [28]:
a = set([1, 2, 3])
a.remove(4)
a

KeyError: 4

### Discard

The **.discard()** method removes hashable elements from the set. Like **.remove()**, it returns **None**, but unlike **.remove()**, it does not raise a key error when the element isn't found. 

In [29]:
a = set([1, 2, 3, 4])
a.discard(4)
a

{1, 2, 3}

In [30]:
a = set([1, 2, 3])
a.discard(4)
a

{1, 2, 3}

### Clear

The **.clear()** method empties the set.

In [31]:
a = set([1, 2, 3])
a.clear()
a

set()

### Update

The **.update()** method adds another set's or list's missing elements to the original set.

In [32]:
a = set([1, 2, 3])
a.update([1, 2, 3, 4])
a

{1, 2, 3, 4}

### Pop

The **.pop()** method removes an element arbitrarily from the set and returns it. Like **.remove()**, it raises a key error when the element isn't found. 

In [33]:
a = set([1, 2, 3, 4])
print a.pop()
a

1


{2, 3, 4}

## Working with Sets: Set Operations

Sets in python support several operations for testing for a value's membership in a set or constructing new sets from given sets:

### In and Not In

The **in** and **not in** tests will test for an element's membership or non-membership.

In [34]:
a = set([1, 2, 3, 4])
print 1 in a
print 1 not in a

True
False


### Union

The union operation, performed with the **.union()** method or **|**, will create a new set with all elements from each original given set in the operation.

In [35]:
a = set([1, 2, 3, 4])
b = set([5, 6, 7, 8])
print a.union(b)
print a|b

set([1, 2, 3, 4, 5, 6, 7, 8])
set([1, 2, 3, 4, 5, 6, 7, 8])


Union operations can be performed on more than two sets at a time:

In [36]:
a = set([1, 2, 3, 4])
b = set([5, 6, 7, 8])
c = set([9, 10, 11, 12])
print a.union(b, c)
print a|b|c

set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])


### Intersection

The intersection operation, performed with the **.intersection()** method or **&**, will create a new set with all common elements from every original given set in the operation.

In [37]:
a = set([1, 2, 3, 4, 5])
b = set([2, 3, 4, 5, 6])
print a.intersection(b)
print a&b

set([2, 3, 4, 5])
set([2, 3, 4, 5])


Like the union operation, intersection operations can also be performed on more than two sets at a time:

In [38]:
a = set([1, 2, 3, 4, 5])
b = set([2, 3, 4, 5, 6])
c = set([3, 4, 5, 6, 7])
print a.intersection(b, c)
print a&b&c

set([3, 4, 5])
set([3, 4, 5])


### Difference

The difference operation, performed using **.difference()** or **-**, will create a new set with elements in the first set, but not in the second set.

In [39]:
a = set([1, 2, 3, 4, 5])
b = set([2, 3, 4, 5, 6])
print a.difference(b)
print a-b

set([1])
set([1])


Like unions and intersections, it can be performed on more than one set at a time.

In [40]:
a = set([1, 2, 3, 4, 5])
b = set([2, 3, 4, 5, 6])
c = set([3, 4, 5, 6, 7])
print a.difference(b, c)
print a-b-c

set([1])
set([1])


### Symmetric Difference

The symmetric difference operation, performed using **.symmetric_difference()** or **^**, will create a new set of elements in one set or the other and not in both. Unlike the other operations, this operation cannot cannot be performed on more than two sets at a time.

In [41]:
a = set([1, 2, 3, 4, 5])
b = set([3, 4, 5, 6, 7])
print a.symmetric_difference(b)
print a^b

set([1, 2, 6, 7])
set([1, 2, 6, 7])


### Other Operations

For a more complete list of set operations, read on here:
https://docs.python.org/2/library/stdtypes.html#set