# Sets

## Lesson Overview

The set is a data structure that is common in both computer science and mathematics. 

If you have encountered a [set in mathematics](https://en.wikipedia.org/wiki/Finite_set), a [set in computer science](https://en.wikipedia.org/wiki/Set_(abstract_data_type)) is the same concept.

### Definition

> A **set** is an unordered collection of unique objects.

Sets collect unique objects. A set has no native order and cannot contain duplicate objects.

For example, you could use a set to keep track of the players in a basketball team. A set is a good choice since there is no obvious ordering for the players, and a team cannot contain the same player more than once.

### Initialization

Sets are generally initialized in one of two ways.

In [None]:
# Option 1: Using curly brackets.
my_set = {0, 2, 4, 6, 8}
print(my_set)

# Option 2: Using the `set` keyword with an array input.
your_set = set([0, 2, 4, 6, 8])
print(your_set)

Be careful though: the *only* correct way to initialize an empty set is with the set operator. Assigning a variable to `{}` initializes an empty *map*, which is a different data structure that we will cover in a later lesson.

In [None]:
empty_set = set()

Sets typically contain only one type. Different languages vary on their different implementations of a set, and some allow different types within the same set. For this lesson, we will assume that a set can only contain one type.

### Properties

Let's look at some properties of sets. 

You can iterate through a set using `for`.

In [None]:
my_set = {0, 2, 4, 6, 8}
for i in my_set:
  print(i)

---

If you try to create a set from an array that has duplicate elements, the duplicate elements will be removed.

In [None]:
print({0, 0, 1, 1, 2, 2})
print(set([3, 4, 3, 4, 3, 4]))

---

Sets do not have an ordering.

In [None]:
ordered_set = {0, 1, 2, 3, 4}
jumbled_set = {1, 4, 3, 0, 2}

print(ordered_set == jumbled_set)

---

A consequence of sets not having an ordering is that elements cannot be accessed using indexing.

In [None]:
my_set = {0, 2, 4, 6, 8}
print(my_set[0]) # throws an error

### Methods

Let's look at some of the common set methods. 

Sets come with an `add` method, which adds a new element to a set *if it does not already exist in the set*.

In [None]:
my_set = {0, 2, 4, 6, 8}

# Add a new element to the set.
my_set.add(10)
print(my_set)

# Try to re-add an existing element to the set.
my_set.add(10)
print(my_set)

---

Similarly, sets come with a `remove` method, which removes an element if it exists in the set.

In [None]:
my_set = {0, 2, 4, 6, 8}

my_set.remove(8)
print(my_set)

---

If you try to remove an element that does not exist, Python will raise an error.

In [None]:
my_set = {0, 2, 4, 6, 8}

my_set.remove(10)
print(my_set)

---

An extension of the `add` method is `update`, which allows you to add multiple items to a set.

In [None]:
my_set = {0, 2, 4, 6, 8}

my_set.update({10, 12})
print(my_set)

---

If you try to `add` to or `update` a set with elements that are already contained, only the new elements will be added.

In [None]:
my_set = {0, 2, 4, 6, 8}

my_set.update([6, 8, 10, 12])
print(my_set)

### Note about exercises

> NOTE: The exercises in this lesson can be solved more efficiently using built-in Python functions. However, we encourage you to attempt the exercise using a low-level solution, that is, with iteration only and no built-in Python methods. Once you have solved the problems with the low-level solution, you can also implement the high-level solution. The high-level solutions will be described in the Solution sections.

The reasoning behind this is that many lower-level languages (e.g., C, C++, Java) do not have some of the built-in functionality that Python provides. It is useful to know the lower-level implementation of basic code constructions.

## Question 1

Which *one* of the following best defines a set?

**a)** An ordered collection of unique objects

**b)** An ordered collection of objects

**c)** An unordered collection of unique objects

**d)** An unordereed collection of objects

### Solution

The correct answer is **c)**.

**a)** A set has no ordering.

**b)** A set has no ordering; this is the definition of an array.

**d)** This is almost correct, but this definition could include duplicate objects.

## Question 2

Which of the following are correct ways to create a variable `s` equal to the set containing the elements 0, 1, and 2? There may be more than one correct response.

**a)** `s = {0, 1, 2}`

**b)** `s = [0, 1, 2]`

**c)** `s = {2, 0, 1}`

**d)** `s = set([0, 1, 2)])`

**e)** `s = (0, 1, 2)`

**f)** `s = set([0, 2, 1, 2, 1, 0, 0, 2])`

### Solution

The correct answers are **a)**, **c)**, **d)**, and **f)**.

**b)** This notation creates an array, not a set.

**e)** This notation creates a tuple, not a set.

## Question 3

Which of the following statements about the differences between an array and a set are true? There may be more than one correct response.

**a)** An array has an ordering while a set does not.

**b)** Both arrays and sets have indexing by which elements and slices can be accessed.

**c)** Both arrays and sets can be iterated over using a `for` or a `while` loop.

**d)** Since sets do not have an ordering, it is impossible to calculate the length of a set.

**e)** The equivalent of `append` for an array is `add` for a set.

### Solution

The correct answers are **a)**, **c)**, and **e)**.

**b)** Since sets do not have an ordering, elements in a set cannot be accessed by indexing.

**d)** The length of a set is the number of elements in it.

## Question 4

In which of the following use cases would a set be the most appropriate data structure to use? There may be more than one correct response.

**a)** A collection of all the unique words used in a book

**b)** A record of the ages (in years) of all the students in a class

**c)** A lookup table mapping each student in a school to an emergency contact number

**d)** Listing all of the numbers between 1 and 100

### Solution

The correct answers are **a)** and **d)**.

**b)** This can have duplicate entries if two students have the same age, so an array is likely the best choice.

**c)** Another data structure called a  *map*, which will be covered in a later lesson, is the most appropriate here.

## Question 5

Use iteration to write a `set_length` function that counts the number of items in a set.

In [None]:
def set_length(s):
  """Calculate the number of items in a set."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
my_set = {0, 1}
print(set_length(my_set))
# Should print: 2

my_set.add(2)
print(set_length(my_set))
# Should print: 3

### Solution

We iterate through the set, incrementing a counter by one at each iteration.

In [None]:
def set_length(s):
  """Calculate the number of items in a set."""
  count = 0
  for i in s:
    # Sets are already de-duplicated, so we don't need to check for duplicates.
    count += 1
  return count

We can also use Python's `len` function to check a set's length.

In [None]:
my_set = {0, 1}
len(my_set)

## Question 6

Use iteration to create a `contains` function, which checks whether a set contains an element.

In [None]:
def set_contains(set, element):
  """Checks if the set contains the given element."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
my_set = {0, 2, 4, 6, 8}

print(set_contains(my_set, 0))
# Should print: True

print(set_contains(my_set, 10))
# Should print: False

### Solution

We iterate through the set, and each time we check to see if the element matches the element we are looking for.

In [None]:
def set_contains(s, e):
  """Checks if the set s contains the given element e."""
  for i in s:
    if i == e:
      return True
  return False

In Python we can check if an element is in a set using `in`.

In [None]:
my_set = {0, 2, 4, 6, 8}
print(0 in my_set)
print(10 in my_set)

## Question 7

Use iteration to extend the `remove` method to a `remove_multiple` function, which removes all elements in a new set from an existing set. (This is the same extension as `update` is to `add`.)

In [None]:
def remove_multiple(original_set, set_to_remove):
  """Remove all elements in set_to_remove from original_set."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
set1 = {0, 1, 2, 3, 4}
set2 = {3, 4, 5, 6, 7}

print(remove_multiple(set1, set2))
# Should print: {0, 1, 2}

### Solution

We iterate through `set_to_remove` and remove each element from `original_set` *only if it exists* in `original_set`.

In [None]:
def remove_multiple(original_set, set_to_remove):
  """Remove all elements in set_to_remove from original_set."""
  for i in set_to_remove:
    # Since remove throws an error if the element is not in the set, we first
    # need to check if the element we want to remove exists in the set.
    if i in original_set:
      original_set.remove(i)
  return original_set

Again, Python has a built-in method to do this: `difference_update`. We will discuss "set difference" in more depth later in this lesson.

In [None]:
set1 = {0, 1, 2, 3, 4}
set2 = {3, 4, 5, 6, 7}
set1.difference_update(set2)
print(set1)

## Question 8

You and your roommate both have grocery shopping lists. Both of these shopping lists are stored in arrays of strings. Instead of going shopping and having to look at both lists, you want to consolidate the two lists into one. You don't care about duplicates, so you decide to take the `union` of both sets.

[Union](https://en.wikipedia.org/wiki/Union_(set_theory)) is a common set operator that takes elements of multiple sets and returns one set that contains the elements that appear in *any* of the input sets.

The union of sets $A$ and $B$ is denoted by $A \cup B$. For example, $\{1, 2, 3\} \cup \{2, 6, 5, 1\} = \{1, 2, 3, 5, 6\}$. (Note the ordering of all of these sets does not matter, since sets are unordered.)

Union can be extended from two sets to three sets as follows:

$$ A \cup B \cup C = (A \cup B) \cup C$$

The union operation can be extended to an arbitrary number of sets. In fact, the order in which we calculate set union doesn't matter; it is an [associate operation](https://en.wikipedia.org/wiki/Associative_property). Therefore, we can calculate the union of multiple sets by taking the union of any two sets, then union-ing the result with another set, then another set, etc., until we have unioned all sets.

Create a `union` function to take the union of two sets of strings. You may use any functions and methods in this lesson.

In [None]:
def union(set1, set2):
  """Return the set union of two sets."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
my_shopping_list = {
    "apples",
    "chocolate",
    "salt",
}

roommates_shopping_list = {
    "chocolate",
    "apples",
    "popcorn"
}

union(my_shopping_list, roommates_shopping_list)
# Should print: {"apples", "chocolate", "salt", "popcorn"}

### Solution

This is just rewriting a method as a function! We can use `update` to extend `set1` by `set2`. It is best practice to save this in a new set using `copy`, so we don't overwrite `set1` (in case we need to use `set1` later).

In [None]:
def union(set1, set2):
  """Return the set union of two sets."""
  output = set1.copy()
  output.update(set2)
  return output

Again, Python has built-in functionality for this, `set.union`.

In [None]:
my_shopping_list = {
    "apples",
    "chocolate",
    "salt",
}

roommates_shopping_list = {
    "chocolate",
    "apples",
    "popcorn"
}
set.union(my_shopping_list, roommates_shopping_list)

In fact, `set.union` allows us to calculate the union of as many sets as we need.

In [None]:
another_shopping_list = {
    "tequila",
    "salt",
    "lime",
}

set.union(my_shopping_list, roommates_shopping_list, another_shopping_list)

## Question 9

You and your friend are taking part in an eating competition. You are trying to eat at as many local restaurants as you can. You are each storing the restaurants where you've eaten in a set.

The two of you decide you want to see how many restaurants you have in common that you have both been to. Use iteration to write a function `number_in_common` that takes two restaurant sets and returns the number of restaurants common to both sets.

In [None]:
def number_in_common(set1, set2):
  """Calculates the number of items in common between two sets."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
my_restaurants = {
    "Say Cheez",
    "Lots o' Pasta",
    "Curry in a Hurry",
}

friends_restaurants = {
    "Curry in a Hurry",
    "Steiner's Diner",
    "Lots o' Pasta",
    "Loxs of Bagels",
}

number_in_common(my_restaurants, friends_restaurants)
# Should print: 2

### Solution

As with previous methods, there is no need to worry about duplicates! We can just loop through each possible combination and increment by 1 if the pair is a match.

In [None]:
def number_in_common(set1, set2):
  """Calculates the number of items in common between two sets."""
  output = 0

  for i in set1:
    for j in set2:
      if i == j:
        output += 1
  
  return output

## Question 10

Another fundamental set operator, along with union, is `intersect`. The [intersection](https://en.wikipedia.org/wiki/Intersection_(set_theory)) of multiple sets returns one set with all the elements that are common to all of the sets.

The intersection of sets $A$ and $B$ is denoted by $A \cap B$. For example, $\{1, 2, 3\} \cap \{2, 6, 5, 1\} = \{1, 2\}$.

Intersect has the same properties as set union; it is [associative](https://en.wikipedia.org/wiki/Associative_property) and can be applied to any number of sets.

Whether you know it or not, you have already done most of the work you need in order to write an `intersect` function. Edit your function in (2) to return a set of the restaurants in common, instead of just counting the restaurants.

In [None]:
def intersect(set1, set2):
  """Return the set intersection of two sets."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
my_restaurants = {
    "Say Cheez",
    "Lots o' Pasta",
    "Curry in a Hurry",
}

friends_restaurants = {
    "Curry in a Hurry",
    "Steiner's Diner",
    "Lots o' Pasta",
    "Loxs of Bagels",
}

intersection(my_restaurants, friends_restaurants)
# Should print: {"Lots o' Pasta", "Curry in a Hurry"}

### Solution

Instead of incrementing a count to indicate the number of elements the two sets have in common, we append to an `output` set.

In [None]:
def intersection(set1, set2):
  """Return the set intersection of two sets."""
  output = set()

  for i in set1:
    for j in set2:
      if i == j:
        output.add(i)

  return output

Similar to `union`, Python has an `intersection` method. This also works for any number of sets.

In [None]:
my_restaurants = {
    "Say Cheez",
    "Lots o' Pasta",
    "Curry in a Hurry",
}

friends_restaurants = {
    "Curry in a Hurry",
    "Steiner's Diner",
    "Lots o' Pasta",
    "Loxs of Bagels",
}

set.intersection(my_restaurants, friends_restaurants)

In [None]:
more_restaurants = {
    "Joey's Hoagies",
    "Curry in a Hurry",
    "Sherry's Deli",
    "Steiner's Diner",
    "Lots and LOTS o' Pasta",
}

set.intersection(my_restaurants, friends_restaurants, more_restaurants)

The `number_in_common` function we wrote in an earlier exercise can be simplified using the `len` and `intersection` functions. Calculating the `len` of the `intersection` of multiple sets is equivalent to calculating the number of elements in common.

In [None]:
print(len(set.intersection(my_restaurants, friends_restaurants)))
print(len(set.intersection(
    my_restaurants, friends_restaurants, more_restaurants)))

## Question 11

The last core set operation is the [set difference](https://en.wikipedia.org/wiki/Complement_(set_theory)), also known as the complement. This can be used to see which elements in a set are *not* in another set.

The set difference between $A$ and $B$ is denoted by $A \backslash B$. For example, $\{1, 2, 3\} \backslash \{2, 6, 5, 1\} = \{3\}$. Unlike union and intersection, set difference is not reciprocal. That is, $A \backslash B$ is not necessarily the same as $B \backslash A$. In this case, $B \backslash A = \{6, 5\}$.

We can use set difference to see which restaurants you visited that your friend did not, and which restaurants your friend visited that you did not.

Write a function called `difference` that takes two restaurant lists and returns the set difference. Remember you can use any previously mentioned methods and functions. (However, for now do not use the built-in Python functionalities that were mentioned only in solutions.)

In [None]:
def difference(set1, set2):
  """Calculate the set difference set1 \ set2."""
  # TODO(you): Implement
  print("This function has not been implemented.")

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
my_restaurants = {
    "Say Cheez",
    "Lots o' Pasta",
    "Curry in a Hurry",
}

friends_restaurants = {
    "Curry in a Hurry",
    "Steiner's Diner",
    "Lots o' Pasta",
    "Loxs of Bagels",
}

print(difference(my_restaurants, friends_restaurants))
# Should print: {"Say Cheez"}
print(difference(friends_restaurants, my_restaurants))
# Should print: {"Steiner's Diner", "Loxs of Bagels"}

### Solution

This function can be written with the `remove` method. We need to remove all the elements in the second list from the first list, if they exist. Remember we should make a `copy` of `set1` before tampering with it, so we don't mess with the original.

In [None]:
def difference(set1, set2):
  """Calculate the set difference between set1 and set2."""
  output = set1.copy()

  for i in set2:
    if i in set1:
      output.remove(i)
  
  return output

Again, this step can be done using the built-in method `difference`.

In [None]:
my_restaurants = {
    "Say Cheez",
    "Lots o' Pasta",
    "Curry in a Hurry",
}

friends_restaurants = {
    "Curry in a Hurry",
    "Steiner's Diner",
    "Lots o' Pasta",
    "Loxs of Bagels",
}

print(my_restaurants.difference(friends_restaurants))
print(friends_restaurants.difference(my_restaurants))

## Question 12

Your friend's method is checking whether `s` is a superset of `t`, but their code seems to be having some unexpected results. Can you find and fix the problems?

Your friend is taking a computer science course at school. They are trying to write a method that checks whether a given set is a [subset](https://en.wikipedia.org/wiki/Subset) of another set. They explain to you that $A$ is a subset of $B$, denoted $A \subset B$ if all the elements of $A$ are elements of $B$. If $A$ is a subset of $B$, then $B$ is a *superset* of $A$, denoted $B \supset A$.

In [None]:
def is_subset(s, t):
  """Returns True if s is a subset of t."""
  for i in s:
    # Check if the element is also an element of t.
    if i in t:
      return True
  return False

Your friend finds that the test compiles and runs, but the test returns `True` in cases where it should not.

### Solution

The problem with this method is that it returns too soon. As soon as it finds a single element of `s` that is also in `t`, it returns `True`, without looking at the other elements.

To fix this, instead of returning `True` as soon as we find an element of `s` in `t`, let's return `False` as soon as we find an element of `s` that is not in `t`. When we see this, we know `s` cannot be a subset of `t`. Once we have looped through all elements, if we haven't returned `False`, we can return `True`.

In [None]:
def is_subset(s, t):
  """Returns True if s is a subset of t."""
  for i in s:
    # Check if the element is not in t.
    if i not in t:
      return False
  return True

Again and as usual, Python provides us with a built-in way to check if `s` is a subset of `t`. We can just use `s.issubset(t)`.

## Question 13

Your friend is now covering a new concept: [mutual exclusivity](https://en.wikipedia.org/wiki/Mutual_exclusivity). Two sets are mutually exclusive, also called disjoint, if they have no elements in common.

Your friend is writing a function `is_disjoint` to check whether two sets of integers are disjoint. 

Their code is below, but it always returns `False`:

In [None]:
def is_disjoint(s, t):
  """Returns True if s and t are disjoint (mutually exclusive)."""
  for i in s:
    if i in s:
      return False
  return True

### Solution

This mistake is likely just a typo. Instead of `if i in s` the second time, you need to check `if i in t`.

In [None]:
def is_disjoint(s, t):
  """Returns True if s and t are disjoint (mutually exclusive)."""
  for i in s:
    if i in t:
      return False
  return True

This fix can also be made using the built-in Python method `s.isdisjoint(t)`.

## Question 14

Can you think of a way to write `is_disjoint` that does not involve any loops (or any special built-in Python functionality), and instead uses code from this lesson?

In [None]:
def is_disjoint(s, t):
  """Returns True if s and t are disjoint (mutually exclusive)."""
  # TODO(you): Implement

### Solution

Of course we can! Two sets are disjoint if they have no elements in common.

In [None]:
def is_disjoint(s, t):
  return len(set.intersection(s, t)) == 0