# Sets & Frozensets

## Programming Fundamentals (NB12)

### MIEIC/2019-20

#### João Correia Lopes

INESC TEC, FEUP

## Goals

By the end of this class, the student should be able to:

- Distinguish between mutable and immutable datatypes & describing alising
- Distinguish between ordered and unordered datatypes
- Use the main operations and methods available to work with sets
- Use immutable sets (frozensets)


## Bibliography

- Peter Wentworth, Jeffrey Elkner, Allen B. Downey, and Chris Meyers, *How to Think Like a Computer Scientist — Learning with Python 3* (Section 9.1, 9.2)

- Set and Frozensets: <https://www.python-course.eu/python3_sets_frozensets.php>


# Data types: Sets & frozensets

## 9.1 Mutable versus immutable and aliasing

### Mutable data types

- Some datatypes in Python are mutable
- This means their contents can be changed after they have been created
- Lists and dictionaries are good examples of mutable datatypes
```
>>> my_list = [2, 4, 5, 3, 6, 1]
>>> my_list[0] = 9
>>> my_list
[9, 4, 5, 3, 6, 1]
```

### Immutable data types

- Tuples and strings are examples of immutable datatypes
- their contents can not be changed after they have been created:

```
>>> my_tuple = (2, 5, 3, 1)
>>> my_tuple[0] = 9
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
```


### Aliasing

- Mutability is usually useful, but it may lead to something called **aliasing**
- In this case, two variables refer to the same object and mutating one will also change the other:

```
>>> list_one = [1, 2, 3, 4, 6]
>>> list_two = list_one
>>> id(list_one) == id(list_two)
True
```

### Cloning

You can escape this problem by making a copy of the list<sup>1</sup>
```
>>> list_one = [1, 2, 3, 4, 6]
>>> list_two = list_one[:]         # list_two = list_one.copy()
>>> id(list_one) == id(list_two)
False

>>> list_two[-1] = 5
>>> list_two
[1, 2, 3, 4, 5]
>>> list_one
[1, 2, 3, 4, 6]
```

<sup>1</sup> However, this will not work for nested lists; the module `copy` provides functions to solve this

## 9.2 Sets and frozensets

### Ordered & unordered datatypes (dict)

- Given that tuples and lists are ordered, and dictionaries are unordered (before 3.7), we can construct the following table:

|           | Ordered  | Unordered |
| ---------:|:--------:|:---------:|
| Mutable   | list     | dict?     |
| Immutable | tuple    | ?         |

- This reveals an empty spot: we don’t know any immutable, unordered datatypes yet
- Additionally, you can argue that a dictionary doesn’t belong in this table, since it is a *mapping type*: a dictionary maps keys to values

### Ordered & unordered datatypes (set)

- A **set** is an unordered, mutable datatype<sup>2</sub>
- a **frozenset** is an unordered, immutable datatype

|           | Ordered  | Unordered |
| ---------:|:--------:|:---------:|
| Mutable   | list     | set       |
| Immutable | tuple    | frozenset |

- Since sets and frozensets are unordered, they share some properties with dictionaries: 
  - for example, it’s elements are **unique**.

 <sup>2</sup> **The set itself is mutable, but the set elements are immutable**.

### Sets

- A set contains an unordered collection of **unique** and **immutable** objects
- Sets unlike lists or tuples can't have multiple occurrences of the same element
- Any immutable data type can be an element of a set: 
  - a number, a string, a tuple
- Mutable (changeable) data types cannot be elements of the set
  - list cannot be an element of a set (but tuple can)
  - another set cannot be an element of a set
- The requirement of immutability follows from the way how computers represent sets in memory

### Create a set using the `set` function

- we can call the built-in `set()` function with a sequence or another iterable object
```
>>> A = set('qwerty')
>>> print(A)
{'e', 'r', 'q', 't', 'y', 'w'}
```

### Create a set using the compact notation

- we can use the compact notation
```
>>> A = {1, 2, 3}
>>> B = {3, 2, 3, 1}
>>> print(A == B)
True
>>> print(B)
{1, 2, 3}
```

$\Rightarrow$
<https://github.com/fpro-feup/public/tree/master/lectures/12/sets.py>

Try some examples:

In [0]:
x = set("A Python Tutorial")
print(x)
print(type(x))

In [0]:
x = set(["Perl", "Python", "Java"])
print(x)

In [0]:
cities = {"Paris", "Madrid", "London", "Berlin", "Paris", "London"}
print(cities)

Immutable objects

In [0]:
cities = set((("Python","Perl"), ("Paris", "Berlin", "London")))
print(cities)

### Frozensets

- Frozensets are like sets except that they cannot be changed, i.e. they are **immutable**:

```
>>> cities = frozenset(["Frankfurt", "Basel", "Freiburg"])
>>> cities.add("Strasbourg")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
```

- Frozensets are *hashable* and we can use them as dictionary keys

$\Rightarrow$
<https://github.com/fpro-feup/public/tree/master/lectures/12/frozensets.py>

Try the same with sets:

In [0]:
cities = set(["Frankfurt", "Basel","Freiburg"])
cities.add("Strasbourg")
print(cities)

Let's create two frozensets, X and Y:

In [0]:
X = frozenset([1, 2, 3, 4, 5, 6])
Y = frozenset([4, 5, 6, 7, 8, 9])

print(X)
print(Y)

## Accessing Set Elements 

### Using `for` and `in`

- Python does not provide us with a way of accessing an individual set item
- We can use a `for` loop to iterate through all the items of a set

```
>>> months = set(["Sep", "Oct", "Nov", "Dec", "Jan"])
>>> for m in months:
...     print(m)
... 
Jan
Nov
Dec
Oct
Sep
```

Use `in` to check for the presence of an element:

In [0]:
months = set(["Sep", "Oct", "Nov", "Dec", "Jan"])
print("Jan" in months)
print("Fev" in months)

## Adding Items to a Set

### Using `add()`

- Python allows us to add new items to a set via the `add()` method

```
>>> N = {1, 2, 3}
>>> N.add(4)
>>> print(N)
{1, 2, 3, 4}
>>> 
>>> N.add(4)
>>> print(N)
{1, 2, 3, 4}
```

In [0]:
months = set(["Sep", "Oct", "Nov", "Dec", "Jan"])
months.add("Fev")
print("Fev" in months)

## Removing Items from a Set

### Using `discard()` and `remove()`

- Python allows us to remove an item from a set, but not using an index as **set elements are not indexed**
- The items can be removed using either the `discard()` or `remove()` methods

```
>>> N = {1, 2, 3, 4, 5, 6}
>>> N.discard(3)
>>> print(N)
{1, 2, 4, 5, 6}
```
```
>>> N.remove(2)
>>> print(N)
{1, 4, 5, 6}
```


So, what is the difference between `discard()` and `remoce()`?

In [0]:
months = set(["Sep", "Oct", "Nov", "Dec", "Jan"])
months.discard("Fev")
print(months)

In [0]:
months.remove("Fev")
print(months)

### Using `pop()`

- With the `pop()` method, we can remove and return an element
- Since the elements are unordered, we cannot tell or predict the item that will be removed

```
>>> N = {1, 2, 3, 4, 5, 6}
>>> print(N.pop())
1
>>> print(N)
{2, 3, 4, 5, 6}
```


For example:

In [0]:
months = set(["Sep", "Oct", "Nov", "Dec", "Jan"])
print(months.pop())
print(months.pop())
print(months.pop())

### Using clear()

- The Python's `clear()` method helps us remove all elements from a set

```
>>> N = {1, 2, 3, 4, 5, 6}
>>> N.clear()
>>> print(N)
set()
```

## Set alias and cloning

### Aliasing

- The assignment just creates a alias, i.e. another name, to the same data structure

```
>>> N1 = {1, 2, 3, 4, 5, 6}
>>> N2 = N1
>>> id(N1) == id(N2)
True

>>> N1.clear()
>>> print(N2)
set()
```



### Copy

- The method copy() creates a *shallow copy*, which is returned

```
>>> N1 = {1, 2, 3, 4, 5, 6}
>>> N2 = N1.copy()
>>> id(N1) == id(N2)
False

>>> N1.clear()
>>> print(N2)
{1, 2, 3, 4, 5, 6}
```

## Set operations

### Set operations

- The set operations are well known from mathematics
  - Union, intersection, difference

### The joy of sets

- Suppose you have three lists with integers and want the list of the elements present on all three

```
# Elements in List1
list1 = [1, 5, 10, 20, 40, 80, 100]

# Elements in List2
list2 = [6, 7, 20, 80, 100]

# Elements in List3
list3 = [3, 4, 15, 20, 30, 70, 80, 120]
```

$\Rightarrow$
<https://github.com/fpro-feup/public/tree/master/lectures/12/intersect.py>


Set union:

In [0]:
X = {"a","b","c","d","e"}
Y = {"c","d","e","f","g"}
print(X.union(Y))
print(X | Y)

Set intersection:

In [0]:
X = {"a","b","c","d","e"}
Y = {"c","d","e","f","g"}
print(X.intersection(Y))
print(X & Y)

Set difference:

In [0]:
X = {"a","b","c","d","e"}
Y = {"c","d","e","f","g"}
print(X.difference(Y))
print(X - Y)

### The joy of sets (2)

- Suppose you have two lists with integers and want to find the missing and additional elements when comparing the lists

```
list1 = [1, 2, 3, 4, 5, 6]
list2 = [4, 5, 6, 7, 8]
```

![difference](images/12/difference.png)

Source: geeksforgeeks

$\Rightarrow$
<https://github.com/fpro-feup/public/tree/master/lectures/12/difference.py>

### Set boolean methods

The `isdisjoint()` method returns `True` if two sets have a null intersection

In [0]:
X = {"a","b","c","d","e"}
Y = {1,2,3}
print(X.isdisjoint(Y))

The `x.issubset(y)` method returns `True`, if x is a subset of y. "<=" is an abbreviation for "Subset of" and ">=" for "superset of". "<" is used to check if a set is a proper subset of a set.

In [0]:
X = {"a","b","c","d","e"}
Y = {"a","b","c"}
print(X.issubset(Y))
print(Y.issubset(X))
print(Y <= X)

`x.issuperset(y)` returns `True`, if x is a superset of y. ">=" is an abbreviation for "issuperset of". ">" is used to check if a set is a proper superset of a set.

In [0]:
X = {"a","b","c","d","e"}
Y = {"c","d"}
print(X.issuperset(Y))
print(X > Y)
print(X >= Y)
print(X > X)
print(X >= X)

### The joy of sets (3)

- A **pangram** is a sentence containing every letter in the English Alphabet

- Lowercase and Uppercase are considered the same

- Given a string, check if the given string is pangram or not
  - source: https://www.geeksforgeeks.org/python-set-check-string-panagram/

- A normal way would have been to use frequency table and check if all elements were present or not

- But using `import ascii_lowercase` we import all the lower  characters in set and all characters of string in another set.

> In the function, two sets are formed:
- one for all lower case letters and one for the letters in the string
- The two sets are subtracted and if it is an empty set, the string is a pangram

$\Rightarrow$
<https://github.com/fpro-feup/public/tree/master/lectures/12/is_pangram.py>

# More datatypes

### Collections module

- More exotic data types (queues, stacks and ordered dictionaries) are provided in Python’s `collections` module. 
- You can find the documentation in the [Python Standadrd Library](https://docs.python.org/3.7/library/collections.html)

# Mutable vs Immutable Objects in Python

> Everything in Python is an object and objects in Python can be either mutable or immutable

- Since everything in Python is an Object, every variable holds an object instance
- When an object is initiated, it is assigned a unique object id
- Its type is defined at runtime and once set can never change, however its state can be changed if it is mutable
- Simple put, a mutable object can be changed after it is created, and an immutable object can’t

> Objects of built-in types like (int, float, bool, str, tuple, unicode) are immutable. Objects of built-in types like (list, set, dict) are mutable

$\Rightarrow$
[towardsdatascience](
https://towardsdatascience.com/https-towardsdatascience-com-python-basics-mutable-vs-immutable-objects-829a0cb1530a)

# Ticket to leave

## Moodle activity

[LE12: Dictionaries](https://moodle.up.pt/mod/quiz/view.php?id=39244)


$\Rightarrow$ 
[Go back to the Table of Contents](00-contents.ipynb)

$\Rightarrow$ 
[Read the Preface](00-preface.ipynb)