What is the dictionary data structure? A dictionary is a collection, but it is `not a sequence`. A dictionary is often referred to as a `map collection`, also sometimes as an associative array. You can think of a dictionary as a list of pairs, where the ﬁrst element of each pair is the key and the second element of each pair is the value. A dictionary is designed so that a search for a key, and subsequently its associated value, is very efﬁcient. The word map comes from this association: the key maps to the value.

All operations on dictionaries are built to work through keys. The data structure is called a dictionary because of its similarity to a real dictionary, such as Webster’s dictionary. Think of the keys as the words in Webster’s dictionary. It is easy to look up a word (the key) because of the organization of Webster’s dictionary (alphabetical). Once the key is found, the associated value, the deﬁnition, is clearly indicated. Note that the opposite search, a search for the the value (i.e., the deﬁnition), is not easily done. Imagine trying to ﬁnd a deﬁnition without the associated word (key) in Webster’s dictionary. It is only possible by examining every deﬁnition in the whole dictionary from beginning to end—not very efﬁcient! Dictionaries, both the data structure and Webster’s, are optimized to work with keys. However, the dictionary data structure and Webster’s are different in one important way: Webster’s key organization is based on an alphabetical (sequential) ordering of the keys. In a Python dictionary, keys are arranged to make searching go quickly, not necessarily sequentially. Furthermore, the arrangement is hidden from the user. As a result, you cannot print a dictionary collection and count on a particular order. As new key-value pairs are added, the dictionary is modiﬁed to make key searching efﬁcient for Python.

A list can be created by either the constructor `list` or a shortcut, the square brackets `[]`. Dictionaries also have a constructor, called `dict`, and have a shortcut, curly braces: . Like lists, curly braces are often used, although, as with lists, there are applications for the dict constructor. When designating the contents of a dictionary, the individual key-value pairs are separated by a colon `:`. Values may be anything, but **keys can only be immutable objects** such as `integers, strings, or tuples`. In this session, we create a contacts dictionary, display it, get a number for `bill` and add a contact for “barb.” Note the unordered nature of a dictionary when printed: it is not kept in any apparent order—not even the order of the key-value pair’s creation.

In [1]:
# create contacts
contacts = {'bill':'353-1234', 'rich':'269-1234', 'jane':'352-1234'}

contacts

{'bill': '353-1234', 'rich': '269-1234', 'jane': '352-1234'}

In [3]:
contacts['bill'] # get contact info for 'bill'

'353-1234'

In [4]:
contacts['bill'] = '271-1234'

In [5]:
contacts

{'bill': '271-1234', 'rich': '269-1234', 'jane': '352-1234'}

A list can be created by either the constructor `list` or a shortcut, the square brackets `[]`. Dictionaries also have a constructor, called `dict`, and have a shortcut, curly braces: . Like lists, curly braces are often used, although, as with lists, there are applications for the dict constructor. When designating the contents of a dictionary, the individual key-value pairs are separated by a colon `:`. Values may be anything, but **keys can only be immutable objects** such as `integers, strings, or tuples`. In this session, we create a contacts dictionary, display it, get a number for `bill` and add a contact for “barb.” Note the unordered nature of a dictionary when printed: it is not kept in any apparent order—not even the order of the key-value pair’s creation.

**Dictionary Indexing and Assignment**

Dictionaries are collections that respond to the index operator: `[]`. However, we do not use a sequence number (0 for the ﬁrst element, -1 for the last, etc.) as we did with other collections. Instead, we use the key as the index value.

`Dictionaries are Mutable`

Because we can do index assignment in a dictionary, a dictionary is our second example of a mutable data structure (the ﬁrst was a list). In addition to index assignment, dictionaries have a number of methods that change the dictionary in place. As with lists, there are potential consequences to passing a mutable object to functions. If a dictionary is modiﬁed in a function, it is modiﬁed in the calling program. You will see examples of mutable methods later in this chapter.

`Dictionaries with different Key Types`

The power of dictionaries is a combination of its indexing speed and the fact that the value can be any data structure, even another dictionary. Furthermore, different types of keys can be used in the same dictionary as long as they are immutable types. In the next session, we create a dictionary with keys that are `ints, tuples, strs`. Of particular note is that the values are `lists, ints, and dicts`. We then show access using each key, followed at the end by accessing an element of the dictionary value contained in our dictionary. As with previous chained expressions, they proceed from left to right.

```python
some_dict = {}
another_dict = dict()
```

In [6]:
demo = {2:['a', 'b', 'c'], (2,4):27, 'x':{1:2.5, 'a':3}}
demo

{2: ['a', 'b', 'c'], (2, 4): 27, 'x': {1: 2.5, 'a': 3}}

In [7]:
demo[2]

['a', 'b', 'c']

In [8]:
demo[(2,4)]

27

In [9]:
demo['x']

{1: 2.5, 'a': 3}

**Operators**

* `[]`: indexing using the key as the index value
* `len()`: the “length” is the number of key-value pairs in the dictionary
* `in`: Boolean test of membership; is the key in the dictionary (not the value)?
* `for`: iteration through keys in the dictionary

Things to note, especially given the focus of dictionary operations on keys:

* The length operator measures the number of key-value pairs (or, if you prefer, the number of keys).
* The membership operation is an operation on keys, not values.
* Iteration yields the keys in the dictionary.


In [10]:
my_dict = {'a':2, 3:['x', 'y'], 'joe':'smith'}
my_dict

{'a': 2, 3: ['x', 'y'], 'joe': 'smith'}

In [11]:
my_dict['a']

2

In [12]:
len(my_dict)

3

In [13]:
'a' in my_dict

True

In [14]:
2 in my_dict

False

In [15]:
for key in my_dict:
    print(key)

a
3
joe


In [16]:
for key in my_dict:
    print(key, my_dict[key])

a 2
3 ['x', 'y']
joe smith


**Dictionary Methods**

Given the default behavior of dictionary iteration on keys, Python provides a number of other methods that allow the programmer to iterate through other dictionary elements:

* `items()`: all the key-value pairs as a list of tuples
* `keys()`: all the keys as a list
* `values()`: all the values as a list


In [17]:
my_dict = {'a':2, 3:['x', 'y'], 'joe':'smith'}

for key, value in my_dict.items():
    print('Key: {:<7}, Value: {}'.format(key, value))

Key: a      , Value: 2
Key: 3      , Value: ['x', 'y']
Key: joe    , Value: smith


In [18]:
for key in my_dict.keys():
    print(key)

a
3
joe


In [19]:
dict_value_view = my_dict.values()
dict_value_view

dict_values([2, ['x', 'y'], 'smith'])

In [21]:
type(dict_value_view) # view type

dict_values

In [22]:
for val in dict_value_view:
    print(val)

2
['x', 'y']
smith


In [23]:
my_dict['new_key'] = 'new_value'

In [25]:
dict_value_view # view updated

dict_values([2, ['x', 'y'], 'smith', 'new_value'])

These three methods allow us to iterate through the dictionary, yielding keys-value pairs (each as a separate item), only keys, or only values. As with ranges, if you type an invocation of one of those methods you get an “odd” type back. Not a list, but one of `dict_values`, `dict_keys`. Python calls these types view objects. View objects have a couple of interesting properties.

* A view object is iterable. Thus we can use them in a for loop.
* Though the order of keys and values in a dictionary cannot be determined, views of keys and values will correspond. That is, whatever the order of the key view is, the value view will have the same order (the elements of the key view and the value view match as found in the dictionary).
* View objects are dynamic. Once assigned, if the dictionary is updated in some way, the view object reﬂects that update.

Because the items iterator generates tuples, we can assign two values for every iteration of the `for` loop: the ﬁrst element is the key of the pair and the second is the value (similar to enumerate).
Dictionaries have a copy method. The copy method makes a shallow copy of the dictionary values. This means that keys are copied properly (as they must be immutable), but if the values are mutable, problems such as we have seen before can arise. See the following session.

In [26]:
my_dict = {'a':2, 3:['x', 'y'], 'joe':'smith'}

new_dict = my_dict.copy() # shallow copy
new_dict['a'] = 'new_value'
my_dict

{'a': 2, 3: ['x', 'y'], 'joe': 'smith'}

In [28]:
new_dict # my_dict unchanged a mutable list

{'a': 'new_value', 3: ['x', 'y'], 'joe': 'smith'}

In [29]:
a_value = new_dict[3]
a_value

['x', 'y']

In [30]:
a_value[0] = 'new_element' # update list copy changed
new_dict

{'a': 'new_value', 3: ['new_element', 'y'], 'joe': 'smith'}

In [31]:
my_dict # original changed

{'a': 2, 3: ['new_element', 'y'], 'joe': 'smith'}

Count frequency of the words.

```python

word_list = 'to be or not to be'.split(' ')

count_dict = {}
for word in word_list:
    if word in count_dict:
        count_dict[word] += 1
    else:
        count_dict[word] = 1
```

```python
count_dict = {}
for word in word_list:
    try:
        count_dict[word] += 1
    except:
        count_dict[word] = 1
```

```python
count_dict = {}
for word in word_list:
    count_dict[word] = count_dict.get(word, 0) + 1
```

# `Sets`

**History**

Sets were invented by Georg Cantor, a German mathematician, in the late 1800s. Though not old by the standards of math, sets have become an integral part of mathematical theory and have revolutionized many aspects of mathematics. In spite of their power, the main concepts are so simple that they can be taught at an early age.

A `set` is a collection of objects, regardless of the objects’ types. These are the `elements` or `members` of the set. Only **one copy** of any element may exist in the set—a useful characteristic. There is no order to the elements in the set, thus it is, like a dictionary, not a sequence. A set with no elements is the “empty set,” also known as the “null set.” A set is an iterable, as are all the collections we have seen.

A set is created by calling the set constructor or using curly braces and commas. The use of curly braces as a way to construct a set can be a bit confusing, as it looks much like the way to construct a dictionary. How to tell them apart? When making a dictionary, the elements are of the form `key:value`, where the `colon(:)` separates the key from the value. In a set, there is only a list of comma-separated elements. That is how you may tell the two data structure constructors apart, by the form of their elements.

Furthermore, since empty curly braces are used to create an empty dictionary, you must use the set constructor to specify an empty set: `set()`. 

In [32]:
null_set = set() # set() creates the empty set
null_set

set()

In [33]:
a_set = {1, 2, 3, 4} # no colons means set
a_set

{1, 2, 3, 4}

In [34]:
b_set = {1, 1, 2, 2, 2} # duplicates are ignored
b_set

{1, 2}

In [35]:
c_set = {'a', 1, 2.5, (5, 6)} # different types is OK
c_set

{(5, 6), 1, 2.5, 'a'}

In [36]:
a_set = set("abcd") # set constructed from iterable
a_set

{'a', 'b', 'c', 'd'}

**Python Sets Are Mutable**


Like lists and dictionaries, sets are mutable data structures. Though index assignment is not possible on a set (it isn’t a sequence), various methods of the set (such as add or remove) change the elements of a set.

**Methods, Operators and Functions for Python Sets**

* `len()` Like all collections, you can determine the number of elements in a set using the len function.
* `in` Is an element in the set? The in operator tests membership and returns a Boolean True or False depending on whether the element is or is not a member of the set.
* `for` Like all collections, you can iterate through the elements of a set using the for statement. The order of the iteration through the objects is not known, as sets have no order.


In [37]:
my_set = {'a', 'c', 'b', 1, 4, 2}

len(my_set)

6

In [38]:
'a' in my_set

True

In [39]:
'z' in my_set

False

In [40]:
for element in my_set:
    print(element, end=' ')

1 2 4 a b c 

**Set Methods**

Python implements the typical mathematical set operations. There are two ways to call each of these operations: using a method or using a binary operator. The results of either approach are the same, though there is some difference in the way they are used. We note in the explanation of each operation that some are commutative and some are not—that is, the order of the operation may or may not matter. The binary operators for set operations are `&, |, -, ˆ, <=, >=`. Each binary operator takes two sets with  an  intervening operator, such as a set & b set. The methods available are `intersection, union, difference and symmetric difference, issubset, issuperset`. For these methods, a set calls the method (using the dot notation) with another collection as the argument. One difference between the binary operators and methods are that the methods approach allows the argument to be any iterable collection. The binary operators require both arguments to be sets. Readability is an issue as well. The methods approach makes it clear what operation is being performed, though the method names are rather long. The binary operator approach is short but can be difﬁcult to read if you are not familiar with the meaning of the binary operator symbols. We will tend to use the methods approach because of the clarity of the method names.


`Intersection`

Intersection is done using the `&` operator or the intersection method. This operation creates a new set of the elements that are common to both sets.

In [41]:
a_set = {1, 2, 3, 4}
b_set = {2, 3, 5, 6}

{2, 3}

In [43]:
a_set & b_set

{2, 3}

In [42]:
b_set & a_set

{2, 3}

In [45]:
a_set.intersection(b_set) # method approach

{2, 3}

`Union`

Union is done using the `|` operator or the union method. Union creates a new set that contains all the elements in both sets.

In [47]:
a_set | b_set # union of all elements

{1, 2, 3, 4, 5, 6}

In [48]:
b_set | a_set # commutative order doesn't matter

{1, 2, 3, 4, 5, 6}

In [49]:
a_set.union(b_set) # method approach

{1, 2, 3, 4, 5, 6}

In [50]:
a_set.union([3, 4, 5, 1])

{1, 2, 3, 4, 5}

`Difference`

Difference is done using the - operator or the difference method. Difference creates a new set whose elements are in the ﬁrst (calling) set and **not** in the second (argument) set. Unlike the other set operators, the difference operator is not `commutative`.

In [51]:
a_set = {'a', 'b', 'c', 'd'}
b_set = {'c', 'd', 'e', 'f'}

In [52]:
a_set - b_set # elements of a_set that are not in b_set

{'a', 'b'}

In [55]:
b_set - a_set # order matthers!

{'e', 'f'}

In [56]:
a_set.difference(b_set) # method approach

{'a', 'b'}

In [57]:
a_set.difference('cdef') # string interable as an argument

{'a', 'b'}

**Symmetric Difference**

This symmetric difference operation might be new to you. Essentially, symmetric difference is the opposite of intersection. It creates a new set of values that are `different`,  not in either of the two sets. The symmetric difference operator is `ˆ` and the method is `symmetric_difference`. The order of the sets does not matter.

In [59]:
b_set ^ a_set # unique elements in the sets

{'a', 'b', 'e', 'f'}

In [61]:
a_set ^ b_set # order doesn't matter

{'a', 'b', 'e', 'f'}

In [62]:
a_set.symmetric_difference(b_set) # method approach

{'a', 'b', 'e', 'f'}

**Subset and Superset**

The concept of `subset` and `superset` should be familiar. A set is a subset of another set only if every element of the ﬁrst set is an element of the second set. Superset is the reversed concept: set A is a superset of set B only if set B is a subset of set A. Clearly, the order of set operation matters in these operations; i.e., it is not commutative. A set is both a subset and a superset of itself. The subset operator is `<=` and the superset operator is `>=`. The method names are `issubset` and `issuperset`. All four of these operations return a `Boolean` value.

In [63]:
small_set = {'a', 'b', 'c'}
big_set = set('abcdef')

In [64]:
small_set <= big_set # subset

True

In [65]:
big_set >= small_set # superset

True

In [66]:
big_set <= big_set # set is a sub of itself

True

In [67]:
small_set.issubset('abcdef') # string iteratble as argument

True

**Other Set Methods**

* `add(element)` Adds the element to the set. There is no effect if the element is already in the set (remember, only one copy of an element in a set). It modiﬁes the set, so there is no return value.
* `clear()` Removes all the elements of the set (making it empty).
* `remove(element)` and `discard(element)` Both methods remove the element if it exists. The difference is that remove will cause an error if the element being removed is not part of the set. In contrast, discard will not give an error even if the argument being removed does not exist in the set. There is no value returned.
* `copy()` Returns a shallow copy of the set.


In [77]:
my_set = set('haleluaa')
my_set

{'a', 'e', 'h', 'l', 'u'}

In [74]:
my_set.add('d')
my_set

{'a', 'd', 'e', 'h', 'l', 'u'}

In [78]:
my_set.remove('a')
my_set

{'e', 'h', 'l', 'u'}

In [79]:
copy_set = my_set.copy()
copy_set

{'e', 'h', 'l', 'u'}

In [80]:
my_set.clear()
my_set

set()

**Using `zip` to create dictionaries**

An interesting and very useful operator is `zip`, which creates pairs from two **parallel sequences**. The zip operator works like a zipper to merge multiple sequences into a list of tuples. It is not special to dictionaries, but when combined with the dict constructor, it provides a useful way to create dictionaries from sequences.

In [81]:
keys = ['red', 'white', 'blue']
values = [100, 300, 500]

d = dict(zip(keys, values))
d

{'red': 100, 'white': 300, 'blue': 500}

**Dictionary and Set `comprehension`**

```python
{expression for-clause condition}
```

In [82]:
a_dict = {k:v for k,v in enumerate('abcdefg')}
a_dict

{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g'}

In [84]:
b_dict = {v:k for k,v in a_dict.items()} # reverse key-value pairs
b_dict

{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}

In [85]:
sorted(b_dict)

['a', 'b', 'c', 'd', 'e', 'f', 'g']

In [86]:
b_list = [(v,k) for v,k in b_dict.items()]
sorted(b_list)

[('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4), ('f', 5), ('g', 6)]

In [87]:
a_set = {ch for ch in 'to be or not to be'}
a_set 

{' ', 'b', 'e', 'n', 'o', 'r', 't'}

In [88]:
sorted(a_set)

[' ', 'b', 'e', 'n', 'o', 'r', 't']