## Lecture 13

The objectives of this lecture are to:

1. Creation and manipulation of collections (continued).
2. Using the `in` operator.
3. Multiple assignment using collections.

# Creation and manipulation of collections (continued)

In previous lectures you learned about one of the many types of collections of objects that Python provides: strings and lists. In this lecture we will learn about the other collections in Python: sets, tuples, and dictionaries.

Before we learn how to create and manipulate them, let's learn their general collection attributes: mutability and ordering.

 <b><u>Features of Python Collections</u></b>
<table>
    <tr>
        <td><b>Collection</b></td>
        <td><b>Mutable?</b></td>
        <td><b>Ordered?</b></td>
        <td><b>Use When...</b></td>
    </tr>
    <tr>
        <td>str</td>
        <td>No</td>
        <td>Yes</td>
        <td>You want to keep track of text</td>
    </tr>
    <tr>
        <td>list</td>
        <td>Yes</td>
        <td>Yes</td>
        <td>You want to keep track of an ordered sequence that you want to update</td>
   </tr>
   <tr>
       <td>tuple</td>
       <td>No</td>
       <td>Yes</td>
       <td>You want to build an ordered sequence that you know won't change or that you want to use as a key in a dictionary or as a value in a set</td>
    </tr>
    <tr>
        <td>set</td>
        <td>Yes</td>
        <td>No</td>
        <td>You want to keep track of values, but order
doesn’t matter, and you don’t want to keep
duplicates. The values must be immutable.</td>
    </tr>
    <tr>
        <td>dictionary</td>
        <td>Yes</td>
        <td>No</td>
        <td>You want to keep a mapping of keys to values.
The keys must be immutable.</td>
    </tr>
    </table>
   

Each collection has provides different approaches to accessing and manipulating the objects stored within them. We go through examples of the creation and usage of each of the collections we have not already learned about: sets, tuples, and dictionaries.

### Sets -- an unordered collection of distinct objects

The `set` collection type very closely matches the attributes of a mathematical *set*. It is a mutable unordered collection of *distinct* objects. The syntax for creating a `set` is similar to that of a `list` but uses braces (`{ }`) instead of brackets (`[ ]`),

In [None]:
# create a set of strings
vowels = {'a', 'e', 'i', 'o', 'u'}

print(vowels)

Note that when we printed the `set` object the order was not the same as when it was initialized, the order of the objects used to initialize it or when it is displayed is arbitrary.

What happens if we repeat the value of an object in the set initialization, 

In [None]:
vowels = {'a', 'e', 'a', 'a', 'i', 'o', 'u', 'u'}

print(vowels)

The Python interpreter did some processing during the initialization of the `set` in order to maintain the constraint that the set objects are distinct. This was done in the initializer method of the `set` class, but because it is a built-in or base class Python provides special syntax for creating an object of this type.

If we use the initializer directly, we must pass a single argument that is a collection of objects. We only know about two other collections of objects (a string is a collection of characters), so let's try them out,

In [None]:
# use the initializer implicitly with a list of objects
vowels1 = set(['a', 'i', 'e', 'u', 'o'])

print(vowels)

In [None]:
# use the initializer implicitly with another set of objects
vowels2 = set({'a', 'i', 'e', 'u', 'o'})

print(vowels)

In [None]:
# we can also create a set from a `range`, just as we did with
# lists
integers = set(range(1,10))

print(integers)

The variables refer to different `set` objects, they are not aliases,

In [None]:
print(id(vowels1), id(vowels2))

# create and alias of vowels1
vowels = vowels1
print(id(vowels1), id(vowels))

There is one minor difference between `set` and `list` syntax, in order to create an empty set one *must* use the initializer without and argument or with an empty collection,

In [None]:
# this will not initialize an empty set!
s = {}

type(s)

In [None]:
# equivalent expressions
s = set()
s = set([])
s = set(range(0))

print(type(s), s)

The reason for the first initialization statement not evaluating the the empty set is because the syntax for creating a `set` and a `dict` (dictionary) is equivalent for empty objects. The Python core developers decided to have it evaluate to an empty dictionary because that type is more commonly used. 

The `set` class has many methods that allow us to manipulate and compare sets.


<b><u>Set Operations</u></b>
<table>
    <tr>
        <td><b>Method</b></td>
        <td><b>Description</b></td>
    </tr>
    <tr>
        <td>S.add(v)</td>
        <td>Adds item v to a set S; this has no effect if v is already in S</td>
    </tr>
    <tr>
        <td>S.clear()</td>
        <td>Removes all items from set S</td>
    </tr>
    <tr>
        <td>S.difference(other)</td>
        <td>Returns a set with items that occur in set S but not
in set other</td>
   </tr>
   <tr>
       <td>S.intersection(other)</td>
       <td>Returns a set with items that occur both in sets S
and other</td>
   </tr>
   <tr>
       <td>S.issubset(other)</td>
       <td>Returns True if and only if all of set S s items are also
in set other</td>
   </tr>
   <tr>
       <td>S.issuperset(other)</td>
       <td>Returns True if and only if set S contains all of set
other ’s items</td>
   </tr>
   <tr>
       <td>S.remove(v)</td>
       <td>Removes item v from set S</td>
   </tr>
   <tr>
       <td>S.symmetric_difference(other)</td>
       <td>Returns a set with items that are in exactly one of
sets S and other ; any items that are in both sets are
not included in the result.</td>
   </tr>
   <tr>
       <td>S.union(other)</td>
       <td>Returns a set with items that are either in set S or
other (or in both)</td>
   </tr>
 </table>

These methods closely represent mathematical operations on sets. Let's explore of the methods that manipulate a set,

In [None]:
print(vowels)

# sometimes 'y' is considered a vowel
vowels.add('y')

print(vowels)

In [None]:
# set difference, intersection, and union all function as 
# defined mathematically
integers1 = {1, 2, 3, 4, 5}
integers2 = {4, 5, 6 ,7 ,8}

# equivalent expressions, method and operator
diff = integers1.difference(integers2)
diff = integers1 - integers2
print("Difference: ", diff)

# equivalent expressions, method and operator
inter = integers1.intersection(integers2)
inter = integers1 & integers2
print("Difference: ", inter)

# equivalent expressions, method and operator
union = integers1.union(integers2)
union = integers1 | integers2
print("Difference: ", union)

All set operators and their corresponding methods are,

<b><u>Set Operators</u></b>
<table>
    <tr>
        <td><b>Method Call</b></td>
        <td><b>Operator</b></td>
    </tr>
    <tr>
        <td>set1.difference(set2)</td>
        <td>set1 - set2</td>
    </tr>
    <tr>
        <td>set1.intersection(set2)</td>
        <td>set1 & set2</td>
    </tr>
    <tr>
        <td>set1.issubset(set2)</td>
        <td>set1 <= set2</td>
    </tr>
    <tr>
        <td>set1.issuperset(set2)</td>
        <td>set1 >= set2 </td>
    </tr>
    <tr>
        <td>set1.union(set2)</td>
        <td>set1 | set2</td>
    </tr>
    <tr>
        <td>set1.symmetric_difference(set2)</td>
        <td>set1 ^ set2</td>
    </tr>
 </table>

Finally, all of the looping *itemwise* looping syntax we learned for lists also works for sets,

In [None]:
for ch in vowels:
    print(ch)

### Tuples -- an ordered immutable collection of objects

Tuples are much like strings but instead of being restricted to characters tuples are collections of any objects. Like strings, they are immutable, once created the *references* to the objects in the tuple cannot be mutated. Otherwise they may be used just as strings and lists are -- indexed, sliced, and looped over.

A `tuple` object is initialized using either parenthesis `( )` or the class initializer `tuple()` (which requires another collection). There is no issue with using the `()` syntax to create an empty `tuple`, but to create a `tuple` with a single object does require the addition of a `,`.

In [None]:
# this will not result in the creation of a tuple, parenthesis
# are also used in arithmetic which takes precedence
tup = (1)

type(tup)

In [None]:
# instead we must add a comma at the end for tuples of 
# length one
tup = (1,)

type(tup)

Otherwise, tuples behave almost exactly the same as lists, except for them being immutable,

In [None]:
# creating a tuple of lists, a nested *collection*
life = (["Canada", 76.5], ['United States', 75.5], ['Mexico', 72.0])

# in order to correct the first entry, we cannot reassign
# the object reference
life[0] = ["Canada", 76.5]

In [None]:
# in this case, since the items themselves are mutable,
# we can change the value of the one of the sub-items
life[0][0] = "Norway"

print(life)

This example shows the main difference between a list and a tuple, but for a more exhaustive example see Section 11.2 in the textbook.

In [None]:
# another example of using a tuple, like a list, but unlike
# a set, we can use both itemwise and indexing to loop

for country in life:
    print(country)
    
for i in range(len(life)):
    print(life[i])

Thus when a tuple's items are not mutable, than both the references and values of the tuple are not mutable (as opposed to the previous case),

In [None]:
integers = tuple(range(10))
print(integers)

integers[0] = 1

### Dictionaries -- unordered mutable collections of mappings

A dictionary is an unordered mutable collection of objects and *distinct* keys which map to each object. This type of collection is frequently used for reasons that will become obvious shortly.

The syntax for creating a `dict` object is similar to that of a `set` in that braces `{ }` are used. Instead of commas separating each object, as in all other collections, commas separate key/object pairs,

`{key1 : object1, key2 : object2, key3 : object3, ...}`

A *key* is an object, but it must be immutable unlike the object that it maps to. For example, let's use a dictionary to store data about the number of observations of different birds somewhere in Canada,

In [None]:
birds = {'canada goose' : 3, 'northern fulmar': 1}

print(birds)

The memory map after creating the dictionary looks like this,


<img src='files/./images/lecture13/pg218.jpg'>

Manipulation of a dictionary is the same as with lists and tuples, except indices are no longer restricted to integers,

In [None]:
# indexing using an existing key will return the object mapped to it
print(birds["canada goose"])
print(birds["northern fulmar"])

In [None]:
# indexing to a key that is not present results in an error
print(birds["baltimore oriole"])

Once a `dict` is created, adding and updating *entries* (key/object pairs) can be done through assignment statements,

In [None]:
# this key doesn't exist, but the assignment statement adds it along with the value resulting from the RHS expression
birds["baltimore oriole"] = 0
        
print(birds)

In [None]:
# if the key does exist, the object it maps to is reassigned
birds["canada goose"] = 2

print(birds)

In [None]:
# to remove a key use the `del` operator
del(birds["baltimore oriole"])

print(birds)

Looping over dictionaries is slightly different than all other collections,

In [None]:
observations = {'canada goose': 183, 'long-tailed jaeger': 71, 'snow goose': 63, 'northern fulmar': 1}

# the `for` loop syntax assigns each key as the items to iterate over
for key in observations:
    print(key, end=" : ")
    
    # the only way to get access to the objects in the dictionary is to index
    # using the keys
    print(observations[key])

Remember that the dictionary is an *unordered* collection, you cannot rely on any particular order of the keys to be iterated over! A complete list of the `dict` classes methods are,


<b><u>Dictionary Methods</u></b>
<table>
    <tr>
        <td><b>Method</b></td>
        <td><b>Description</b></td>
    </tr>
    <tr>
        <td>D.clear()</td>
        <td>Removes all key/value pairs from dictionary D</td>
    </tr>
    <tr>
        <td>D.get(k)</td>
        <td>Returns the value associated with key k , or None if the key
isn’t present (Usually you’ll want to use D[k] instead.)</td>
    </tr>
    <tr>
        <td>D.get(k, v)</td>
        <td>Returns the value associated with key k , or a default value
v if the key isn’t present</td>
    </tr>
    <tr>
        <td>D.keys()</td>
        <td>Returns dictionary D ’s keys as a set-like object; entries are
guaranteed to be unique</td>
    </tr>
    <tr>
        <td>D.items()</td>
        <td>Returns dictionary D ’s (key, value) pairs as set-like objects</td>
    </tr>
    <tr>
        <td>D.pop(k)</td>
        <td>Removes key k from dictionary D and returns the value that
was associated with k —if k isn’t in D , an error is raised.</td>
    </tr>
    <tr>
        <td>D.pop(k, v)</td>
        <td>Removes key k from dictionary D and returns the value that
was associated with k ; if k isn’t in D , returns v</td>
    </tr>
    <tr>
        <td>D.setdefault(k)</td>
        <td>Returns the value associated with key k in D</td>
    </tr>
    <tr>
        <td>D.setdefault(k, v)</td>
        <td>Returns the value associated with key k in D; if k isn’t a key
in D , adds the key k with the value v to D and returns v</td>
    </tr>
    <tr>
        <td>D.values()</td>
        <td>Returns dictionary D ’s values as a set-like object—entries
may or may not be unique.</td>
    </tr>
    <tr>
        <td>D.update(other)</td>
        <td>Updates dictionary D with the contents of dictionary other ;
for each key in other , if it is also a key in D , replaces that key
in D ’s value with the value from other ; for each key in other , if
that key isn’t in D , adds that key/value pair to D</td>
    </tr>
    
</table>

Some examples of the usage of these methods are,

In [None]:
# naming dictionaries this way is good for code readability
scientist_to_birthdate = {'Newton' : 1642, 'Darwin' : 1809, 'Turing' : 1912}

# return a list of keys in the dictionary
scientist_to_birthdate.keys()

In [None]:
# return a list of values/objects in the dictionary
scientist_to_birthdate.values()

In [None]:
# return a list of key/values pairs as tuples in the dictionary
scientist_to_birthdate.items()

In [None]:
# add key/value pairs to the dictionary from another one
scientist_to_birthdate.update({'Curie' : 1867, 'Hopper' : 1906, 'Franklin' : 1920})

# remember, it mutates the `dict` object and thus returns None
print(scientist_to_birthdate)

The `dict.items()` method can be used for looping over the dictionary when both the key and object are needed,

In [None]:
for (scientist, birthdate) in scientist_to_birthdate.items():
    print(scientist, 'was born in', birthdate)

This involved syntax similar to what we saw with `enumerate()` -- this is called *multiple assignment* and we will learn about it thoroughly at the end of the lecture.


# Using the `in` operator with collections



Now that we have seen the breadth of Python built-in collection objects, the utility of the `in` operator should have new significance to you. We can us the in operator on all collections as follows:

1. `str` -- the `in` operator will evaluate to `True` if the LHS string is a sub-string of RHS string, else `False`.
2. `list`, `set`, `tuple` -- the `in` operator will evaluate to `True` if the LHS object is present in the RHS list, set, or tuple.
3. `dict` -- the `in` operator will evaluate to `True` if the LHS key (immutable object) is present in the RHS dictionary. The values in the dictionary *are not* considered.

We'll go through a few illustrative examples before moving on to the last section of the lecture.

In [None]:
odds = set([1, 3, 5, 7, 9])

9 in odds

In [None]:
8 in odds

In [None]:
odds_string = str(odds)

odds_string

In [None]:
'9' in odds_string

In [None]:
'8' in odds_string

In [None]:
bird_to_observations = {'canada goose': 183, 'long-tailed jaeger': 71, 'snow goose': 63, 'northern fulmar': 1}

# `in` searches for keys, not values
'snow goose' in bird_to_observations

In [None]:
# even though an `int` is a valid type to be a key, it is not present in the dictionary
# except for as a value, which is not considered by `in`
183 in bird_to_observations

# Multiple assignment using collections

This last section is a "catch-all" for a few convenient Python constructs. *Multiple assignment* is an assignment statement which uses collections on *both* sides of the assignment operator. Before I explain this, let's look at an example to clarify what I mean,

In [None]:
# multiple assignment is a compact form of an assignment statement which assigns values
# to multiple variables
(x, y) = (10, 20)

print(x, y)

In [None]:
# more complicated example
[[w, x], [[y], z]] = [{10, 20}, [(30,), 40]]

print(w, x, y, z)

If you have a collection, including a nested collection, this is a convenient way to extract the objects it contains for use later in your program. Here is an example of a nested list of tuples,

In [None]:
# this is a simple nested list of tuples where the index in the top-level
# list corresponds to the dimension of a Cartesian domain and the tuple
# of that index contains the domain boundaries
boundaries = [(0.0, 1.0), (0.0, 2.0), (-0.5, 0.5)]

# while the `boundaries` object is easily passed as an argument to a function
# using the values it contains through indexing is not very readable, instead...
((xmin, xmax), (ymin, ymax), (zmin, zmax)) = boundaries

print(xmin, xmax)
print(ymin, ymax)
print(zmin, zmax)

One related syntax is that tuples need not include the `( )` parenthesis, a collection of objects separated by commas is assumed by Python to be an initializer of a tuple,

In [None]:
x, y = 1, 2

print(x, y)

In [None]:
tup = 1,

type(tup)

My evaluation of this particular convenience syntax is that it actually decreases readability. Thus, I do not use it and recommend that you do not either, although I teach it so that you recognize and understand such syntax when it is used in others' code.

At this point you have an beginners-level understanding of programming (in Python). Writing programs is not as simple as understanding programming syntax, constructs, and concepts. One must have a story in mind before writing a book, the same is true for writing a program.

In the next lecture we will take a break from programming syntax and see examples of algorithm development. This is a key precondition to writing a program, given a problem you must develop a procedure for solving it, *then* write a program implementing this procedure.

# Exercises

***1.*** Write a function called find_dups that takes a list of integers as its input
argument and returns a set of those integers that occur two or more times
in the list.

***2.*** Python’s set objects have a method called pop that removes and returns
an arbitrary element from the set. If the set gerbils contains five cuddly
little animals, for example, calling gerbils.pop() five times will return those animals one by one, leaving the set empty at the end. Use this to write a function called mating_pairs that takes two equal-sized sets called males and
females as input and returns a set of pairs; each pair must be a tuple containing one male and one female. (The elements of males and females may be strings containing gerbil names or gerbil ID numbers—your function must work with both.)

***3.*** The PDB file format is often used to store information about molecules.
A PDB file may contain zero or more lines that begin with the word AUTHOR
(which may be in uppercase, lowercase, or mixed case), followed by spaces
or tabs, followed by the name of the person who created the file. Write a
function that takes a list of filenames as an input argument and returns
the set of all author names found in those files.

***4.*** The keys in a dictionary are guaranteed to be unique, but the values are
not. Write a function called count_values that takes a single dictionary as
an argument and returns the number of distinct values it contains. Given
the input {'red': 1, 'green': 1, 'blue': 2} , for example, it should return 2 .

***5.*** After doing a series of experiments, you have compiled a dictionary
showing the probability of detecting certain kinds of subatomic particles.
The particles’ names are the dictionary’s keys, and the probabilities are
the values: {'neutron': 0.55, 'proton': 0.21, 'meson': 0.03, 'muon': 0.07, 'neutrino': 0.14} .
Write a function that takes a single dictionary of this kind as input and
returns the particle that is least likely to be observed. Given the dictionary
shown earlier, for example, the function would return 'meson' .

***6.*** Write a function called count_duplicates that takes a dictionary as an argu-
ment and returns the number of values that appear two or more times.

***7.*** A balanced color is one whose red, green, and blue values add up to 1.0.
Write a function called is_balanced that takes a dictionary whose keys are
'R' , 'G' , and 'B' as input and returns True if they represent a balanced color.

***8.*** Write a function called dict_intersect that takes two dictionaries as arguments
and returns a dictionary that contains only the key/value pairs found in
both of the original dictionaries.

***9.*** Programmers sometimes use a dictionary of dictionaries as a simple
database. For example, to keep track of information about famous scien-
tists, you might have a dictionary where the keys are strings and the
values are dictionaries, like this:

In [None]:
{'jgoodall':{'surname':'Goodall',
             'forename':'Jane',
             'born':1934,
             'died':None,
             'notes':'Primate Researcher',
             'author':['In the shadows of man','The chimpanzees of Gombe']},
  'rfranklin':{'surname':'Franklin',
               'forename':'Rosalind',
               'born':1920,
               'died':1957,
               'notes':'contributed to the discovery of DNA'},
  'rcarson':{'surname':'Carson',
              'forename':'Rachel',
              'born':1907,
              'died':1964,
              'notes':'raised awareness about effects of DDT',
              'author':['Silent Spring']}}


Write a function called db_headings that returns the set of keys used in any
of the inner dictionaries. In this example, the function should return
set('author', 'forename', 'surname', 'notes', 'born', 'died') .

***10.*** Write another function called db_consistent that takes a dictionary of dictio-
naries in the format described in the previous question and returns True
if and only if every one of the inner dictionaries has exactly the same keys.
(This function would return False for the previous example, since Rosalind
Franklin’s entry doesn’t contain the 'author' key.)

**11.** A sparse vector is a vector whose entries are almost all zero, like [1, 0, 0, 0,
0, 0, 3, 0, 0, 0] . Storing all those zeros in a list wastes memory, so program-
mers often use dictionaries instead to keep track of just the nonzero
entries. For example, the vector shown earlier would be represented as
{0:1, 6:3} , because the vector it is meant to represent has the value 1 at
index 0 and the value 3 at index 6.

a. The sum of two vectors is just the element-wise sum of their elements.
For example, the sum of [1, 2, 3] and [4, 5, 6] is [5, 7, 9] . Write a function
called sparse_add that takes two sparse vectors stored as dictionaries
and returns a new dictionary representing their sum.

b.The dot product of two vectors is the sum of the products of corresponding elements. For example, the dot product of [1, 2, 3] and [4, 5,6] is 4+10+18 , or 32. Write another function called sparse_dot that calcu-
lates the dot product of two sparse vectors.

c.Your boss has asked you to write a function called sparse_len that will
return the length of a sparse vector (just as Python’s len returns the
length of a list). What do you need to ask her before you can start
writing it?