# Collections and Iterators

2018-09-13  
Kevin Bonham, PhD

## Outline

- Collections of data
- Lists (ordered collections)
- Dictionaries (referenced collection)
- Sets (unordered collection of unique values)
- Iterators and loops

## Learning Objectives

After completing this lesson, you should be able to:

1. Explain the differences between lists, dictionaries and sets.
2. Identify the best collection type for a particular application.
3. Iterate over a collection and modify the contents.
4. Write a function that modifies a list.


## Review of scalars

Integers and floats represent numbers.

In [None]:
integer  = 2 + 2

In [None]:
flt = 26.7 * 0.15

### Strings are sequences of characters

In [1]:
st = 'How' + ' ' + 'now' + ' ' + 'brown' + ' ' + 'cow' + '?'
st

'How now brown cow?'

Actually, strings can be thought of as a collection of characters...

## Collections are groups of related objects

- The contents of collections can be scalars of different types, or even other collections
- The best type of collection depends on the application
    - Does the content have an order?
    - How do you plan to access the content?
- "Data Structures" represent an entire CS course... there are many other collection types than what we'll discuss here


### Common features of collections

- they may contain 0 or more entries (they can be empty)
    - entries can be scalars or other collections
- they are iterable (`for` loops touch each entry exactly once)
- they are mutable (you can add, remove, or modify entries)
- they are (often) indexable (you can access individual entries with a unique "index")


### Indexing a string

In [2]:
st

'How now brown cow?'

In [3]:
st[4]

'n'

In [4]:
st[4:13]

'now brown'

## Lists (vectors) are ordered collections

In [5]:
my_list = [1, 'z', 2.4, 42, 1]
my_list

[1, 'z', 2.4, 42, 1]

### Lists can have zero or more elements

In [13]:
my_list = [] # this is an "empty" list

Add new elements to the end of a list with `.append()`

In [14]:
my_list.append(10)
my_list

[10]

In [15]:
my_list.append(20)
my_list

[10, 20]

or with `.extend()`

In [16]:
my_list.extend([30, 40, 50]) # equivalent to `my_list + [30, 40, 50]`
my_list

[10, 20, 30, 40, 50]

### "Order" for a list means insertion order

In [17]:
my_list.append(3)
my_list

[10, 20, 30, 40, 50, 3]

If you want the list sorted, use `.sort()`

In [20]:
my_list.sort(reverse=True)
my_list

[50, 40, 30, 20, 10, 3]

### Iterating through a list returns the elements of the list in order

But first - a quick review of loops

In [25]:
import time

for i in range( 5 ):
    i2 = i ** 2
    print( "i  =", i )
    print( "i2 =", i2 )
    # time.sleep( 3 )

i  = 0
i2 = 0
i  = 1
i2 = 1
i  = 2
i2 = 4
i  = 3
i2 = 9
i  = 4
i2 = 16


### Iterating through a list returns the elements of the list in order

In [27]:
for element in my_list:
    el2 = element ** 2
    print(el2)

2500
1600
900
400
100
9


### List "indexing"

We can also access items at particular locations in the list.

**WARNING**: in python, counting starts at `0`, not `1`.
In other words, to get the **third** element in `my_list`:

In [28]:
my_list

[50, 40, 30, 20, 10, 3]

In [29]:
my_list[2]

30

Like a variable, we can perform operations on the items in a list.

In [30]:
my_list[2] * 4

120

### Figuring out how many elements are in a list

The `len()` function ("length") tells us how many items are in a list.

In [31]:
len(my_list)

6

You may be tempted to use `len()` to get the last item in the list...

In [32]:
list_length = len(my_list)
my_list[list_length]

IndexError: list index out of range

This error is telling us that the `index` (in this case `6`) is "out of range."
In other words, the index `5` is "off the end" of the list.
This is because python counts from 0...
a list with `len(list) == 6` has indices `[0,1,2,3,4,5]`.


### Getting the last element in a list

In [33]:
my_list[-1]

3

or the second to last:

In [34]:
my_list[-2] # etc

10

### To access a range of values, use a "slice"

In [36]:
len(my_list[1:3])

2

- You can think about a slice as `my_list[start:stop]`
    - **BUT**, note that the thing at `stop` is not included in the slice
    - another way to thing about it is `my_list[start : start + slice_length]`


### Slice from the beginning or to the end

In [37]:
my_list

[50, 40, 30, 20, 10, 3]

In [38]:
my_list[:3]

[50, 40, 30]

In [42]:
len(my_list[2:])

4

### Modiying lists

Lists are "mutable", meaning we can change them.

In [43]:
my_list[1] = 10
my_list

[50, 10, 30, 20, 10, 3]

In [45]:
my_list[1:3] = 1, 2, 3
my_list

[50, 1, 2, 3, 3, 20, 10, 3]

### A quick aside

Don't forget to google!

Eg. "How do I add something to a list in python?"

**Question 1**

Remove all of the strings from the list `my_list2`:

In [50]:
my_list2 = ['a', 20, 2.3, 4.9, 'b', 7, 'a']

In [49]:
my_list2.remove(my_list2[0])
my_list2.remove('b')

# use this cell to figure out the code, then copy and paste it
# into the answer box in the quiz

print(my_list2)

[20, 2.3, 4.9, 7, 'a']


## Dictionaries are unordered collections with arbitrary indices

The indices are called "keys",
and the objects referenced by keys are the "values".

In [53]:
my_dict = {"Ada": 0, "Bodhidharma": 1, "César": 2, 4: 4}
my_dict

{'Ada': 0, 'Bodhidharma': 1, 'César': 2, 4: 4}

Entries (values) in a dict are accessed by their keys

In [54]:
my_dict[4]

4

### Dictionaries can be empty

In [55]:
fruit_colors = {} # equivalent to `fruit_colors = dict()``
fruit_colors

{}

And new elements can be added similar to assignment

In [56]:
fruit_colors["banana"] = "yellow"
fruit_colors["apple"] = "green"
fruit_colors

{'banana': 'yellow', 'apple': 'green'}

Note that if you use the same key, you will override previous entries:

In [57]:
fruit_colors["apple"] = "red"
fruit_colors

{'banana': 'yellow', 'apple': 'red'}

### Iterating through a dictionary provides keys

In [60]:
for k in my_dict:
    print(k)

Ada
Bodhidharma
César
4


Or you can iterate through values using keys:

In [61]:
for k in my_dict:
    print(my_dict[k])

0
1
2
4


### You can get lists of keys and values using the `.keys()` and `.values()` methods

In [63]:
my_dict.keys()

dict_keys(['Ada', 'Bodhidharma', 'César', 4])

In [64]:
my_dict.values()

dict_values([0, 1, 2, 4])

### A quick note on formatting

- One can often take advantage of python's syntax to make code more readable
- For example, spaces after commas, opening brackets etc are ignored

In [65]:
my_dict_formatted = {
    "Ada": 0,
    "Bodhidharma": 1,
    "César": 2
    }

my_dict == my_dict_formatted

False

or

In [None]:
my_dict_formatted = {
    "Ada"         : 0,
    "Bodhidharma" : 1,
    "César"       : 2
    }

my_dict == my_dict_formatted

### The order of key/value pairs does not matter

In [None]:
dict2 = {
    "Bodhidharma" : 1,
    "César"       : 2,
    "Ada"         : 0,
    }
dict2 == my_dict

## Sets are unordered collections with uniue entries

In [66]:
my_set = {1,2,1,2,1,2}
my_set

{1, 2}

In [67]:
my_set == {2, 1}

True

### Sets cannot be indexed

In [68]:
my_set[1]

TypeError: 'set' object does not support indexing

But you *can* iterate through them:

In [69]:
for entry in my_set:
    print(entry)

1
2


### Pay attention to the (lack of) ordering

In [75]:
for entry in {3,20,1,"1"}:
    print(type(entry), ":", entry)

<class 'int'> : 1
<class 'int'> : 3
<class 'int'> : 20
<class 'str'> : 1


### Sets have some useful functions for comparison

In [76]:
s1 = {'a','b','c','d'}
s2 = {'b','c','d','e'}

s1.intersection(s2) # or s1 & s2

{'b', 'c', 'd'}

In [77]:
s1.difference(s2) # or s1 - s2

{'a'}

In [78]:
s1.union(s2) # or s1 | s2

{'a', 'b', 'c', 'd', 'e'}

NOTE: Set theory is [a whole thing][1] - useful but beyond the scope of this course.

[1]: https://en.wikipedia.org/wiki/Set_theory

### As with other collections, sets can be empty...

In [80]:
my_set = set()
my_set

set()

In [4]:
my_set.add('a')
my_set

{'a'}

In [81]:
my_set.add('b')
my_set

{'b'}

### Excercise: other useful set functions

**What values of `my_other_set` will return `True` for the following functions:**

In [None]:
my_first_set = {'Ada', 'Bodhidharma', 'César'}
my_other_set_1 = { } # put at least one string in this set to make the following return True

my_first_set.issuperset(my_other_set_1)

In [None]:
my_first_set = {'Ada', 'Bodhidharma', 'César'}
my_other_set_2 = { } # put at least one string in this set to make the following return True

my_first_set.issubset(my_other_set_2)

### Set operations can be used to compare other collections

In [None]:
one_more_list = ['a', 'b', 'c', 'd', 'a', 'b']
yet_another_list = ['c', 'd', 'b']

set(one_more_list).union( set(yet_another_list) )

### Excercise: using set operations

**What will the return value of the following expression be?**

In [None]:
a_dict = {"Emma"    : "TA",
          "Eric"    : "instructor",
          "Kevin"   : "instructor",
          "Marina"  : "TA",
          "Shirley" : "TA"}

roles = {"instructor", "TA", "student"}

roles.difference( set( a_dict.keys() ) )

## Strings are weird

Remember how I said strings can act like scalars or collections of characters?

In [82]:
set("banana")

{'a', 'b', 'n'}

In [83]:
for c in "What's happening?":
    print(c)

W
h
a
t
'
s
 
h
a
p
p
e
n
i
n
g
?


## X of Ys - nested collections

- So far, all of our collections have held scalars
- But collections can hold other collections too
- For example, you can have a list of lists, or a dict of sets
    - Generally speaking - it's not a good idea to have the `key`s of dicts be collections

In [85]:
teaching_staff = {
    "Eric"  : {
        "position" : "instructor",
        "e-mail"   : "franzosa@hsph.harvard.edu",
        "lectures" : [1,2,5,6,8,11,13,14,15]
        },
    "Kevin" : {
        "position" : "instructor",
        "e-mail"   : "kbonham@broadinstitute.org",
        "lectures" : [3,4,7,9,10,12]
        }
    }

teaching_staff["Eric"]["e-mail"]

'franzosa@hsph.harvard.edu'

## Hands-on practice

**Write a loop that displays the square of every number from 0 to 10**

_NOTE: the last number displayed should be `100`_

In [None]:
for ?? in ??: # replace the ??
    print()

The colors of the rainbow are often described as ROYGBIV
(red, orange, yellow, green, blue, indigo, violet)
corresponding to wavelengths between 650nm (red) and 400nm (violet).

**Make a dictionary where the names of the colors are the keys**
**and the wavelengths are the values**

Just make the wavelengths evenly spaced (how might you use math to let
python do this for you automatically?)

In [None]:
color_dict = {
    "red" : 650,
    # enter other keys and valus here
    }

"Comprehensions" are syntax that let you generate collections using loops.
For example, compare the output of

In [None]:
for i in range(5):
    print(i / 2)

to

In [None]:
[i / 2 for i in range(5)]

**Use a list comprehension to make a list of all of the square of every number**
**from 0 to 10**

_the output should be `[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]`_

In [None]:
squares = [i for ?? in ??]
squares

Comprehensions can also be used to make sets and dictionaries.
For example, here's a dictionary comprehension:

In [None]:
squares = {i : i ** 2 for i in range(5)}
squares