## Lecture 11

The objectives of this lecture are to:

1. Learn how Python stores collections of objects: lists, tuples, and dictionaries.
2. Creation, modify and manipulate `list` collections.
3. Collections of collections!

# Collections of objects

Python, like most programming languages, has built-in data types to store collections of objects. In Python these collections can be either ordered (`list`, `tuple`) or unordered (`set`, `dictionary`). In this introduction to collections, we will focus on lists and later tuples and dictionaries.

Before we jump-in to creating and manipulating lists, we need to learn about the concept of *mutability*. All of the datatypes that we have used thus far (`int`, `float`, `str`, etc) have been *immutable*. That is, the value of the object cannot be changed and instead a new object is created. To illustrate this remember that integer values are unique,

In [None]:
i = 1
j = 1

print(id(i), id(j))

If I reassign the value of an immutable object, it changes the value to which the object references

In [None]:
j = 2

print(id(i), id(j))

Even strings are immutable in Python, remember that in the whole lecture introducing them we never tried to change a single character in the string,

In [None]:
string = "Hello World!"

string[0] = "h"

In [None]:
print(id(string))

string += "!!!"

print(id(string))

As will be illustrated in this and later lectures, collections in Python can be either immutable (tuples) or *mutable* (lists, dictionaries). This can cause significant issues for the beginner programmer, but hopefully careful study of this lecture (and the textbook) will help you avoid these issues.

# Creating and manipulating lists of data

Given a sample set of data, a table that shows the number of gray whales observed daily over a 14 day period:


<table>
    <tr>
        <td><b> Day</b></td>
        <td><b>Number of Whales</b></td>
        <tr>
            <td>1</td>
            <td>5</td>
        </tr>
        <tr>
            <td>2</td>
            <td>4</td>
        </tr>
        <tr>
            <td>3</td>
            <td>7</td>
        </tr>
        <tr>
            <td>4</td>
            <td>3</td>
        </tr>
        <tr>
            <td>5</td>
            <td>2</td>
        </tr>
        <tr>
            <td>6</td>
            <td>3</td>
        </tr>
        <tr>
            <td>7</td>
            <td>2</td>
        </tr>
        <tr>
            <td>8</td>
            <td>6</td>
        </tr>
        <tr>
            <td>9</td>
            <td>4</td>
        </tr>
        <tr>
            <td>10</td>
            <td>2</td>
        </tr>
        <tr>
            <td>11</td>
            <td>1</td>
        </tr>
        <tr>
            <td>12</td>
            <td>7</td>
        </tr>
        <tr>
            <td>13</td>
            <td>1</td>
        </tr>
        <tr>
            <td>14</td>
            <td>3</td>
        </tr>
        </table>
           
          
           

storing that information using our existing knowledge of Python would be tedious. We would need to create a single `int` variable for each data point `day1`, `day2`, ... , and `day14`. As the number of data points increases, so does the infeasibility of our existing approach. This is exactly a situation in which we would use a `list` datatype, which is a built-in or base type in Python and thus has special syntax for its initialization,

`list_label = [item1, item2, ...]`

Lists are heterogeneous datatypes, each *item* in the list need not be the same type, although typically this is the case. Furthermore, the item can be an expression which is evaluated before the list is created. From our table of data, we can form a list where the order of each item corresponds to the day at which the data was recorded,

In [None]:
# equivalent expressions
whales = [5, 4, 7, 3, 2, 3, 2, 6, 4, 2, 1, 7, 1, 3]
whales1 = list(whales)
print(id(whales), id(whales1))
print(whales, whales1)

Using our memory model, after executing the code above it would be represented by,


<center>
<img src='files/./images/lecture11/pg135.jpg'>
</center>

So a list is essentially a object that allows you to create, assign, and manipulate an ordered set of variables. The convenience of using a list is realized when we access the elements of the list which is achieved through *indexing* into the list,

In [None]:
whales[1]

The indexing syntax seems relatively straightforward, except for the fact that index $1$ returned the second list item! Python uses indexing syntax which is consistent with most programming languages (`C`, `C++`, `Java`) where the first item has index $0$, the second item has index $1$, and so on such that the last item has index $l-1$ where $l$ is the number of items in the list,

In [None]:
# first item
print(whales[0])

# last item
print(whales[13])

Intuitive, indexing into a list with an index $\ge l$ will result in an error,

In [None]:
whales[14]

Unlike most programming languages, Python provides some useful indexing syntax involving negative indexing. This allows you to index starting from the *end* of a list without knowing how long it is,

In [None]:
# last item
print(whales[-1])

# third from the last item
print(whales[-3])

Once again, indexing outside of the list in the reverse direction has the same consequences,

In [None]:
# first item
print(whales[-14])

print(whales[-15])

Since each list item is an object, we may assign a variable to its value, which is frequently convenient,

In [None]:
day2 = whales[1]

print("The number of whales observed on day 2 was: ", day2)

But be careful, since the item type is not mutable and changes to value of the variable we assigned has no affect on the list item value,

In [None]:
day2 = 10

whales[1]

We will learn how to do that in the next section, but first let's see an example of a list with heterogeneous items,

In [None]:
numbers = [1, "one", 2, "Three", 4., 5. + 0*1j]

type(numbers)

A heterogeneous list has the same type as a homogeneous one, there is no difference between them. We may index into the list as before, use negative indices, etc. There are very few scenarios where using heterogeneous lists is justified, so in practice we avoid them. Heterogeneous collections are very useful, though, but we will see that a dictionary collection is a better choice in this case.

In [None]:
print(numbers[0], numbers[-1])

We may also create a list that contains no items; this is called an *empty list*.  This may seem like a strange thing do to, but it is actually quite convenient when you do not know *a priori* how many items in a list you will need,

In [None]:
l = []

type(l)

This implies that we can add items to a list, which we will learn about in the next section.


### Manipulating lists

We have learned how to create lists and some basic concepts about mutability in Python, now let's learn how to modify a list. Given an list of strings of the names of noble gases,

In [None]:
# we purposefully make a mistake when typing "neon" and type "none" instead
nobles = ['helium', 'none', 'argon', 'krypton', 'xenon', 'radon']

After executing the code above, the memory model that represents the current state of memory is,


<center>
<img src='files/./images/lecture11/pg137.jpg'>
</center>

To correct our mistake we want to reassign the second item in the list, this is accomplished through indexing to the item and using the assignment operator,

In [None]:
# reassign the value of the second item
nobles[1] = "neon"

print(nobles)

Remember, the list object is mutable but strings are not. The string object with value "none" was not changed, instead a new one was created and second item in the list was referenced to it. The memory model corresponding to this is,


<center>
<img src='files/./images/lecture11/pg138.jpg'>
</center>

We have now seen two examples of list indexing with the assignment operator where the indexed list is on the (i) right and (ii) left of the assignment operator.

In [None]:
# indexed list on the right side of assignment, create a variable which references the list item object
neon_string = nobles[1]

# indexed list on the left side of the assignment, reassign the value of a list item
nobles[1] = "none"

# can you predict what the output of this function call will be?
print(neon_string, nobles)

There are many other ways to manipulate a list and Python provides many built-in functions for this purpose,


<table>
<tr>
    <td><b>Function</b></td>
    <td><b>Description</b></td>
</tr>
<tr>
    <td>len(L)</td>
    <td>Returns the number of items in list L</td>
</tr>
<tr>
    <td>max(L)</td>
    <td>Returns the maximum value in list L</td>
</tr>
<tr>
    <td>min(L)</td>
    <td>Returns the minimum value in list L</td>
</tr>
<tr>
    <td>sum(L)</td>
    <td>Returns the sum of the values in list L</td>
</tr>
<tr>
    <td>sorted(L)</td>
    <td>Returns a copy of list L where the items are in order from smallest to largest (This does not mutate L)</td>
</tr>
</table>

Let's learn how these functions behave by example using a list of strings and of numbers (from above).

In [None]:
# `len()` returns the number of items in a list
len(whales), len(nobles)

In [None]:
# `max()`/`min()` returns the maximum/minimum value in a list, this only make sense for
# a homogeneous list
print(max(whales), max(nobles))
print(min(whales), min(nobles))

In [None]:
# `sum()` returns the sum of all of the list items in order of index
sum(whales)

In [None]:
# `sorted()` returns a *new* list with item references in order of least to greatest
whales_sorted = sorted(whales)
nobles_sorted = sorted(nobles)

print(whales, whales_sorted)
print(nobles, nobles_sorted)

The `+`, `*`, and `in` operators are also defined for lists and behave similarly to the same operations on strings,

In [None]:
# concatenate the two lists
whales_nobles = whales + nobles

print(whales_nobles)

In [None]:
# multiply the list to create a new one
twice_the_nobles = nobles * 2

print(twice_the_nobles)

In [None]:
# check if a string value is in the list
gas = "neon"

gas in nobles

In [None]:
gas = "krypton"

gas in nobles

The `in` operator performs an itemwise check for a list item, it will not identify sub-lists as it does with strings.

In [None]:
string = "Hello World!"
substring = "Hello"

l = [1, 2, 3]
sl = [1, 2]

print(substring in string, sl in l)

### "Slicing" Lists

*Slicing* a list is indexing syntax which allows you to create a sub-list of items from an existing list. You will become very adept at this later in the course in that it is a frequently used operation in scientific computing. The format for slicing a list is,

`sublist = list[start:stop:increment]`

The expression on the righthand side of the assignment creates a new list including items from the `list` starting with `start` and ending with `stop-1` with increments of `increment`. Let's say we want a list with the observed number of whales every other day instead of every day,

In [None]:
whales_everyother = whales[1:15:2]

whales_everyother

If `start` or `stop` are excluded the interpreter assumes that `start=0` and `stop=len(list)`,

In [None]:
whales_everyother = whales[1::2]

whales_everyother

# Exercises


### Pragprog 

**1.** The variable kingdoms refers to the list ['Bacteria', 'Protozoa', 'Chromista', 'Plantae',
'Fungi', 'Animalia'] . Using kingdoms and either slicing or indexing with positive
indices, write expressions that produce the following:

In [None]:
kingdom=['Bacteria','Protozoa','Chromista','Plantae','Fungi','Animalia']

a. The first item of kingdoms

b. The last item of kingdoms

c. The list ['Bacteria', 'Protozoa', 'Chromista']

d. The list ['Chromista', 'Plantae', 'Fungi']

e. The list ['Fungi', 'Animalia']

f. The empty list

**2.** Repeat the previous question using negative indices

**3.** The variable appointments refers to the list ['9:00', '10:30', '14:00', '15:00', '15:30'] .
An appointment is scheduled for 16:30, so '16:30' needs to be added to the
list.

In [None]:
appointments=['9:00','10:30','15:00','15:30']

a. Using the list method append , add '16:30' to the end of the list that
appointments refers to.

b. Instead of using append , use the + operator to add '16:30' to the end of
the list that appointments refers to.

c. You used two approaches to add '16:30' to the list. Which approach
modified the list and which approach created a new list?

**4.** The variable ids refers to the list [4353, 2314, 2956, 3382, 9362, 3900] . Using list
methods, do the following:

In [None]:
ids=[4353,2314,2956,3382,9362,3900]

a. Remove 3382 from the list.

b. Get the index of 9362 .

c. Insert 4499 in the list after 9362

d. Extend the list by adding [5566, 1830] to it.

e. Reverse the list.

f. Sort the list.

**5.** a. Assign a list that contains the atomic numbers of the six alkaline
earth metals—beryllium (4), magnesium (12), calcium (20), strontium
(38), barium (56), and radium (88)—to a variable called alkaline_earth_metals .

b. Which index contains radium’s atomic number? Write the answer in
two ways, one using a positive index and one using a negative index.

c. Which function tells you how many items there are in alkaline_earth_metals ?

d. Write code that returns the highest atomic number in alkaline_earth_metals .

**6.** a. Create a list of temperatures in degrees Celsius with the values 25.2,
16.8, 31.4, 23.9, 28, 22.5, and 19.6, and assign it to a variable called
temps

b. Using one of the list methods, sort temps in ascending order.

c. Using slicing, create two new lists, cool_temps and warm_temps , which
contain the temperatures below and above 20 degrees Celsius,
respectively.

d. Using list arithmetic, recombine cool_temps and warm_temps into a new
list called temps_in_celsius .

**7.** Complete the examples in the docstring and then write the body of the
following function:

In [None]:
def same_first_last(L):
""" (list) -> bool
Precondition: len(L) >= 2
Return True if and only if first item of the list is the same as the
last.
>>> same_first_last([3, 4, 2, 8, 3])
True
>>> same_first_last(['apple', 'banana', 'pear'])
>>> same_first_last([4.0, 4.5])
"""

**8.** Complete the examples in the docstring and then write the body of the
following function:

In [None]:
def is_longer(L1, L2):
""" (list, list) -> bool
Return True if and only if the length of L1 is longer than the length
of L2.
>>> is_longer([1, 2, 3], [4, 5])
True
>>> is_longer(['abcdef'], ['ab', 'cd', 'ef'])
>>> is_longer(['a', 'b', 'c'], [1, 2, 3]
"""

**9.** Draw a memory model showing the effect of the following statements:
values = [0, 1, 2]
values[1] = values

**10.** The units variable refers to the nested list [['km', 'miles', 'league'], ['kg', 'pound',
'stone']] . 

In [None]:
units=[['km','miles','league'],['kg','pound','stone']]

Using units and either slicing or indexing with positive indices,
write expressions that produce the following:

a. The first item of units (the first inner list)

b. The last item of units (the last inner list)

c. The string 'km'

d. The string 'kg'

e. The list ['miles', 'league']

f. The list ['kg', 'pound']

**11.** Repeat the previous question using negative indices