# <div class = "alert alert-info"> <font color = purple> Chapter 04 - Collection Data Types 1: Python Lists

We have thus far looked at some operations used to handle the following basic data types:

- Integer `int`
- Floating point number `float`
- Boolean `bool`
- Character `chr`
- String `str`

In this chapter, we shall looked at how data can be compounded and handled as a collection.

In Python, data can be compounded into the following collection data types:

- list `list` 
- tuple `tuple`
- dictionary `dict`
- set `set`

Each collection data type has its own properties and set of operations which may be similar to or different from another collection data type.

## 4.1 Python lists

A Python list is a simple collection of data.

You can initialise a Python list from a collection of data by enclosing them in a pair of square brackets`[]`, then separating each piece of data with a comma `,`.

Try the following lines of code one by one in the cell below:

1. `list_ = [1, 2, 3, 4, 5]`
3. `list_ = ['apple', 'blueberry', 'carrot',]`
2. `list_ = [1, '2', 3.0, True, 'apple']`
4. `list_ = []`

In [None]:
# A list can contain integers
list_ = [i for i in range(1, 6)]

In [39]:
# A list can contain strings too. It will also ignore one trailing comma at the end of the list.
list_ = ['apple', 'blueberry', 'carrot', ]

In [None]:
# A list can even contain a mixture of different data types.
# May not be meaningful
list_ = [1, '2', 3.0, True, 'apple']

In [2]:
# This is how you initialise an empty list
list_ = []

[]

### 4.1.1 The `list()` Constructor

The `list()` constructor can be used to create a list from existing data that can be iterated e.g strings

In [43]:
# Iterating through a string to create a list of characters

list('apple')

['a', 'p', 'p', 'l', 'e']

The `list()` constructor cannot be used on data that cannot be iterated.

In [44]:
# Integers and floats are examples of data that cannot be iterated

list(str(300))

list(str(4.0))

['4', '.', '0']

### 4.1.2 List indexing

List indexing works similar to string indexing. In fact, strings are implemented in Python internally as a special type of list.

Each item in a list is called an **element**. The elements of a list can be retrieved using their **index**.

The first element of a list has index `0`. Hence `list[0]` returns the first element of the list; `list[1]` returns the second element, and so on.

Try the following lines of code one by one in the cell below:

`mylist = [1, 2, 3, 4, 5]`
1. `type(mylist)`
2. `mylist[0]` 
3. `mylist[5]` 
4. `mylist[1.0]` 
5. `mylist[]`
6. `mylist[-1]` 

In [45]:
mylist = [1, 2, 3, 4, 5] # Do not remove this line.

# A list is a type of object in Python
type(mylist)

list

In [50]:
# Try addressing mylist with other integers besides 0 in mylist[0]
print(mylist[0])
print(mylist[1])
print(mylist[2])
print(mylist[3])
print(mylist[4])

1
2
3
4
5


In [51]:
# The largest index in a list of length n is n-1 as indices start from 0. 
# Attempting to address a list with an index greater than n-1 produces an IndexError
print(mylist[5])

IndexError: list index out of range

In [52]:
# A list index must be an integer.
mylist[1.0]

TypeError: list indices must be integers or slices, not float

In [53]:
# You cannot address a list without an index. 
# To refer to the entire list as an object, use the list's variable name, i.e. mylist.
mylist[]

SyntaxError: invalid syntax (<ipython-input-53-9d908f1247aa>, line 3)

In [56]:
# Negative integer indices address the list backward. 
# Try indexing mylist with -1, -2,-3, etc. 
# What happens when you go below -5? IndexError, out of range
mylist[-5]

1

### 4.1.3 List slicing

List slicing allow you to return multiple elements from a list, just as string slicing allows you to return substrings of characters from a string.

Try the following lines of code one by one in the cell below:

`yourlist = ["Alice", "Bob", "Charlie", "Dawn", "Ernest"]`
1. `yourlist[0:0]`
2. `yourlist[0:1]` 
3. `yourlist[0:4]` 
4. `yourlist[0:6]` 
5. `yourlist[0:5:2]` 
6. `yourlist[0:4:2]`
7. `yourlist[0::2]`
8. `yourlist[::-1]`
9. `yourlist[0:4:-1]`
10. `yourlist[4:0:-1]`
11. `type(yourlist[0:0])`

In [57]:
yourlist = ["Alice", "Bob", "Charlie", "Dawn", "Ernest"] # Do not remove this line.

# This should give you an empty list.

yourlist[0:0]

[]

In [None]:
# How does yourlist[0:1] differ from yourlist[0]
print(yourlist[0:1])
print(yourlist[0])

In [58]:
# Why doesn't yourlist[0:4] return the whole list?
yourlist[0:4]
# Note: upper limit is excluded

['Alice', 'Bob', 'Charlie', 'Dawn']

In [None]:
# If the slice exceeds the largest index, you won't get an error but those indices will be ignored.
yourlist[0:100]

In [None]:
# What is the significance of the third number in a slicing operation? 
yourlist[0:5:-1]

In [None]:
# Why is `yourlist[4]` not in the result yourlist[0:4:2]?
print(yourlist[4])
print(yourlist[:4:-2])

In [59]:
# What happens when you remove any of the numbers in the slicing operation? 
# Can you figure out the default values when they are not included in the slicing operation?
print(yourlist[:4:2]) # Start to specified end
print(yourlist[0::2]) # Specified start to end
print(yourlist[::0]) #

['Alice', 'Charlie']
['Alice', 'Charlie', 'Ernest']


ValueError: slice step cannot be zero

In [None]:
# How do you reverse a list?
yourlist[::-1]

In [60]:
# Why does yourlist[0:4:-1] not work?
yourlist[4:0:-1]


['Ernest', 'Dawn', 'Charlie', 'Bob']

In [None]:
# Is yourlist[4:0:-1] the same as yourlist[::-1]?

In [None]:
# A list indexed with a slice always returns a list, even if the result is only an empty list.

### 4.1.4 List editing methods

A list has built-in methods that allow you to add or remove elements from it.

Try the expressions in the cell below, one by one:  
(Remember that the assignment (`=`) operator does not produce any output.)

`countries = ["America", "Brazil", "Cambodia", "Dominican Republic", "Ethiopia", "France", "Germany", "Hungary"]`  

1. `dir(countries)` 
2. `countries.append("India")` 
3. `countries.append(["India","Japan"])` 
4. `countries.extend(["India","Japan"])` 
5. `countries.extend("India","Japan")` 
6. `countries.extend("India")` 
7. `countries.insert(4,"England")`
8. `countries.remove("China")`
9. `countries.remove(0)` 
10. `countries.remove(countries[0])`
11. `item = countries.pop()` 
12. `item = countries.pop(1)` 
13. `countries.clear()` 
14. `del countries[0]` 
15. `del countries[0:3]` 

In [9]:
# Do not remove this line.
countries = ["America", "Brazil", "Cambodia", "Dominican Republic", "Ethiopia", "France", "Germany", "Hungary"]

# This gives you a list of attributes that the samelist list object has. 
# We will cover attributes in a later lesson, in Object-Oriented Programming
dir(countries)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [10]:
# Adds a single item at the end of the list. 
# Calling an object method will not print its value; do you remember how to check an object's value?
countries.append('India')
countries

['America',
 'Brazil',
 'Cambodia',
 'Dominican Republic',
 'Ethiopia',
 'France',
 'Germany',
 'Hungary',
 'India']

In [11]:
# Did this produce the effect you expected?
countries.append(['India', 'Japan'])
countries

['America',
 'Brazil',
 'Cambodia',
 'Dominican Republic',
 'Ethiopia',
 'France',
 'Germany',
 'Hungary',
 'India',
 ['India', 'Japan']]

In [12]:
# Adds multiple elements, which must be in a list. How does this differ from append()?
countries.extend(['India', 'Japan'])

In [None]:
# The extend() method only accepts one value.
# What exactly is one value then?

In [23]:
# If you use the extend() method with a string instead of a list, what happens? Why?
countries = countries[:4] + ['England'] + countries[4:]
countries

['America',
 'Brazil',
 'Cambodia',
 'Dominican Republic',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'Ethiopia',
 'France',
 'Germany',
 'Hungary',
 'India',
 ['India', 'Japan'],
 'India',
 'Japan']

In [13]:
# The insert(n,item) method lets you insert item at the nth index.
countries.insert(4, 'England')

In [25]:
# You can only remove elements that already exist in the element. 
# Try this with a country that is already in the list.
countries.remove('Cambodia')

In [26]:
# You can't remove the first item in a list this way. Remember that list indexes go in [square brackets].

['America',
 'Brazil',
 'Dominican Republic',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'England',
 'Ethiopia',
 'France',
 'Germany',
 'Hungary',
 'India',
 ['India', 'Japan'],
 'India',
 'Japan']

In [None]:
# Can you explain why this works?

In [None]:
# pop() removes the last element from the list and returns it. 
# You can assign it to a variable, in this case item.

In [None]:
# pop(int) removes the list element at the (int) index stated and returns it. 
# You can assign it to a variable.

In [None]:
# What does clear() do?

In [None]:
# To delete an element from a list, use the del keyword on the list element specified by index.

In [None]:
# You can use the del keyword to delete multiple elements by list slicing.

### 4.1.5 List operators

The operators `+` and `*` work with lists. So does the `in` keyword.

Try the following expressions in the cell below:
(Notice that the list methods above modify the original list. the operators below do not; they give the modified list as a return value.)

`anewlist = ["America", "Brazil", "Cambodia", "Dominican Republic", "Ethiopia", "France", "Germany", "Hungary"]`
1. `anewlist + "Ireland"` (Doesn't work. The error provides a clue: lists can only be "added" to other lists, not to strings.)
2. `anewlist + ["Ireland"]` (This works. You have to convert the string into a list element by putting it in a list first. This is called **list concatenation**. A **new list** is returned containing the result.)
3. `anewlist + ["Ireland","Japan","Kenya"]` (You can add lists with multiple elements together this way.)
4. `anewlist += ["Ireland"]` (This is equivalent to `anewlist = anewlist + ["Ireland"]`, and is a shorter way to modify the original list.)
5. `anewlist[0] = "Australia"` (To reassign a list element to a different value, address the element using its index.)
6. `anewlist[0] = Australia` (Remember that strings need the quote marks `''` or `""` otherwise they get interpreted as variables.)
7. `anewlist*3` (`*` operator with an integer works on lists too.)
8. `anewlist *= 3` (This is equivalent to `anewlist = anewlist*3`)
9. `'America' in anewlist` (The `in` keyword checks for exact matches with list elements.)
10. `['America'] in anewlist` (Does this work? Why or why not?)
11. `'Amer' in anewlist` (Although `'Amer' in 'America'` returns `True`, this statement returns `False`. Why?)

In [None]:
anewlist = ["America", "Brazil", "Cambodia", "Dominican Republic", "Ethiopia", "France", "Germany", "Hungary"]

# Type your code below this line.


### 4.1.6 Useful functions and methods for lists

We often need to know something about the list. Python has built-in functions to give us this information. You have already learnt the `type()` function, which tells us what type of object it is. 

Try the expressions in the cell below and indicate which functions and methods modify the original list.

`numberlist = [8, 7, 6, 5, 4, 3, 3, 2, 1]`

1. `len(numberlist)` (`len()` tells you how many elements a list has.)
2. `sorted(numberlist)` (`sorted()` gives you a list sorted in ascending order.)
3. `sum(numberlist)` (`sum()` returns the sum of all elements. Works on integers and floats only.)
4. `min(numberlist)` (`min()` returns the smallest element. Works on strings.)
5. `max(numberlist)` (`max()` returns the largest element. Works on strings.)
6. `numberlist.index(6)` (`.index(num)` gives you the index of the first occurrence of `num` in the list.)
7. `numberlist.count(3)` (`.count(num)` returns the number of occurrences of `num` in the list.)
8. `numberlist.reverse()`  -> Examine the value of `numberlist` again; what happened?
   (The `.reverse()` method reverses the order of elements in the list without returning any value. This is an alternative to the list-slicing method.)

In [None]:
numberlist = [8, 7, 6, 5, 4, 3, 2, 1] #Do not remove this line.

# Type your code below this line.


Try the expressions in the cell below to understand how these functions work for strings:

`countrylist = ["Hungary", "Germany", "France", "Ethiopia", "Dominican Republic", "Cambodia", "Brazil", "America"]`
1. `len(countrylist)`
2. `sorted(countrylist)`
3. `min(countrylist)`
4. `max(countrylist)`

Are there any differences in the way the functions work for integers and strings?

In [None]:
countrylist = ["Hungary", "Germany", "France", "Ethiopia", "Dominican Republic", "Cambodia", "Brazil", "America"] \
#Do not remove this line. Type your code below this line.
numlist = [i for i in range(1, 9)]

numlist.sort(reverse=True)
numlist.sort()[::-1]

### 4.1.7 A note on naming variables

We always write functions with the brackets e.g. `len()`. This is to avoid confusing them with variables. As much as possible, avoid naming your variables in a way that can confuse you.

**Negative example:** `len = len(numberlist)` (The first `len` is a variable; the second `len()` is a function)  
**Positive example:** `list_len = len(numberlist)` or `list_length = len(numberlist)`

### 4.1.8 Iterating over lists

Often, we need to perform more advanced instructions over each item in a collection, and the basic functions will not suffice. In such cases, we need to **iterate** over each item in the collection and carry out a procedure on each item. We can do that using **loops**.

### <u>Iterating with a `for` loop</u>

Run the following cell and observe the output:

In [None]:
positions = ['first','second','third','fourth','fifth']

for num in positions:
    print(f'value of positions: {positions.index(num)}')
    print(f'value of num: {num}')

Notice how the `num` and `positions` variables are used.

`num` is a placeholder. When we start from the first element of `positions` (i.e. `positions[0]`), `num` temporarily holds the value of `positions[0]`. In each iteration, the value of `positions` remains the same, but the value of `num` changes.

**Task: Write a list**

Complete the code by replacing the underscores (`_____`) with appropriate variable names or strings.

In the code cell below, create a list containing the email addresses of your classmates, and print them out using a `for` loop.

Hint: Remember that strings need to be initialised within quotes (`''` or `""`)

In [None]:
emails = [_____,_____,_____,_____,_____]

for _____ in _____:
    # Type your code below
### BEGIN SOLUTION
### END SOLUTION

### <u>Exercise 1: Validate a list of phone numbers</u>

In the code cell below, complete the procedure by replacing the underscores (`_____`) with appropriate expressions to:

1. validate each entry in the list `phone_numbers` to check that it is a valid phone number, i.e. obeys the following conditions:
   - has 8 digits  
   - starts with 6, 8, or 9  
2. Print out **only the invalid phone numbers**.

In [18]:
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5]
lst = []

for number in numbers:
    if number not in lst:
        lst.append(number)

lst


[1, 2, 3, 4, 5]

In [15]:
phone_numbers = [68476397,9448756,83674561,48697485] # Do not edit this line.

for number in phone_numbers:
    # Type your validation procedure below this line
    if len(str(number)) == 8 and str(number).startswith(('6', '8', '9')):
        pass
    else:
        print(f'Invalid number: {number}')
### BEGIN SOLUTION
### END SOLUTION  

Invalid number! 9448756
Invalid number! 48697485


### <u>Exercise 2: Filtering lists with `for` loop</u>

The `dir()` function returns all the attributes and methods associated with a Python object, in the form of a list. The object's **special methods** begin and end with a **d**ouble **under**sore (`__`) and are also known as **dunder**s.

Complete the code in the code cell below using loops, string methods, and other relevant functions to write a procedure that prints out all the **non-dunder methods** associated with the `string` object.

In [16]:
str_methods = dir(str)

for method in str_methods:
    # Type your code below
    if method.startswith('__') and method.endswith('__'):
        continue
    else:
        print(f'{method}')
### BEGIN SOLUTION
### END SOLUTION

capitalize
casefold
center
count
encode
endswith
expandtabs
find
format
format_map
index
isalnum
isalpha
isdecimal
isdigit
isidentifier
islower
isnumeric
isprintable
isspace
istitle
isupper
join
ljust
lower
lstrip
maketrans
partition
replace
rfind
rindex
rjust
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill


### <u>Generating numbers for iteration: the `range()` function</u>

Run each of the following groups of code and observe the output:

1. `range(1,10)` (Hmm, this doesn't seem to do anything ...)
2. `list(range(1,10))` (`range()` **generates** a collection of numbers and it can be converted to a list! Notice that the last number is ignored; this is similar to slicing.)
3.  ```
    for n in range(1,10):
        print(n)
    ```
    (`range()` can be used in a `for` loop to generate numbers for iterating.)
4. `list(range(10))` (If only one value is given, this is assumed to be the end value. The start value is assumed to be 0.)
5. `list(range(1,10,2))` (Similar to slicing, if 3 values are given, the last value is the step size.)

Which function in Python tells you how to use the `range()` function? Try it on the function and see what it tells you.

In [None]:
# Your code here

### <u>Iterating over a range of numbers with `for` loop</u>

Suppose I have two lists:

```  
positions = ['first','second','third','fourth','fifth']
fruits = ['apple','banana','cherry','durian','elderberry']
```

How would I generate the following output?

  ```
  The first value is apple.
  The second value is banana.
  The third value is cherry.
  The fourth value is durian.
  ...
  ```

Can I do that with a `for` loop? Absolutely. But it is not possible for us to iterate over two different lists in one loop. Instead, we need to recognise that in the first iteration, we want the first elements from each list, and for the second iteration we need the second elements, and so on.

We need to have a way to generate indexes for each iteration. Python makes it easy to do that with the `range()` function.

Run the code cell below and observe the output:

In [11]:
positions = ['first','second','third','fourth','fifth']


print(f'{"Index":<15}{"Value":<15}')
print('_________________________________')
for position in positions:
    print(f'{positions.index(position):<15}{position:<15}')

Index          Value          
_________________________________
0              first          
1              second         
2              third          
3              fourth         
4              fifth          


In [None]:
positions = ['first','second','third','fourth','fifth']
fruits = ['apple','banana','cherry','durian','elderberry']

for i in range(len(positions)):
    ith = positions[i]
    name = fruits[i]
    print(f'The {ith} value is {name}.')


### <u>Exercise 3: Predict the output</u>

What will the output look like with the following code?

  ```
  for i in range(1,len(positions),2)
      ith = positions[i]
      name = fruits[i]
      print(f'The {ith} value is {name}.')
  ```

What will the output look like with the following code? Why?

  ```
  for i in [0,1,2,3,4,5]:
      ith = positions[i]
      name = fruits[i]
      print(f'The {ith} value is {name}.')
  ```

What error will you get with the following code? Why?

  ```
  for i in [0,1,2,3,4,5]:
      ith = positions
      name = fruits
      print(f'The {ith} value is {name}.')
  ```

What will the output look like with the following code? Why?

  ```
  for i in range(len(fruits)):
      ith = positions(i)
      name = fruits(i)
      print(f'The {i} value is {name}.')
  ```

What will the output look like with the following code? Why?

  ```
  for i in range(0,len(positions)):
      print(f'The {positions[i]} value is {fruits[i]}.')
  ```

### Exercise 4: Make a menu

Write code in the code cell below to create a menu and ask the user for input.

Sample output:

```
    == Menu options ==
    1. Show the time
    2. Round a number to the nearest sf
    3. Round a number to the nearest dp
    4. Convert temperatures
    
    Choose an option (1-4): 
```

Your code should store the menu options in a list and generate the options in a `for` loop, so as to allow future developers to extend it easily.

In [None]:
"""
When creating long collections, Python allows you to break up the line of code 
for easier reading, so long as the line break happens after a comma (,) and
before the end of the collection object.

If you need to write multiline comments, the best way to do so is using three 
quote marks at the start and end, on a new line, like this comment.
"""

menu_options = ['Show the time',
                'Round a number to the nearest sf',
                'Round a number to the nearest dp',
                'Convert temperatures']

# Type your code below:
### BEGIN SOLUTION
print('== Menu options ==')
for n in range(4):
    print(str(n+1)+'.',menu_options[n])
### END SOLUTION

### 4.1.9 Errors in Python: `IndexError`

When you try to slice a list or obtain an element from a list using an invalid index, Python will halt and raise an `IndexError`. This often happens if you accidentally use a `string` instead of an `int`, or if you use the wrong variable. Another way to get `IndexError` is when your index is equal to or larger than the length of the list.

In the code cell below, try to raise an `IndexError`.

In [12]:
fruits = ['apple','banana','cherry','durian','elderberry']

# Type your below this line to raise an IndexError
raise IndexError

IndexError: 

### 4.1.10 List comparators

The comparison operators `<`, `>`, `<=`, `>=`, `==`, and `!=` work with lists as well.

Try the following expressions in the cell below:

1. `[1,2,3] == [1,2,3]`
2. `[1,2,3] == [1,2,4]`
3. `[1,2,3] <= [1,2,4]`
4. `[1,3,3] <= [1,2,4]`
5. `1 <= [1,2,4]`
6. `anewlist.append("Ireland")` (The `append()` list method is another way to add an element to the end of the list.)
7. `anewlist[0] = "Australia"` (To reassign a list element to a different value, address the element using its index.)
8. `anewlist[0] = Australia` (Remember that strings need the quote marks `''` or `""` otherwise they get interpreted as variables.)
9. `anewlist*3` (`*` operator with an integer works on lists too.)
10. `anewlist *= 3` (This is equivalent to `anewlist = anewlist*3`)

How does the comparator `==` work for lists?

How does the `!=` comparator work for lists?

How do the `<` and `<=` comparator work for lists?

What does the `<` comparator do if the lists are of unequal length?