# Lesson 4: Lists

## Recall from last week

### Errors
Error messages are thrown when something in the program makes it unable to run as intended.

Such messages contain some details about the nature of the error which can help us fix it.

### Strings
In some ways, Strings are simply lists of characters, meaning that they also largely behave as such.

Strings are also objects with internal methods which give us a large array of functionality in strings.

### Readable code
Code can be complete gibberish or it can be clearly explained throughout. The latter can help ourselves and collaborators.

# Lists

## The basics

Lists are created with `[item1, item2, item3]`

Lists are **ordered**, meaning that unless we move around things, things stay in the same order.

They can contain **anything**, even mixed data types!
`a_mixed_list = [4, 'a string', 9.2, False]`

Lists are **iterable**, meaning that we can loop over them:

```python
for item in some_list:
    print(item)
```

Last time you learned that strings are similar to lists. Well, let's look at it!

## List items have indices

We can use index numbers to retrieve an item or sequences of items in a list.

In [1]:
# remember that we count from 0!
a_list = [3, 7, 34, 97, 54, 29]
print('the item with index [0]:', a_list[0])
print('the item with index [3]:', a_list[3])

the item with index [0]: 3
the item with index [3]: 97


In [2]:
# we can change the item at a certain index
a_list[1] = 9
a_list

[3, 9, 34, 97, 54, 29]

In [3]:
# we can also use backwards indices and slices
print('the item with index [-2]:', a_list[-2])
print('the items in slice [3:5]', a_list[3:5])

the item with index [-2]: 54
the items in slice [3:5] [97, 54]


## The _in_ operator

We can use the _in_ operator to check if a certain element is in a list.

In [4]:
54 in a_list

True

In [5]:
64 in a_list

False

In [6]:
# not, however, with sequences of items!
[3, 7] in a_list

False

## The methods in a list

Lists also have internal methods.

In [7]:
a_list

[3, 9, 34, 97, 54, 29]

In [8]:
a_list.sort()
a_list

[3, 9, 29, 34, 54, 97]

In [9]:
a_list.append(867)
a_list

[3, 9, 29, 34, 54, 97, 867]

In [10]:
a_list.reverse()
a_list

[867, 97, 54, 34, 29, 9, 3]

## About mutability

Hang on a moment! Why don't we need to save it in a (new) variable whenever we do something as with strings?

We don't, because lists are **mutable**, meaning that we **can** change the internal state of a list.

Strings, on the contrary, are **immutable**, meaning that we **cannot** change the internal state but have to create a **new instance** if we want to change it.

In [11]:
a_string = 'a sample string'  # the string is initialized
a_string.upper()  # we call an internal method, which returns another version of the string
a_string  # the original string is unaltered

'a sample string'

In [12]:
another_list = ['a', 'list', 'of', 'strings']  # the list is initialized
another_list.append('!')  # we call an internal method, which alters the list
another_list  # the original list is altered

['a', 'list', 'of', 'strings', '!']

## Nested lists

Since lists can contain anything, they can also contain other lists (which, in turn, can contain lists - and so, ad infinitum).

This is a very handy data structure, but too much nesting can become impossible to handle.

Let's see how it can work!

In [13]:
# for readability, we can put out the nested lists on a line each
a_nested_list = [[1, 2, 3],
                 [2, 3, 4],
                 [3, 4, 5],
                 [4, 5, 6]]
a_nested_list

[[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]]

In [14]:
# when we loop over the nested list, we retrieve each of the inner lists one by one
for item in a_nested_list:
    print(item)

[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]


In [15]:
# therefore, we can make nested loops
for inner_list in a_nested_list:
    for number in inner_list:
        print(number, 'squared is', number ** 2)

1 squared is 1
2 squared is 4
3 squared is 9
2 squared is 4
3 squared is 9
4 squared is 16
3 squared is 9
4 squared is 16
5 squared is 25
4 squared is 16
5 squared is 25
6 squared is 36


In [16]:
# we can do some things in the outer loop and other things in the inner loop
# what is executed when?

# calculate the mean of squares for each list and append it to another list
means = []
for inner_list in a_nested_list:
    total = 0  # initalialize as 0 for each list
    for number in inner_list:
        number_squared = number ** 2
        total += number_squared
    mean_of_squares = total / len(inner_list)
    means.append(mean_of_squares)

means  # check that we've made a list

[4.666666666666667, 9.666666666666666, 16.666666666666668, 25.666666666666668]

## An alternative list: the set

They are made with `{item1, item2, item3}`

They are somewhat similar to lists, but there are two major differences:

### Lists are ordered; sets are not!

In [17]:
a_list_of_strings = ['another', 'list', 'of', 'strings', 'ordered', 'after', 'their', 'original', 'order']
a_list_of_strings

['another',
 'list',
 'of',
 'strings',
 'ordered',
 'after',
 'their',
 'original',
 'order']

In [18]:
a_set_of_strings = {'another', 'list', 'of', 'strings', 'ordered', 'after', 'their', 'original', 'order'}
a_set_of_strings  # as default, they come sorted if possible, but it's not a given!

{'after',
 'another',
 'list',
 'of',
 'order',
 'ordered',
 'original',
 'strings',
 'their'}

This also means that sets cannot be indexed:

In [19]:
a_set_of_strings[0]

TypeError: 'set' object does not support indexing

### Lists can contain duplicates; sets can not!

In [20]:
a_list_with_duplicates = [1, 1, 1, 4, 6, 34, 1]
a_list_with_duplicates

[1, 1, 1, 4, 6, 34, 1]

In [21]:
a_set_with_duplicates = {1, 1, 1, 4, 6, 34, 1}
a_set_with_duplicates

{1, 4, 6, 34}

### Possible usage of sets: type-to-token ratio

In [22]:
# suppose that we retrieve a list of words uttered by a child in a transcript
tokens = ['big','drum', 'horse', 'who', 'is', 'that', 'those', 'are',
          'checkers', 'two', 'checkers','yes', 'play', 'checkers',
          'big', 'horn', 'get', 'over', 'Mommy', 'shadow', 'I',
          'like', 'it']
types = set(tokens)  # convert the list to a set
print('TTR:', round(len(types) / len(tokens), 3))  # calculate and report TTR

TTR: 0.87


## Another alternative list: the tuple

They are made with `(item1, item2, item3)`

They are almost exactly similar to lists, but with one key difference: they are **immutable**!

That is, we cannot the internal state of tuples.

In [23]:
a_tuple = (1, 43, 29, 13)
a_tuple

(1, 43, 29, 13)

In [24]:
# so, for instance, we cannot sort tuples with an internal method
a_tuple.sort()

AttributeError: 'tuple' object has no attribute 'sort'

What are they good for, you might ask?

In practice, they are used for different things:

Lists are used when we want an ordered data structure which we can change along the way.

Tuples are used when we want a fixed data structure which should stay the same from its initialization. Often, a tuple serves as a "multi-element" data type, e.g. coordinates `(34.2, 12.6)` or transcript lines `('CHI', 'see what bear ?')`.

### Possible usage of tuples: transcript lines

In [26]:
# load the transcript
transcript_path = input('Please, input the file path to a transcript: ')
raw = open(transcript_path).read()
lines = raw.split('\n')

# retrieve lines with utterances, stored as (SPEAKER, UTTERANCE) tuples
utterances = []
for line in lines:
    if line.startswith('*'):  # all other lines are metadata
        split_line = line.split(':\t')  # we know that this sequence splits the line correctly
        speaker = split_line[0][1:]  # get only the actual speaker name
        utterance = (speaker, split_line[1])  # store it as a tuple
        utterances.append(utterance)
        
utterances[:10]  # show the first 10

Please, input the file path to a transcript: /home/kasper/python-projects/CLA_2019/CLA_2018/Data/Brown/Adam/adam01.cha


[('CHI', 'play checkers .'),
 ('CHI', 'big drum .'),
 ('MOT', 'big drum ?'),
 ('CHI', 'big drum .'),
 ('CHI', 'big drum .'),
 ('CHI', 'big drum .'),
 ('CHI', 'horse .'),
 ('MOT', 'horse .'),
 ('CHI', 'who (th)at ?'),
 ('MOT', 'who is that ?')]

# Exercise 1 in ch. 7

Compare your solutions to your neighbour(s) and discuss how the different manipulations can be done.

In [36]:
user_list = eval(input())  # see page 57

[7,12,3.4,65,False,2,87,6,23,5,5]


In [37]:
# many of these tasks can be done in smarter ways that we are not yet
# familiar with, i.e. we will be using the tools that we know of.

# a) we get the number of items in the list with len()
print('The number of items in the list is', len(user_list))

# b) we get the last item with a backwards index
print('The last item in the list is', user_list[-1])

# c) we can use the internal method .reverse()
user_list.reverse()  # note that the actual order is changed now
print('The list reversed:', user_list)

# d) we use the 'in̈́ operator to check if there is a 5 in the list
if 5 in user_list:
    print('Yes, there is a 5 in the list.')
else:
    print('No, there is not a 5 in the list.')
    
# we use the .count() method
print('The number of 5\'s in the list is', user_list.count(5))

# f) we use the 'del' statement to delete the first and last entries
del user_list[0]  # first entry
del user_list[-1]  # last entry
# then we sort it with the internal method .sort() and print it
user_list.sort()
print('The first and last entry is deleted, and the list is sorted:',
      user_list)

# We cannot simply use the .count() method here as in e -- at least not
# with our current knowledge! So, we want to check if an item is an
# integer with type() and, if so, see if it's lower than 5
ints_lower_than_five = 0
for item in user_list:
    if type(item) == int and item < 5:
        ints_lower_than_five += 1
print('The number of integers lower than 5 is', ints_lower_than_five)

The number of items in the list is 11
The last item in the list is 5
The list reversed: [5, 5, 23, 6, 87, 2, False, 65, 3.4, 12, 7]
Yes, there is a 5 in the list.
The number of 5's in the list is 2
The first and last entry is deleted, and the list is sorted: [False, 2, 3.4, 5, 6, 12, 23, 65, 87]
The number of integers lower than 5 is 1


# Homework for next time

Review chapter 1-7 for next time by skimming throught it and getting an overview of how much you know.

Re-read things that are unclear and gather questions that we can work with next time.

We will be revisiting most topics, but we can focus on the areas where you have problems.