# Lesson 4: Lists and reading/writing files

## Recall from last week

### Errors
Error messages are thrown when something in the program makes it unable to run as intended.

Such messages contain some details about the nature of the error which can help us fix it.

### Strings
In some ways, Strings are simply lists of characters, meaning that they also largely behave as such.

Strings are also objects with internal methods which give us a large array of functionality in strings.

### Readable code
Code can be complete gibberish or it can be clearly explained throughout. The latter can help ourselves and collaborators.

# Lists

## The basics

Lists are created with `[item1, item2, item3]`

Lists are **ordered**, meaning that unless we move around things, things stay in the same order.

They can contain **anything**, even mixed data types!
`a_mixed_list = [4, 'a string', 9.2, False]`

Lists are **iterable**, meaning that we can loop over them:

```python
for item in some_list:
    print(item)
```

Last time you learned that strings are similar to lists. Well, let's look at it!

## List items have indices

We can use index numbers to retrieve an item or sequences of items in a list.

In [29]:
# remember that we count from 0!
a_list = [3, 7, 34, 97, 54, 29]
print('the item with index [0]:', a_list[0])
print('the item with index [3]:', a_list[3])

the item with index [0]: 3
the item with index [3]: 97


In [30]:
# we can change the item at a certain index
a_list[1] = 9
a_list

[3, 9, 34, 97, 54, 29]

In [31]:
# we can also use backwards indices and slices
print('the item with index [-2]:', a_list[-2])
print('the items in slice [3:5]', a_list[3:5])

the item with index [-2]: 54
the items in slice [3:5] [97, 54]


## The _in_ operator

We can use the _in_ operator to check if a certain element is in a list.

In [12]:
54 in a_list

True

In [13]:
64 in a_list

False

In [None]:
# not, however, with sequences of items!
[3, 7] in a_list

## The methods in a list

Lists also have internal methods.

In [24]:
a_list

[3, 7, 34, 97, 54, 29]

In [25]:
a_list.sort()
a_list

[3, 7, 29, 34, 54, 97]

In [26]:
a_list.append(867)
a_list

[3, 7, 29, 34, 54, 97, 867]

In [27]:
a_list.reverse()
a_list

[867, 97, 54, 34, 29, 7, 3]

## About mutability

Hang on a moment! Why don't we need to save it in a (new) variable whenever we do something?

We don't, because lists are **mutable**, meaning that we **can** change the internal state of a list.

Strings, on the contrary, are **immutable**, meaning that we **cannot** change the internal state but have to create a **new instance** if we want to change it.

In [40]:
a_string = 'a sample string'  # the string is initialized
a_string.upper()  # we call an internal method, which returns another version of the string
a_string  # the original string is unaltered

'a sample string'

In [39]:
another_list = ['a', 'list', 'of', 'strings']  # the list is initialized
another_list.append('!')  # we cal an internal method, which alters the list
another_list  # the original list is altered

['a', 'list', 'of', 'strings', '!']

## Nested lists

Since lists can contain anything, they can also contain other lists (which, in turn, can contain lists - and so, ad infinitum).

This is a very handy data structure, but too much nesting can become impossible to handle.

Let's see how it can work!

In [41]:
# for readability, we can put out the nested lists on a line each
a_nested_list = [[1, 2, 3],
                 [2, 3, 4],
                 [3, 4, 5],
                 [4, 5, 6]
                ]
a_nested_list

[[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]]

In [42]:
# when we loop over the nested list, we retrieve each of the inner lists one by one
for item in a_nested_list:
    print(item)

[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]


In [43]:
# therefore, we can make nested loops
for inner_list in a_nested_list:
    for number in inner_list:
        print(number)

1
2
3
2
3
4
3
4
5
4
5
6


In [46]:
# we can do some things in the outer loop and other things in the inner loop
# what is executed when?

# calculate the mean of squares for each list and append it to another list
means = []
for inner_list in a_nested_list:
    total = 0  # initalialize as 0 for each list
    for number in inner_list:
        number_squared = number ** 2
        total += number_squared
    mean_of_squares = total / len(inner_list)
    means.append(mean_of_squares)

means  # check that we've made a list

[4.666666666666667, 9.666666666666666, 16.666666666666668, 25.666666666666668]

## An alternative list: the set

They are made with `{item1, item2, item3}`

They somewhat similar to lists, but there are two major differences:

### Lists are ordered; sets are not!

In [49]:
a_list_of_strings = ['another', 'list', 'of', 'strings', 'ordered', 'after', 'their', 'original', 'order']
a_list_of_strings

['another',
 'list',
 'of',
 'strings',
 'ordered',
 'after',
 'their',
 'original',
 'order']

In [50]:
a_set_of_strings = {'another', 'list', 'of', 'strings', 'ordered', 'after', 'their', 'original', 'order'}
a_set_of_strings  # as default, they come sorted if possible, but it's not a given!

{'after',
 'another',
 'list',
 'of',
 'order',
 'ordered',
 'original',
 'strings',
 'their'}

### Lists can contain duplicates; sets can not!

In [51]:
a_list_with_duplicates = [1, 1, 1, 4, 6, 34, 1]
a_list_with_duplicates

[1, 1, 1, 4, 6, 34, 1]

In [53]:
a_set_with_duplicates = {1, 1, 1, 4, 6, 34, 1}
a_set_with_duplicates

{1, 4, 6, 34}

## Possible usage of sets: type-to-token ratio

In [65]:
# suppose that we retrieve a list of words uttered by a child in a transcript
tokens = ['big','drum', 'horse', 'who', 'is', 'that', 'those', 'are', 'checkers', 'two', 'checkers',
          'yes', 'play', 'checkers', 'big', 'horn', 'get', 'over', 'Mommy', 'shadow', 'I', 'like', 'it']
types = set(tokens)  # convert the list to a set
print('TTR:', round(len(types) / len(tokens), 3))  # calculate and report TTR

TTR: 0.87
