# Python data types

Python has quite a number of built-in data types, i.e., data types that are part of the core language.  However, the standard library implements quite a number of additional data types, many of which also simplify your task as a developer. We will not cover each and every method that are defined for these data structures, but only those that are most commonly used. As always, it is a good idea to read through the [documentation](https://docs.python.org/3/library/index.html) for more information.

For this tutorial, we will use a very simple data file that allows us to illustrate use cases for Python's data types. The data file is a text file, with three columns. It represents patients by listing their first name, their age, and their weight. The data types are `str`, `int`, and `float` respectively. Colums are separated by spaces.

In [176]:
!cat Data/patients.txt

name age weight
ivy 35 59.9
bob 33 72.4
gitta 27 56.3
carol 33 58.7
elly 27 61.3
alice 35 63.2
freddy 33 78.9
harry 41 65.3
darren 35 68.5
john 40 67.1

In order to ensure that all cells in this notebook can be evaluated without errors, we will use `try`-`except` to catch exceptions. To show the actual errors, we need to print the backtrace, hence the following import.

In [10]:
from traceback import print_exc

## Tuples

It would be convenient to have a function that reads throught the file, and returns the patients one at the time.  But how to represent a patient so that it can be returned as a value from a function?  We can represent a patient as a tuple with three fields. The first represents the patient's name, the second her age, the third her weight.

In [178]:
patient = ('suzan', 15, 48.9)

The type of this data structure is `tuple`, and we can access the values of its fields by 0-based index.  The number of fields can be determined using the `len` function.

In [179]:
type(patient)

<class 'tuple'>

In [180]:
patient[0]

'suzan'

In [181]:
len(patient)

3

Similar to a `str` in Python, a `tuple` is immutable, i.e., it can not be modified after its creation. So no birthdays for our patients.

In [182]:
try:
    patient[1] = 16
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-182-a818907fd4b6>", line 2, in <module>
    patient[1] = 16
TypeError: 'tuple' object does not support item assignment


A `tuple` seems a reasonable data type to represent our patients, so let's proceed with the function to read them.

In [183]:
def read_patients(file_name):
    '''
    Generator function that yields the patients in a file.
        file_name: name of the data file
        returns tuples representing the patients
    '''
    with open(file_name, 'r') as patients:
        # read but ignore the column names
        _ = patients.readline()
        # iterate over all the patient data
        for line in patients:
            # strip the line endings, and split on whitespace
            data = line.rstrip().split()
            # turn this function into a generator by yielding a tuple
            # representing a patient; for convenience, do appropriate
            # type conversion
            yield (str(data[0]), int(data[1]), float(data[2]))

The `yield` statement interrupts the execution of the function, returning a value. When the function is called again, execution will resume from that same point in the function, retaining state.

Let's test the function by simply printing the results.

In [184]:
for patient in read_patients('Data/patients.txt'):
    print(patient)

('ivy', 35, 59.9)
('bob', 33, 72.4)
('gitta', 27, 56.3)
('carol', 33, 58.7)
('elly', 27, 61.3)
('alice', 35, 63.2)
('freddy', 33, 78.9)
('harry', 41, 65.3)
('darren', 35, 68.5)
('john', 40, 67.1)


That seems to work quite well.  Let's proceed with some data analytics.

## Named tuples

Although this implementation works, it has a drawback. We have to remember that `patient[0]` is the name, `patient[1]` the age, and `patient[2]` the weight of the patient.  This is error-prone, and may give rise to subtle bugs.  It would be much more convenient if we could access the `tuple`'s fields by name, rather than by index.  Python's standard library implements a convenient way to define named tuples through `typing.NameTuple`.

In [19]:
from typing import NamedTuple

class Patient(NamedTuple):
    name: str
    age: int
    weight: float

We will cover defining our own classes elsewhere, but this is quite staightforward. Note that `typing.NamedTuple` was introduced in Python 3.5, for earlier version of Python, you can use the somewhat more cumbersome `collections.namedtuple`.  A named tuple of type `Patient` has three fields, `name`, `age`, and `weight`, which we declare using type hints, `str`, `int`, and `float` respectively.  Let's redefine the tuple representing Suzan.

In [186]:
patient = Patient('suzan', 15, 48.9)

The type of this data structure is `Patient`, and we can access the values of its fields by 0-based index, but also by name.  The number of fields can be determined using the `len` function.

In [187]:
type(patient)

<class '__main__.Patient'>

In [188]:
patient[0], patient.name

('suzan', 'suzan')

In [189]:
len(patient)

3

Similar to Python's built-in `tuple`, a named tuple is immutable, i.e., it can not be modified after its creation.

In [190]:
try:
    patient.age = 16
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-190-6ab0ca5769fd>", line 2, in <module>
    patient.age = 16
AttributeError: can't set attribute


Modifying the function `read_patients` to return named tuples is trivial.

In [20]:
def read_patients(file_name):
    '''
    Generator function that yields the patients in a file.
        file_name: name of the data file
        returns named tuples representing the patients
    '''
    with open(file_name, 'r') as patients:
        # read but ignore the column names
        _ = patients.readline()
        # iterate over all the patient data
        for line in patients:
            # strip the line endings, and split on whitespace
            data = line.rstrip().split()
            # turn this function into a generator by yielding a tuple
            # representing a patient; for convenience, do appropriate
            # type conversion
            yield Patient(str(data[0]), int(data[1]), float(data[2]))

To illustrate, let's read the paient, but only print their names.

In [192]:
for patient in read_patients('Data/patients.txt'):
    print(patient.name)

ivy
bob
gitta
carol
elly
alice
freddy
harry
darren
john


## Data classes

_Note:_ Data classes are a new feature of Python 3.7, and don't work with earlier versions.

Data classes ofer more flexibility than named tuples since they can have attributes you can modify after creating an instance of the data class.  They can also serve as the basis for more sophisticated classes, a subject that is cover in an [other notebook](oo_programming.ipynb).

For our running example, let's represent the patient information as an insatnce of a data class.

In [21]:
try:
    from dataclasses import dataclass

    @dataclass
    class Patient:
        name: str
        age: int
        weight: float

except:
    print_exc()
    import sys
    print(f'Error: you are using Python {sys.version}, version 3.7 or higher is required')

The difference between the implmentation in the previous section using `NamedTuple` and this one is subtle.  The previous version inherited properties for the `NameTuple` class, where this versino is generated using the `datacclass` decorator.  We create a new point exactly as we did previously.

Superficially, there are few differences when using this version of `Patient` when compared to the previous one.

In [7]:
patient = Patient('suzan', 15, 48.9)

The type of this data structure is `Patient`, and we can access the values of its fields by 0-based index, but also by name.  The number of fields can be determined using the `len` function.

In [8]:
type(patient)

__main__.Patient

Accessing an attribute by name works as expected.

In [16]:
patient.name

'suzan'

However, this is not a tuple, so accessing the attributes by index will not work.

In [13]:
try:
    patient[0]
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-13-de9025dc8d68>", line 2, in <module>
    patient[0]
TypeError: 'Patient' object does not support indexing


The `len` function will also fail, again since this `Patient` is not a tuple.

In [17]:
try:
    len(patient)
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-17-1fdf86191d06>", line 2, in <module>
    len(patient)
TypeError: object of type 'Patient' has no len()


As opposed to `NamedTuple` objects, an instance of a data class can be modified.

In [18]:
patient.age = 16
patient

Patient(name='suzan', age=16, weight=48.9)

The implementation of the `read_patient` function is the exact same as the one we previously used. To illustrate, let's read the paient, but only print their names.

In [22]:
for patient in read_patients('Data/patients.txt'):
    print(patient.name)

ivy
bob
gitta
carol
elly
alice
freddy
harry
darren
john


## Lists

A `list` is a very convenient data type, representing an ordered sequence.  We can for instance create a list of names of our patients.

In [193]:
patient_names = list()
for patient in read_patients('Data/patients.txt'):
    patient_names.append(patient.name)

We start off with an empty list, created using the `list()` function, and we append the name of each patient to it. The resulting list contains the patient names, in the order they have been read from the file, and added to the list `patient_names`.

In [194]:
patient_names

['ivy', 'bob', 'gitta', 'carol', 'elly', 'alice', 'freddy', 'harry', 'darren', 'john']

List elements can be accessed by 0-based index, and the length of a list can be obtained using the `len` function.

In [195]:
patient_names[1]

'bob'

In [196]:
len(patient_names)

10

Trying to access an element using an invalid index, e.g., one that is larger than the length of the `list` raises an exception.

In [197]:
try:
    print(patient_names[len(patient_names)])
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-197-818684a89767>", line 2, in <module>
    print(patient_names[len(patient_names)])
IndexError: list index out of range


We can iterate over the elements of a `list` using a `for`-loop.

In [198]:
for patient_name in patient_names:
    print(patient_name.capitalize())

Ivy
Bob
Gitta
Carol
Elly
Alice
Freddy
Harry
Darren
John


In contrast to a `tuple`, the elements of a list can be modified, for instance, if we want to change `'bob'` to `'robert'`, we can just do that.

In [199]:
patient_names[1] = 'robert'

Surreptitiously, we have already used a `list` when we used the `split` method on the lines of our data file. The inverse operation, converting a `list` of `str` type elements can be accomplished using the `join` method.

In [200]:
', '.join(patient_names)

'ivy, robert, gitta, carol, elly, alice, freddy, harry, darren, john'

Note that `join` is a method defined on a `str`, the separator, and that is argument must be an iterable over `str` values.

Indices can be negative, for instance the element at index `-1` would be the last element of the list, `-2` the second but last, and so on. Hence, if the list has $n$ elements, legal index values run from $-n$ to $n - 1$. Although that is a nice shortcut, it is also a good way to introduce subtle bugs in your code.

In [201]:
patient_names[-1] == patient_names[len(patient_names) - 1]

True

In [202]:
patient_names[0] == patient_names[-len(patient_names)]

True

### Modifying & querying lists

Whereas the number of fields of a `tuple` is always the same, elements can be added to or removed from a list at any time. Besides the `append` method we have already used to build the list, there is the `insert` method to add elements anywhere in the list, and the `pop` method to remove elements.

In [203]:
patient_names.insert(2, 'kathy')

In [204]:
patient_names.pop(5)

'elly'

Our `patient_names` list now has `'kathy'` as the third element, while `'elly'` is no longer an element.

In [205]:
', '.join(patient_names)

'ivy, robert, kathy, gitta, carol, alice, freddy, harry, darren, john'

Using the `pop()` method without an index will remove and return the list's last element, `'john'` in this case.

In [206]:
patient_names.pop()

'john'

Checking list membership is easy using the `in` operator, for instance, `'robert'` is in our list, while `'mary'` isn't.

In [207]:
'robert' in patient_names

True

In [208]:
'mary' in patient_names

False

The `index` method will return... the index at which an element first occurs in a list, and raises an exception otherise.

In [209]:
patient_names.index('robert')

1

In [210]:
try:
    patient_names.index('mary')
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-210-f6efb2b26643>", line 2, in <module>
    patient_names.index('mary')
ValueError: 'mary' is not in list


Note that the same element can occur multiple times in a list, for instance, we can add another `'alice'` at the end of the list.

In [211]:
patient_names.append('alice')

The `index` method will only return the index of the first occurence.  However, since it can optionally be called with a start, and and end index, we can also search from the end of the list.

In [212]:
patient_names.index('alice')

5

In [213]:
patient_names.index('alice', -1)

9

The `pop` method removes an element at a given index, the `remove` method removes an element by value. If we remove `'alice'` from the list, only the `'alice'` at the end of the list will remain.  Removing a value that doesn't occur in the list will raise an exception.

In [214]:
', '.join(patient_names)

'ivy, robert, kathy, gitta, carol, alice, freddy, harry, darren, alice'

In [215]:
patient_names.remove('alice')

Just as for the `index` method, using -1 as a start index would remove the last `'alice'` from the `list`.

In [216]:
', '.join(patient_names)

'ivy, robert, kathy, gitta, carol, freddy, harry, darren, alice'

### Aliasing versus copying

It may be a bit counter-intuitive, but assigning a list to a new variable doesn't copy the list. The new variable refers to the same list as the original one. We assign `patient_names` to another variable, remove the first element, and check the value of both variables.

In [41]:
other_patient_names = patient_names

In [42]:
other_patient_names.pop(0)

'ivy'

In [43]:
', '.join(other_patient_names)

'robert, kathy, gitta, carol, freddy, harry, darren, alice'

In [44]:
', '.join(patient_names)

'robert, kathy, gitta, carol, freddy, harry, darren, alice'

The `copy` method will create an actual copy of the original list.

In [45]:
other_patient_names = patient_names.copy()

In [46]:
other_patient_names.pop()

'alice'

In [47]:
', '.join(other_patient_names)

'robert, kathy, gitta, carol, freddy, harry, darren'

In [48]:
', '.join(patient_names)

'robert, kathy, gitta, carol, freddy, harry, darren, alice'

### Slicing

We can create sublists out of list by "slicing", i.e., indexing by a start index, and end index, and, optionally, a stride.  For instance, we could select the first three elements of the list, or the second to the sixth element, but only every other element.

In [49]:
patient_names[0:3]

['robert', 'kathy', 'gitta']

In [50]:
patient_names[1:6:2]

['kathy', 'carol', 'harry']

Both the start and end index can be left out, that means that the slice starts from the beginning, or runs up to the end of the list respectively. Combined with negative indices, this is quite expressive. Getting the first, or the last three elements of list is quite trivial that way.

In [51]:
patient_names[:3]

['robert', 'kathy', 'gitta']

In [52]:
patient_names[-3:]

['harry', 'darren', 'alice']

Leaving out both start and end index, and using a stride of -1 is a neat thrick to reverse a list.

In [53]:
patient_names[::-1]

['alice', 'darren', 'harry', 'freddy', 'carol', 'gitta', 'kathy', 'robert']

Note that slicing creates a new list, but that it is a shallow copy. This is important when list elements are mutable.

### Creating lists

The simplest way to construct a list is by a literal enumeration of its elements.

In [54]:
first_names = ['peter', 'sally', 'vaughan', 'sophie', 'patrick']

However, new lists can be constructed from iterables by comprehension. For instance, a list of capitalized names can be constructed.

In [55]:
[name.capitalize() for name in first_names]

['Peter', 'Sally', 'Vaughan', 'Sophie', 'Patrick']

It is also possible to select only certain elements for the new list by adding an `if` clause to the comprehension.

In [56]:
[name.capitalize() for name in first_names if name.startswith('p')]

['Peter', 'Patrick']

Returning to our running example, we can combine this to select the names of the patients that are older than 35.

In [57]:
[patient.name for patient in read_patients('Data/patients.txt') if patient.age > 35]

['harry', 'john']

Using the `list` function, we can also create a list of the patients in our data file using the generator.

In [58]:
patients = list(read_patients('Data/patients.txt'))

In [59]:
patients[0].age

35

Lists can be concatenated using the `+` operator, or extended throught the `extend` method.

In [173]:
['alice', 'bob'] + ['carol', 'dave']

['alice', 'bob', 'carol', 'dave']

## Sets

Which ages are represented in our group of patients? When we want to answer this question, we actually ask for a mathematical set containing the ages, each element of the set occurs only once. We could accomplish this using a `list` as well, but that would be pretty cumbersome.

In [60]:
age_list = list()
for patient in read_patients('Data/patients.txt'):
    if patient.age not in age_list:
        age_list.append(patient.age)

In [61]:
age_list

[35, 33, 27, 41, 40]

Using Python's built-in `set` type, this task is not only easier, but the data structure represents the mathematical concept we actually have in mind.

In [62]:
age_set = set()
for patient in read_patients('Data/patients.txt'):
    age_set.add(patient.age)

In [63]:
age_set

{27, 33, 35, 40, 41}

This is even simpler using a set comprehension, similar to the list comprehension we've encountered before.

In [64]:
age_set = {patient.age for patient in read_patients('Data/patients.txt')}

In [65]:
age_set

{27, 33, 35, 40, 41}

The number of elements of a `set` is obtained using the `len` function, and we can test membership using the `in` operator.

In [66]:
len(age_set)

5

In [67]:
40 in age_set

True

In [68]:
53 in age_set

False

As opposed to a `list`, a `set` is not a sequence, i.e., it is not ordered, it has no first, second, or last element. This is of course the same for a mathematical set, which is obviously no coincidence.

Iterating over the elements of a `set` is done using a `for`-loop.

In [69]:
for age in age_set:
    print(age)

33
35
40
41
27


### Modifying sets

The `add` method adds an element to an existing set. To remove an element, we can use the `pop()` method.

In [70]:
age_set.pop()

33

Note that the element that is removed is random (well, implementation dependent, to be precise). To remove an element from the set, you can use either the `remove` or `discard` method.

In [71]:
age_set.remove(41)

In [72]:
try:
    age_set.remove(41)
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-72-c2dc3b58f16a>", line 2, in <module>
    age_set.remove(41)
KeyError: 41


The `discard` method will not raise an exception when we try to remove an element that is not in the set. To remove all elements from a `set`, we can use the `clear` method.

In [73]:
age_set.clear()

In [74]:
len(age_set)

0

### Set operations

All the mathematical operation you would expect on sets are indeed defined, e.g., union (`|`), intersection (`&`), difference (`-`), symmetric difference (`^`).

In [75]:
set1 = {1, 2, 3, 4, 6, 12}
set2 = {1, 3, 5, 15}

In [76]:
set1 | set2

{1, 2, 3, 4, 5, 6, 12, 15}

In [77]:
set1 & set2

{1, 3}

In [78]:
set1 - set2

{2, 4, 6, 12}

In [79]:
set2 - set1

{5, 15}

In [80]:
set1 ^ set2

{2, 4, 5, 6, 12, 15}

All operations create a new set, an method to perform these operations in place is implemented, mainly for performance reasons.  For example, `difference_update` applied to `set1` would modify that set.

In [81]:
set1.difference_update(set2)

In [82]:
set1

{2, 4, 6, 12}

Three Boolean methods are available as well,
  * `s1.isdisjoint(s2)` will test whether the sets are disjoint,
  * `s1.issubset(s2)` will check whether `s1` is a subset of `s2`, and
  * `s1.issuperset(s2)` checks whether `s1` is a superset of `s2`.

## Dictionaries

A `dict` is a mapping from a set of keys to a map of values, each key maps onto exactly one value, but multiple keys may map the the same value. This corresponds to a mathematical relatino called an injection. `dict`s are a very convenient data type to represent associations between objects.

bak to the running example. What if we want to know how many patient there aer of each age? So with an age, we want to associate a count. We can represent this by a dictonary that has a patient's age as keys, and the number of patients at that age as values.

In [83]:
age_distr = dict()
for patient in read_patients('Data/patients.txt'):
    if patient.age not in age_distr:
        age_distr[patient.age] = 0
    age_distr[patient.age] += 1

In [84]:
age_distr

{35: 3, 33: 3, 27: 2, 41: 1, 40: 1}

We start off with an empty `dict`, and iterate over the patients as usual. For each patient, we check whether her age is in the `dict` as a key using the `in` operator.  If not, we associate the value 0 with that key.  Next, we increment the value associated with the key.` 

We can get the value associated with a key by using the latter as an index, and we can get the number of key/value pairs using the `len` function.

In [85]:
age_distr[33]

3

In [86]:
len(age_distr)

5

Trying to access a `dict` with a key not in the `dict` raises an exception.

In [87]:
try:
    print(age_distr[-5])
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-87-747f39179763>", line 2, in <module>
    print(age_distr[-5])
KeyError: -5


We can iterate over the keys in a dictionary, and use that to access the corresponding value.

In [88]:
for age in age_distr:
    print(f'{age}: {age_distr[age]}')

35: 3
33: 3
27: 2
41: 1
40: 1


This pattern is so common that the `items` method is defined for `dict`s. When called, this method yields `tuple`s where the first field is the key, the second the value.

In [89]:
for age, count in age_distr.items():
    print(f'{age}: {count}')

35: 3
33: 3
27: 2
41: 1
40: 1


Note that starting from Python 3.6, keys and item are returned in insertion order. For older version of Python, the order can and should not be relied on.

### Modifying dictionaries

New items can be added to a `dict` at any time, or existing values can be updated by assignment.

In [90]:
age_distr[20] = 1

In [91]:
age_distr

{35: 3, 33: 3, 27: 2, 41: 1, 40: 1, 20: 1}

In [96]:
age_distr[20] = 4

In [93]:
age_distr

{35: 3, 33: 3, 27: 2, 41: 1, 40: 1, 20: 4}

An item can be removed from the `dictt` using the `remove` method, which takes the key as an argument. The value associated with that key is returned.  If we try to remove an item with a key that is not in the `dict`, an exception is raised.

In [97]:
age_distr.pop(20)

4

In [98]:
try:
    age_distr.pop(20)
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-98-97bd61c7637a>", line 2, in <module>
    age_distr.pop(20)
KeyError: 20


Just like sets and lists, dictionaries can also be constructed using comprehensions. 

Just like sets and lists, dictionaries can also be constructed using comprehensions. We will create some people with random ages.

In [106]:
import random
from random import randint
random.seed(1234)

In [107]:
people = {name: randint(20, 40) for name in ['zoe', 'wolf', 'thomas']}
people

{'zoe': 34, 'wolf': 23, 'thomas': 20}

In [108]:
more_people = {name: randint(20, 40) for name in ['elsie', 'frank', 'thomas']}
more_people

{'elsie': 22, 'frank': 38, 'thomas': 21}

Two dictionaries can be merged using the `update` method, which modifies the `dict` is is applied to in-place.

In [109]:
people.update(more_people)
people

{'zoe': 34, 'wolf': 23, 'thomas': 21, 'elsie': 22, 'frank': 38}

The items of `more_people` have been added to `people`, overwriting the original value when an item with that key already existed.

## Nesting data structures

With the excpetion of our list of patients, `list` of `tupule`, all the example we consided so far had elements that were simple types like `str`, `int` or `float`.  However, it is possible, and often useful to build more complicated data structures out of `list`, `set`, and `dict`.

For example, we could represent a matrix as a `list` of `list`, where each element of the outer `list` represents a row of the matrix. (This is just for illustratino purposes, for mathematical applications, you would use numpy arrays, both for performance and features.)

In [127]:
rows = 3
cols = 4
matrix = list()
for i in range(rows):
    matrix.append(list())
    for j in range(cols):
        matrix[i].append(i*cols + j)

In [128]:
matrix

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

In [129]:
matrix[1][2]

6

In [130]:
matrix[0][3] = -17

In [131]:
matrix

[[0, 1, 2, -17], [4, 5, 6, 7], [8, 9, 10, 11]]

### Shallow copying

Remember the somewhat cryptic remark on shallow copies? Let's explore that here.  We create a copy of our matrix, and modify it.

In [132]:
new_matrix = matrix.copy()

We can add an extra row to `new_matrix`, and `matrix` is unaffected since we created a copy.

In [133]:
new_matrix.append([3, 9, 7, 5])

In [134]:
matrix

[[0, 1, 2, -17], [4, 5, 6, 7], [8, 9, 10, 11]]

Now, we modify and element of `new_matrix`.

In [135]:
new_matrix[0][0] = 103

In [136]:
new_matrix

[[103, 1, 2, -17], [4, 5, 6, 7], [8, 9, 10, 11], [3, 9, 7, 5]]

This is all as expected, but inspecting `matrix` may yield a surprise.

In [137]:
matrix[0][0]

103

It seems that the elements of `matrix` and `new_matrix`, i.e., the first three rows, are the same `list` objects.  However, `matrix` and `new_matrix` themselves are distinct objects. This is easy to verify using the `is` operator that tests object identify.

In [139]:
matrix[0] is new_matrix[0]

True

In [140]:
matrix is not new_matrix

True

Clearly, the `copy` method produces a shallow copy of the original data structure. Although we illustrated this for `list`, the same is true for both `set` and `dict` as well.

### Limitations

We can store elements of type `list`, `set` and `dict` in lists without issues. For instance, we can create a `list` of `set`.

In [144]:
list_of_sets = list()
for size in range(5):
    list_of_sets.append(set(range(size)))
list_of_sets

[set(), {0}, {0, 1}, {0, 1, 2}, {0, 1, 2, 3}]

However, not  everything can be stored in a `set` or a `dict`. For instance, if we try to create a `set` of lists, an exception is raised.

In [146]:
try:
    set_of_lists = set()
    for size in range(5):
        set_of_lists.add(list(range(size)))
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-146-6790c7469f72>", line 4, in <module>
    set_of_lists.add(list(range(size)))
TypeError: unhashable type: 'list'


Some types in Python are 'hashable', while others are not. For instance, a `float` has a `__hash__` function, while a `list` has not.

In [150]:
(17.3).__hash__()

691752902764109841

In [153]:
try:
    list_of_sets.__hash__()
except:
    print_exc()

Traceback (most recent call last):
  File "<ipython-input-153-a6e01c1e4470>", line 2, in <module>
    list_of_sets.__hash__()
TypeError: 'NoneType' object is not callable


This implies that only values of types that have a `__hash__` method implemented, and are, by definition, 'hashable' can be stored in a `set`, or used as *keys* in a `dict`. Any type can occur as *value* in a `dict` though, regardless of whether it is hashable or not.

### Smell

If you are using nested data structures, that might actually a code smell, i.e., an indicator that your code design could stand improvement. Quite often, it means that it is time to consider introducing classes to represent the concepts you are trying to model.

## Choices, choices, choices...

How to pick the correct data structure? Of course, there are quite a number of considerations. However, as a rule of thumb, if you pick a data structure that corresponds to the mathemaical model you have in mind while programming, that is probably a good start.
  * Tuples can represent elements of carthesian products of sets.
  * Lists represent ordered sequences of objects.
  * Sets repreesent... well, mathematical sets.
  * Dictionaries represent mathematical relations, or associations.

Sometimes, your code can be simplified by picking the right data type.  For instance, if yuo find yourself representing something, say, the information on a patient, by a list, and for each patient, the same information is available, and you don't change it in your code, than you might want to replace the list by a tuple>

Similarly, when you represent that information by a dictionary, you can replace it by a named tuple.

If you use a list, but add some item only when it isn't an element yet, you may want to use a set.

Another consideration is performance, some operations are fast on certain data structures, slow and others, and vice versa.  Take membership test for example, this is a lot faster on a set, than on a list. In fact, for large number of elements, the difference can be huge.

In [154]:
s = set(range(1_000_000))
l = list(range(1_000_000))

In [155]:
%timeit 1_000_001 in s 

98.4 ns ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [156]:
%timeit 1_000_001 in l

37.9 ms ± 8.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Other standard library data structures

Python's standard library contains many additional data structures, some quite specialized. We will only discuss a few here that might prove useful. Again, it pays of to read through the documentation to familiarize yourself with what is available.

### Arrays

Although arrays are very useful data structures, the implementation in Python's standaard library is not so great. However, they can be useful to get some extra performance if you can not use numpy for some reason, or if you require interoperatbility with C. As opposed to lists, array elements all have the same type, and that is specified when you crate the array.  Although `array` supports methods such as `append`, `insert` and `pop`, you would probably want to avoid those for performance reasons.

In [158]:
from array import array

In [160]:
weights = array('d', (patient.weight for patient in read_patients('Data/patients.txt')))
weights

array('d', [59.9, 72.4, 56.3, 58.7, 61.3, 63.2, 78.9, 65.3, 68.5, 67.1])

Elements can be accessed by index, and can be modified. The length of an `array` can be determined using the `len` function.

In [161]:
weights[1]

72.4

In [162]:
weights[0] = 59.3

In [163]:
len(weights)

10

Contrary to what you might expect, the `+` operator represents array concatenation, rather than element-wise addition.  Hence to do mathematics on arrays in Python, numpy is the way to go

In [165]:
weights + weights

array('d', [59.3, 72.4, 56.3, 58.7, 61.3, 63.2, 78.9, 65.3, 68.5, 67.1, 59.3, 72.4, 56.3, 58.7, 61.3, 63.2, 78.9, 65.3, 68.5, 67.1])