# Python training UGA 2017

**A training to acquire strong basis in Python to use it efficiently**

Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)

# [Data structures](https://docs.python.org/3/tutorial/datastructures.html)

4 built-in containers: list, tuple, set and dict...

For more containers: see [collections](https://docs.python.org/3/library/collections.html)...

### list: mutable sequence

Lists are mutable ordered tables of inhomogeneous objects. They can be viewed as an array of references (nearly pointers) to objects.

In [1]:
# 2 equivalent ways to define an empty list
l0 = []
l1 = list()
assert l0 == l1

# not empty lists
l2 = ['a', 2]
l3 = list(range(3))
print(l2, l3, l2 + l3)
print(3 * l2)

['a', 2] [0, 1, 2] ['a', 2, 0, 1, 2]
['a', 2, 'a', 2, 'a', 2]


The [`itertools`](https://docs.python.org/3/library/itertools.html) module provide other ways of iterating over lists or set of lists (e.g. cartesian product, permutation, filter, ... ).

### list: mutable sequence

The builtin function `dir` returns a list of name of the attributes. For a list, these attributes are python system attributes (with double-underscores) and 11 public methods:


In [2]:
print(dir(l3))

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [3]:
l3.append(10)
print(l3)
l3.reverse()
print(l3)

[0, 1, 2, 10]
[10, 2, 1, 0]


In [4]:
# Built-in functions applied on lists
# return lower value
print(min(l3))
# return higher value
print(max(l3))
# return sorted list
print(sorted([5, 2, 10, 0]))

0
10
[0, 2, 5, 10]


In [5]:
# "pasting" two lists can be done using zip
l1 = [1, 2, 3]
s = 'abc'
print(list(zip(l1, s)))
print(list(zip('abc', 'defg')))

[(1, 'a'), (2, 'b'), (3, 'c')]
[('a', 'd'), ('b', 'e'), ('c', 'f')]


### `list`: list comprehension

They are iterable so they are often used to make loops. We have already seen how to use the keyword `for`. For example to build a new list (side note: `x**2` computes `x^2`):

In [6]:
l0 = [1, 4, 10]
l1 = []
for number in l0:
    l1.append(number**2)
    
print(l1)

[1, 16, 100]


There is a more readable (and slightly more efficient) method to do such things, the "list comprehension":

In [7]:
l1 = [number**2 for number in l0]
print(l1)

[1, 16, 100]


In [8]:
# list comprehension with a condition
[s for s in ['a', 'bbb', 'e'] if len(s) == 1]

['a', 'e']

In [9]:
# lists comprehensions can be cascaded
[(x,y) for x in [1,2] for y in ['a','b'] ]

[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

### `tuple`: immutable sequence

Tuples are very similar to lists but they are immutable (they can not be modified).

In [10]:
# 2 equivalent notations to define an empty tuple (not very useful...)
t0 = ()
t1 = tuple()
assert t0 == t1

# not empty tuple
t2 = (1, 2, 'a')  # with the parenthesis
t2 = 1, 2, 'a'    # it also works without parenthesis
t3 = tuple(l3)  # from a list

In [11]:
# tuples only have 2 public methods (with a list comprehension)
[name for name in dir(t3) if not name.startswith('__')]

['count', 'index']

In [12]:
# assigment of multiple variables in 1 line
a, b = 1, 2
print(a, b)
# exchange of values
b, a = a, b
print(a, b)

1 2
2 1


### `tuple`: immutable sequence

Tuples are used *a lot* with the keyword `return` in functions:

In [13]:
def myfunc():
    return 1, 2, 3

t = myfunc()
print(type(t), t)
# Directly unpacking the tuple
a, b, c = myfunc()
print(a, b, c)

<class 'tuple'> (1, 2, 3)
1 2 3


### `set`: a hashtable

Unordered collections of unique elements (a hashtable). Sets are mutable. The elements of a set must be [hashable](https://docs.python.org/3/glossary.html#term-hashable).

In [14]:
s0 = set()

In [15]:
{1, 1, 1, 3}

{1, 3}

In [16]:
set([1, 1, 1, 3])

{1, 3}

In [17]:
s1 = {1, 2}
s2 = {2, 3}
print(s1.intersection(s2))
print(s1.union(s2))

{2}
{1, 2, 3}


### `set`: lookup

Hashtable lookup (for example `1 in s1`) is algorithmically efficient (complexity O(1)), i.e. theoretically faster than a look up in a list or a tuple (complexity O(size iterable)).

In [18]:
print(1 in s1, 1 in s2)

True False


### What is a hashtable?

https://en.wikipedia.org/wiki/Hash_table

## DIY: back to the "find the removed element" problem

In [19]:
from random import shuffle, randint

n = 20
i = randint(0, n-1)
print('integer remove from the list:', i)
l = list(range(n))
l.remove(i)
shuffle(l)
print('shuffled list: ', l)

integer remove from the list: 8
shuffled list:  [2, 0, 3, 12, 14, 9, 11, 16, 5, 4, 18, 10, 6, 1, 19, 7, 17, 13, 15]


  - Could the problem be solved using set ? 
  - What is the complexity of this solution ? 

## A possible solution : 

In [20]:
full_set = set(range(n))
changed_set = set(l)
ns = full_set - changed_set
ns.pop()

8

### `dict`: unordered set of key: value pairs

The dictionary (`dict`) is a very important data structure in Python. All namespaces are (nearly) dictionaries and "Namespaces are one honking great idea -- let's do more of those!" (The zen of Python).

A dict is a hashtable (a set) + associated values.

In [21]:
d = {}
d['b'] = 2
d['a'] = 1
print(d)

{'b': 2, 'a': 1}


In [22]:
d = {'a': 1, 'b': 2, 0: False, 1: True}
print(d)

{'a': 1, 'b': 2, 0: False, 1: True}


### Tip: parallel between `dict` and `list`

You can first think about `dict` as a super `list` which can be indexed with other objects than integers (and in particular with `str`).

In [23]:
l = ["value0", "value1"]
l.append("value2")
print(l)

['value0', 'value1', 'value2']


In [24]:
l[1]

'value1'

In [25]:
d = {"key0": "value0", "key1": "value1"}
d["key2"] = "value2"
print(d)

{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}


In [26]:
d["key1"]

'value1'

But warning, `dict` are not ordered (since they are based on a hashtable)!

### `dict`: public methods

In [27]:
# dict have 11 public methods (with a list comprehension)
[name for name in dir(d) if not name.startswith('__')]

['clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

### `dict`: different ways to loop over a dictionary

In [28]:
# loop with items
for key, value in d.items():
    if isinstance(key, str): 
        print(key, value)

key0 value0
key1 value1
key2 value2


In [29]:
# loop with values
for value in d.values():
    print(value)

value0
value1
value2


In [30]:
# loop with keys
for key in d.keys():
    print(key)

key0
key1
key2


In [31]:
# dict comprehension (here for the "inversion" of the dictionary)
print(d)
d1 = {v: k for k, v in d.items()}

{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}


## Do it yourself:

Write a function that returns a dictionary containing the number of occurrences of letters in a text.

In [32]:
text = 'abbbcc'

#### A possible solution:

In [33]:
def count_elem(sequence):
    d = {}

    for letter in sequence:
        if letter not in d:
            d[letter] = 1
        else:
            d[letter] += 1
    return d

print("text=", text, "counts=", count_elem(text))


text= abbbcc counts= {'a': 1, 'b': 3, 'c': 2}
