[Node 8: Tools für Listen und Dicts](http://www-static.etp.physik.uni-muenchen.de/kurs/Computing/python2/node8.html)

Navigation:

[](node9.ipynb)[](node9.ipynb)[](node9.ipynb)

**Next:** [Iterables und Generatoren (yield)](node9.ipynb) **Up:** [Iterables und Generatoren (yield) **Previous:** [Iterables und Generatoren (yield)

## Tools for lists and dicts
You often want to select elements from a list that meet a certain criterion (filtering). Or a list should be transformed into another list. One could also apply a combination of filtering and transforming. The functions <font color=#0000e6> ``filter``</font> and <font color=#0000e6> ``map``</font> are available for this purpose. On the other hand, so-called list comprehensions can be used.
* Let's take a list <font color=#0000e6> ``[ 1, 2, 3, 4, 5]``</font> as an example.
* A mathematical operation (here: calculation of the root) is to be carried out with each list element.
* Only even elements should be selected
* Only even elements should be selected and multiplied by 10.

First <font color=#0000e6> ``filter``</font> and <font color=#0000e6> ``map``</font> :

In [None]:
liste1 = [ 1, 2, 3, 4, 5 ]
import math
map(math.sqrt, liste1) # apply math.sqrt to each element

In contrast to Python2, Python3 no longer returns a list, but an iterator. We can output the entries by explicitly converting them into a list:

In [None]:
list(map(math.sqrt, liste1))

In [None]:
list(map(lambda x: x**0.5, liste1)) # same with lambda function

In [None]:
list(filter(lambda x: x % 2 == 0, liste1))

In [None]:
list(map(lambda x: x*10, filter(lambda x: x % 2 == 0, liste1)))

### List comprehensions

Alternatively, you can use <font color=#0000ff> **list comprehensions**</font>.
The identity mapping returns the list itself:

In [None]:
[element for element in liste1]

The following less trivial examples do the same as we achieved above with `map` and `filter`:

In [None]:
[element**0.5 for element in liste1]

In [None]:
[element for element in liste1 if element % 2 == 0]

In [None]:
[element*10 for element in liste1 if element % 2 == 0]

List comprehensions have the following general form:

```python
[ expr(element) for element in iterable if pred(element) ]
```

With
* <font color=#0000e6> ``expr(element)``</font> any expression depending on <font color=#0000e6> ``element``</font>,
* <font color=#0000e6> ``iterable``</font> any sequence and
* <font color=#0000e6> ``pred(element)``</font> a function that <font color=#0000e6> ``True``</font> or <font color=#0000e6> `` False``</font> and depends on <font color=#0000e6> ``element``</font>.

---

You can also combine multiple lists:

In [None]:
[(x,y) for x in range(5) for y in range(5) ]

and also add <font color=#008000>*if*</font> conditions:

In [None]:
[(x,y) for x in range(5) if x % 2 == 0 for y in range(5) if y % 2 == 1]

This gives a combination of all even numbers from 0 to 4 and all odd numbers from 0 to 4. This is equivalent to a double <font color=#008000> *for*</font> loop:

In [None]:
result = []
for x in range(5):
   if x % 2 == 0:
      for y in range(5):
         if y % 2 == 1:
            result.append((x,y))
print(result)

Clearer and easier to understand with explicit for loops, but significantly more complex to write and somewhat slower to execute ([this post](https://stackoverflow.com/questions/30245397/why-is-a-list-comprehension-so-much-faster-than-appending-to-a-list) explains this with "suspending and resuming a function's frame, or multiple functions in other cases, is slower than creating a list on demand" ):

In [None]:
%%timeit
# Zeitmessung für Listenausdruck
[(x,y) for x in range(1000) if x % 2 == 0 for y in range(5) if y % 2 == 1]

In [None]:
%%timeit
# Zeitmessung für explizite for-Schleife
result = []
for x in range(1000):
   if x % 2 == 0:
      for y in range(5):
         if y % 2 == 1:
            result.append((x,y))

The easiest way to understand list expressions is to read them back to front and think of them as a series of `if` statements and `for` loops, as above.

Typical use case: transformation of a list (input = list, output = list).

Similar to <font color=#ff0000> **list comprehensions**</font> for lists, there are <font color=#0000ff> **dict comprehensions**</font> for dictionaries:

In [None]:
sqdict = { i : i**2 for i in range(10) }
print(sqdict)

### Merge lists with zip

In [None]:
a = [ 1, 2, 3 ]
b = ['a', 'b', 'c']
list(zip(a,b))

Returns combined list of <font color=#008000>tuples.</font>
 

Practical application - dot product (with [`sum`](https://docs.python.org/3/library/functions.html#sum) as a function that can run over iterators):

In [None]:
a = [ 0.3, 1.8, -2.2 ] 
b = [ -2.5, 3.8, 0.4]
sp = sum([ x*y for x,y in zip(a,b)])
print(sp)

Can be easily extended to combine two lists into one dict:

In [None]:
d = { x[1] : x[0] for x in zip(a, b) }
print(d)

In [None]:
# Oder direkter ...
d = dict(zip(b,a))
d


### defaultdict
 

In the <font color=#008000> *collections*</font> module there are useful helper classes for working with lists and dicts. Here's an example of how to determine the frequency of words in a text file:

In [None]:
# download Kant's text
import urllib.request
f = urllib.request.urlopen("https://goo.gl/rGqW4k")

# split into words and convert to unicode
words=[]
for line in f: # iteriere ueber alle Zeilen
    line=line.decode("utf-8") # Decoding the binary data to text.
    words += line.split() # packe Words in list

print ("Gesamtzahl der Wörter:", len(words))    
# or more direct w/ double list-comprehension:
# words=[ word for line in f for word in line.split() ]

Now let's try different methods to count the frequency of individual words (using the word "Vernunft (=reason)" as an example), initially without using any additional modules:

In [None]:
# count words v1 (if / else)
word_counts = {}
for word in words:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

print ("V1:", word_counts["Vernunft"])
# Umstaendlich ...

In [None]:
# count words v2 (try / except)
word_counts = {}
for word in words:
    try:
        word_counts[word] += 1
    except:
        word_counts[word] = 1

print ("V2:", word_counts["Vernunft"])
# Auch umstaendlich ...

In [None]:
# count words v3
from collections import defaultdict
# defaultdict(int) initialisiert Eintraege beim Ansprechen automatisch auf int() = 0
word_counts = defaultdict(int)
for word in words:
    word_counts[word] += 1

print ("V3:", word_counts["Vernunft"])

In [None]:
# oder noch einfacher ...
from collections import Counter
word_counts=Counter(words)
# Counter liefert eine Art dict zurück, das als Wert die Häufigkeit enthält... 
print ("V4:", word_counts["Vernunft"])

In [None]:
# und weitere Methoden...
print(word_counts.most_common(10)) # die 10 häufigsten...

### enumerate

Another common problem is that when iterating over a list, you want both the item and the index.

In [None]:
# find largest element in list,
# both index and value of this element
#
nums = [ 1,5,8,3,7,6,15,11 ] # list with some numbers -- largest element is 15 at position 6 (counting from 0)

In [None]:
# initialize
maxv = nums[0]
imax = 0

# classical method-1
for i in range(len(nums)): # index loop
    if nums[i]>maxv:
        maxv = nums[i]
        imax = i

print (maxv, imax)

In [None]:
# classical method-2
maxv=nums[0]
imax = 0
i = 0
for x in nums: # keep separate counter/index
    if x>maxv:
        maxv = x
        imax = i
    i += 1
print (maxv, imax)

In [None]:
# pythonic way: better use enumerate
#
maxv=nums[0]
imax = 0
for i,x in enumerate(nums): # provides index,value 
    if x>maxv:
        maxv = x
        imax = i
print (maxv, imax)

<font color=#0000e6> ``enumerate``</font> returns index and element together:

In [None]:
list(enumerate(nums))

### Extra options for standard functions

Python provides a couple of standard functions for common tasks, such as max, min, sort, etc.

Simple use is straightforward but there are extra options which increase functionality.

#### sort:

specify criteria

In [None]:
txtl = [ 'hello', 'Munich', 'terrible', 'Acronym']
# default sort
print(sorted(txtl))
#
# sort using ignoring lower/upper case
print(sorted(txtl,key=lambda x: x.lower()))
#
# sort using length
print(sorted(txtl,key=lambda x: len(x)))
print(sorted(txtl,key=len))



#### max


In [None]:
# return max element
a = [1,2,7,3,88,17]
print (max(a))

# find most frequent element
li=[1,5,8,6,5,9,6,9,5,6,9,6,5,4,"a","a","b","b","a","a","a"]
print(max(set(li), key=li.count))
