# Findall and its pitfalls
Learning goals:
 - Understand the different behavior of findall() depending on the presence of grouping parenthesis that allow capturing of submatches or backreferences in substitutions
 - Understand that alternatives in regexes are not commutative!
 - Understand how capturing groups work with findall

# `re.findall()`   
## Non-overlapping matches, groups and alternatives

Search for all non-overlapping matches with `re.findall()`. Umlauts work as expected.

In [None]:
import re
text = 'Viele KÃ¶che verderben den Brei.'
pattern = r'\w+'
print(re.findall(pattern, text))

## Alternatives are _not_ commutative
The order of Regex alternatives is relevant! Eager matching from left to right is applied.

In [None]:
re.findall(r'a|aa',"Saal")

In [None]:
re.findall(r'aa|a',"Saal")

## (Non-)referenceable groups
Round brackets result in referenceable groups

In [None]:
text = 'Blick-Leser, A-Post-Fans und andere Bindestrich-Komposita'
re.sub(r'(\w+-)+(\w+)', r'\2', text)

### Unreferenceable groups: (?: REGEX)
`(?: )` makes the groups unreferenceable


In [None]:
re.sub(r'(?:\w+-)+(\w+)', r'\1', text)

## Grouping  changes the return value of `re.findall()`

* **Without capturing groups**:  List of string matches

In [None]:
re.findall(r'ah|aa', "kahler Saal")

* **With capturing groups**:  List of tuples of strings, where the _i_th element contains the matched content of the _i_th grouping brackets.

In [None]:
re.findall(r'a(h)|a(a)', "kahler Saal")

How is that to be interpreted?

     kahler Saal
       |      |
     ('h',    '')  # 1. match
     ('',    'a')  # 2. match

- Without capturing groups: List of matches

In [None]:
re.findall(r'a(?:h)|a(?:a)', "kahler Saal")