[Node 11: Reguläre Ausdrücke ](http://www-static.etp.physik.uni-muenchen.de/kurs/Computing/python2/node11.html)

Navigation:

**Next:** [Aufgaben](node12.ipynb) **Up:** [Aufgaben](node12.ipynb) **Previous:** [Aufgaben](node12.ipynb)

## Regular expressions
In character strings, you can usually search for simple sub-character strings and possibly replace them. For this you can use the simple methods <font color=#0000e6> ``index, rindex, find, rfind, replace``</font> and the operator <font color=#0000e6> ``in``</font> . (`index` and `find` only differ in their behavior if the searched string is not included.)

With the help of regular expressions you can search for complicated patterns in character strings and replace parts of the character strings. A detailed description with examples of regular expressions can be found under: <font color=#0000ff>[Regular Expressions](http://docs.python.org/howto/regex.html)</font>

The Python module <font color=#0000ff> **re**</font> provides numerous functions for using regular expressions.

<font color=#0000e6> ``re.search``</font> versus <font color=#0000e6> ``in``</font> :

In [None]:
import re
input = 'Franz jagt im komplett verwahrlosten Taxi quer durch Bayern'
a=re.search(r'Taxi', input)

In [None]:
a.group(0)

In [None]:
'Taxi' in input

In [None]:
a=re.search(r'Bus', input) # not found, returns None
print(a)

In [None]:
'Bus' in input

A string that begins with <font color=#0000e6> ``r``</font> is called a 'raw' string. In this, no backslashes have to be escaped, i.e. in the following example you enter either ``r'\bTaxi\b'`` or ``'\\bTaxi\\b'``.

If you want to search for single words, <font color=#0000e6> ``re.search``</font> shows its advantage with the extra parameter ``\b`` (``\b`` is “wild-card ” for word boundary):

In [None]:
input1 = 'Franz jagt im komplett verwahrlosten Taxi quer durch Bayern'
input2 = 'Der Taxibus ist zu spaet'
'Taxi' in input1, 'Taxi' in input2

In [None]:
re.search(r'\bTaxi\b', input1), re.search(r'\bTaxi\b', input2)

To replace, use <font color=#0000e6> ``re.sub``</font> :

In [None]:
input1 = 'Franz jagt im komplett verwahrlosten Taxi quer durch Bayern'
output=re.sub('Taxi', 'Bus', input)
output

With the <font color=#0000e6> ``match``</font> object you can access parts of the string that match regular expressions. With
```python
r'(\b\w+\b)\s+\1'
```
can be searched for a double occurring word:

In [None]:
input = 'Franz jagt im komplett verwahrlosten Taxi quer quer durch Bayern'
mo = re.search(r'(\b\w+\b)\s+\1',input)
mo

(Explanation: `\b\w+\b` matches a word (multiple (`+`) alphanumeric characters (`\w`) within word boundaries `\b`), the brackets around it mark this expression for further use, `\s+ ` matches at least one (`+`) white space (`\s`) and `\1` is a reference back to the previously parenthesized expression.)

In [None]:
input = 'Franz jagt im komplett verwahrlosten Taxi quer und quer durch Bayern'
mo = re.search(r'(\b\w+\b).+\1',input)
mo

(Explanation: Now the  `\.+ ` matches a least one (`+`) arbitrary character, so it will find the first ocurence of any word appearing twice.)

In [None]:
mo.group(0)

In [None]:
mo.group(1)

In [None]:
mo.start()

In [None]:
mo.span()

In [None]:
input[42: 51]

<font color=#0000e6> ``re.search``</font> returns only the <font color=#0000ff> **first**</font> occurrence of a search pattern. You can get all occurrences with <font color=#0000e6> ``re.findall``</font> or <font color=#0000e6> ``re.finditer``</font> .
 

<font color=#0000e6> ``re.compile``</font> gives faster access to search results, especially for longer character strings or reading / searching line by line in a file. The search term is “compiled” once and can then be reused:

In [None]:
input3 = 'Franz jagt im komplett verwahrlosten Taxi quer durch Bayern'
input4 = 'Franz jagt im komplett verwahrlosten Taxi quer quer durch Bayern'
regdoub = re.compile(r'(\b\w+\b)\s+\1')
regdoub

In [None]:
regdoub.search(input3)
regdoub.search(input4)

In [None]:
regdoub.sub(r'\1',input3)

In [None]:
regdoub.sub(r'\1',input4)

#### fnmatch

The Python module <font color=#0000e6> ``fnmatch``</font> can be used to search in <font color=#0000e6> ``strings``</font> using the Unix filename search convention. The rules known from the `bash` command line are used here:

* `*` matches anything
* `?` matches a letter
* `[seq]` matches a letter in seq
* `[!seq]` matches a letter not in seq

The following example shows all filenames in the current directory with the extension <font color=#0000e6> ``.txt``</font> :

In [None]:
import fnmatch
import os
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print(file)