**regular expression**

python library: **Lib/re.py**

Regular expressions (also called “RegEx”) are a powerful tool and a standardized language to search for patterns in strings.

the built-in python module "re" contains different methods for performing matches. The most signigiciant are:
- ``match()``:$\qquad \text{Determine if the RE matches at the beginning of the string.}$<br>
- ``search()``:$\qquad \text{Scan through a string, looking for any location where this RE matches.}$<br>
- ``findall()``:$\qquad \text{Find all substrings where the RE matches, and returns them as a list.}$<br>
- ``finditer()``:$\qquad \text{Find all substrings where the RE matches, and returns them as an iterator.}$<br>

find in this notebook examples to following metacharacter setting can be found:
- [Anchors](#anchors)
- [Quantifiers](#quantifiers)
- [Disjunctions](#disjunctions)
- [Character Classes](#character_classes)

python documentation:
- Regular Expression Syntax [library "re"](https://docs.python.org/3/library/re.html)
- How-to use re [re-how-to](https://docs.python.org/3/howto/regex.html#regex-howto)




In [None]:
# load resouces for all following code cells
import re
#

<a id="anchors">**Anchors mark a position in the string**</a>

- $\text{^}$ : $\quad$ means the beginning of the string
- $\text{\$}$: $\quad$  means the end

In [None]:
# define regular expression
#
# does them sentence "bananas are cheap" begin with "banana"?
p = re.search('^banana', "bananas are cheap")

# the function match() would return the same result
# p = re.match('^banana', "bananas are cheap")
#

#
# show results:
#
if p:
    # print the string wich matches the regular expression
    print(p.group())
    #
    # print the beginning of the match
    print(p.start())
    #
    # print the last character of the match
    print(p.end())
    #
    # print the start and end position as tuple:
    print(p.span())
else:
    print("no match")

banana
0
6
(0, 6)


In [None]:
# check if the sentence "I have a banana" ends with "banana"
#
p = re.search('banana$', "I have a banana")

# note:
# The match() function only checks if the RE matches at the beginning of the string while search()
# will scan forward through the string for a match. 
# It’s important to keep this distinction in mind. 
# Remember, match() will only report a successful match which will start at 0; 
# if the match wouldn’t start at zero, match() will not report it.

#
# show results:
#
if p:
    # print the string wich matches the regular expression
    print(p.group())
    #
    # print the beginning of the match
    print(p.start())
    #
    # print the last character of the match
    print(p.end())
    #
    # print the start and end position as tuple:
    print(p.span())
else:
    print("no match")

    
# $ Matches at the end of a line, which is defined as either the end of the string,
#   or any location followed by a newline character.
print('-------------------------------------------------')
#
print(re.search('}$', '{block}'))
print(re.search('}$', '{block} '))
print(re.search('}$', '{block}\n'))
#
print('-------------------------------------------------')
#
print(re.search('^banana$', 'bananas are cheap'))
print(re.search('^banana$', 'I have a banana'))
print(re.search('^banana$', 'banana'))

banana
9
15
(9, 15)
-------------------------------------------------
<_sre.SRE_Match object; span=(6, 7), match='}'>
None
<_sre.SRE_Match object; span=(6, 7), match='}'>
-------------------------------------------------
None
None
<_sre.SRE_Match object; span=(0, 6), match='banana'>


<a id=quantifiers> **Quantifiers indicate the number of repetitions of the previous character** </a>

- $\text{*}$: $\quad$  means zero or more
- $\text{+}$: $\quad$ means one or more
- $\text{?}$: $\quad$ means zero or one repetitions
- If more precise quantifiers are needed, the number of repetitions can be written in curly brackets

In [None]:
print(re.search('ba?nana', 'banana'))
print(re.search('ba?nana', 'bnana'))
print(re.search('ba?nana', 'baaaaanana'))
#
print('-------------------------------------------------')
#
print(re.search('ba+nana', 'banana'))
print(re.search('ba+nana', 'bnana'))
print(re.search('ba+nana', 'baaaaanana'))
#
print('-------------------------------------------------')
#
print(re.search('ba*nana', 'banana'))
print(re.search('ba*nana', 'bnana'))
print(re.search('ba*nana', 'baaaaanana'))
#
print('-------------------------------------------------')
#
# a{2,7} means that the letter “a” must repeat at least 2 times and at maximum 7 times for the string to match the pattern
#
print(re.search('a{2,7}', 'banana'))
print(re.search('a{2,7}', 'bnana'))
print(re.search('a{2,7}', 'baaaaanana'))

<_sre.SRE_Match object; span=(0, 6), match='banana'>
<_sre.SRE_Match object; span=(0, 5), match='bnana'>
None
-------------------------------------------------
<_sre.SRE_Match object; span=(0, 6), match='banana'>
None
<_sre.SRE_Match object; span=(0, 10), match='baaaaanana'>
-------------------------------------------------
<_sre.SRE_Match object; span=(0, 6), match='banana'>
<_sre.SRE_Match object; span=(0, 5), match='bnana'>
<_sre.SRE_Match object; span=(0, 10), match='baaaaanana'>
-------------------------------------------------
None
None
<_sre.SRE_Match object; span=(1, 6), match='aaaaa'>


<a id=disjunctions>**Disjunctions represent a logical OR.**</a>

$\text{written in squared brackets [] or separated by the pipe sign |.}$

In [None]:
print(re.search('b[aou]nana', 'banana'))
print(re.search('b[aou]nana', 'bonana'))
print(re.search('b[aou]nana', 'bunana'))
#
print('-------------------------------------------------')
#
print(re.search('b(a|o|u)nana', 'banana'))
print(re.search('b(a|o|u)nana', 'bonana'))
print(re.search('b(a|o|u)nana', 'bunana'))
#
print('-------------------------------------------------')
#
print(re.search('banana|bonana|bunana', 'banana'))
print(re.search('banana|bonana|bunana', 'bonana'))
print(re.search('banana|bonana|bunana', 'bunana'))
#
print('-------------------------------------------------')
#

<_sre.SRE_Match object; span=(0, 6), match='banana'>
<_sre.SRE_Match object; span=(0, 6), match='bonana'>
<_sre.SRE_Match object; span=(0, 6), match='bunana'>
-------------------------------------------------
<_sre.SRE_Match object; span=(0, 6), match='banana'>
<_sre.SRE_Match object; span=(0, 6), match='bonana'>
<_sre.SRE_Match object; span=(0, 6), match='bunana'>
-------------------------------------------------
<_sre.SRE_Match object; span=(0, 6), match='banana'>
<_sre.SRE_Match object; span=(0, 6), match='bonana'>
<_sre.SRE_Match object; span=(0, 6), match='bunana'>
-------------------------------------------------


<a id=character_classes>**Character classes represent certain groups of characters.**</a>

- $\text{\d or [0-9]}: \quad$ matches digits
- $\text{\w or [0-9A-Za-z_]}: \quad$ matches alphanumeric characters and underscores
- $\text{\s}: \qquad$ matches white spaces
- $\text{.}: \qquad$ matches any character.


In [None]:
print(re.search('^Hello \w+$', 'Hello World'))
print(re.search('^Hello.+$', 'Hello new world'))
#
print('-------------------------------------------------')

<_sre.SRE_Match object; span=(0, 11), match='Hello World'>
<_sre.SRE_Match object; span=(0, 15), match='Hello new world'>
-------------------------------------------------


Copyright © 2021 IUBH Internationale Hochschule