- Title: Hands on the re module in Python
- Slug: re-in-python
- Date: 2019-10-23
- Category: Programming
- Tags: programming, Python, re, regular expression
- Author: Ben Du

https://docs.python.org/3/library/re.html

1. `re.search` search for the first match anywhere in the string.

2. `re.match` search for the first match at the beginning of the string. 

3. `re.findall` find all matches in the string. 

4. `re.finditer` find all matches and return an iterator of the matches.

5. Passing `re.DOTALL` to the `flags` option make the dot matches anything including the newline.

## Tips

If a pattern is used many times, 
it is suggested that you compile it using `re.compile` to improve performance.

groups, i.e., matched strings in parentheses can be accessed by .group or groups method.

In [1]:
import re

## re.compile

The compiled object is of type `_sre.SRE_Pattern` 
and has method `search`, `match`, `sub`, etc.

In [9]:
p = re.compile('\d{4}-\d{2}-\d{2}$')

In [11]:
type(p)

_sre.SRE_Pattern

In [7]:
help(p.sub)

Help on built-in function sub:

sub(repl, string, count=0) method of _sre.SRE_Pattern instance
    Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.



In [10]:
p.sub('YYYY-mm-dd', 'Today is 2018-05-01')

'Today is YYYY-mm-dd'

## re.sub

In [4]:
re.sub("\s", "", "a b\tc")

'abc'

In [1]:
import re

s = '''this is 
/* BEGIN{NIMA}
what 
ever
END{NIMA} */
an example
'''

In [6]:
re.sub('(?s)/\* BEGIN{NIMA}.*END{NIMA} \*/', '', s)

'this is \n\nan example\n'

## re.match

In [6]:
import re

re.match('^\d{4}-\d{2}-\d{2}$', '2018-07-01')

<_sre.SRE_Match object; span=(0, 10), match='2018-07-01'>

In [10]:
import re

re.match('\d{4}-\d{2}-\d{2}', 'Today is 2018-07-01.')

## re.search

In [1]:
import re

re.search('^\d{4}-\d{2}-\d{2}$', '2018-07-01')

<re.Match object; span=(0, 10), match='2018-07-01'>

In [9]:
import re

re.search('\d{4}-\d{2}-\d{2}', 'Today is 2018-07-01.')

<_sre.SRE_Match object; span=(9, 19), match='2018-07-01'>

In [2]:
s = "ab,cd"
re.search(',', s)

<re.Match object; span=(2, 3), match=','>

In [3]:
s = "ab,cd"
re.search('\b,', s)

In [4]:
s = "ab,cd"
re.search('\B,', s)

In [5]:
s = "ab ,cd"
re.search(',', s)

<re.Match object; span=(3, 4), match=','>

In [6]:
s = "ab ,cd"
re.search('\b,', s)

In [7]:
s = "ab ,cd"
re.search('\B,', s)

<re.Match object; span=(3, 4), match=','>

In [8]:
s = "ab,cd"
re.sub(', *', ', ', s)

'ab, cd'

## re.findall

Find all matching strings.

In [5]:
import re

s = 'It is "a" good "day" today.'
re.findall('".*?"', s)

['"a"', '"day"']

In [1]:
import re

sql = '''
    select ${cal_dt}, ${path} from some_table
    '''
re.findall(r'\$\{\w+\}', sql)

['${cal_dt}', '${path}']

## Escape & Non-escape

`{` and `}` need not to be escaped.