# Regular expressions

## Overview/cheat sheet

See [https://github.com/zeeshanu/learn-regex](https://github.com/zeeshanu/learn-regex)

## Websites for testing Regex

Websites to write/debug regular expressions: [Regexr.com](Regexr.com) and [Regex101.com](Regex101.com)

### Main functions

There are 3 main regex functions: `match`, `search` and `findall` (multiple matches).     
    
The difference between `match` and `search` is that `match` matches on the beginning of the string (and `search` matches anywhere). 

In [3]:
import re
print ( re.match("c", "abcdef")  )  # None
print ( re.search("c", "abcdef") )  # Matches c

None
<re.Match object; span=(2, 3), match='c'>


In [2]:
text = "He was carefully disguised but captured quickly by police."
re.findall(r"\w+ly", text)

['carefully', 'quickly']

### Compile

```python
re.compile(pattern, flags = 0)
```
Compile a regular expression pattern into a regular expression object, which can be used for matching using its `match()`, `search()` and other methods, described below.

The sequence:
```python
prog = re.compile(pattern)
result = prog.match(string)
```

is equivalent to

```python
result = re.match(pattern, string)
```

Using `re.compile()` and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

It will be easier to use the compile method while experimenting.

The `findall()` function returns the non-overlapping matches of pattern in string as a list of strings (or list of tuples if there are multiple matching groups).

Check the [Module Contents](https://docs.python.org/3/library/re.html#module-contents) page for the various methods available. 

### Flags

See [http://xahlee.info/python/python_regex_flags.html](http://xahlee.info/python/python_regex_flags.html) for the different flags you can use.

In [None]:
## Raw string

In [6]:
print (r"As raw string \b shows up correctly with \ and b")
print ("Without r: As raw string \b shows up correctly with \ and b") # not correct

As raw string \b shows up correctly with \ and b
Without r: As raw string  shows up correctly with \ and b


In [14]:
# in regular expressions, \b matches on a word boundary
text = "He was carefully disguised but captured quickly by police."
# \b word boundary
# .+? one or more of anything, ? makes this non-greedy (stop as soon as you can)
# \b another word boundary
re.findall(r"\b.+?\b", text) 

['He',
 ' ',
 'was',
 ' ',
 'carefully',
 ' ',
 'disguised',
 ' ',
 'but',
 ' ',
 'captured',
 ' ',
 'quickly',
 ' ',
 'by',
 ' ',
 'police']

## Capturing groups (retrieving something)

In [18]:
result = re.findall("((\w+)\s(\d{3}))", "blahh ... office of Don 337 and Mike 325 ... blah")  
print(result)

[('Don 337', 'Don', '337'), ('Mike 325', 'Mike', '325')]


In [20]:
for r in result:
    print (  "The person name is {} and the room number is {}".format(r[1], r[2])  )

The person name is Don and the room number is 337
The person name is Mike and the room number is 325


In [21]:
# removed 'outer' group
result = re.findall("(\w+)\s(\d{3})", "blahh ... office of Don 337 and Mike 325 ... blah")  
print(result)

[('Don', '337'), ('Mike', '325')]
