* https://regex101.com/ -> tool
* https://regexone.com/ -> intro website
* https://github.com/tesla809/intro-to-python-jupyter-notebooks/blob/master/47-Regular%20Expressions.ipynb
* https://docs.python.org/3/library/re.html

* The `r` before a string indicates that it is a 'raw string', so don't treat `/` as a special char
* **A regular expression (or RE) specifies a set of strings that matches it**
* We can compile a regex pattern if we want to reuse it using `re.compile`
* Regular expressions can be concatenated to form new regular expressions

In [1]:
import re

## Simple RE
A very simple re, check for 'last'

In [2]:
match = re.search(pattern='last', string='also lastly')
print(match)
print(match.span())
print(match.group())

<_sre.SRE_Match object; span=(5, 9), match='last'>
(5, 9)
last


This Match object returned by the search() method is more than just a Boolean or None, it contains information about the match, including the original input string, the regular expression that was used, and the location of the match. Note that `re.search` returns None if no results

In [3]:
match = re.search(pattern='last', string='not here')
print(match)

None


## Special characters
Some characters, like '|' or '(', are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted.

#### ^
(Caret.) Matches the start of the string. Regular expressions beginning with '^' can be used with search() to restrict the match at the beginning of the string:

#### $
Matches the end of the string or just before the newline at the end of the string

#### *
Causes the resulting RE to match 0 or more repetitions of the preceding RE

#### ?
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either ‘a’ or ‘ab’.

#### +
Causes the resulting RE to match 1 or more repetitions of the preceding RE

#### |
A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way.

#### (...)
Matches whatever regular expression is inside the parentheses, 

#### {m}
Specifies that exactly m copies of the previous RE should be matched

#### []
Used to indicate a set of characters. E.g. `[0-9]` is the set of numbers from 0 - 9

#### +
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.

In [4]:
match = re.match(r'^[0-9][0-9]:[0-9][0-9]$', '12:30')
print(match)
match.group()

<_sre.SRE_Match object; span=(0, 5), match='12:30'>


'12:30'

## Special escape codes
See -> https://render.githubusercontent.com/view/ipynb?commit=a1fd9ad64fc2d31643d33a7fb0ce330f7a1a54bb&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f7465736c613830392f696e74726f2d746f2d707974686f6e2d6a7570797465722d6e6f7465626f6f6b732f613166643961643634666332643331363433643333613766623063653333306637613161353462622f34372d526567756c617225323045787072657373696f6e732e6970796e62&nwo=tesla809%2Fintro-to-python-jupyter-notebooks&path=47-Regular+Expressions.ipynb&repository_id=111425841&repository_type=Repository#Escape-Codes

Escapes are indicated by prefixing the character with a backslash, e.g. `\d` is digit

In [5]:
s = "I have a meeting on 2018-12-10 in New York"
match = re.search('\d{4}-\d{2}-\d{2}', s)
match

<_sre.SRE_Match object; span=(20, 30), match='2018-12-10'>

## re.search(pattern, string, flags=0)
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. 

In [6]:
re.search(r'ab?', 'a')

<_sre.SRE_Match object; span=(0, 1), match='a'>

In [7]:
re.search(r'ab?', 'ab')

<_sre.SRE_Match object; span=(0, 2), match='ab'>

Search 2 digits

In [8]:
re.search('\d{2}', '59')

<_sre.SRE_Match object; span=(0, 2), match='59'>

Get first digit

In [9]:
re.search('^[0-9]', '59')

<_sre.SRE_Match object; span=(0, 1), match='5'>

Get last digit

In [10]:
re.search('[0-9]$', '59')

<_sre.SRE_Match object; span=(1, 2), match='9'>

In [11]:
string = '/home/ubuntu/.fastai/data/oxford-iiit-pet/images/saint_bernard_188.jpg'

match = re.search(r'/([^/]+)_\d+.jpg$', string) # we are searching for the _
print(match)
match.group()

<_sre.SRE_Match object; span=(48, 70), match='/saint_bernard_188.jpg'>


'/saint_bernard_188.jpg'

## re.split(pattern, string, maxsplit=0, flags=0)
Split string by the occurrences of pattern. 

In [14]:
re.search('\d{1,2}.?\d{0,2}(AM|PM)', '4.10PM')

<_sre.SRE_Match object; span=(0, 6), match='4.10PM'>

In [15]:
re.search('\d{1,2}.?\d{0,2}(AM|PM)', '15')