# RegEx
- A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

- RegEx can be used to check if a string contains the specified search pattern.

## Function
* There is a set of functions that allows us to search a string for a match:

* `findall()`
    Returns a list containing all matches
* `search()`
    Returns a Match object if there is a match anywhere in the string
* `split()`
	Returns a list where the string has been split at each match
* `sub()`
	Replaces one or many matches with a string
*  `subn()` Similar to sub except it returns a tuple of 2 items containing the new string and the number of substitutions made.

## findall()

In [14]:
import re 


txt = "I love I jerusalem"

x = re.findall("I", txt)
x

['I', 'I']

## search()

In [10]:


txt = "I love jerusalem"

x = re.search("l", txt)

print('span:',x.span())
print('start:',x.start())
print('end:',x.end())
print('string:',x.string)
print('group:',x.group())
print(x)

span: (2, 3)
start: 2
end: 3
string: I love jerusalem
group: l
<re.Match object; span=(2, 3), match='l'>


## split()

In [19]:
txt = "In god we trust"
x = re.split("I", txt ,1)
x

['', 'n god we trust']

## sub()

In [20]:
txt = 'Make america great again'

x = re.sub('a','9',txt ,2)
x

'M9ke 9merica great again'

## subn()

In [139]:
txt = 'Make america great again'

x = re.subn('\s','9',txt ,2)
x

('Make9america9great again', 2)

# Metacharacters
* Metacharacters are characters with a special meaning:

- `.` Any character (except newline character)
- `^`       Starts with
- `$` Ends with
- `*` 	Zero or more occurrences
- `+` 	One or more occurrences
- `?` 	Zero or one occurrences
- `{}` Exactly the specified number of occurrences
- `|` 	Either or

# Example

In [140]:
txt = "hello world"

x = re.findall("world$", txt)
x

['world']

In [141]:
txt = "hello world"

x = re.findall("^world", txt)
x

[]

In [25]:
txt = "hello world"

x = re.findall("he.{5}o", txt)
x

['hello wo']

In [26]:
txt = "hello world"

x = re.findall("he.*o", txt)
x

['hello wo']

In [27]:
txt = "hello world"

x = re.findall("he.*o", txt)
x

['hello wo']

In [28]:
txt = "hello world"

x = re.findall("he.+o", txt)
x

['hello wo']

In [33]:
txt = "hello world"

x = re.findall("he..?o", txt)
x

[]

# Special Sequences
- A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:



- `\A` Returns a match if the specified characters are at the `beginning` of the string
- `\Z` If at the `end` of the string <br></br>
- `\b` If the word `start` or `end` with
- `\B` If the word not `start` or `end` with<br></br>
- `\d` If it contain `digit` 
- `\D` If it `not` contain `digit`<br></br>
- `\s` If it contain a `white space`
- `\S` If it `not` contain a `white space`<br></br>
- `\w` If the string contain characters from a-Z or 0-9 or _
- `\W` If the string `does not` contain characters from a-Z or 0-9 or _


In [4]:
txt = "I like to play soccer"

x = re.findall("soccer\Z", txt)
x

['soccer']

In [3]:
txt = "I like to play so ccer"

x = re.findall(r"lay\b", txt)
x

[]

In [63]:
txt = "I like to play soccer 2 22"

x = re.findall("\d+", txt)
x

['2', '22']

# Sets
* A set is a set of characters inside a pair of square brackets [] with a special meaning:

- `[a-z]` 	Returns a match for any lower case character, alphabetically between a and n
- `[^b-m]` Returns a match for any character `EXCEPT` character between d and m
- `[0-9]` Matching by number
- `[0-2][0-8]` Returns a match for any two-digit numbers from 00 and 28
- `[a-zA-Z]`	Returns a match for any character alphabetically between a and z, lower case OR upper case

### Examples

In [67]:
txt = "Today i got up at 07:23 "
x = re.findall("[apk]", txt)
x

['a', 'p', 'a']

In [69]:
txt = "Today i got up at 07:23 "
x = re.findall("[^apk]", txt)
x

['T',
 'o',
 'd',
 'y',
 ' ',
 'i',
 ' ',
 'g',
 'o',
 't',
 ' ',
 'u',
 ' ',
 't',
 ' ',
 '0',
 '7',
 ':',
 '2',
 '3',
 ' ']

In [86]:
txt = "Today i got up at 07:23 "
x = re.findall("[0-2][0-8]", txt)
x

['07', '23']

In [91]:
txt = "Today i got up at 0 7:  23 "
x = re.findall("[0-28]", txt)
x

['0', '2']

In [76]:
txt = "Today I got up at 07:23 "
x = re.findall("[a-zA-Z]", txt)
x

['T', 'o', 'd', 'a', 'y', 'I', 'g', 'o', 't', 'u', 'p', 'a', 't']

# `r` prefix 
* When `r` or `R` is used before a regular expression , its mean row string
* For example, `'\n'` is a new line whereas `r'\n'` means two characters: a backslash `\` followed by `n`.

  

In [94]:

txt = ' and are escape sequences.'

result = re.findall(r'[\s\r]', txt) 
result


[' ', ' ', ' ', ' ']

# re.error()
* `re.error(msg,pattern=None,pos=None)`  Use to pass customized error msg when any issue occurs during compilation

In [113]:
try:
    re.match('[+*','template matching')
except re.error as err:
    print(f'Invalid Regular Expression : {err.msg}. Match operation failed.')

Invalid Regular Expression : unterminated character set. Match operation failed.
