## Regular Expressions:

These are quite often used in splitting the strings based on an underlying pattern. Python has an inbuilt module named 're'  

In [1]:
#Importing the regular expression module
import re

### 1. Match function

In [2]:
text1 = 'john williams'
pattern = '[Jj]ohn' #square bracket indicates a set and both upper and lower case j's will be searched in the string
print('looking in', text1, 'for the pattern',pattern)

if re.search(pattern,text1): #if true this statement returns a match object which always has a boolean value of True
    output = re.search(pattern,text1)
    print('Match has been found')
else: #re.search returns a None object in the absence of the pattern which is treated as false by the if-else block
    print('No match has been found')

looking in john williams for the pattern [Jj]ohn
Match has been found


#### Methods and attributes supported by a match object

In [3]:
print(type(output))
print(output.re)
print(output.string)
print(output.start())
print(output.end())
print(output.group())

<class 're.Match'>
re.compile('[Jj]ohn')
john williams
0
4
john


### 2. Search function

- Similar to match function 
- Includes flag options like (also present in match function)
    - re.IGNORECASE 
    - re.LOCALE 
    - re.MULTILINE
    - re.DOTALL
    - re. UNICODE
    - re.VERBOSE

### Difference between match() and search(): 
- match() checks for a match only at the beginning of the string
- search() checks for a match anywhere in the string

#### Example 1 

In [5]:
line1 = 'The price is 23.55'
containsIntegers = r'\d+' 

#'r' before any string indicates a raw string e.g. \n is not considered a new line
# \d returns a match where the string contains digits (0-9)
# + returns one or more occurences of preceding pattern (i.e. \d)

if re.search(containsIntegers, line1):
    print('Line 1 contains an integer')
else:
    print('Line 1 does not contain an integer')

Line 1 contains an integer


#### Example 2

In [6]:
# Alternative words
music = r'Beatles|Adele|Gorillaz' #this string looks for any of the 3 artists if present in the text
request = 'Play some Adele'

if re.search(music, request):
    print('Set Fire to the Rain')
else:
    print('No Adele Available')

Set Fire to the Rain


### 3. FIndall function

Returns a list containing all non-overlapping matches of pattern in string 

In [11]:
text = r'The rain in Spain stays mainly on the plain'

#find substring starting with 2 letters (upper or lower case) followed by 'ai' and a single character
results = re.findall('[a-zA-Z]{2}ai.', text) 

print(results)

['Spain', 'plain']


### 4. Findinter function

Works similar to match function but returns an iterator yielding matched objects

In [14]:
text = r'The rain in Spain stays mainly on the plain'

#find substring starting with 2 letters (upper or lower case) followed by 'ai' and a single character
results = re.finditer('[a-zA-Z]{2}ai.', text) 

print(results)

for i in results:
    print(i)

<callable_iterator object at 0x000002E42310E248>
<re.Match object; span=(12, 17), match='Spain'>
<re.Match object; span=(38, 43), match='plain'>


### 5. Split function

Returns a list where the string has been split at each matched pattern. Similar to str.split() function

In [16]:
text = 'It was a hot summer night'

#\s returns a match where the string contains a white space character 
x = re.split('\s', text)
print(x)

['It', 'was', 'a', 'hot', 'summer', 'night']


### 6. Sub function

It allows us to replace the matched pattern with a string of our choice entered as the repl argument. It replaces all occurences unless a max value is specified

*re.sub(pattern,repl,string,max=0)*

In [18]:
pattern = '(England|Wales|Scotland)' #find either of the three terms

text = 'England for football, Wales for Rugby and Scotland for the Highland games'

print(re.sub(pattern, 'England', text))

England for football, England for Rugby and England for the Highland games


In [19]:
#a count parameter can be added to control the number of matches to be replaced

pattern = '(England|Wales|Scotland)'
text = 'England for football, Wales for Rugby and Scotland for the Highland games'

x = re.sub(pattern, 'Wales', text, 2)
print(x)

Wales for football, Wales for Rugby and Scotland for the Highland games


### 7. Subn function

Here along with replacing matched strings, we can find the number of substitutions that were made 

In [21]:
pattern = '(England|Wales|Scotland)'
text = 'England for football, Wales for Rugby and Scotland for the Highland games'

print(re.subn(pattern,'Scotland',text))

('Scotland for football, Scotland for Rugby and Scotland for the Highland games', 3)


### 8. Compile function

All the above explained re functions are directly available as module-level functions and as methods on a compiled regular expression object. Compiled regular expression objects support the following methods and attributes - 
- Pattern.search(string, pos, endpos)
- Pattern.match(string, pos, endpos)
- Pattern.split(string, maxsplit = 0)
- Pattern.findall(string[, pos[, endpos]])
- Pattern.finditer(string[, pos[, endpos]])
- Pattern.sub(repl, string, count = 0)
- Pattern.subn(repl, string, count = 0)
- Pattern.pattern

#### Example 1

In [22]:
line1 = 'The price is 23.55'
containsIntegers = r'\d+'
rePattern = re.compile(containsIntegers) #compiled pattern can be reused later on
matchLine1 = rePattern.search(line1)
if matchLine1:
    print('Line 1 contains a number')
else:
    print('Line 1 does not contain a number')

Line 1 contains a number


#### Example 2

In [23]:
p = re.compile(r'\W+')
s = '20 High Street'
print(p.split(s))

['20', 'High', 'Street']


### Online Resources:

See the Python Standard Library documentation for:
- https://docs.python.org/3/howto/regex.html Standard Library regular expression
how to.
- https://pymotw.com/3/re/index.html the Python Module of the Week page for
the re module.

Other online resources include
- https://regexone.com An introduction to regular expressions.
- https://www.regular-expressions.info/tutorial.html a regular expressions tutorial.
- https://www.regular-expressions.info/quickstart.html regular expressions quick
start.
- https://pypi.org/project/regex A well known third party regular expression
module that extends the functionality offered by the builtin re module.

**Reference Book** : Advanced Guide to Python 3 Program, 2019