# Python Built-in Module for Regular Expressions

Python built-in module to work with regular expressions called "re”. Some common methods from this re module are:  

re.match()  
re.search()  
re.findall()  

# re.match(pattern, string)

The re.match method returns a matched object in the string (if exists at the beginning). otherwise, returns none.

In [1]:
import re

#match a word 'VU' at the beginning of a string

result = re.match('VU',r'VU - Discover the New Way To Do Uni and find your success with the VU Block Model')
print(result)

<re.Match object; span=(0, 2), match='VU'>


In [2]:
print(result.group())

VU


In [3]:
result_2 = re.match('VU',r'Discover the New Way To Do Uni and find your success with the VU Block Model')
print(result_2)

None


# re.search(pattern, string)

re.search matches the first occurrence of a matched object in the entire string(not just checks at the beginning).

In [4]:
# search for the pattern "founded" in a given string

result = re.search('VU',r'Discover the New Way To Do Uni and find your success with the VU Block Model')
print(result.group())

VU


# re.findall(pattern, string)

re.findall returns all the occurences of the matched object in the given string.

In [5]:
result = re.findall('VU',r'VU - Discover the New Way To Do Uni and find your success with the VU Block Model')
print(result)

['VU', 'VU']


# Special Sequences in Regular Expressions

`\b` returns a match where the specified pattern is at the beginning or at the end of a word.

In [6]:
str = r'The Master of Applied Information Technology (NMIT) is a flexible, accredited course'

#Check if there is any word that ends with "est"

x = re.findall(r"ion\b", str)
print(x)

['ion']


In [7]:
str = r'The Master of Applied Information Technology (NMIT) is a flexible, accredited course'

#Check if there is any word that ends with "est"

x = re.findall(r"tour\b", str)
print(x)

[]


`\d` returns a match where the string contains digits (numbers from 0-9).

In [8]:
str = "From 2021, this course will be delivered in eight-week mode"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['2', '0', '2', '1']
Yes, there is at least one match!


`\D` returns a match where the string does not contain any digit. It is basically the opposite of \d.

In [9]:
str = "From 2021, this course will be delivered in eight-week mode."

#Check if the word character does not contain any digits (numbers from 0-9):

x = re.findall("\D", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['F', 'r', 'o', 'm', ' ', ',', ' ', 't', 'h', 'i', 's', ' ', 'c', 'o', 'u', 'r', 's', 'e', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'd', 'e', 'l', 'i', 'v', 'e', 'r', 'e', 'd', ' ', 'i', 'n', ' ', 'e', 'i', 'g', 'h', 't', '-', 'w', 'e', 'e', 'k', ' ', 'm', 'o', 'd', 'e', '.']
Yes, there is at least one match!


In [10]:
#Check if the word does not contain any digits (numbers from 0-9):

x = re.findall("\D+", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['From ', ', this course will be delivered in eight-week mode.']
Yes, there is at least one match!


`\w` helps in extraction of alphanumeric characters only (characters from a to Z, digits from 0-9, and the underscore _ character)

In [11]:
str = "From 2021, this course will be delivered in eight-week mode!"

#returns a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character)

x = re.findall("\w+",str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['From', '2021', 'this', 'course', 'will', 'be', 'delivered', 'in', 'eight', 'week', 'mode']
Yes, there is at least one match!


`\W` returns match at every non-alphanumeric character. Basically opposite of \w.

In [12]:
str = "From 2021, this course will be delivered in eight-week mode!"

#returns a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

x = re.findall("\W", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

[' ', ',', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '-', ' ', '!']
Yes, there is at least one match!


# Metacharacters in Regular Expression

`.` matches any character (except newline character)

In [13]:
str = "The Master of Applied Information Technology (NMIT) is a flexible, accredited course"

#Search for a string that starts with "Ma", followed by any number of characters

x = re.findall("Ma.", str)           #searches one character after Ma
x2 = re.findall("Ma...", str)        #searches three characters after Ma

print(x)
print(x2)

['Mas']
['Maste']


 `^` checks whether the string starts with the given pattern or not.

In [14]:
str = "Victoria University"

#Check if the string starts with 'Data':

x = re.findall("^Victoria", str)

if (x):
    print("Yes, the string starts with 'Victoria'")
else:
    print("No match")

Yes, the string starts with 'Victoria'


`$` checks whether the string ends with the given pattern or not.

In [15]:
str = "Victoria University"

#Check if the string ends with 'Science':

x = re.findall("University$", str)

if (x):
    print("Yes, the string ends with 'University'")
else:
    print("No match")

Yes, the string ends with 'University'


`*` matches for zero or more occurrences of the pattern to the left of it

In [16]:
str = "vict viccct vit vt"

#Check if the string contains "vi" followed by 0 or more "c" characters and ending with t

x = re.findall("vic*t", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['vict', 'viccct', 'vit']
Yes, there is at least one match!


`+` matches one or more occurrences of the pattern to the left of it

In [17]:
#Check if the string contains "vi" followed by 1 or more "c" characters and ends with t

x = re.findall("vic+t", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['vict', 'viccct']
Yes, there is at least one match!


`?` matches zero or one occurrence of the pattern left to it.

In [18]:
x = re.findall("vic?t",str)

print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['vict', 'vit']
Yes, there is at least one match!


`|` either or

In [19]:
str = "Boost your career prospects and become highly-employable with a postgraduate qualification in IT."

#Check if the string contains either "data" or "India":

x = re.findall("Boost|qualification", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['Boost', 'qualification']
Yes, there is at least one match!


In [20]:
str = "Boost your career prospects and become highly-employable with a postgraduate qualification in IT."

#Check if the string contains either "data" or "India":

x = re.findall("boosts|qualification", str)
print(x)

if (x):
    print("Yes, there is at least one match!")
else:
    print("No match")

['qualification']
Yes, there is at least one match!


Reference. 

https://www.analyticsvidhya.com/blog/2021/03/beginners-guide-to-regular-expressions-in-natural-language-processing/
    
https://stackabuse.com/using-regex-for-text-manipulation-in-python