A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.
Python has a built-in package called re, which can be used to work with Regular Expressions.

Complete list of sequences
\d
Matches any decimal digit; this is equivalent to the class [0-9].

\D
Matches any non-digit character; this is equivalent to the class [^0-9].

\s
Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].

\S
Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].

\w
Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].

\W
Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].

These sequences can be included inside a character class. For example, [\s,.] is a character class that will match any whitespace character, or ',' or '.'.

The final metacharacter in this section is .. It matches anything except a newline character, and there’s an alternate mode (re.DOTALL) where it will match even a newline. . is often used where you want to match “any character”.

In [4]:
import re
#Check if the string starts with "The" and ends with "Spain":
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

if x:
  print("YES! We have a match!")
else:
  print("No match")

YES! We have a match!


Function	Description
findall	    Returns a list containing all matches
search	    Returns a Match object if there is a match anywhere in the string
split	    Returns a list where the string has been split at each match
sub	        Replaces one or many matches with a string
finditer()  Find all substrings where the RE matches, and returns them as an iterator.
group()     Return the string matched by the RE
start()     Return the starting position of the match
end()       Return the ending position of the match
span()      Return a tuple containing the (start, end) positions of the match

Character	Description	                                                                Example
[]	        A set of characters	                                                        "[a-m]"	
\	        Signals a special sequence (can also be used to escape special characters)	"\d"	
.	        Any character (except newline character)	                                "he..o"	
^	        Starts with	                                                                "^hello"	
$	        Ends with	                                                                "world$"	
*	        Zero or more occurrences	                                                "aix*"	
+	        One or more occurrences	                                                    "aix+"	
{}	        Exactly the specified number of occurrences	                                "al{2}"	
|	        Either or	                                                                "falls|stays"	
()	        Capture and group

In [5]:
import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

['ai', 'ai']


In [6]:
import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

[]


In [7]:
import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

The first white-space character is located in position: 3


In [8]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


In [9]:
import re
#Replace every white-space character with the number 9:
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

The9rain9in9Spain


In [None]:
import re
#You can control the number of replacements by specifying the count parameter:
txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

A Match Object is an object containing information about the search and the result.

In [10]:
import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

<re.Match object; span=(5, 7), match='ai'>


The Match object has properties and methods used to retrieve information about the search, and the result:

.span() returns a tuple containing the start-, and end positions of the match.
.string returns the string passed into the function
.group() returns the part of the string where there was a match

In [11]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

(12, 17)


In [13]:
#Print the part of the string where there was a match.
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Spain


In [14]:
import re
p = re.compile('[a-z]+')
p
re.compile('[a-z]+')

re.compile(r'[a-z]+', re.UNICODE)

In [15]:
p.match("")
print(p.match(""))

None


In [16]:
m = p.match('tempo')
m

<re.Match object; span=(0, 5), match='tempo'>