# Python RegEx
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

# RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.

# Import the re module:

In [None]:
import re

# RegEx in Python
When you have imported the re module, you can start using regular expressions:

Example
Search the string to see if it starts with "The" and ends with "Spain":

In [20]:
import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
print(x)

<re.Match object; span=(0, 17), match='The rain in Spain'>


# RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:

#### Function	Description
#### findall :	Returns a list containing all matches
#### search :	Returns a Match object if there is a match anywhere in the string
#### split :	Returns a list where the string has been split at each match
#### sub :	Replaces one or many matches with a string

# Metacharacters
Metacharacters are characters with a special meaning:

## Character	Description	Example	Try it
[]	A set of characters	"[a-m]"

\	Signals a special sequence (can also be used to escape special characters)	"\d"	

.	Any character (except newline character)	"he..o"	

^	Starts with	"^hello"	

$	Ends with	"world$"	

*	Zero or more occurrences	"aix*"	

+	One or more occurrences	"aix+"	

{}	Exactly the specified number of occurrences	"al{2}"	

|	Either or	"falls|stays"	

()	Capture and group

# The findall() Function
The findall() function returns a list containing all matches.

In [23]:
import re

str = "The rain in Spain"
x = re.findall("ai", str)
print(x)

['ai', 'ai']


The list contains the matches in the order they are found.

If no matches are found, an empty list is returned:

Example
Return an empty list if no match was found:

In [24]:
import re

str = "The rain in Spain"
x = re.findall("Portugal", str)
print(x)

[]


# The search() Function

The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:


## Example

Search for the first white-space character in the string:

In [26]:
import re

str = "The rain in Spain"
#print(str[1])
x = re.search("\s", str)

print("The first white-space character is located in position:", x.start())


The first white-space character is located in position: 3


If no matches are found, the value None is returned:

## Example
Make a search that returns no match:

In [27]:
import re

str = "The rain in Spain"
x = re.search("Portugal", str)
print(x)

None


# The split() Function
The split() function returns a list where the string has been split at each match:

## Example
Split at each white-space character:

In [28]:
import re

str = "The rain in Spain"
x = re.split("\s", str)
print(x)

['The', 'rain', 'in', 'Spain']


You can control the number of occurrences by specifying the maxsplit parameter:

## Example
Split the string only at the first occurrence:

In [32]:
import re

str = "The rain in Spain"
x = re.split("\s", str, 1)
print(x)

['The', 'rain in Spain']


# The sub() Function
The sub() function replaces the matches with the text of your choice:

## Example
Replace every white-space character with the number 9:

In [35]:
import re

str = "The rain in Spain"
x = re.sub("\s", "\t", str)
print(x)

The	rain	in	Spain


You can control the number of replacements by specifying the count parameter:

## Example
Replace the first 2 occurrences:

In [36]:
import re

str = "The rain in Spain"
x = re.sub("\s", "9", str, 2)
print(x)

The9rain9in Spain


# Match Object
A Match Object is an object containing information about the search and the result.

*Note: If there is no match, the value None will be returned, instead of the Match Object.

## Example
Do a search that will return a Match Object:

In [37]:
import re

str = "The rain in Spain"
x = re.search("ai", str)
print(x) #this will print an object

<re.Match object; span=(5, 7), match='ai'>


The Match object has properties and methods used to retrieve information about the search, and the result:

.span() returns a tuple containing the start-, and end positions of the match.

.string returns the string passed into the function

.group() returns the part of the string where there was a match

## Example
Print the position (start- and end-position) of the first match occurrence.

The regular expression looks for any words that starts with an upper case "S":

In [41]:
import re

str = "The rain in Spain"
x = re.search(r"\bS\w+", str)
print(x.span())

(12, 17)


In [50]:
import re
str = "The rain in Spain Spainer abdSpain"
x = re.findall("\\bSpain\\b", str)
# print(x.span())
print(x)

['Spain']


# Example
Print the string passed into the function:

In [52]:
import re

str = "The rain in Spain"
x = re.search(r"\bS\w+", str)
print(x.string)
print(x)

The rain in Spain
<re.Match object; span=(12, 17), match='Spain'>


In [53]:
import re

str = "The rain in Spain"
x = re.search("\\bS\w+", str)
print(x.string)
# print(x)

The rain in Spain


In [54]:
import re

str = "The rain in Spain"
x = re.findall("\\bS\w+", str)
# print(x.string)
print(x)

['Spain']


In [59]:
s = r'Hi\xHello'
print(s)

Hi\xHello


# Example
Print the part of the string where there was a match.

The regular expression looks for any words that starts with an upper case "S":

In [64]:
import re

str = "The rain in Spain"
x = re.search(r"\bS\w+", str)
print(x.group())
print(x.string)

Spain
The rain in Spain


Note: If there is no match, the value None will be returned, instead of the Match Object.

