### Python RegEx

- A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

- RegEx can be used to check if a string contains the specified search pattern.

## RegEx Module

- Python has a built-in package called re, which can be used to work with Regular Expressions.

In [1]:
import re

### RegEx in Python
#### When you have imported the re module, you can start using regular expressions

In [7]:
# Search the string to see if it starts with "The" and ends with "Spain"

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
x

<re.Match object; span=(0, 17), match='The rain in Spain'>

### RegEx Functions
- The re module offers a set of functions that allows us to search a string for a match:

#### findall
 - Returns a list containing all matches

#### search
 - Returns a Match object if there is a match anywhere in the string

#### split
 - Returns a list where the string has been split at each match

#### sub
 - Replaces one or many matches with a string

### Metacharacters

- Metacharacters are characters with a special meaning

![image.png](attachment:image.png)

### Special Sequences
 - A special sequence is a \ followed by one of the characters in the list below, and has a special meaning

![image.png](attachment:image.png)

### Sets

- A set is a set of characters inside a pair of square brackets [] with a special meaning

![image.png](attachment:image.png)

### The findall() Function


 - The findall() function returns a list containing all matches.


In [8]:
# Print a list of all matches:
    
import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)


['ai', 'ai']


In [37]:
# Return an empty list if no match was found:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

[]


### The search() Function

 - The search() function searches the string for a match, and returns a Match object if there is a match.

 - If there is more than one match, only the first occurrence of the match will be returned

In [9]:
# Search for the first white-space character in the string:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

The first white-space character is located in position: 3


In [40]:
# If no matches are found, the value None is returned:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

None


### The split() Function
 -  The split() function returns a list where the string has been split at each match

In [41]:
# Split at each white-space character:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


In [42]:
# You can control the number of occurrences by specifying the maxsplit parameter:
# Split the string only at the first occurrence:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)  

['The', 'rain in Spain']


# The sub() Function

 - The sub() function replaces the matches with the text of your choice

In [43]:
# Replace every white-space character with the number 9:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)


The9rain9in9Spain


In [44]:
# You can control the number of replacements by specifying the count parameter:
# Replace the first 2 occurrences:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

The9rain9in Spain


### Match Object
 - A Match Object is an object containing information about the search and the result.
 - If there is no match, the value None will be returned, instead of the Match Object.


In [45]:
# Do a search that will return a Match Object:

import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

<re.Match object; span=(5, 7), match='ai'>


### The Match object has properties and methods used to retrieve information about the search, and the result:

 - .span() returns a tuple containing the start-, and end positions of the match.
 - .string returns the string passed into the function
 - .group() returns the part of the string where there was a match

In [46]:
# Print the position (start- and end-position) of the first match occurrence.

# The regular expression looks for any words that starts with an upper case "S":

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

(12, 17)


In [47]:
# Print the string passed into the function:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

The rain in Spain


In [48]:
# Print the part of the string where there was a match.

# The regular expression looks for any words that starts with an upper case "S":

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Spain
