# Module 10: Additional Python modules

## Part 6: Regular expressions (re module)

The re module in Python provides support for regular expressions, which are powerful tools for pattern matching and text manipulation. Regular expressions allow you to search, extract, and manipulate text based on specific patterns, making them invaluable for tasks such as data validation, text parsing, and string manipulation. Let's explore the key features of the re module.

### 6.1. Introduction to regular expressions

Regular expressions are sequences of characters that define a search pattern. They provide a flexible and concise way to match and manipulate strings based on specific patterns. Regular expressions use a combination of ordinary characters and special characters called metacharacters to define patterns.

In [1]:
import re

# Search for a pattern in a string
pattern = r"apple"
text = "I have an apple"
match = re.search(pattern, text)
if match:
    print("Pattern found!")
else:
    print("Pattern not found.")

Pattern found!


In this code snippet, we import the re module. We define a pattern using the r prefix to create a raw string. The pattern is set as "apple". We define a text string as "I have an apple". We use the search() function from the re module to search for the pattern within the text. If a match is found, the message "Pattern found!" is printed. Otherwise, "Pattern not found." is printed.

### 6.2. Basic pattern matching

Regular expressions provide various metacharacters and special sequences to create patterns for matching specific text patterns. Metacharacters such as . (dot), * (asterisk), + (plus), and [ ] (brackets) are used to define the pattern rules.

In [2]:
import re

# Match an email address pattern
pattern = r"\w+@\w+\.\w+"
text = "Contact us at info@example.com or support@example.com"
matches = re.findall(pattern, text)
print(matches)

['info@example.com', 'support@example.com']


In this example, we import the re module. We define a pattern using a raw string. The pattern is set as \w+@\w+\.\w+, which matches an email address pattern. Let's break down the pattern:
- \w+: Matches one or more word characters (alphanumeric characters and underscores).
- @: Matches the "@" symbol.
- \.: Matches a period (escaped with a backslash because the period is a special character in regular expressions).

We define a text string that contains two email addresses. We use the findall() function from the re module to find all occurrences of the pattern within the text. The matching email addresses are stored in the matches variable, which is then printed.

### 6.3. Pattern substitution

The re module allows you to perform pattern substitution, replacing matched patterns with desired text. This feature is useful when you need to replace specific patterns or manipulate text based on certain matching conditions.

In [3]:
import re

# Replace a pattern in a string
pattern = r"apple"
text = "I have an apple"
new_text = re.sub(pattern, "orange", text)
print(new_text)

I have an orange


In this code snippet, we import the re module. We define a pattern using a raw string as "apple". We define a text string as "I have an apple". We use the sub() function from the re module to substitute the pattern with the word "orange". The resulting string after substitution is stored in the new_text variable, which is then printed.

### 6.4. Splitting strings using patterns

The re module allows you to split strings based on specific patterns, rather than fixed delimiters. This feature is useful when you want to split a string into substrings based on variable patterns.

In [4]:
import re

# Split a string using a pattern
pattern = r"\s+"
text = "Hello    World"
substrings = re.split(pattern, text)
print(substrings)

['Hello', 'World']


In this example, we import the re module. We define a pattern using a raw string as \s+, which matches one or more whitespace characters. We define a text string as "Hello World", which contains multiple spaces. We use the split() function from the re module to split the string into substrings based on the pattern. The resulting substrings are stored in the substrings variable, which is then printed.

### 6.5. Summary

The re module in Python provides powerful capabilities for pattern matching and text manipulation through regular expressions. Regular expressions allow you to define search patterns, match text based on specific rules, perform substitutions, and split strings using patterns. By leveraging the features of the re module, you can efficiently handle tasks such as data validation, text parsing, and string manipulation in your Python programs. Regular expressions are widely used in various domains, making the re module a valuable tool for working with textual data.