## Regular Expressions
###B asics of Regular Expressions
Regular expressions are a way of describing patterns in string data. They're a powerful tool for manipulating text. Python's re module provides support for regular expressions.

First, you'll need to import the re module:

In [1]:
import re

To create a regex object, use the re.compile() function:

In [2]:
# This is a regex object
regex = re.compile('abc')

This object represents the pattern 'abc'.

You can use the search() method to search for this pattern in a string:

In [3]:
# Search for the pattern
match = regex.search('abcdef')

This method returns a match object, which contains information about the match. If the pattern is not found, the method returns None.

You can use the group() method of the match object to get the actual matched text:

In [5]:
# Print the matched text
print(match.group())

abc


### Regex Groups and The Pipe Character
#### Regex Groups
You can define groups in a regex pattern using parentheses. For example, the pattern (abc) defines a group containing the pattern 'abc'.

In [4]:
# This regex has a group
regex = re.compile('(abc)')

The group() method of the match object can take an argument to return the text of a specific group. group(0) returns the entire match, group(1) returns the first group, and so on.

#### The Pipe Character

The pipe character | represents "or". It matches either the pattern before it or the pattern after it.

In [None]:
# This regex matches either 'abc' or 'def'
regex = re.compile('abc|def')

This regex matches either 'abc' or 'def'.

### Repetition in Regex Patterns and Greedy/Nongreedy Matching
#### Repetition in Regex Patterns
There are several characters that can represent repetition:
* *: Zero or more times.
* +: One or more times.
* ?: Zero or one time.
* {n}: Exactly n times.
* {n,}: n or more times.
* {,m}: m or fewer times.
* {n,m}: Between n and m times.

Here's an example:

In [None]:
# This regex matches one or more 'a's
regex = re.compile('a+')

#### Greedy and Nongreedy Matching

By default, Python's regex are greedy: they match the longest possible string. You can make them nongreedy (matching the shortest possible string) by following the repetition character with a ?.

In [None]:
# This regex matches as few 'a's as possible
regex = re.compile('a+?')

### Regex Character Classes and the findall() Method
#### Regex Character Classes
A character class matches any character in the class. Here are some predefined character classes:

* \d: Any digit (0-9).
* \D: Any non-digit.
* \s: Any whitespace character.
* \S: Any non-whitespace character.
* \w: Any alphanumeric character (a-z, A-Z, 0-9, _).
* \W: Any non-alphanumeric character.

In [None]:
# This regex matches any digit
regex = re.compile('\d')

#### The findall() Method

The findall() method returns all matches of a pattern in a string, as a list of strings. The strings will be in the order they were found.

In [None]:
# Find all matches
matches = regex.findall('123 abc 456 def')

### Regex Dot-Star and The Caret/Dollar Characters
#### The Dot-Star
In regex, the dot . matches any character except a newline. The star * matches zero or more repetitions of the preceding character. Together, .* matches any number of any characters.

In [None]:
# This regex matches any number of any characters
regex = re.compile('.*')

#### The Caret and Dollar Characters

The caret ^ matches the start of the string, and the dollar $ matches the end of the string.

In [None]:
# This regex matches a string that starts with 'abc' and ends with 'def'
regex = re.compile('^abc.*def$')

### Regex sub() Method and Verbose Mode
#### The sub() Method
The sub() method replaces all matches of a pattern with a replacement string:

In [None]:
# Replace all matches
regex = re.compile('\d')
new_string = regex.sub('X', '123 abc 456 def')

#### Verbose Mode

Verbose mode allows you to write regex that's easier to read and understand. You can add comments and whitespace to your regex.

In [None]:
# This regex is in verbose mode
regex = re.compile('''
    ^    # Start of the string
    abc  # 'abc'
    .*   # Any number of any characters
    def  # 'def'
    $    # End of the string
''', re.VERBOSE)