# Regular Expressions

### 1. Basic Pattern Matching

Regular expressions are used to search for specific patterns in text. You define a pattern using a combination of characters, metacharacters, and special sequences.

In [37]:
import re

In [49]:
string =  "The curious cat, known for its agility and independence, prowled the garden in search of adventure. With a swift leap, the cat pounced on a fallen leaf, a momentary victory in its playful escapade. Nearby, a second cat observed the scene, its eyes fixed on the first cat's antics. Suddenly, a third cat appeared, joining the playful duo, and together they formed a trio of frolicking felines. As the sun began to set, the cats' energy waned, and they settled down for a well-deserved catnap under the shade of a tall oak tree."
pattern = "The"
match = re.match(pattern, string)

print(f"Type of pattern: {type(pattern)}")
print(f"Pattern: {pattern}")
print(f"Type of match: {type(match)}")
print(f"Match: {match}")
print(match.start(), match.end(), match.group())

Type of pattern: <class 'str'>
Pattern: The
Type of match: <class 're.Match'>
Match: <re.Match object; span=(0, 3), match='The'>
0 3 The


In [50]:
# re.match() only search from the beginning while re.search() scan through the entire string

string = "The curious cat, known for its agility and independence, prowled the garden in search of adventure. With a swift leap, the cat pounced on a fallen leaf, a momentary victory in its playful escapade. Nearby, a second cat observed the scene, its eyes fixed on the first cat's antics. Suddenly, a third cat appeared, joining the playful duo, and together they formed a trio of frolicking felines. As the sun began to set, the cats' energy waned, and they settled down for a well-deserved catnap under the shade of a tall oak tree."
pattern = "cat"
match = re.match(pattern, string)
print(f"Match: {match}")
search = re.search(pattern, string)
print(f"Search: {search}")

Match: None
Search: <re.Match object; span=(12, 15), match='cat'>


In [52]:
# re.compile()

string = "The curious cat, known for its agility and independence, prowled the garden in search of adventure. With a swift leap, the cat pounced on a fallen leaf, a momentary victory in its playful escapade. Nearby, a second cat observed the scene, its eyes fixed on the first cat's antics. Suddenly, a third cat appeared, joining the playful duo, and together they formed a trio of frolicking felines. As the sun began to set, the cats' energy waned, and they settled down for a well-deserved catnap under the shade of a tall oak tree."
pattern = re.compile("The")
match = re.match(pattern, string)

print(f"Type of pattern: {type(pattern)}")
print(f"Pattern: {pattern}")
print(f"Type of match: {type(match)}")
print(f"Match: {match}")
print(match.start(), match.end(), match.group())

Type of pattern: <class 're.Pattern'>
Pattern: re.compile('The')
Type of match: <class 're.Match'>
Match: <re.Match object; span=(0, 3), match='The'>
0 3 The


In [48]:
# re.findall()

string = "The curious cat, known for its agility and independence, prowled the garden in search of adventure. With a swift leap, the cat pounced on a fallen leaf, a momentary victory in its playful escapade. Nearby, a second cat observed the scene, its eyes fixed on the first cat's antics. Suddenly, a third cat appeared, joining the playful duo, and together they formed a trio of frolicking felines. As the sun began to set, the cats' energy waned, and they settled down for a well-deserved catnap under the shade of a tall oak tree."

pattern = re.compile("cat")
matches = re.findall(pattern, string)

if matches:
    print(f"Found: {matches}")
else:
    print("Pattern not found.")

Found: ['cat', 'cat', 'cat', 'cat', 'cat', 'cat', 'cat']


In [58]:
# split text based on pattern

string = "coconut, durian, papaya; lychee"
pattern = re.compile("[,;]")
texts = re.split(pattern, string)
texts

['coconut', ' durian', ' papaya', ' lychee']

### 1.2. Metacharacters & Character classes

Metacharacters like `.`, `*`, `+`, `?`, `[]`, `()`, `|`, `^`, and `$` have special meanings in regular expressions and are used to specify the behavior of the pattern.

- The **square brackets ([])** character class define a character class, allowing you to match any one character from a set. For example, `[aeiou]` matches any vowel.

In [82]:
pattern = "[aeiou]"
string = "The quick brown fox jumps over the lazy dog."
vowels = re.findall(pattern, string)
print(f"Vowels: {vowels}")

Vowels: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o']


In [96]:
# Matching a range of character
string = "Call me by this number +79131234567"
pattern = "[\+0-9]"
matches = re.findall(pattern, string)

if matches:
    print("".join(matches))

+79131234567


In [None]:
# EXERCISE
# Use re library to count number of vowels and number of consonants in this string
string = "The quick brown fox jumps over the lazy dog."

- **Anchors**: Anchors like `^` (start of a line) and `$` (end of a line) are used to specify where in the text a match should occur.

In [98]:
import re

def is_valid_variable_name(variable_name):
    # Define the regular expression pattern
    pattern = re.compile("^[a-zA-Z_][a-zA-Z0-9_]*$")

    # Use re.match() to check if the entire string matches the pattern
    match = re.match(pattern, variable_name)

    return match is not None

# Test the function with some variable names
variable_names = ["my_variable", "123invalid", "_underscore", "name_with$pecial_chars"]

for name in variable_names:
    if is_valid_variable_name(name):
        print(f"'{name}' is valid.")
    else:
        print(f"'{name}' is invalid.")


'my_variable' is valid.
'123invalid' is invalid.
'_underscore' is valid.
'name_with$pecial_chars' is invalid.


- **Flags**: Flags, such as `re.IGNORECASE`, can be used to modify the behavior of the regular expression matching. For example, `re.IGNORECASE` makes the matching case-insensitive.

In [111]:
# Sample text containing words in different case variations
text = "The quick brown Fox jumped over the LAZY Dog."

# Regular expression pattern to match "fox" in a case-insensitive manner
pattern = r'fox'

# Use re.findall() without the flag
matches_without_flag = re.findall(pattern, text)
print("Without IGNORECASE flag:")
print(matches_without_flag)

# Use re.findall() with the IGNORECASE flag
matches_with_flag = re.findall(pattern, text, flags=re.IGNORECASE)
print("\nWith IGNORECASE flag:")
print(matches_with_flag)


Without IGNORECASE flag:
[]

With IGNORECASE flag:
['Fox']


# REFERENCES

1. [Python Docs](https://docs.python.org/3/howto/regex.html#regex-howto)