### **Regular Expressions**
Regular expressions (regex) in Python are powerful tools for matching and manipulating text based on specific patterns. The re module in Python provides support for working with regular expressions. By defining patterns using a combination of literal characters and metacharacters, you can perform complex string matching, searching, and substitution operations efficiently. Understanding regular expressions is essential for tasks such as data validation, parsing, and text processing.

Common Regex Patterns and Rules:

Literal Characters: Match the exact characters in the pattern.

Example: pattern = r"abc" matches the string "abc".
Character Classes: Match any character within the brackets.

- [aeiou] matches any vowel.
- [A-Z] matches any uppercase letter.
- [0-9] matches any digit.
- [^0-9] matches any character except digits.

Shorthand Character Classes:

- \d matches any digit ([0-9]).
- \D matches any non-digit character ([^0-9]).
- \w matches any alphanumeric character ([a-zA-Z0-9_]).
- \W matches any non-alphanumeric character ([^a-zA-Z0-9_]).
- \s matches any whitespace character ([ \t\n\r\f\v]).
- \S matches any non-whitespace character ([^ \t\n\r\f\v]).

Anchors:

- ^ asserts the start of a line.
- $ asserts the end of a line.

Quantifiers:

- * matches zero or more occurrences of the preceding element.
- + matches one or more occurrences.
- ? matches zero or one occurrence.
- {n} matches exactly n occurrences.
- {n,} matches n or more occurrences.
- {n,m} matches between n and m occurrences.

Grouping and Alternation:

- Parentheses () group patterns together.
- The pipe | denotes alternation (i.e., logical OR).
- Escaping Special Characters: Use \ to escape special characters like ., *, ?, etc.

In [None]:
"""
Objective: Use a regular expression to check if a string contains the word 'Python'.
"""
import re

text = "I am learning Python programming."

# Define the pattern to search for
pattern = r"Python"

# Search for the pattern in the text
match = re.search(pattern, text)

if match:
    print("Match found!")
else:
    print("No match found.")

# TODO: Modify the pattern to make the search case-insensitive.


In [None]:
"""
Objective: Use a regular expression to find all email addresses in a given text.
"""
import re

text = """
Please contact us at support@example.com for further information.
You can also reach out to sales@example.org or admin@example.net.
"""

# Define the email pattern
pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

# Find all matches in the text
emails = re.findall(pattern, text)

print("Extracted email addresses:", emails)

# TODO: Modify the pattern to exclude email addresses with certain domains (e.g., example.org).


In [None]:
"""
Objective: Write a regular expression to validate phone numbers in the format (123) 456-7890.
"""
import re

phone_numbers = [
    "(123) 456-7890",
    "(987) 654-3210",
    "123-456-7890",
    "(123)456-7890"
]

# Define the phone number pattern
pattern = r"\(\d{3}\) \d{3}-\d{4}"

for number in phone_numbers:
    if re.match(pattern, number):
        print(f"{number} is a valid phone number.")
    else:
        print(f"{number} is not a valid phone number.")

# TODO: Extend the pattern to match phone numbers with an optional country code, e.g., +1 (123) 456-7890.


In [None]:
"""
Objective: Use regular expressions to split a string by commas, semicolons, or spaces.
"""
import re

text = "apple, banana; orange grape,pear;melon"

# Define the pattern for delimiters
pattern = r"[,\s;]+"

# Split the text based on the pattern
fruits = re.split(pattern, text)

print("List of fruits:", fruits)

# TODO: Modify the pattern to also split on colons (:) and periods (.).


In [None]:
"""
Objective: Use regular expressions to replace all occurrences of 'cat' with 'dog' in a given text.
"""
import re

text = "The cat sat on the mat. The cat is cute."

# Define the pattern to search for
pattern = r"cat"

# Replace 'cat' with 'dog'
new_text = re.sub(pattern, "dog", text)

print("Updated text:", new_text)

# TODO: Modify the pattern to replace 'cat' only when it appears as a whole word.


In [None]:
"""
Objective: Write a regular expression to extract dates in the format DD/MM/YYYY from a text.
"""
import re

text = """
John's birthday is on 12/05/1990.
The project deadline is 30/09/2025.
"""

# Define the date pattern
pattern = r"\b\d{2}/\d{2}/\d{4}\b"

# Find all dates in the text
dates = re.findall(pattern, text)

print("Extracted dates:", dates)

# TODO: Modify the pattern to extract dates in the format YYYY-MM-DD as well.


In [None]:
"""
Objective: Use regular expression groups to extract the area code and main number from phone numbers.
"""
import re

phone_number = "(123) 456-7890"

# Define the pattern with groups
pattern = r"\((\d{3})\) (\d{3}-\d{4})"

# Search for the pattern in the phone number
match = re.search(pattern, phone_number)

if match:
    area_code = match.group(1)
    main_number = match.group(2)
    print("Area Code:", area_code)
    print("Main Number:", main_number)
else:
    print("No match found.")

# TODO: Modify the pattern to handle phone numbers with or without parentheses around the area code.


In [None]:
"""
Objective: Compile a regular expression pattern for repeated use to improve efficiency.
"""
import re

texts = [
    "Error: File not found.",
    "Warning: Low disk space.",
    "Error: Access denied."
]

# Compile the pattern
pattern = re.compile(r"Error: (.+)")

for text in texts:
    match = pattern.search(text)
    if match:
        print("Error message:", match.group(1))

# TODO: Add a pattern to also capture 'Warning' messages.


In [None]:
"""
Objective: Use lookahead and lookbehind assertions to find words surrounded by specific characters.
"""
import re

text = "The price is $100. The discount is 20%."

# Define the pattern with lookahead and lookbehind
pattern = r"(?<=\$)\d+"

# Find all matches in the text
prices = re.findall(pattern, text)

print("Prices found:", prices)

# TODO: Modify the pattern to also find percentages (numbers followed by '%').


In [None]:
"""
Objective: Write a regular expression to remove HTML tags from a string.
"""
import re

html = "<p>This is a <b>bold</b> paragraph.</p>"

# Define the pattern to match HTML tags
pattern = r"<.*?>"

# Remove HTML tags
clean_text = re.sub(pattern, "", html)

print("Cleaned text:", clean_text)

# TODO: Modify the pattern to handle nested tags correctly.


### **Reflection**
Reflect on how regular expressions can simplify complex string processing tasks. Consider these questions:

- How do regular expressions improve the efficiency of text searching and manipulation?
- What are the potential pitfalls of using overly complex regular expressions?
- How can you ensure that your regular expressions are both efficient and maintainable?

(answer here)

### **Exploration**
For further exploration, research advanced regular expression features such as named groups, non-capturing groups, and recursive patterns. Additionally, explore regex performance optimization techniques