# <font color="#418FDE" size="6.5" uppercase>**Using Python `re`**</font>

>Last update: 20251224.
    
By the end of this Lecture, you will be able to:
- Apply `re.match`, `re.search`, `re.findall`, and `re.finditer` to solve basic text-search tasks. 
- Use compiled regex objects to improve clarity and reuse of patterns in Python code. 
- Control regex behavior with common flags such as `re.IGNORECASE` and `re.MULTILINE`. 


## **1. Core re Search Functions**

### **1.1. Match vs Search**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_01_01.jpg?v=1766631340" width="250">



>* Match checks pattern only at string start
>* Useful for validating inputs with strict starting format

>* Search scans forward for first matching pattern
>* Great for finding patterns appearing anywhere in text

>* Choose start-only or anywhere search carefully
>* Use start-only for validation, search for discovery



In [None]:
#@title Python Code - Match vs Search

# Demonstrate difference between match and search functions.
# Show how match checks only string beginnings.
# Show how search scans entire string content.

import re  # Import regular expression module for pattern matching.

text_example = "User: Alice, status: active"  # Example text string for demonstration.
pattern_status = r"status"  # Simple pattern that appears later inside text.

match_result = re.match(pattern_status, text_example)  # Try matching at string beginning.
search_result = re.search(pattern_status, text_example)  # Try searching anywhere inside string.

print("Text being checked:", text_example)  # Display the example text for clarity.
print("Pattern being used:", pattern_status)  # Display the pattern used for matching.

print("Result using re.match:", match_result)  # Show that match likely returns None here.
print("Result using re.search:", bool(search_result))  # Show that search successfully finds pattern.

if search_result:  # Check if search_result is a successful match object.
    print("Search found at index:", search_result.start())  # Show starting index of found pattern.
    print("Matched substring from search:", search_result.group())  # Show exact matched substring.




### **1.2. Using findall vs finditer**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_01_02.jpg?v=1766631372" width="250">



>* One tool returns all matched text at once
>* Great for quick extraction when positions donâ€™t matter

>* Iterator returns match objects with detailed information
>* Supports context, large texts, and memory efficiency

>* Use list results for quick aggregate processing
>* Use match iterator for positions and flexibility



In [None]:
#@title Python Code - Using findall vs finditer

# Demonstrate difference between findall and finditer clearly.
# Show list of matched strings versus detailed match objects.
# Highlight positions useful for tasks like simple text highlighting.

import re  # Import regular expression module for pattern searching.

text = "Order AA-123 arrived, order BB-456 shipped, order CC-789 pending."  # Example text.
pattern = r"[A-Z]{2}-\d{3}"  # Pattern matching simple product style codes.

codes_list = re.findall(pattern, text)  # Use findall to get list of strings.
print("Using findall, codes list:", codes_list)  # Show collected codes list.

codes_iter = re.finditer(pattern, text)  # Use finditer to get match iterator.
print("\nUsing finditer, detailed matches:")  # Explain upcoming detailed output.

for match in codes_iter:  # Loop through each match object from iterator.
    code = match.group(0)  # Extract matched text from current match object.
    start = match.start()  # Get starting index of current match within text.
    end = match.end()  # Get ending index of current match within text.
    print(f"Code {code} found from index {start} to {end}.")  # Print details.




### **1.3. Match Object Basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_01_03.jpg?v=1766631402" width="250">



>* Match functions return detailed match objects, not strings
>* They store matched text, positions, and captured groups

>* Match objects store each regex group separately
>* Helps build structured data, like email domains

>* Match objects store span to preserve context
>* Enable precise edits, reducing errors in processing



In [None]:
#@title Python Code - Match Object Basics

# Demonstrate basic regex match objects with simple timestamp example.
# Show matched text, its span positions, and captured groups clearly.
# Help beginners see how match objects answer important search questions.

import re  # Import regular expression module for pattern matching.

log_line = "User logged in at 07:45 PM from New York."  # Example log line.
pattern = r"(\d{2}):(\d{2})\s?(AM|PM)"  # Pattern capturing hour, minute, and period.

match = re.search(pattern, log_line)  # Search for timestamp pattern within the log line.

if match is not None:  # Check whether the pattern matched successfully.
    print("Full match:", match.group(0))  # Print entire matched timestamp substring.
    print("Match span:", match.span(0))  # Print start and end index positions for full match.

    print("Hour group:", match.group(1))  # Print captured hour group from timestamp.
    print("Minute group:", match.group(2))  # Print captured minute group from timestamp.

    print("Period group:", match.group(3))  # Print captured AM or PM group clearly.
    before_text = log_line[: match.start(0)]  # Text before timestamp using match start index.

    after_text = log_line[match.end(0) :]  # Text after timestamp using match end index.
    print("Before timestamp:", before_text.strip())  # Show context before timestamp clearly.

    print("After timestamp:", after_text.strip())  # Show context after timestamp clearly.
else:
    print("No timestamp pattern was found in this log line.")  # Inform when no match occurs.




## **2. Using Compiled Regex**

### **2.1. Compiling Regex Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_02_01.jpg?v=1766631424" width="250">



>* Compiled regex turns patterns into reusable tools
>* This improves code clarity, structure, and maintainability

>* Compile once, then reuse pattern methods everywhere
>* Improves readability, modularity, and reduces regex mistakes

>* Treat compiled patterns as named, reusable design elements
>* Centralized compilation reduces duplication and simplifies maintenance



In [None]:
#@title Python Code - Compiling Regex Patterns

# Demonstrate compiling regex patterns for reuse and clarity.
# Compare direct pattern usage with compiled pattern objects.
# Show how compiled patterns simplify repeated text checks.

import re  # Import regular expression module from Python standard library.

emails = ["alice@example.com", "BOB@EXAMPLE.COM", "not-an-email"]
# Define a simple pattern string that roughly matches email like text.
pattern_text = r"[\w.+-]+@[\w.-]+"  # Raw string avoids escaping backslashes.

print("Using direct pattern with re.search each time:")
for address in emails:
    match = re.search(pattern_text, address)
    print("Address:", address, "->", "matched" if match else "no match")

print("\nUsing compiled pattern object for repeated checks:")
compiled_email_pattern = re.compile(pattern_text)  # Compile pattern once for reuse.
for address in emails:
    match = compiled_email_pattern.search(address)
    print("Address:", address, "->", "matched" if match else "no match")

print("\nCompiled pattern object representation:", compiled_email_pattern)



### **2.2. Reusing Compiled Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_02_02.jpg?v=1766631455" width="250">



>* Compile a pattern once, reuse it everywhere
>* Saves processing time and makes code clearer

>* Define compiled patterns once, reuse across code
>* Centralized patterns simplify updates and maintenance

>* Reuse compiled patterns for speed and efficiency
>* Shared patterns improve testing, reliability, and consistency



In [None]:
#@title Python Code - Reusing Compiled Patterns

# Demonstrate compiling one pattern and reusing it many times.
# Show difference between raw pattern strings and compiled pattern objects.
# Keep example simple, clear, and beginner friendly.

import re  # Import regular expression module from Python standard library.

email_pattern_text = r"[\w.+-]+@[\w-]+\.[\w.-]+"  # Define email pattern string.
compiled_email_pattern = re.compile(email_pattern_text)  # Compile pattern once for reuse.

sample_emails = ["alice@example.com", "bob_at_example.com", "carol@test.org"]
print("Checking emails using compiled pattern object:")

for address in sample_emails:  # Loop through each email like string.
    match = compiled_email_pattern.fullmatch(address)  # Reuse compiled pattern here.
    if match:  # Check whether compiled pattern matched entire string.
        print(f"Valid email found: {address}")
    else:
        print(f"Invalid email found: {address}")

print("\nSame check using raw pattern string repeatedly:")

for address in sample_emails:  # Loop again using direct re.fullmatch calls.
    match = re.fullmatch(email_pattern_text, address)  # Recompile pattern each call.
    result_text = "valid" if match else "invalid"
    print(f"Email {address} considered {result_text} here.")



### **2.3. Persisting Regex Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_02_03.jpg?v=1766631469" width="250">



>* Treat regex patterns as reusable, persistent configuration
>* Design stable, documented patterns that simplify maintenance

>* Store regex patterns as external configuration data
>* Load, compile, and update patterns without code

>* Centralize and version control shared regex patterns
>* Ensure consistent, auditable, reusable text processing across systems



In [None]:
#@title Python Code - Persisting Regex Patterns

# Demonstrate persisting regex patterns using a simple configuration file.
# Load pattern strings from a file, then compile them for repeated use.
# Show how changing the file updates behavior without changing Python code.

import re
import json

config_filename = "regex_config.json"

config_data = {
    "patterns": {
        "email": {
            "description": "Simple email pattern for basic matching.",
            "pattern": r"[\w.+-]+@[\w-]+\.[A-Za-z]{2,}"
        },
        "zip_code": {
            "description": "US five digit postal code pattern.",
            "pattern": r"\b\d{5}\b"
        }
    }
}

with open(config_filename, "w", encoding="utf-8") as config_file:
    json.dump(config_data, config_file, indent=2)

with open(config_filename, "r", encoding="utf-8") as config_file:
    loaded_config = json.load(config_file)

compiled_patterns = {}

for name, info in loaded_config["patterns"].items():
    compiled_patterns[name] = re.compile(info["pattern"])

sample_text = "Contact us at help@example.com or visit ZIP 90210 today."

email_match = compiled_patterns["email"].search(sample_text)

zip_match = compiled_patterns["zip_code"].search(sample_text)

print("Loaded pattern names from configuration file:", list(compiled_patterns.keys()))

print("Found email match using persisted pattern:", email_match.group(0))

print("Found ZIP code match using persisted pattern:", zip_match.group(0))

print("Configuration file can be edited without changing this Python script.")



## **3. Python Regex Flags**

### **3.1. Case Insensitive Matching**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_03_01.jpg?v=1766631509" width="250">



>* Case insensitive regex handles any capitalization automatically
>* One pattern matches Error, ERROR, and error consistently

>* Pattern stays same; letter comparison becomes flexible
>* One pattern matches all capitalization styles across contexts

>* Use case insensitivity only when meaning unchanged
>* Balance flexible matching with precise, accurate results



In [None]:
#@title Python Code - Case Insensitive Matching

# Demonstrate case insensitive regex matching using Python re module.
# Compare normal matching with case insensitive flag usage.
# Show how one pattern matches many capitalizations easily.

import re  # Import regular expression module for pattern matching.

text = "Error, ERROR, error, and eRrOr occurred in server log."  # Example text string.
pattern = r"error"  # Simple pattern that searches for word error.

matches_sensitive = re.findall(pattern, text)  # Case sensitive search without flags.
matches_insensitive = re.findall(pattern, text, re.IGNORECASE)  # Case insensitive search.

print("Case sensitive matches:", matches_sensitive)  # Likely finds only lowercase error.
print("Case insensitive matches:", matches_insensitive)  # Finds every capitalization variation.

print("Number of case insensitive matches:", len(matches_insensitive))  # Show total flexible matches.



### **3.2. Multiline and Dotall Flags**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_03_02.jpg?v=1766631530" width="250">



>* Caret and dollar normally anchor whole strings
>* Multiline flag makes anchors work per line

>* Multiline lets anchors match each log line
>* Helps filter lines and align patterns to structure

>* Dotall lets the dot cross newline boundaries
>* Choose dotall for capturing multi-line text spans



In [None]:
#@title Python Code - Multiline and Dotall Flags

# Demonstrate multiline flag behavior with caret and dollar anchors.
# Demonstrate dotall flag behavior with dot matching newline characters.
# Compare outputs to understand multi line and dotall regex flags.

import re  # Import regular expression module for pattern matching.

text = """2024-01-01 INFO Start server
2024-01-01 ERROR Disk full
2024-01-02 INFO Shutdown complete"""  # Multi line log text.

pattern_start = r"^\d{4}-\d{2}-\d{2}"  # Pattern matching date at each line start.

print("Without MULTILINE flag, start matches:")  # Explain first printed result meaning.
print(re.findall(pattern_start, text))  # Show matches only from entire string start.

print("\nWith MULTILINE flag, start matches:")  # Explain second printed result meaning.
print(re.findall(pattern_start, text, flags=re.MULTILINE))  # Show matches from each line.

body_text = "Subject: Report\n\nLine one of body.\nLine two of body."  # Email style text.

pattern_body = r"Subject:.*body\."  # Pattern trying to capture full email body.

print("\nWithout DOTALL flag, body match:")  # Explain third printed result meaning.
print(re.search(pattern_body, body_text))  # Show match stopping before newline character.

print("\nWith DOTALL flag, body match:")  # Explain fourth printed result meaning.
match = re.search(pattern_body, body_text, flags=re.DOTALL)  # Allow dot matching newlines.
print(match.group(0) if match else "No match found")  # Print full matched multi line body.



### **3.3. Combining Regex Flags**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_01/Lecture_C/image_03_03.jpg?v=1766631555" width="250">



>* Combine multiple flags to fine-tune patterns
>* Use one expression instead of many separate searches

>* Combine flags to search messy, inconsistent text
>* Match keywords across lines, ignoring capitalization reliably

>* Combine flags to describe behavior, not data
>* Creates flexible, reusable, and maintainable regex patterns



In [None]:
#@title Python Code - Combining Regex Flags

# Demonstrate combining regex flags for flexible text searching.
# Show differences between single flags and multiple combined flags.
# Help beginners see how flags change pattern behavior.

import re  # Import regular expression module from Python standard library.

text_block = "Error: user FAILED login.\nwarning: user failed logout.\nERROR: system failed reboot."

pattern = r"^error:.*failed"  # Pattern uses anchors and wildcard for matching.

print("Without flags, matches found:")
for match in re.finditer(pattern, text_block):
    print(" ", repr(match.group(0)))

print("\nWith IGNORECASE and MULTILINE flags combined:")
combined_flags = re.IGNORECASE | re.MULTILINE  # Combine flags using bitwise OR.
for match in re.finditer(pattern, text_block, combined_flags):
    print(" ", repr(match.group(0)))

print("\nWith IGNORECASE, MULTILINE, and DOTALL flags combined:")
multiline_text = "Error: user FAILED login.\nextra details failed here."
flags_all = re.IGNORECASE | re.MULTILINE | re.DOTALL  # Stack three flags together.
for match in re.finditer(pattern, multiline_text, flags_all):
    print(" ", repr(match.group(0)))



# <font color="#418FDE" size="6.5" uppercase>**Using Python `re`**</font>


In this lecture, you learned to:
- Apply `re.match`, `re.search`, `re.findall`, and `re.finditer` to solve basic text-search tasks. 
- Use compiled regex objects to improve clarity and reuse of patterns in Python code. 
- Control regex behavior with common flags such as `re.IGNORECASE` and `re.MULTILINE`. 

In the next Module (Module 2), we will go over 'Intermediate Patterns'