# <font color="#418FDE" size="6.5" uppercase>**Lookarounds and Context**</font>

>Last update: 20251224.
    
By the end of this Lecture, you will be able to:
- Differentiate between positive and negative lookahead and lookbehind assertions in regex patterns. 
- Construct patterns that use lookarounds to enforce contextual rules without including the context in the match. 
- Apply lookaround-based patterns in Python to solve complex extraction problems on semi-structured text. 


## **1. Regex lookahead basics**

### **1.1. Positive Lookahead Essentials**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_01_01.jpg?v=1766635423" width="250">



>* Positive lookahead checks upcoming text without consuming
>* Enforces following context while excluding it from matches

>* Match a word only if followed
>* Future context required but not part matched

>* Layer multiple lookaheads to enforce complex context
>* Separate captured text from required surrounding conditions



In [None]:
#@title Python Code - Positive Lookahead Essentials

# Demonstrate positive lookahead matching in simple Python examples.
# Show how text can be required without being fully captured.
# Compare matches with and without positive lookahead usage.

import re  # Import regular expression module for pattern matching.

text = "speed 60mph, speed 70km, speed 55mph"  # Example text with mixed units.
pattern_no_lookahead = r"speed \d+"  # Match speed word and following digits only.
pattern_with_lookahead = r"speed \d+(?=mph)"  # Require mph after digits without capturing.

matches_no_lookahead = re.findall(pattern_no_lookahead, text)  # Find basic matches.
matches_with_lookahead = re.findall(pattern_with_lookahead, text)  # Find mph constrained matches.

print("Text:", text)  # Show the original example text.
print("Without positive lookahead:", matches_no_lookahead)  # Show all speed matches.
print("With positive lookahead for mph:", matches_with_lookahead)  # Show mph only matches.



### **1.2. Negative lookahead basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_01_02.jpg?v=1766635446" width="250">



>* Matches text only when a pattern doesn’t follow
>* Peeks ahead, blocks forbidden futures without capturing

>* Positive checks presence; negative checks absence ahead
>* Lookahead peeks without consuming, enforcing context rules

>* Use negative lookahead to exclude specific follow-ups
>* Combine it with patterns to avoid forbidden futures



In [None]:
#@title Python Code - Negative lookahead basics

# Demonstrate negative lookahead pattern using simple product codes.
# Show difference between allowed and blocked future suffixes.
# Print matches that avoid a forbidden suffix pattern.

import re  # Import regular expression module for pattern matching.

text = """ABC123-OK ABC123-TEST ABC123-OLD XYZ999-OK XYZ999-OLD"""

pattern = r"\b[A-Z]{3}\d{3}(?!-OLD)\b"  # Match codes not followed by -OLD.

matches = re.findall(pattern, text)  # Find all codes that satisfy negative lookahead.

print("All product codes in text:")
print(text)

print("\nCodes matched using negative lookahead (not followed by -OLD):")
print(matches)



### **1.3. Practical Lookaround Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_01_03.jpg?v=1766635466" width="250">



>* Lookaheads enforce rules about following context only
>* They test future text without including it

>* Lookbehinds check context that comes before matches
>* Use positive or negative lookbehind to filter

>* Combine lookarounds to enforce rich contextual rules
>* Filter clean matches while excluding unwanted surrounding text



In [None]:
#@title Python Code - Practical Lookaround Patterns

# Demonstrate practical lookahead and lookbehind patterns with simple text examples.
# Show how context can filter matches without including surrounding words.
# Use log lines and emails to compare positive and negative lookarounds.

import re

log_text = """ERROR42 critical disk failure
ERROR42 minor disk warning
ERROR99 critical memory failure
ERROR99 minor memory warning"""

print("Log lines with ERROR codes and statuses.\n")
print(log_text)

pattern_critical = re.compile(r"ERROR\d+(?=.*critical)")
pattern_noncritical = re.compile(r"ERROR\d+(?!.*critical)")

critical_matches = pattern_critical.findall(log_text)
noncritical_matches = pattern_noncritical.findall(log_text)

print("\nError codes followed by critical somewhere later on line.")
print("Critical codes:", sorted(set(critical_matches)))

print("\nError codes not followed by critical anywhere later on line.")
print("Noncritical codes:", sorted(set(noncritical_matches)))

email_text = """Do not reply john@example.com Support Engineer
Contact jane@example.com Product Manager
No-reply bot@example.com Automation Service"""

print("\nEmail lines with possible signature addresses.\n")
print(email_text)

pattern_signature = re.compile(r"(?<!Do not reply )(?!No-reply )\b[\w.]+@[\w.]+\b(?= [A-Z][a-z]+ )")

signature_emails = pattern_signature.findall(email_text)

print("\nSignature emails allowed by combined lookbehind and lookahead.")
print("Signature emails:", signature_emails)



## **2. Regex Lookbehind Essentials**

### **2.1. Positive Lookbehind Basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_02_01.jpg?v=1766635492" width="250">



>* Match text only when preceded by context
>* Use preceding markers without including them in matches

>* Lookbehind checks text just before a match
>* It requires context but excludes it from results

>* Use lookbehind to match data after cues
>* Patterns stay clean, avoiding extra filtering steps



In [None]:
#@title Python Code - Positive Lookbehind Basics

# Demonstrate positive lookbehind usage with simple price examples.
# Show how context controls matches without including context characters.
# Compare matches with and without positive lookbehind usage.

import re  # Import regular expression module for pattern matching.

text = "Total: $25, Tip: $5, Tax: £3, Fee: $7"  # Example mixed currency text.

pattern_any_dollars = r"\$\d+"  # Match dollar amounts including dollar symbol.

pattern_lookbehind = r"(?<=Tip: )\$\d+"  # Match dollar amount only after Tip label.

any_dollar_matches = re.findall(pattern_any_dollars, text)  # Find all dollar amounts.

lookbehind_matches = re.findall(pattern_lookbehind, text)  # Find only tip dollar amount.

print("All dollar amounts including symbol:", any_dollar_matches)  # Show all matches.

print("Dollar amount after 'Tip:' label only:", lookbehind_matches)  # Show filtered matches.

print("Original text for reference:", text)  # Display original example text again.



### **2.2. Negative lookbehind basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_02_02.jpg?v=1766635515" width="250">



>* Negative lookbehind blocks matches with forbidden prefixes
>* It checks prior context without consuming characters

>* Engine checks behind each position for forbidden patterns
>* Allows matches only when disallowed context is absent

>* Combine negative lookbehind with other patterns for precision
>* Carefully choose forbidden context to enforce rules



In [None]:
#@title Python Code - Negative lookbehind basics

# Demonstrate negative lookbehind usage with simple product codes.
# Show how forbidden prefixes block specific unwanted matches.
# Keep matched codes clean without including contextual markers.

import re  # Import regular expression module for pattern matching.

text = "VIP-123 REG-123 VIP-456 REG-456"  # Example product codes string.
pattern = r"(?<!VIP-)\d+"  # Match numbers not preceded by VIP- prefix.

matches = re.findall(pattern, text)  # Find all matching codes using pattern.

print("All product codes:", text)  # Show original text containing all codes.
print("Codes not after VIP-:", matches)  # Show codes allowed by negative lookbehind.



### **2.3. Python Lookaround Limits**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_02_03.jpg?v=1766635534" width="250">



>* Python lookbehind patterns must be fixed length
>* Design context rules without variable-length lookbehind parts

>* Real-world context is often long and variable
>* Limit lookbehind length, shift complexity to other steps

>* Lookbehind can’t handle long, messy prior context
>* Use lookahead, groups, or multi-step matching instead



In [None]:
#@title Python Code - Python Lookaround Limits

# Demonstrate Python lookbehind fixed length rule clearly.
# Show failing variable length lookbehind pattern example.
# Show working fixed length lookbehind alternative pattern.

import re  # Import regular expression module for pattern matching.

text = "CAT-123, DOG-456, BIG-DOG-789"  # Example product style codes string.

pattern_bad = r"(?<=DOG-\d+-?)\d+"  # Invalid variable length lookbehind pattern.

try:
    re.findall(pattern_bad, text)  # This call should raise specific regex error.
except re.error as error:
    print("Variable length lookbehind failed:", error)  # Show error message.

pattern_good = r"(?<=DOG-)\d+"  # Valid fixed length lookbehind pattern.

matches = re.findall(pattern_good, text)  # Find all matching codes after DOG-.

print("Fixed length lookbehind matches:", matches)  # Display successful matches list.



## **3. Context Aware Matching**

### **3.1. Contextual Marker Matching**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_03_01.jpg?v=1766635562" width="250">



>* Use lookarounds to match values near markers
>* Keep markers out of matches for clean extraction

>* Use lookbehind to validate ticket number digits
>* Use lookahead to capture usernames after success markers

>* Use markers to target the correct numbers
>* Lookarounds encode context, improving precision and robustness



In [None]:
#@title Python Code - Contextual Marker Matching

# Demonstrate contextual marker matching using Python regex lookarounds.
# Extract values that follow or precede specific textual markers only.
# Show clean matches without including the surrounding marker text.

import re  # Import regular expression module for pattern matching.

text_block = """User ID: 48291 logged in successfully
User ID: 77777 failed login attempt
Ticket number: 12345 closed by agent
Ticket number: 99999 still open"""  # Example semi structured text block.

pattern_ticket = r"(?<=Ticket number:\s)\d+"  # Lookbehind anchors digits after ticket marker.

pattern_success_user = r"\b\w+(?= logged in successfully)"  # Lookahead anchors username before success phrase.

tickets = re.findall(pattern_ticket, text_block)  # Find all ticket numbers using contextual marker.

successful_users = re.findall(pattern_success_user, text_block)  # Find usernames only for successful logins.

print("Ticket numbers found using contextual marker:", tickets)  # Print extracted ticket numbers list.

print("Usernames found using contextual success marker:", successful_users)  # Print extracted usernames list.




### **3.2. Context Based Exclusions**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_03_02.jpg?v=1766635580" width="250">



>* Use lookarounds to exclude unwanted nearby context
>* Negative lookahead/lookbehind keep matches clean, focused

>* Use lookahead or lookbehind based on marker position
>* Filter real data from test, comments, placeholders

>* Use lookarounds to exclude unwanted contextual matches
>* Encode business rules directly, reducing post-processing work



In [None]:
#@title Python Code - Context Based Exclusions

# Demonstrate context based exclusions using Python regular expressions.
# Show how negative lookahead excludes unwanted tagged order numbers.
# Show how negative lookbehind excludes order numbers inside quoted replies.

import re

# Example email text containing several order numbers with different contexts.
email_text = (
    'Order #12345 shipped yesterday.\n'
    'Order #67890 [DEPRECATED] will not ship.\n'
    '> Quoted line with order #11111 from previous email.\n'
    'Order #22222 is a valid active order.\n'
)

# Pattern excludes deprecated orders using negative lookahead after the number.
pattern_no_deprecated = re.compile(r"#(\d{5})(?!\s*\[DEPRECATED\])")

# Pattern excludes quoted lines using negative lookbehind before the hash.
pattern_no_quoted = re.compile(r"(?<!>\s)#[0-9]{5}")

# Pattern combines both exclusions for clean, relevant order numbers only.
pattern_combined = re.compile(r"(?<!>\s)#(\d{5})(?!\s*\[DEPRECATED\])")

# Find all order numbers that are not deprecated anywhere in the email text.
not_deprecated = pattern_no_deprecated.findall(email_text)

# Find all order numbers that are not inside quoted reply lines.
not_quoted = pattern_no_quoted.findall(email_text)

# Find order numbers that are neither deprecated nor inside quoted lines.
clean_orders = pattern_combined.findall(email_text)

# Print results to show how context based exclusions change the matches.
print('All non deprecated order numbers:', not_deprecated)

print('All non quoted order numbers:', not_quoted)

print('Clean context filtered order numbers:', clean_orders)



### **3.3. Log and Markup Examples**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python Regex A-Z/Module_03/Lecture_A/image_03_03.jpg?v=1766635607" width="250">



>* Use lookarounds to filter URLs in logs
>* Python regex extracts segments based on contextual conditions

>* Use log markers to target error messages
>* Combine lookbehind and lookahead for robust extraction

>* Use lookarounds to target content inside markup
>* Extract specific text while skipping surrounding tags



In [None]:
#@title Python Code - Log and Markup Examples

# Demonstrate regex lookarounds on simple log and HTML markup examples.
# Extract error URLs from logs without including numeric status codes themselves.
# Extract highlighted HTML text only when inside a specific error container.

import re

log_text = """2024-12-25 10:00:00 10.0.0.1 GET /home 200 OK
2024-12-25 10:01:00 10.0.0.2 GET /login 500 ERROR
2024-12-25 10:02:00 10.0.0.3 GET /about 404 NOT_FOUND
2024-12-25 10:03:00 10.0.0.4 GET /contact 200 OK"""

pattern_error_urls = re.compile(r"GET\s+(\S+)(?=\s+(?:500|404)\b)")

error_urls = pattern_error_urls.findall(log_text)

print("Error URLs extracted using positive lookahead:")
print(error_urls)

html_text = """<div class="error-box"><span class="highlight">Disk full</span></div>
<div class="info-box"><span class="highlight">Backup running</span></div>"""

pattern_error_highlight = re.compile(r"(?<=<div class=\"error-box\">.*?<span class=\"highlight\">)(.*?)(?=</span>)")

error_highlights = pattern_error_highlight.findall(html_text)

print("Highlighted messages inside error container only:")
print(error_highlights)



# <font color="#418FDE" size="6.5" uppercase>**Lookarounds and Context**</font>


In this lecture, you learned to:
- Differentiate between positive and negative lookahead and lookbehind assertions in regex patterns. 
- Construct patterns that use lookarounds to enforce contextual rules without including the context in the match. 
- Apply lookaround-based patterns in Python to solve complex extraction problems on semi-structured text. 

<font color='yellow'>Congratulations on completing this course!</font>