# Search, Split, and Substitute 
- `re.findall()` and `re.finditer()` let you retrieve every occurrence of a pattern.  
- `re.split()` handles complex delimiters beyond simple string splits.  
- `re.sub()` performs powerful search-and-replace operations, including reuse of captured groups.  

## Finding All Matches
- `re.findall(pattern, string)` returns a list of all **non-overlapping** matches:  
  - No groups → list of matched substrings.  
  - With groups → list of tuples of captured substrings.  
- `re.finditer(pattern, string)` returns an iterator of match objects, giving access to `.group()`, positions, named groups, etc., and is more memory-efficient for large inputs.  

In [None]:
import re

text = "Errors found: 404, 500, 403, 500. User IDs: user123, admin99."
config = "timeout=60 retries=3 workers=5"

# Find all error codes:
print(f"Numbers found: {re.findall(r"\d+", text)}")

# findall with groups:
print(f"Key-value pairs: {re.findall(r"(\w+)=(\w+)", config)}")

# finditer
for match in re.finditer(r"(\w+)=(\w+)", config):
    print(f"Whole match: {match.group(0)}; key: {match.group(1)}; value: {match.group(2)} - at {match.start()}-{match.end()}")


Numbers found: ['404', '500', '403', '500', '123', '99']
Key-value pairs: [('timeout', '60'), ('retries', '3'), ('workers', '5')]
Whole match: timeout=60; key: timeout; value: 60 - at 0-10
Whole match: retries=3; key: retries; value: 3 - at 11-20
Whole match: workers=5; key: workers; value: 5 - at 21-30
hello\njello


# Splitting Strings
- Use `re.split(pattern, string)` to break a string on a **regex pattern**, not just a fixed substring.  
- Always use a raw string literal so backslashes reach the regex engine.  
- **Simple single-character delimiters:** use a character class (never captured), e.g. `r"\s*[,;]\s*"`.  
- **Complex delimiters** (alternation or multi-character): group with non-capturing parentheses, e.g. `r"\s*(?:foo|bar|baz)\s*"`, so they aren’t included in the result list.  
- **Including delimiters:** wrap your delimiter in a capturing group, e.g. `r"\s*([,;])\s*"`, to have the separators appear in the split output.  
- **Summary:**  
  - No parentheses or a non-capturing group → delimiters are **removed**.  
  - Capturing group → delimiters **appear** in the split list.  

In [36]:
import re

data = "item1 , item2; item3 ,item4 ;item5"

# 1. Split on comma and semi-colon
pattern1 = r"\s*[,;]\s*"
print(f"Character class split: {re.split(pattern1, data)}")

# 2. Capturing the delimiter
pattern2 = r"\s*([,;])\s*"
print(f"Capturing group split: {re.split(pattern2, data)}")

html = """
<p class='hello'>First paragraph.</p>
<b class='world'>Second paragraph.</b>
End.
"""

pattern3 = r"<.*?class='(?:hello|world)'.*?>|</[pb]>"
print(f"HTML non-capturing split: {re.split(pattern3, html)}")

Character class split: ['item1', 'item2', 'item3', 'item4', 'item5']
Capturing group split: ['item1', ',', 'item2', ';', 'item3', ',', 'item4', ';', 'item5']
HTML non-capturing split: ['\n', 'First paragraph.', '\n', 'Second paragraph.', '\nEnd.\n']


## Substituting Text
- `re.sub(pattern, replacement, string, count=0)` replaces all (or a limited number) of matches.  
- `count` controls how many replacements to make (default 0 = all).  
- Back-references (`\1`, `\g<name>`) let you reorder or reuse captured text in the replacement.  

In [92]:
import re

text = "User IDs: user123, user456, user123457689. Contact admin789 for help."

# Basic substitution
redacted = re.sub(r"user\d+", "[REDACTED_USER]", text)
print(f"Result of redacting: {redacted}")

# Back-reference for reusing information
redacted_partially = re.sub(r"(u)ser\d+(\d{2})", r"\1[REDACTED_USER]\2", text)
print(f"Result of redacting: {redacted_partially}")

# Limited count of substitutions
redacted_only_two = re.sub(r"(u)ser\d+(\d{2})", r"\1[REDACTED_USER]\2", text, count=2)
print(f"Result of redacting: {redacted_only_two}")

# Named groups for substitution
date_text = "Start: 2023-10-27, End: 2024-01-15"
# Current format YYYY-MM-DD
# Target format DD/MM/YYYY

date_pattern_named = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
replacement_format_named = r"\g<day>/\g<month>/\g<year>"
reformatted_date = re.sub(date_pattern_named, replacement_format_named, date_text)

print(f"Result of date transformation: {reformatted_date}")

Result of redacting: User IDs: [REDACTED_USER], [REDACTED_USER], [REDACTED_USER]. Contact admin789 for help.
Result of redacting: User IDs: u[REDACTED_USER]23, u[REDACTED_USER]56, u[REDACTED_USER]89. Contact admin789 for help.
Result of redacting: User IDs: u[REDACTED_USER]23, u[REDACTED_USER]56, user123457689. Contact admin789 for help.
Result of date transformation: Start: 27/10/2023, End: 15/01/2024
