# Python RegEx â€” 10-Part Practical Course

Text processing using Python's `re` module with practical examples of meta-characters and functions.  
**10 videos â€¢ ~1h 33m (suggested pacing)**

> Run cells with **Shift+Enter**. Tweak patterns and inputs to see how results change.

## Course Outline
1. Introduction to RegEx and Module in Python  
2. RegEx Functions in Python  
3. RegEx Metacharacters â€” Part 1  
4. RegEx Metacharacters â€” Part 2  
5. Special Sequences in Python â€” Part 1  
6. Special Sequences in Python â€” Part 2  
7. RegEx Sets in Python â€” Part 1  
8. RegEx Sets in Python â€” Part 2  
9. RegEx Match Object in Python  
10. Manual Pattern Creation

## 1) Introduction to RegEx and Module in Python

**Regular Expressions (RegEx)** are patterns for matching text. Python's built-in module: **`re`**.

Common tasks:
- Validate inputs (emails, dates, passwords)
- Search and extract data (IDs, hashtags, URLs)
- Replace or split text

In [None]:
import re

text = "Hello, world! 2025 is here."
pattern = r"\d+"  # one or more digits
print(re.findall(pattern, text))  # -> ['2025']

## 2) RegEx Functions in Python

- `re.compile(pat, flags=0)` â†’ compiled pattern
- `re.search` â†’ first match anywhere
- `re.match` â†’ match only at start
- `re.fullmatch` â†’ match entire string
- `re.findall` â†’ all matches as list of strings
- `re.finditer` â†’ iterator of match objects
- `re.sub` / `re.subn` â†’ replace; return text / (text, count)
- `re.split` â†’ split by pattern
- Flags: `re.I`/`IGNORECASE`, `re.M`/`MULTILINE`, `re.S`/`DOTALL`, `re.X`/`VERBOSE`

In [None]:
import re

text = "Email me at ALEX@example.com or alex@sample.org"
pat = re.compile(r"[\w.-]+@[\w.-]+", flags=re.IGNORECASE)

print("search:", pat.search(text).group())
print("findall:", pat.findall(text))
print("sub:", pat.sub("[REDACTED]", text))

In [None]:
# finditer & split
import re
text = "id=42; id=7; id=108"
for m in re.finditer(r"id=(\d+)", text):
    print("found:", m.group(1), "at", m.span())

print("split:", re.split(r"\s*;\s*", text))

## 3) RegEx Metacharacters â€” Part 1

- `.` any char (except newline by default)  
- `^` start of string (or line with `re.M`)  
- `$` end of string (or line with `re.M`)  
- `*` 0+ repeats, `+` 1+ repeats, `?` 0/1 repeat  
- `{m}`, `{m,}`, `{m,n}` quantifiers

In [None]:
import re
s = "cat cot cut ct ccccct
catapult"
print(re.findall(r"c.t", s))           # dot wildcard
print(re.findall(r"^cat", s))          # start (first line only)
print(re.findall(r"ct$", s, re.MULTILINE))  # end of line with MULTILINE
print(re.findall(r"c{1,}t", s))        # one or more 'c' then 't'

In [None]:
# Greedy vs non-greedy (lazy) quantifiers
import re
html = "<p>one</p><p>two</p>"
print("greedy:", re.findall(r"<p>.*</p>", html))       # one big match
print("lazy  :", re.findall(r"<p>.*?</p>", html))      # two small matches

## 4) RegEx Metacharacters â€” Part 2

- `[]` character class (sets in detail later)  
- `()` capturing groups, `(?P<name>...)` named groups  
- `|` alternation (OR)  
- `\` escapes special meaning  
- Backreferences: `\1`, `(?P=name)`  
- Lookarounds: `(?=...)` positive LA, `(?!...)` negative LA, `(?<=...)` positive LB, `(?<!...)` negative LB

In [None]:
import re
text = "color colour colr"
print(re.findall(r"colou?r", text))          # optional 'u'

phone = "(123) 456-7890"
m = re.search(r"\((\d{3})\)\s*(\d{3})-(\d{4})", phone)
print("groups:", m.groups())

dup = "word word test nope"
print(re.findall(r"\b(\w+)\s+\1\b", dup))  # backreference finds duplicated words

In [None]:
# Lookarounds
import re
s = "price: $25, discounted: $19"
print("numbers preceded by $:", re.findall(r"(?<=\$)\d+", s))    # positive lookbehind
print("numbers not preceded by $:", re.findall(r"(?<!\$)\b\d+\b", s))  # negative LB
print("digits before comma:", re.findall(r"\d+(?=,)", s))         # positive lookahead

## 5) Special Sequences in Python â€” Part 1

- `\d` digit, `\D` non-digit  
- `\w` word char (letters/digits/underscore), `\W` non-word  
- `\s` whitespace, `\S` non-whitespace

In [None]:
import re
txt = "A1_B2 C3-D4	E5
F6"
print("digits:", re.findall(r"\d", txt))
print("words :", re.findall(r"\w+", txt))
print("non-ws:", re.findall(r"\S+", txt))

## 6) Special Sequences in Python â€” Part 2

- `\b` word boundary, `\B` non-boundary  
- `\A` start of string, `\Z` end of string (like anchors but not multiline-aware)

In [None]:
import re
s = "cat scatter catalog"
print("word boundary:", re.findall(r"\bcat\b", s))
print("non-boundary:", re.findall(r"\Bcat\B", s))

print("A/Z anchors:", bool(re.search(r"\AHello.*world\Z", "Hello wonderful world")))

## 7) RegEx Sets in Python â€” Part 1

Character classes: `[abc]` match any one, ranges: `[a-z]`, negation: `[^...]`.

In [None]:
import re
s = "abc ABC 123 _-+"
print(re.findall(r"[a-c]", s))
print(re.findall(r"[A-Z]", s))
print(re.findall(r"[^\w\s]", s))  # anything not word or whitespace

## 8) RegEx Sets in Python â€” Part 2

- Combine ranges: `[A-Za-z0-9_]` == `\w`  
- Predefined classes are shortcuts; custom sets give control  
- You can simulate set **intersections** using lookarounds

In [None]:
import re
pwd = "Xy9#pass"
# require at least one digit, one lowercase, one uppercase, one special
strong = (r"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[^\w\s]).{8,}$")
print("strong password?", bool(re.search(strong, pwd)))

## 9) RegEx Match Object in Python

Match objects provide details:
- `.group()` / `.groups()` / `.groupdict()`
- `.start()` / `.end()` / `.span()`
- `.re` (pattern), `.string` (input)

In [None]:
import re
m = re.search(r"(?P<user>[\w.-]+)@(?P<host>[\w.-]+)", "email: bob.smith@example.com")
print("group(0):", m.group(0))
print("user    :", m.group('user'))
print("host    :", m.group('host'))
print("span    :", m.span())
print("groupdict:", m.groupdict())

## 10) Manual Pattern Creation

Let's craft some practical patterns. **These are pragmatic, not RFC-perfect.**

In [None]:
import re

samples = [
    "Contact: jane-doe_99@sub.example.co.uk",
    "Site: https://example.com/path?q=1#frag",
    "Phone: +1 (555) 123-4567",
    "Date: 2025-10-16",
]

EMAIL = r"[\w.-]+@[\w.-]+\.[A-Za-z]{2,}"
URL   = r"https?://[\w.-]+(?:/[\w\-./?%&=:#]*)?"
PHONE = r"\+?\d[\d\s().-]{7,}\d"
DATE  = r"\b\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])\b"

for s in samples:
    print("IN:", s)
    print("  email:", re.findall(EMAIL, s))
    print("  url  :", re.findall(URL, s))
    print("  phone:", re.findall(PHONE, s))
    print("  date :", re.findall(DATE, s))

In [None]:
# Flags & VERBOSE for readability
import re

pattern = re.compile(r"""
    ^                # start
    (?P<name>[A-Za-z][A-Za-z\s'-]{1,30})  # name
    :\s*
    (?P<age>\d{1,3})                      # age
    $                # end
""", re.VERBOSE)

tests = ["Alice: 29", "O'Neil: 7", "x: 3000"]
for t in tests:
    print(t, "->", bool(pattern.fullmatch(t)), pattern.search(t).groupdict() if pattern.search(t) else None)

---

## ðŸ§ª Practice Blocks

1. Extract all **hashtags** and **mentions** from a social post.  
2. Validate **IPv4** addresses (e.g., `192.168.0.1`).  
3. Convert **dates** from `DD/MM/YYYY` to `YYYY-MM-DD` with `re.sub`.  
4. Mask PII: redact emails/phones in a paragraph using `re.sub`.  
5. Parse CSV-like lines with quoted commas using `re.findall`.

In [None]:
# Your practice workspace
text = "Tweet by @user: Loving #Python and #regex on 16/10/2025! Email me at user@mail.com."
print("TODO: implement exercises here")