# Regex (Regular Expression)
A regular expression is a pattern used to find / match / extract / replace text inside a string.

# One more example for args and kwargs

In [2]:
def random(x,y, *args, **kwargs):
    print(x)
    print(y)
    print(args)
    print(kwargs)

random(1,2,3,4,9, par1='75', par2='75', par3='75', par4='75')


1
2
(3, 4, 9)
{'par1': '75', 'par2': '75', 'par3': '75', 'par4': '75'}


# Regular Expressions (Regex)

In [4]:
# We need to import the "re" module
import re

## Regex Cheat Sheet (Python `re`)

**Regex = pattern** to **find / match / extract / replace** text.
**Tip:** In Python, prefer **raw strings**: `r"..."` so `\` isn’t doubled.

---

### 1) Import + Most-used functions
- `re.search(pat, s)` → first match **anywhere**
- `re.match(pat, s)` → match only at **start**
- `re.fullmatch(pat, s)` → **entire string** must match
- `re.findall(pat, s)` → list of all matches (strings or tuples if groups)
- `re.finditer(pat, s)` → iterator of match objects
- `re.sub(pat, repl, s)` → replace matches
- `re.split(pat, s)` → split by pattern
- `re.compile(pat, flags=0)` → precompile (reuse)

Example:
- `import re`
- `m = re.search(r"\d+", "id=42")`  → matches `"42"`

---

### 2) Anchors (position)
- `^` start of string
- `$` end of string
- `\b` word boundary (edge between word and non-word)
- `\B` not a word boundary

Examples:
- `re.search(r"^Hi", "Hi there")`
- `re.search(r"there$", "Hi there")`
- `re.search(r"\bcat\b", "a cat!")` (whole word)

---

### 3) Character classes (one character)
- `.` any char except newline (by default)
- `\d` digit `[0-9]`      | `\D` non-digit
- `\w` word char `[A-Za-z0-9_]` | `\W` non-word
- `\s` whitespace (space/tab/newline) | `\S` non-space
- `[abc]` one of a/b/c
- `[a-z]` range
- `[^a-z]` NOT in range (negated class)

---

### 4) Quantifiers (how many)
- `*` 0 or more
- `+` 1 or more
- `?` 0 or 1 (optional)
- `{n}` exactly n
- `{n,}` at least n
- `{n,m}` between n and m

**Greedy vs Lazy**
- Greedy: `.*` grabs as much as possible
- Lazy: `.*?` grabs as little as possible

Example:
- `re.search(r"<.*>", "<a> <b>")` → matches `"<a> <b>"`
- `re.search(r"<.*?>", "<a> <b>")` → matches `"<a>"`

---

### 5) Groups + Alternation
- `( ... )` capture group
- `(?: ... )` non-capturing group
- `|` OR (alternation)

Examples:
- `re.findall(r"(ab)+", "abab")` → returns captured group(s)
- `re.search(r"cat|dog", "a dog")`

---

### 6) Match object quick methods
If `m = re.search(...)`:
- `m.group()` → full match
- `m.group(1)` → group 1
- `m.groups()` → all groups tuple
- `m.start(), m.end()` → indices
- `m.span()` → (start, end)

---

### 7) Flags (very common)
Use `re.Xxx(..., flags=...)` or inline `(?i)`:
- `re.I` / `re.IGNORECASE` → case-insensitive
- `re.M` / `re.MULTILINE` → `^` and `$` work per line
- `re.S` / `re.DOTALL` → `.` matches newline too
- `re.A` / `re.ASCII` → make `\w \d \s` ASCII-only

Examples:
- `re.search(r"(?i)hello", "HeLLo")`
- `re.search(r"^a", "x\na", flags=re.M)`

---

### 8) Escaping special characters
Special regex chars: `. ^ $ * + ? { } [ ] \ | ( )`
To match them literally, escape with `\`.

Examples:
- `re.search(r"\.", "a.b")` → matches `.`
- `re.search(r"\$", "$100")` → matches `$`

---

### 9) Lookarounds (advanced but exam shows up)
- `(?=...)` positive lookahead (next chars must match)
- `(?!...)` negative lookahead
- `(?<=...)` positive lookbehind
- `(?<!...)` negative lookbehind

Examples:
- `re.findall(r"\d+(?=kg)", "5kg 10lb 7kg")` → `["5","7"]`
- `re.findall(r"(?<=\$)\d+", "$50 and $20")` → `["50","20"]`

---

### 10) Common patterns (copy-paste)
Digits / integers:
- `r"\d+"`                 → one or more digits
- `r"[+-]?\d+"`            → optional sign + digits

Decimal number:
- `r"[+-]?\d+(\.\d+)?"`    → integer or float like 12 or 12.34

Word (letters only):
- `r"[A-Za-z]+"`

Alphanumeric + underscore:
- `r"\w+"`

Simple email (basic):
- `r"^[\w\.-]+@[\w\.-]+\.\w+$"`

US phone (very basic):
- `r"^\d{3}-\d{3}-\d{4}$"`  → 123-456-7890

Date (dd-mm-yyyy or dd/mm/yyyy basic):
- `r"^\d{2}[-/]\d{2}[-/]\d{4}$"`

---

### 11) Replace + Split examples
Replace multiple spaces with one:
- `re.sub(r"\s+", " ", s)`

Extract all numbers:
- `re.findall(r"\d+", s)`

Split on comma/semicolon/space:
- `re.split(r"[,\s;]+", s)`

---

### 12) Mini mental model (fast)
1) Decide **where**: anchors `^ $ \b`
2) Decide **what**: classes `\d \w \s [A-Z]`
3) Decide **how many**: `+ * ? {n,m}`
4) Add **groups**: `( )` and OR `|`
5) Escape literals: `\.` `\?` `\$`
