### Regex basics

Alright! Here’s a solid **Regex (Regular Expressions) basics guide** for Python.

### ✅ **What is Regex?**
Regex is a tool for matching patterns in text. In Python, you typically use the `re` module.

```python
import re
```

---

### ✅ **Common Functions**

1. **re.search()** – Finds a match anywhere in the string
2. **re.match()** – Only matches at the beginning of the string
3. **re.findall()** – Returns all non-overlapping matches as a list
4. **re.sub()** – Substitutes matched text with something else

---

### ✅ **Basic Syntax**

| Symbol | Meaning                        | Example                  |
|--------|--------------------------------|--------------------------|
| `.`    | Any character except newline   | `a.c` matches `abc`, `a-c` |
| `^`    | Start of string                | `^Hello` matches `"Hello world"` |
| `$`    | End of string                  | `world$` matches `"Hello world"` |
| `*`    | 0 or more repetitions          | `ab*` matches `a`, `ab`, `abb` |
| `+`    | 1 or more repetitions          | `ab+` matches `ab`, `abb` |
| `?`    | 0 or 1 repetition (optional)   | `ab?` matches `a`, `ab` |
| `{n}`  | Exactly n repetitions          | `a{3}` matches `aaa` |
| `{n,}` | n or more repetitions          | `a{2,}` matches `aa`, `aaa`, etc. |
| `{n,m}`| Between n and m repetitions    | `a{2,4}` matches `aa`, `aaa`, `aaaa` |
| `[]`   | Character class (set)          | `[aeiou]` matches any vowel |
| `|`    | OR                             | `cat|dog` matches `cat` or `dog` |
| `()`   | Grouping                       | `(abc)+` matches `abc`, `abcabc` |

---

### ✅ **Special Sequences**

| Sequence | Matches                         |
|----------|----------------------------------|
| `\d`     | Digit (`0-9`)                   |
| `\D`     | Non-digit                       |
| `\w`     | Word character (`a-zA-Z0-9_`)   |
| `\W`     | Non-word character              |
| `\s`     | Whitespace (space, tab, newline)|
| `\S`     | Non-whitespace                  |
| `\b`     | Word boundary                   |
| `\B`     | Not a word boundary             |

---

### ✅ **Examples**

```python
# Search for a 3-digit number
re.search(r"\d{3}", "Phone: 123-4567")
# Output: <re.Match object; span=(7, 10), match='123'>

# Extract all email addresses
text = "Contact: alice@mail.com and bob@mail.org"
emails = re.findall(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", text)
# Output: ['alice@mail.com', 'bob@mail.org']

# Replace digits with X
re.sub(r"\d", "X", "Account: 4567")
# Output: 'Account: XXXX'
```

---

### ✅ **Flags**

- `re.IGNORECASE` or `re.I` – case-insensitive
- `re.MULTILINE` or `re.M` – `^` and `$` match at each line
- `re.DOTALL` or `re.S` – dot `.` matches newline too

```python
re.search(r"hello", "HeLLo World", re.I)
```

---

### ✅ **Pro tip: Always use raw strings**

Always prefix your regex patterns with `r""` to avoid dealing with escape sequences in Python strings.

```python
pattern = r"\d{3}-\d{4}"  # Instead of "\\d{3}-\\d{4}"
```

---



https://www.youtube.com/watch?v=wnuBwl2ekmo

In [1]:
import re

In [None]:
pattern = re.compile("^[A-Z]+$")

In [4]:
print(pattern.search("Hello Word"))
print(pattern.search("HELLO WORLD"))
print(pattern.search("HELLOWORLD"))

None
None
<re.Match object; span=(0, 10), match='HELLOWORLD'>


In [10]:
print(pattern.search("HellO Word"))
print(pattern.search("HELLO WORLD"))
print(pattern.search("HELLOWORLD"))

None
None
<re.Match object; span=(0, 10), match='HELLOWORLD'>


In [7]:
print(pattern.match("Hello Word"))
print(pattern.match("HELLO WORLD"))
print(pattern.match("HELLOWORLD"))

None
None
<re.Match object; span=(0, 10), match='HELLOWORLD'>


In [11]:
pattern = re.compile("^[a-zA-Z\s]+$")

  pattern = re.compile("^[a-zA-Z\s]+$")


In [12]:
print(pattern.search("Hello Word"))
print(pattern.search("HELLO WORLD"))
print(pattern.search("HELLOWORLD"))

<re.Match object; span=(0, 10), match='Hello Word'>
<re.Match object; span=(0, 11), match='HELLO WORLD'>
<re.Match object; span=(0, 10), match='HELLOWORLD'>


In [13]:
# 3 lowercase letters
# 3-5 digits
# one symbol
# upto two uppercase characters

In [16]:
pattern = re.compile("^[a-z]{3}[0-9]{3,5}[^a-zA-Z0-9]{1}[A-Z]{2}$")


print(pattern.search("ahd3234#AJ"))


<re.Match object; span=(0, 10), match='ahd3234#AJ'>
