**Python RegEx**

**RegEx Module**

To use regular expressions in Python, import the re module.

In [1]:
import re


**RegEx in Python**

Regular Expressions (RegEx) are patterns used to match strings.

In [2]:
pattern = r"\d+"  # Matches one or more digits
text = "There are 3 cats and 5 dogs."
result = re.findall(pattern, text)
print(result)  # Output: ['3', '5']


['3', '5']


**RegEx Functions**


| Function    | Description                                    | Example                                   |
| ----------- | ---------------------------------------------- | ----------------------------------------- |
| `findall()` | Returns all matches                            | `re.findall(r"\d+", "12 cats 9 dogs")`    |
| `search()`  | Returns the first match object                 | `re.search(r"cat", "my cat is cute")`     |
| `match()`   | Checks match only at the beginning of a string | `re.match(r"Hello", "Hello World")`       |
| `split()`   | Splits string by regex pattern                 | `re.split(r"\s", "Split this sentence")`  |
| `sub()`     | Substitutes matching pattern                   | `re.sub(r"dog", "cat", "dog is barking")` |


**Metacharacters**

Special characters that define RegEx rules:

| Metacharacter | Description                  | Example                               |       |       |
| ------------- | ---------------------------- | ------------------------------------- | ----- | ----- |
| `.`           | Any character except newline | `a.b` matches `acb`                   |       |       |
| `^`           | Starts with                  | `^Hello`                              |       |       |
| `$`           | Ends with                    | `end$`                                |       |       |
| `*`           | 0 or more                    | `lo*l` matches `ll`, `lol`, `loool`   |       |       |
| `+`           | 1 or more                    | `lo+l` matches `lol`, `lool`          |       |       |
| `?`           | 0 or 1                       | `colou?r` matches `color` or `colour` |       |       |
| `[]`          | Set of characters            | `[aeiou]` matches any vowel           |       |       |
| `{}`          | Exactly x times              | `\d{3}` matches 3 digits              |       |       |
| \`            | \`                           | Either or                             | \`cat | dog\` |


**Flags**

Modify matching behavior:

In [3]:
re.findall(r"hello", "HELLO hello", flags=re.IGNORECASE)
# Output: ['HELLO', 'hello']


['HELLO', 'hello']

Common flags:

* re.IGNORECASE or re.I

* re.MULTILINE or re.M

* re.DOTALL or re.S

**Special Sequences**

| Sequence | Description             | Example                       |
| -------- | ----------------------- | ----------------------------- |
| `\d`     | Digit (0–9)             | `\d+` matches 123             |
| `\D`     | Non-digit               | `\D+` matches abc             |
| `\w`     | Alphanumeric characters | `\w+` matches hello123        |
| `\W`     | Non-alphanumeric        | `\W+` matches @#\$            |
| `\s`     | Whitespace              | `\s+` matches space or tab    |
| `\S`     | Non-whitespace          | `\S+`                         |
| `\b`     | Word boundary           | `\bword\b` matches exact word |


**Sets**

Match any character from a set.

**findall() Function**

Returns all occurrences.

In [4]:
text = "Emails: user1@mail.com, user2@mail.com"
emails = re.findall(r"\S+@\S+", text)
print(emails)
# ['user1@mail.com', 'user2@mail.com']


['user1@mail.com,', 'user2@mail.com']


**search() Function**

Returns the first match object.

In [5]:
text = "My phone is 123-456-7890"
match = re.search(r"\d{3}-\d{3}-\d{4}", text)
print(match.group())  # Output: 123-456-7890


123-456-7890


**split() Function**

Splits a string based on the pattern.

In [6]:
text = "apple,banana;grape orange"
fruits = re.split(r"[;, ]", text)
print(fruits)
# ['apple', 'banana', 'grape', 'orange']


['apple', 'banana', 'grape', 'orange']


**sub() Function**

Replaces matched strings.

In [7]:
text = "The cat sat on the mat"
new_text = re.sub(r"cat|mat", "dog", text)
print(new_text)  # Output: The dog sat on the dog


The dog sat on the dog


**Match Object**

Used to get details from search() or match().

In [8]:
text = "Order #12345 placed"
match = re.search(r"#\d+", text)
if match:
    print(match.group())  # Output: #12345
    print(match.start())  # Start index
    print(match.end())    # End index


#12345
6
12


**10 Real-Time Problem Statements Using Python RegEx**

**1.Email Validator**

Validate if input strings are valid email addresses.

**2.Password Strength Checker**

Ensure passwords contain uppercase, lowercase, numbers, and special characters.

**3.Phone Number Formatter**

Clean and format user-input phone numbers to a consistent style.

**4.Log File Analyzer**

Extract timestamps, IPs, and error messages from logs using regex.

**5.Form Input Cleaner**

Remove extra whitespace, symbols, and unwanted characters from form fields.

**6.Invoice Number Extractor**

Find and extract invoice numbers from text or emails.

**7.Chat Profanity Filter**

Replace or detect inappropriate words using regex patterns.

**8.URL Extractor**

Extract all valid URLs from a web page or string.

**9.Date Format Converter**

Identify and convert dates in various formats (e.g., DD/MM/YYYY to YYYY-MM-DD).

**10.Tag Parser**

Extract hashtags, mentions, or custom markup from social media posts or comments.