# Title: Python Series – Day 28: Regular Expressions (re module) in Python

## 1. Introduction
**Regular Expressions (Regex)** are sequences of characters that define a search pattern. They are extremely powerful for text processing, validation, and searching.

**Why use Regex?**
- **Searching:** Find specific patterns in large text files.
- **Validation:** Ensure inputs like emails and passwords follow correct formats.
- **Scraping:** Extract specific data (links, dates) from web pages.

**Real-world uses:**
- Extracting emails from a document.
- validating user registration forms.
- Cleaning up messy data.

## 2. Importing re Module
Python has a built-in module named `re` for working with regular expressions.

In [None]:
import re

## 3. Basic Regex Functions

### Common Functions:
1. **`re.search(pattern, string)`**: Returns the **first** location where the pattern produces a match.
2. **`re.match(pattern, string)`**: Checks for a match **only at the beginning** of the string.
3. **`re.findall(pattern, string)`**: Returns **all** matches as a list of strings.
4. **`re.finditer(pattern, string)`**: Returns an **iterator** yielding match objects.
5. **`re.sub(pattern, replacement, string)`**: Replaces occurrences of the pattern.

In [None]:
text = "Hello world, hello Python"

# 1. re.search - finds first match anywhere
match = re.search("hello", text)
print(f"Search: {match}") # Returns a match object

# 2. re.match - finds match only at start
match_start = re.match("Hello", text)
match_fail = re.match("world", text)
print(f"Match (Start): {match_start}")
print(f"Match (Fail): {match_fail}")

# 3. re.findall - finds all occurrences
all_matches = re.findall("hello", text, re.IGNORECASE)
print(f"Find All: {all_matches}")

# 4. re.sub - replace
new_text = re.sub("world", "Universe", text)
print(f"Substituted: {new_text}")

## 4. Special Characters & Metacharacters
Regex uses special characters to define patterns.

| Symbol | Description | Example |
|---|---|---|
| `\d` | Any digit (0-9) | `ID: \d+` |
| `\D` | Any non-digit | `\D` |
| `\w` | Alphanumeric (a-z, A-Z, 0-9, _) | `\w+` |
| `\W` | Non-alphanumeric | `\W` |
| `\s` | Whitespace (space, tab, newline) | `Hello\sWorld` |
| `\S` | Non-whitespace | `\S+` |
| `.` | Any character (except newline) | `h.t` matches hat, hit |
| `^` | Starts with | `^Hello` |
| `$` | Ends with | `World$` |
| `+` | One or more occurrences | `a+` |
| `*` | Zero or more occurrences | `a*` |
| `?` | Zero or one occurrence (optional) | `colors?` matches color, colors |
| `[]` | Character set | `[aeiou]` |
| `{n}` | Exactly n occurrences | `\d{4}` |

In [None]:
sample = "Order #123 was placed on 2024-01-25"

# Find digits
print(f"Digits: {re.findall(r'\d+', sample)}")

# Find non-whitespace blocks
print(f"Words: {re.findall(r'\S+', sample)}")

## 5. Grouping & Capturing
Use parentheses `()` to create groups. This allows you to extract specific parts of a match.

In [None]:
text = "Date: 25/01/2024"
pattern = r"(\d{2})/(\d{2})/(\d{4})"

match = re.search(pattern, text)
if match:
    print(f"Full Match: {match.group(0)}")
    print(f"Day: {match.group(1)}")
    print(f"Month: {match.group(2)}")
    print(f"Year: {match.group(3)}")

## 6. Using Raw Strings
Always use raw strings `r"pattern"` for regex in Python. This tells Python to treat backslashes `\` as literal characters and not escape sequences (like `\n` for new line).

## 7. Common Regex Patterns (Very Useful)

In [None]:
# 1. Email Validation
email_pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"
print(f"Is Valid Email: {bool(re.fullmatch(email_pattern, 'test@example.com'))}")

# 2. Phone Number (10 digits)
phone_pattern = r"\d{10}"
print(f"Is Valid Phone: {bool(re.fullmatch(phone_pattern, '1234567890'))}")

# 3. URL Pattern
url_pattern = r"https?://(www\.)?[\w-]+\.\w+"
print(f"Is Valid URL: {bool(re.match(url_pattern, 'https://google.com'))}")

## 8. Real-World Examples

In [None]:
text = """
Contact support@example.com or sales@example.org for assistance.
Call us at 9876543210 or 1234567890.
"""

# Extract all emails
emails = re.findall(r"[\w\.-]+@[\w\.-]+", text)
print(f"Emails found: {emails}")

# Extract all phone numbers
phones = re.findall(r"\d{10}", text)
print(f"Phones found: {phones}")

# Replace multiple spaces
messy_text = "Too    many    spaces   here."
clean_text = re.sub(r"\s+", " ", messy_text)
print(f"Cleaned: {clean_text}")

# Find words starting with Capital letter
cap_words = re.findall(r"[A-Z]\w*", "Hello World this is Python")
print(f"Capitalized Words: {cap_words}")

## 9. Practice Exercises
1. Validate if an email address is valid.
2. Validate a strong password (min 8 chars, 1 upper, 1 lower, 1 digit, 1 special).
3. Extract all integer numbers from a text.
4. Replace all digits in a string with `*` (masking).
5. Check if a string starts with a capital letter.
6. Extract all dates (DD/MM/YYYY) from a paragraph.
7. Find all words with exactly 5 letters.
8. Validate a Pakistani phone number pattern (e.g., +923001234567 or 03001234567).

## 10. Mini Project – Form Validator
Create a function that validates a user registration attempt.

In [None]:
import re

def validate_form(name, email, phone, password):
    # 1. Name: Only letters and spaces
    if not re.fullmatch(r"[A-Za-z ]+", name):
        print("❌ Invalid Name (only letters allowed)")
        return False
        
    # 2. Email
    if not re.fullmatch(r"[\w\.-]+@[\w\.-]+\.\w+", email):
        print("❌ Invalid Email")
        return False
        
    # 3. Phone (Mobile 10-11 digits)
    if not re.fullmatch(r"\d{10,11}", phone):
        print("❌ Invalid Phone (must be 10-11 digits)")
        return False
        
    # 4. Password Strength
    # (?=.*[A-Z]) -> at least one uppercase
    # (?=.*\d) -> at least one digit
    # .{8,} -> min length 8
    if not re.fullmatch(r"(?=.*[A-Z])(?=.*\d).{8,}", password):
        print("❌ Weak Password (need 8+ chars, 1 uppercase, 1 digit)")
        return False
        
    print("✅ Validation Successful!")
    return True

# Test cases
validate_form("John Doe", "john@example.com", "03001234567", "Pass1234")
print("---")
validate_form("John123", "john@com", "123", "weak") 

## 11. Day 28 Summary
- Learned about **Regular Expressions (Regex)**.
- Used the **`re` module** functions: `search`, `match`, `findall`, `sub`.
- Understood **metacharacters** like `\d`, `\w`, `+`, `*`, `[]`.
- Learned **grouping** `()` and **raw strings** `r""`.
- Built a **Form Validator** mini project.

**Next topic: Day 29 – Python Comprehensions (List, Set, Dict)**