# üìò P1.1.3.3 ‚Äì String Manipulation
## Topic: Regular Expressions Basics

## üéØ Learning Objectives
By the end of this notebook, you will:
- Understand what regular expressions (regex) are
- Use `re.search()` and `re.findall()` to match patterns
- Use `re.sub()` to replace patterns
- Understand common regex tokens (., \d, \w, *, +, ?, ^, $)
- Apply regex for validation and text cleanup
- See a brief AI use case

## üîç What is Regex?
Regex (Regular Expressions) is a pattern language for searching and modifying text.
It is widely used in data cleaning, validation, and NLP preprocessing.

In [2]:
import re
text = "Contact: user123@example.com"
match = re.search(r"\w+@\w+\.\w+", text)
print(match.group() if match else "No match")

user123@example.com


## üß± Common Regex Tokens
- `.` any character
- `\d` digit (0-9)
- `\w` word character (letters, digits, _)
- `*` zero or more
- `+` one or more
- `?` optional
- `^` start of string
- `$` end of string

In [3]:
text = "Order ID: 4782, Qty: 3"
numbers = re.findall(r"\d+", text)
print(numbers)

['4782', '3']


## ‚úÖ Validation Example
Check if an input looks like a phone number.

In [4]:
phone = "+91-9876543210"
pattern = r"^\+?\d{1,3}-\d{10}$"
is_valid = bool(re.match(pattern, phone))
print(is_valid)

True


## üìß Email Format Check
Use a basic pattern to check if an email looks valid.


In [8]:
pattern = r"^[\w\.-]+@[\w\.-]+\.\w+$"

emails = [
    "aisha@example.com",
    "user.name@domain.co",
    "invalid-email",
    "user@domain",
]

for email in emails:
    is_valid = re.fullmatch(pattern, email) is not None
    print(email, "->", is_valid)


aisha@example.com -> True
user.name@domain.co -> True
invalid-email -> False
user@domain -> False


## üîÅ Replace with re.sub()
Use regex to clean or normalize text.

In [5]:
text = "Price: $45, Discount: $5"
clean = re.sub(r"\$\d+", "$X", text)
print(clean)

Price: $X, Discount: $X


## üß† Grouping and Extraction
Use parentheses to capture parts of a match.

In [6]:
text = "Name: Aisha, Score: 91"
match = re.search(r"Name: (\w+), Score: (\d+)", text)
if match:
    name, score = match.groups()
    print(name, score)

Aisha 91


### ‚úÖ Key Takeaways
- Regex is a powerful pattern language for text
- `re.search()` finds the first match
- `re.findall()` returns all matches
- `re.sub()` replaces matched patterns
- Tokens like `\d`, `\w`, `+`, `*` are essential
- Regex helps with validation and text preprocessing