### Regular Expressions (re module) in Python
Regular Expressions (regex) are patterns used to match, search, and manipulate text.

Python provides the built-in **`re` module** for working with regular expressions.

Common use cases:
- Validate inputs (like email, phone numbers)
- Search specific patterns in text
- Replace or extract information

Let's begin with simple string handling before diving into regex functions.

In [None]:
print('Hello, World')

### String Literals and Escape Characters
Python string literals can include escape sequences (like `\n`, `\t`) for newlines or tabs.

In [None]:
x = 'Hello\nWorld'
print(x)

x = 'Hello\tWorld'
print(x)

x = 'Hello \'World\''
print(x)

x = 'Hello "World"'
print(x)

### File Paths and Raw Strings
Backslashes (`\`) are treated as escape characters. To prevent this, prefix the string with `r` to make it a **raw string**.

In [None]:
path = 'C:\\Users\\tmp\\new'
print(path)

# Using raw string
path = r'C:\Users\tmp\new'
print(path)

### Basic Pattern Matching using `re.match()`
The `match()` function checks for a pattern only at the **beginning** of a string.

In [None]:
import re

text = 'Python is super super easy'
regex = r'Python'

match = re.match(regex, text)
print(match)
print(match.span())

start, end = match.span()
print(text[start:end])

### Difference Between `match()` and `search()`
`match()` looks only at the beginning, whereas `search()` finds the first occurrence **anywhere** in the string.

In [None]:
regex = r'super'
match = re.match(regex, text)
print('Using match():', match)

match = re.search(regex, text)
print('Using search():', match)
print(text[match.start(): match.end()])

### Finding All Occurrences using `findall()`
The `findall()` function returns a list of all matches found in the text.

In [None]:
regex = r'super'
match = re.findall(regex, text)
print(match)
print(type(match))

### Metacharacters
Metacharacters define the structure of patterns.

Common metacharacters: `. ^ $ * + ? |`

Example below shows the OR operator `|` and quantifiers.

In [None]:
text = 'Python is super super easy'
regex = r'Python | super'
print(re.findall(regex, text))

### Quantifiers
Quantifiers specify how many times a character or pattern should repeat.

In [None]:
text = 'The pole, role, subrole, and rrole are part of company roles'
print(re.findall(r'r*ole', text))  # zero or more r's before ole
print(re.findall(r'r+ole', text))  # one or more r's before ole
print(re.findall(r'r?ole', text))  # zero or one r before ole

### Range Quantifiers `{m,n}`
Specifies how many times the preceding element must appear.
Example: Match any string that starts with 'g' followed by 2–5 'o's and ends with 'gle'.

In [None]:
text = 'gogle google gooogle goooogle gooooogle goooooooogle goooooooogle'
regex = r'go{2,5}gle'
print(re.findall(regex, text))

### Character Classes `[ ]`
Character classes match any one character within the brackets.

`[aeiou]` → matches any vowel.

Use `re.IGNORECASE` or shorthand `re.I` for case-insensitive matching.

In [None]:
text = 'python java data science data engineering ai AI'
print(re.findall(r'[aeiou]', text))
print(re.findall(r'[aeiou]', text, re.IGNORECASE))
print(re.findall(r'[aeiou][AEIOU]', text, re.I))

### Anchors `^` and `$`
- `^` checks if a string starts with a given pattern.
- `$` checks if a string ends with a given pattern.

In [None]:
text = 'python has no connection with snake python'
print(re.search(r'^python', text))  # pattern occurs at beginning
print(re.search(r'python$', text))  # pattern occurs at end

### Shorthand Character Classes
These special codes make pattern matching simpler:

| Code | Description |
|------|--------------|
| `\d` | Any digit (0–9) |
| `\D` | Any non-digit |
| `\w` | Any word character (letter, digit, underscore) |
| `\W` | Any non-word character |
| `\s` | Any whitespace |
| `\S` | Any non-whitespace |

In [None]:
text = 'I am Arjun and my emp id is 123456 and department id is 23'
print(re.findall(r'\d', text))   # digits
print(re.findall(r'\d{4}', text)) # 4 consecutive digits
print(re.findall(r'\D', text))   # non-digits
print(re.findall(r'\w', text))   # word characters
print(re.findall(r'\W', text))   # non-word characters
print(re.findall(r'\s', text))   # spaces
print(re.findall(r'\S', text))   # non-spaces

### Key Takeaways
- Use raw strings (`r''`) for regex patterns.
- `match()` → checks from start only.
- `search()` → finds first occurrence.
- `findall()` → returns all occurrences.
- Metacharacters (`. ^ $ * + ? |`) and shorthand classes (`\d`, `\w`, etc.) help create powerful text patterns.
- Always test and refine regex patterns before using them in production.