# Regular Expressions or RegEx

A regular expression is a sequence of characters that defines a search pattern.

It can include literal characters, metacharacters (characters with a special meaning), and quantifiers.

Regular expressions in Python are a powerful tool for pattern matching and string manipulation. They allow you to search, match, and manipulate strings using concise and flexible patterns. Python provides the 're' module to work with regular expressions.

### Metacharacters

- `.`: Any Character Except New Line
- `\d`: Digit (0-9)
- `\D`: Not a Digit (0-9)
- `\w`: Word Character (a-z, A-Z, 0-9, _)
- `\W`: Not a Word Character
- `\s`: Whitespace (space, tab, newline)
- `\S`: Not Whitespace (space, tab, newline)
- `\b`: Word Boundary
- `\B`: Not a Word Boundary
- `[]`: Matches Characters in brackets
- `[^ ]`: Matches Characters NOT in brackets
- `|`: Either Or
- `()`: Group

### Quantifiers
- `*`: Matches 0 or More Occurrences
- `+`: Matches 1 or More Occurrences
- `?`: Matches 0 or 1 Occurrence
- `{n}`: Matches Exactly n Occurrences
- `{n,}`: Matches n or More Occurrences
- `{n,m}`: Matches Between n and m Occurrences

### Anchors
- `^`: Beginning of a String
- `$`: End of a String

### Regular Expression Functions 

- `search()`: Scans a string for a regex match
- `match()`: Looks for a regex match at the beginning of a string
- `fullmatch()`: Looks for a regex match on an entire string
- `findall()`: Returns a list of all regex matches in a string
- `finditer()`: Returns an iterator that yields regex matches from a string

### Substitution Functions 

- `sub()`: Scans the string for pattern matches, replaces the matching portions of the string with the replacement string
- `subn()`: Same as sub but returns the number of substitutions made

### Regular Expression Flags 

- `I`: Perform case-insensitive matching
- `S`: Makes `.` match any character, including a newline character
- `U`: Interprets letters according to the Unicode character set (enabled by default in Python 3)
- `M`: Makes `$` match end of line and `^` match start of any lines

### Sample Regular Expression

- `[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+`


In [None]:
# %%writefile example/simple.py
import re

text_to_search = '''
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

Ha HaHa

MetaCharacters :
. ^ $ * + ? () {} [] | \

coreyms.com

321-555-4321
123.555.1234
123*555*1234
800-555-1234
900-555-1234

Mr. Jayaprakash
Mr Pavan
Ms Davis
Mrs. Robinson
Mr. T
'''

sentence = 'Start a sentence and then bring it to an end 0 9 8 '

print(re.search(r'start', sentence, re.I))

pattern = re.compile(r'start', re.I)
print(type(pattern))

matches = pattern.search(sentence)
print(matches)


In [None]:
# %%writefile example/emails.py

import re

emails = '''
pavanshanbhag@gmail.com
jayaprakash@university.edu
test-123@my-work.net
dummy_email@my-work.net
'''

pattern = re.compile(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

matches = pattern.finditer(emails)

for match in matches:
    print(match)



In [None]:
# %%writefile example/url.py
import re

urls = '''
https://www.google.com
http://coreyms.com
https://youtube.com
https://www.nasa.gov
'''

pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
subbed_urls = pattern.sub(r'\2\3', urls)
print(subbed_urls)

matches = pattern.finditer(urls)

for match in matches:
   print(match)
