<h1 align="center" style="color: orange">Regular Expressions in Python</h1>

> [Regex Docs](https://docs.python.org/3/library/re.html#)

### Regular expression syntax

- `.` : Matches any character except a newline.
- `*` : Matches zero or more occurrences of the previous character.
- `+` : Matches one or more occurrences of the previous character.
- `?` : Matches zero or one occurrence of the previous character.
- `\d` : Matches any digit (0-9).
- `\D` : Matches any non-digit character.
- `\w` : Matches any word character (alphanumeric and underscore).
- `\W` : Matches any non-word character.
- `\s` : Matches any whitespace character.
- `\S` : Matches any non-whitespace character.

**Anchors**
* `\b` - word boundary
* `\B` - not a word boundary
* `^` - beginning of a string
* `$` - end of a string

**Character Classes**
* `[]` - matches characters in brackets
* `[^ ]` - matches characters NOT in brackets
* `|` - either or
* `()` - group
* `[1-4]` - range of numbers (minimum, maximum)

In [2]:
import re

In [4]:
text_to_search = '''

abcdefghijklmnopqurtuvwxyz

ABCDEFGHIJKLMNOPQRSTUVWXYZ

1234567890

Ha HaHa Ha Hii Haa

MetaCharacters :
. ^ $ * + ? { } [ ] \ | ( )

Google.com
Amazon.com
Facebook.com

123-456-789
123.456.789
123*555*1234
123/456/789

877-500-1234
980-555-1234
930-234-3455

Mr. Peter Griffin
Mr Stewie Griffin
Ms Glen Quagmire
Mrs. Griffin
Ms. Meg Griffin
Mrs. U
Mr. Joe Swanson
'''

sentence = "Finally some more things to test the RegEx using Python re module"

In [13]:
# Extract phone numbers
phone_numbers_v1 = re.search(r'\d{3}-\d{3}-\d{4}', text_to_search)

if phone_numbers_v1:
    print("Phone numbers found : ", phone_numbers_v1.group(0))

Phone numbers found :  877-500-1234


In [None]:
pattern = re.compile(r'abcd')

hits = pattern.finditer(text_to_search)
for hit in hits:
    print(hit) 

# span shows the beginning and the end of the matches

In [None]:
# we need to escape meta characters as they hold different meanings in regex
p2 = re.compile(r'\.')
m1 = p2.finditer(text_to_search)

for x in m1:
    print(x)

# typical use cases are URLs
p3 = re.compile(r'Google\.com')
m2 = p3.finditer(text_to_search)

for x in m2:
    print(x)

In [None]:
# matching phone numbers
ph_num = re.compile(r'[89][78]\d[-.]')

valid_numbers = ph_num.finditer(text_to_search)

for num in valid_numbers:
    print(num)

In [8]:
# finding all the occurrences without upper case letters
p5 = re.compile(r'[^a-zA-Z\d]')

no_upper = p5.finditer(text_to_search)

for x in no_upper:
    print(x)

<re.Match object; span=(0, 1), match='\n'>
<re.Match object; span=(27, 28), match='\n'>
<re.Match object; span=(54, 55), match='\n'>
<re.Match object; span=(65, 66), match='\n'>
<re.Match object; span=(66, 67), match='\n'>
<re.Match object; span=(69, 70), match=' '>
<re.Match object; span=(74, 75), match=' '>
<re.Match object; span=(77, 78), match=' '>
<re.Match object; span=(81, 82), match=' '>
<re.Match object; span=(85, 86), match='\n'>
<re.Match object; span=(86, 87), match='\n'>
<re.Match object; span=(101, 102), match=' '>
<re.Match object; span=(102, 103), match=':'>
<re.Match object; span=(103, 104), match='\n'>
<re.Match object; span=(104, 105), match='.'>
<re.Match object; span=(105, 106), match=' '>
<re.Match object; span=(106, 107), match='^'>
<re.Match object; span=(107, 108), match=' '>
<re.Match object; span=(108, 109), match='$'>
<re.Match object; span=(109, 110), match=' '>
<re.Match object; span=(110, 111), match='*'>
<re.Match object; span=(111, 112), match=' '>
<re.

In [14]:
string2 = """
pun
bun
one
won
"""

p6 = re.compile(r'[^p]un')

something_weird = p6.finditer(string2)

for weird in something_weird:
    print(weird)

<re.Match object; span=(5, 8), match='bun'>


In [17]:
# using quantifiers
ph_num = re.compile(r'\d{3}.\d{3}.\d{4}')

valid_numbers = ph_num.finditer(text_to_search)

for num in valid_numbers:
    print(num)

<re.Match object; span=(169, 181), match='123*555*1234'>
<re.Match object; span=(183, 195), match='877-500-1234'>
<re.Match object; span=(196, 208), match='980-555-1234'>
<re.Match object; span=(209, 221), match='930-234-3455'>


In [6]:
# finding all with the `MR.` prefix
pattern = re.compile(r'(Mr|Mrs|Ms)\.?\s[A-Z]\w*')

result = pattern.finditer(text_to_search)

for r in result:
    print(r)

<re.Match object; span=(223, 233), match='Mr. Oliver'>
<re.Match object; span=(234, 242), match='Mr Smith'>
<re.Match object; span=(243, 251), match='Ms Peter'>
<re.Match object; span=(264, 276), match='Mr. Galactus'>
<re.Match object; span=(277, 283), match='Mrs. U'>


- Simple e-mail validator

In [10]:
def validate_email(email: str) -> bool:

    pattern = re.compile(r"""
                                ^                # Start of the string
                                [a-zA-Z0-9_.+-]+   # One or more of any alphanumeric character, underscore, dot, plus, or hyphen
                                @                # The "@" symbol
                                [a-zA-Z0-9-]+      # One or more of any alphanumeric character or hyphen
                                \.               # The literal dot character (escaped with backslash)
                                [a-zA-Z0-9-.]+     # One or more of any alphanumeric character, dot, or hyphen
                                $                # End of the string
                         """, re.VERBOSE)
    
    validator = pattern.search(email)

    return validator


result = validate_email('asdf@one.two.in')

if result:
    print("success")
else:
    print("Enter a valid email..")

success


- Masking Email address and Phone number

In [24]:
def mask_email(email: str) -> str:
    if validate_email(email):
        name, domain = email.split('@')
        return f"Your Email address is {name[0]}#####{name[-1]}@{domain}"
    

print(mask_email("abcd@mit.edu.in"))

Your Email address is a#####d@mit.edu.in


In [32]:
def mask_phone(num: str) -> str:

    pattern = re.compile("""
                    ^                       # Start of the string
                    (                       # Start of the first capturing group
                    \+?                   # Matches an optional "+" sign
                    [0-9]{1,3}            # Matches one to three digits for the country code
                    )                       # End of the first capturing group
                    [-\s.]?                 # Matches an optional hyphen, space, or dot as a separator
                    \( ?                    # Matches an optional opening parenthesis for area codes
                    (                       # Start of the second capturing group
                    [0-9]{1,4}            # Matches one to four digits for the area code
                    )                       # End of the second capturing group
                    \)?                     # Matches an optional closing parenthesis for area codes
                    [-\s.]?                 # Matches an optional hyphen, space, or dot as a separator
                    (                       # Start of the third capturing group
                    [0-9]{1,4}            # Matches one to four digits for the first part of the phone number
                    )                       # End of the third capturing group
                    [-\s.]?                 # Matches an optional hyphen, space, or dot as a separator
                    (                       # Start of the fourth capturing group
                    [0-9]{1,9}            # Matches one to nine digits for the final part of the phone number
                    )                       # End of the fourth capturing group
                    $                       # End of the string
                    """, re.VERBOSE)
    
    matching = pattern.match(num)
    if matching:
        country_code, last_3_digit = matching.group(1), matching.group(4)
        return f"{country_code}#####{last_3_digit}"
    return "Invalid phone number"

In [33]:
mask_phone("+1-123-456-7890")

'+1#####7890'