# Python RegEx

## What is Regular Expression (RegEx)?
Regular Expressions (RegEx) are patterns used to search, match, and manipulate text. They are widely used in Python, SQL, log processing, ETL pipelines, and data validation.

## Basic RegEx Metacharacters & Examples

![image.png](attachment:image.png)

### RegEx Functions

![image.png](attachment:image.png)

### 🔹 Real-World Use Cases -  Lets Practice

#### 1. Extracting Email Addresses

In [15]:
import re

text = "My email is kannika+.m@company.com and alternate is kannika123@gmail.com OR test.email@sub.domain.co.uk.org"

#Write a regex expression to only extract the email addresses available in the text
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',text)

print(emails)


['kannika+.m@company.com', 'kannika123@gmail.com', 'test.email@sub.domain.co.uk.org']


#### 2.  Extracting Phone Numbers

In [20]:
text = "My number is +44 7890-123-456 or 020-3456-7890 or 9880025376"
pattern1 = r'\+?\d{1,3}[-.\s]?\d{2,4}[-.\s]?\d{3,4}[-.\s]?\d{3,4}'
pattern2 = r'\+*\d{1,3}[-.\s]?\d{2,4}[-.\s]?\d{3,4}[-.\s]?\d{3,4}'

matches1 = re.findall(pattern1, text)
print(matches1)


matches2 = re.findall(pattern2, text)
print(matches2)

['+44 7890-123-456', '020-3456-7890', '9880025376']
['+44 7890-123-456', '020-3456-7890', '9880025376']


#### 3. Validate if a String is a Proper Date (YYYY-MM-DD)

In [21]:
pattern = r'^\d{4}-\d{2}-\d{2}$'
date1 = "2024-10-12"
date2 = "10/12/2024"

print(bool(re.match(pattern, date1)))  # ✅ True
print(bool(re.match(pattern, date2)))  # ❌ False


True
False


In [25]:
import re

text = r"The total cost is $250.99 or INR 1999."
pattern = r'(?:\$|INR)\s?\d+(?:\.\d{2})?'  # Non-capturing groups

matches = re.findall(pattern, text)
print(matches)


['$250.99', 'INR 1999']


In [28]:
import re

text = "Mr. John, Ms. Alice _ "
pattern1 = r'(Mr|Ms)\. \w+'  # Capturing Group
pattern2 = r'(?:Mr|Ms)\. \w+'  # Non-Capturing Group

print(re.findall(pattern1, text))  # ['Mr', 'Ms']
print(re.findall(pattern2, text))  # ['Mr. John', 'Ms. Alice']


['Mr', 'Ms']
['Mr. John', 'Ms. Alice']


![image.png](attachment:image.png)

In [30]:
import re

text = "Hello 123, welcome_to Regex! 123 @#$"
pattern = r'\w+'

matches = re.findall(pattern, text)
print(matches)


['Hello', '123', 'welcome_to', 'Regex', '123']


#### re.fullmatch

In [31]:
text = "Python3"
pattern = r"[A-Za-z]+\d"

match = re.fullmatch(pattern, text)
print("Matched!" if match else "No match")

print(match)

Matched!
<re.Match object; span=(0, 7), match='Python3'>


In [32]:
import re

text = "Python3"
pattern = r"^[A-Za-z]+\d$"  # Ensure full match

match = re.fullmatch(pattern, text)
print("Matched!" if match else "No match")


Matched!


![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [33]:
import re

text = "ABC 123 XYZ 456 PQR 789"
pattern = r"\d+"  # Match digits

match_result = re.match(pattern, text)
search_result = re.search(pattern, text)
findall_result = re.findall(pattern, text)

print("Match result:", match_result.group() if match_result else "No match")  # ❌ No match
print("Search result:", search_result.group() if search_result else "No match")  # ✅ First match
print("Findall result:", findall_result)  # ✅ All matches


Match result: No match
Search result: 123
Findall result: ['123', '456', '789']


In [None]:
import os
import re
import pandas as pd


# Local directory where PDF files are stored
directory_path = r"C:\Users\manj373091\OneDrive - Endeavor\Documents\Sept 22\Endeavor Trainings\Python-Challenges\01_Regex\DataFiles"

print('hi')

for filenames in os.listdir(directory_path):
    print(filenames)
