# regular expressions in python 

# Regular expressions, often abbreviated as regex or regexp, are sequences of characters that define a search pattern. They are widely used in text processing tasks for searching, extracting, and manipulating strings based on patterns.

In Python, the re module provides support for regular expressions. Here's an overview of some common functions and methods provided by the re module:

re.search(pattern, string): This function searches for the first occurrence of the pattern within the string and returns a match object if found. If not found, it returns None.

In [3]:
import re
var1 = re.search(r'apple', 'I love apples and oranges')
if var1:
    print('Found:', var1.group())


Found: apple


# re.match(pattern, string): This function checks if the pattern matches at the beginning of the string. It returns a match object if successful, otherwise None.

In [6]:
import re
var2 = re.match(r'apple', 'apple pie')
if var2:
    print('Found:', var2.group())


Found: apple


In [8]:
import re
match = re.match(r'apple', 'orange apple  pie')
if match:
    print('Found:', match.group())

# re.findall(pattern, string): This function finds all occurrences of the pattern in the string and returns them as a list of strings.

In [10]:
import re
matches = re.findall(r'\d+', 'There are 10 apples and 20 oranges 50 banana')
print('Found:', matches)

Found: ['10', '20', '50']


# re.finditer(pattern, string): This function returns an iterator yielding match objects over all non-overlapping matches of the pattern in the string.

In [16]:
import re
matches = re.finditer(r'\d+', 'There are 10 apples and 20 oranges 50 banana')
#print('Found:', matches)
for i in matches:
    print('Found:', i.group())


Found: 10
Found: 20
Found: 50


# re.sub(pattern, repl, string): This function replaces occurrences of the pattern in the string with the replacement string.

In [17]:
import re
new_string = re.sub(r'\d+', 'XX', 'There are 10 apples and 20 oranges')
print('Modified:', new_string)

Modified: There are XX apples and XX oranges


# Pattern Syntax: Regular expression patterns are formed using a combination of characters and metacharacters. For example:

\d: Matches any decimal digit.

\w: Matches any alphanumeric character and underscore.

[]: Matches any single character within the brackets.

.: Matches any single character except newline.

*, +, ?: Quantifiers to specify repetition.

^, $: Anchors to match the beginning and end of the string respectively.

# Case Study: Extracting Email Addresses from a Text File by using Python with regular expressions

# let's create a simple case study using Python regex

# In this code:

We define a function extract_emails_from_text_file that takes the path to the text file as input.

Within this function, we read the contents of the file using open() and read().

We then define a regular expression pattern email_pattern to match email addresses. This pattern matches common email address formats.

Using re.findall(), we extract all the email addresses from the text.

Finally, we return the list of extracted email addresses.

Make sure to replace 'sample_text.txt' with the path to your actual text file.

In [18]:
import re

In [19]:
def extract_emails_from_text_file(file_path):
    with open(file_path, 'r') as file:
        text = file.read()

    # Regular expression pattern for matching email addresses
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

    # Using findall to extract all email addresses from the text
    emails = re.findall(email_pattern, text)

    return emails

In [20]:
# Example usage:
file_path = 'sample_text.txt'
emails = extract_emails_from_text_file(file_path)
print("Extracted Email Addresses:")
for email in emails:
    print(email)


Extracted Email Addresses:
user@example.com
business@company.com
