### Regular Expression :

A regular expression (regex or regexp) is a powerful tool used for pattern matching within strings. It provides a concise and flexible syntax for finding, matching, and manipulating text. Regular expressions are supported in many programming languages, text editors, and command-line tools.

In [1]:
import re

##### Key Functions:
<h6> 1. search(pattern, string): </h6>
Searches for the first occurrence of the pattern in the string.
Returns a match object if a match is found, None otherwise.

In [50]:
pattern = r"world"
string = "Hello world"
match = re.search(pattern, string)
if match:
    #<the type; span(first index of the string, last index of the string), match'the string'>
    print(match)
    # .span() returns the first and last index
    print(match.span())
    # .group() is the full text of the match
    print(match.group())
    # string and the pattern
    print(match.string)
    print(match.re.pattern)

<re.Match object; span=(6, 11), match='world'>
(6, 11)
world
Hello world
world


<h6> 2. match(pattern, string): </h6>
Checks if the pattern matches at the beginning of the string.
Returns a match object if there is a match, None otherwise.

In [57]:
pattern1 = r"apples"
pattern2 = r"app"
string = "apples are delicious."
match1 = re.match(pattern1, string)
match2 = re.match(pattern2, string)
if match1:
    print("Matched:", match1.group())
if match2:
    print("Matched:", match2.group())

Matched: apples
Matched: app


<h6> 3. findall(pattern, string): </h6>
Finds all occurrences of the pattern in the string.
Returns a list of matching substrings.

In [79]:
import re

pattern = r"apple"
string = "I have an apple, and she has an apple too."
matches = re.findall(pattern, string)
print("Matches:", matches)
# Matches is a list and u can accsess every element with index
print(matches[0])
# I am going to use this Key Concept and explain in the Key Concepts section.
text_for_search = '''
phonebook:
user - phone number
a - 12345
b - 44444
c - 33333
'''
new_pattern = r"\d+"
matches = re.findall(new_pattern, text_for_search)
print(matches)
print(matches[0])

Matches: ['apple', 'apple']
apple
['12345', '44444', '33333']
12345


<h6> 4. sub(pattern, replacement, string):</h6>
Replaces all occurrences of the pattern with the specified replacement in the string. 

In [80]:
pattern = r"apple"
replacement = "orange"
string = "I like apples and apples are red."
new_string = re.sub(pattern, replacement, string)
print("New string:", new_string)

New string: I like oranges and oranges are red.


##### Key Concepts:

Regular expressions use metacharacters like . (any character), * (zero or more occurrences), + (one or more occurrences), ^ (start of the line), $ (end of the line), etc.

<h6>1. Metacharacters:</h6>
Metacharacters are characters in a regular expression that have a special meaning.<br>
Examples:<br>

. (dot): Matches any character except a newline.<br>
*: Matches 0 or more occurrences of the preceding character or group.<br>
+: Matches 1 or more occurrences of the preceding character or group.<br>
?: Matches 0 or 1 occurrence of the preceding character or group.<br>
^: Anchors the regex at the start of the string.<br>
$: Anchors the regex at the end of the string.<br>
[]: Specifies a character class.<br>

<h6>2. Character Classes:</h6>
Character classes allow you to match one character out of a set of characters.<br>
Examples:<br>

[aeiou]: Matches any vowel.<br>
[0-9]: Matches any digit.<br>
[^a-z]: Matches any character that is not a lowercase letter.<br>

<h6>3. Quantifiers:</h6>
Quantifiers specify the number of occurrences of a character or group.<br>
Examples:<br>

{n}: Matches exactly n occurrences.<br>
{n,}: Matches n or more occurrences.<br>
{n, m}: Matches between n and m occurrences.<br>
*: Equivalent to {0,} (0 or more occurrences).<br>
+: Equivalent to {1,} (1 or more occurrences).<br>
?: Equivalent to {0,1} (0 or 1 occurrence).<br>

<h6>4. Grouping:</h6>
Parentheses () are used for grouping and capturing parts of the pattern.<br>
Groups allow you to apply quantifiers to multiple characters as a single unit.<br>
Example:<br>

(ab)+: Matches one or more occurrences of the sequence "ab."<br>

<h6>5. Anchors:</h6>
Anchors are used to specify the position of a match in the string.<br>
Example:<br>

^: Anchors the regex at the start of the string.<br>
$: Anchors the regex at the end of the string.<br>
\b: Word boundary, asserts the position at the beginning or end of a word.<br>

<h6>6. Wildcards:</h6>
The dot . is a wildcard character that matches any character except a newline.<br>
Example:<br>

a.b matches "aab," "axb," "a1b," etc.

<h6>7. Escaping:</h6>
Backslash \ is used to escape metacharacters, allowing you to match them as literals.<br>
Example:<br>

\\. matches a literal dot, not any character.<br>

### Some Examples:

In [9]:
pattern = '\d+'
text = "There are apples and 456 bananas."

match = re.search(pattern, text)
print("Found:", match.group())

Found: 456


In [25]:
date_pattern = r'\d+.\d+.\d+'
text = "Today's date is 2023-08-23."
matches = re.findall(date_pattern, text)
print(matches)

2023-08-23


In [None]:
date_pattern = r'(\d{4})-(\d{2})-(\d{2})'
text = "Event date: 2023-08-23"
matches = re.findall(date_pattern, text)
for match in matches:
    print(f"Year: {match[0]}, Month: {match[1]}, Day: {match[2]}")

In [None]:
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'
text = "Contact us at email@example.com or support@domain.co.uk"
matches = re.findall(email_pattern, text)
print(matches)

In [None]:
phone_pattern = r'\b\d{3}-\d{3}-\d{4}\b'
text = "Call us at 555-123-4567 or 987-654-3210"
matches = re.findall(phone_pattern, text)
print(matches)

In [None]:
prefix = r'\bpre\w+\b'
text = "I prefer to prepare for the presentation beforehand."
matches = re.findall(prefix, text, re.IGNORECASE)
print(matches)

In [13]:
html_pattern = r'<.*?>'
text = "<p>This is a <b>bold</b> statement.</p>"
matches = re.findall(html_pattern, text)
print(matches)

['<p>', '<b>', '</b>', '</p>']


In [None]:
length_pattern = r'\b\w{5,8}\b'
text = "The quick brown fox jumped over the lazy dogs."
matches = re.findall(length_pattern, text)
print(matches)

In [None]:
ip_pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
text = "Valid IPs are 192.168.1.1 and 10.0.0.2"
matches = re.findall(ip_pattern, text)
print(matches)

In [None]:
url_pattern = r'https?://(?:www\.)?[A-Za-z0-9.-]+(?:/[\w./?%&=-]*)?'
text = "Visit https://www.example.com or http://google.com/search"
matches = re.findall(url_pattern, text)
print(matches)

In [None]:
hex_color_pattern = r'#[A-Fa-f0-9]{6}'
text = "The color is #FFA500 for orange and #003366 for blue."
matches = re.findall(hex_color_pattern, text)
print(matches)

In [None]:
date_pattern = r'(\d{4})-(\d{2})-(\d{2})'
text = "Event date: 2023-08-23"
matches = re.findall(date_pattern, text)
for match in matches:
    print(f"Year: {match[0]}, Month: {match[1]}, Day: {match[2]}")

In [None]:
text = "Hello, my name is John. Call me JohnD."
new_text = re.sub(r'John', 'Alice', text)
print(new_text)

In [None]:
repeated_pattern = r'(ha)+'
text = "The crowd laughed, hahaha, at the funny joke."
matches = re.findall(repeated_pattern, text)
print(matches)

In [None]:
word_boundary_pattern = r'\bcat\b'
text = "The cat is on the mat, but not a category."
matches = re.findall(word_boundary_pattern, text)
print(matches)