<p style="font-family:Verdana; font-size: 26px; color: magenta"> 1.0 - How to Use RegEx in Python?</p>

<p style="font-family:Verdana; font-size: 18px; color: darkorange"> A regular expression (regex) is a sequence of characters that define a search pattern.
</p>

<p style="font-family:Verdana; font-size: 18px; color: orange"> 1. Start by understanding the special characters used in regex, such as ".", "*", "+", "?", and more.</p>
<p style="font-family:Verdana; font-size: 18px; color: orange"> 2. Choose a programming language or tool that supports regex, such as Python, Perl, or grep.</p>
<p style="font-family:Verdana; font-size: 18px; color: orange"> 3. Write your pattern using the special characters and literal characters.</p>
<p style="font-family:Verdana; font-size: 18px; color: orange"> 4. Use the appropriate function or method to search for the pattern in a string.</p>

In [1]:
import re

s = 'As the volume of data grows so does the demand for skilled data scientists. ' \
'The most common languages used for data science are Python and R'

match = re.search(r'science', s)

print('Start Index:', match.start())
print('End Index:', match.end())

Start Index: 116
End Index: 123


<p style="font-family:Verdana; font-size: 18px; color: magenta"> RegEx Functions</p>

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 1. re.findall()</p>

In [6]:
string = """Hello my Number is 123456789 and
            my friend's number is 987654321"""
pattern = r'\d+'
match = re.findall(pattern, string)
match

['123456789', '987654321']

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 2. re.compile() </p>

> Example 1:

In [7]:
# to find and list all lowercase letters from 'a' to 'e' in the input string
pattern = re.compile('[a-e]')
match = re.findall(pattern, 'Aye, said Mr. Gibenson Stark')
match

['e', 'a', 'd', 'b', 'e', 'a']

> Example 2: 

In [9]:
# to find and list all single digits and sequences of digits in the given input strings
p = re.compile(r'\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

p = re.compile(r'\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

['1', '1', '4', '1', '8', '8', '6']
['11', '4', '1886']


> Example 3:

In [12]:
# to find and list word characters, sequences of word characters, and non-word characters in input strings
p = re.compile(r'\w')
print(p.findall("He said * in some_lang."))

p = re.compile(r'\w+')
print(p.findall("I went to him at 11 A.M., he \
said *** in some_language."))

p = re.compile(r'\W') # and non-word characters
print(p.findall("he said ?** in some_language."))

['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_', 'l', 'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said', 'in', 'some_language']
[' ', ' ', '?', '*', '*', ' ', ' ', '.']


> Example 4:

In [13]:
# to find and list all occurrences of 'ab' followed by zero or more 'b' characters in the input string
p = re.compile(r'ab*')
print(p.findall("ababbaabbb"))

['ab', 'abb', 'a', 'abbb']


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 3. re.split() > re.split(pattern, string, maxsplit=0, flags=0)</p>

> Example 1:

In [15]:
from re import split

print(split(r'\W+', 'Words, words , Words'))
print(split(r'\W+', "Word's words Words"))
print(split(r'\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split(r'\d+', 'On 12th Jan 2016, at 11:02 AM'))

['Words', 'words', 'Words']
['Word', 's', 'words', 'Words']
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']


> Example 2:

In [22]:
print(re.split(r'\d+', 'On 12th Jan 2016, at 11:02 AM', 1))
print(re.split(r'[a-f]+', 'Aey, Boy oh boy, come here', flags=re.IGNORECASE))
print(re.split(r'[a-f]+', 'Aey, Boy oh boy, come here'))

['On ', 'th Jan 2016, at 11:02 AM']
['', 'y, ', 'oy oh ', 'oy, ', 'om', ' h', 'r', '']
['A', 'y, Boy oh ', 'oy, ', 'om', ' h', 'r', '']


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 4. re.sub() > re.sub(pattern, repl, string, count=0, flags=0)</p>

> Example 1:

In [23]:
# replaces all occurrences of 'ub' with '~*' (case-insensitive): 'S~*ject has ~*er booked already'
print(re.sub('ub', '~*', 'Subject has Uber booked already', 
             flags=re.IGNORECASE))
# replaces all occurrences of 'ub' with '~*' (case-sensitive): 'S~*ject has Uber booked already'
print(re.sub('ub', '~*', 'Subject has Uber booked already'))
# replaces the first occurrence of 'ub' with '~*' (case-insensitive): 'S~*ject has Uber booked already'
print(re.sub('ub', '~*', 'Subject has Uber booked already',
             count=1, flags=re.IGNORECASE))
# replaces 'AND' with ' & ' (case-insensitive): 'Baked Beans & Spam'
print(re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', 
             flags=re.IGNORECASE))

S~*ject has ~*er booked already
S~*ject has Uber booked already
S~*ject has Uber booked already
Baked Beans & Spam


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 5. re.subn() > re.subn(pattern, repl, string, count=0, flags=0)</p>

> Example:

In [24]:
# replaces all occurrences of a pattern in a string and returns a tuple with the modified string 
# and the count of substitutions made
print(re.subn(r'ub', '~*', 'Subject has Uber booked already'))

t = re.subn(r'ub', '~*', 'Subject has Uber booked already',
            flags=re.IGNORECASE)
print(t)
print(len(t))
print(t[0])

('S~*ject has Uber booked already', 1)
('S~*ject has ~*er booked already', 2)
2
S~*ject has ~*er booked already


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 6. re.escape() > re.escape(string)</p>

> Example:

In [25]:
# is used to escape special characters in a string, making it safe to be used as a pattern
print(re.escape(r"This is Awesome even 1 AM"))
print(re.escape(r"I Asked what is this [a-9], he said \t ^WoW"))

This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\],\ he\ said\ \\t\ \^WoW


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 7. re.search()</p>

In [26]:
# This method either returns None (if the pattern doesn’t match), or a re.MatchObject contains information
pattern = r"([a-zA-Z]+) (\d+)"

match = re.search(pattern, "I was born on June 24")
if match != None:
    print ("Match at index %s, %s" % (match.start(), match.end()))
    print ("Full match: %s" % (match.group(0)))
    print ("Month: %s" % (match.group(1)))
    print ("Day: %s" % (match.group(2)))

else:
    print ("The regex pattern does not match.")

Match at index 14, 21
Full match: June 24
Month: June
Day: 24


<p style="font-family:Verdana; font-size: 18px; color: orange"> Meta-characters</p> 

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 1. \ - Backslash</p>

In [38]:
s = 'geeks.forgeeks'

# without using \
# The first search (re.search(r'.', s)) matches any character, not just the period
match = re.search(r'.', s)
print(match)

# using \
# the second search (re.search(r'\.', s)) specifically looks for and matches the period character
match = re.search(r'\.', s)
print(match)

<re.Match object; span=(0, 1), match='g'>
<re.Match object; span=(5, 6), match='.'>


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 2. [] - Square Brackets</p>

> Example:

In [31]:
string = "The quick brown fox jumps over the lazy dog"
pattern = "[a-d]"
result = re.findall(pattern, string)

print(result)

['c', 'b', 'a', 'd']


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 3. ^ - Caret</p>

> Example:

In [33]:
# checks whether the string starts with the given character(s) or not
pattern = r'^The'
strings = ['The quick brown fox', 'The lazy dog', 'A quick brown fox']
for string in strings:
    if re.match(pattern, string):
        print(f'Matched: {string}')
    else:
        print(f'Not matched: {string}')

Matched: The quick brown fox
Matched: The lazy dog
Not matched: A quick brown fox


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 4. $ - Dollar</p>

> Example

In [34]:
# checks whether the string ends with the given character(s) or not
string = "Hello World!"
pattern = r"World!$"

match = re.search(pattern, string)
if match:
    print("Match found!")
else:
    print("Match not found.")

Match found!


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 5. . - Dot</p>

> Example:

In [35]:
string = "The quick brown fox jumps over the lazy dog."
# The dot (.) in the pattern represents any character
pattern = r"brown.fox"

match = re.search(pattern, string)
if match:
    print("Match found!")
else:
    print("Match not found.")

Match found!


<p style="font-family:Verdana; font-size: 18px; color: magenta"> 6. | - Or</p>

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 7. ? - Question Mark</p>

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 8.* - Star</p>

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 9. + - Plus</p>

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 10. {m, n} - Braces</p>

<p style="font-family:Verdana; font-size: 18px; color: magenta"> 11. (<regex>) - Group</p>