### [Regular Expression](https://docs.python.org/3/library/re.html)

References:
- [Scaler Topics](https://www.scaler.com/topics/python/regular-expression-in-python/)
- [Ask Python](https://www.askpython.com/python/regular-expression-in-python)

In [79]:
import re

#### RegEx functions

##### 1. Searching for a pattern
- The re.search() function is used to search for a pattern within a string.
- In case of more than one match, the first occurrence of the match is returned.

In [110]:
# Searching for a pattern:

string = "at what time?"
pattern = r'at'

match = re.search(pattern,string)

if (match):
    print ("String found at: " ,match.span())
else:
    print ("String not found!")

String found at:  (0, 2)


In [126]:
# Pattern modifiers:

# You can use modifiers to modify the behavior of regular expressions, such as ignoring case or enabling multiline mode. Modifiers are specified as flags in the regex pattern. For example:

text = "Kira, World"
pattern = r"world"
match = re.search(pattern, text, re.IGNORECASE)
print(match)

<re.Match object; span=(6, 11), match='World'>


##### 2. Finding all occurrences
- The `re.findall()` function is used to find all occurrences of a pattern in a string and returns them as a list.

In [97]:
# Finding all occurrences:

string = "at what time?"
pattern = r"at"

matches = re.findall(pattern,string)

print(matches)

['Hello', 'Hello']


##### 3. Splitting a string based on a pattern
- The `re.split()` function is used to split a string based on a pattern.
- Splits the string at every occurrence of the sub-string.

In [98]:
# Splitting a string based on a pattern:

string = "at what time?"
pattern = r"at"

matches = re.split(pattern,string)

print(matches)

['', ' wh', ' time?']


- You can control the number of occurrences by specifying the maxsplit parameter:

In [100]:

string = "at what time?"
pattern = r"at"

matches = re.split(pattern,string, 1)

print(matches)

['', ' what time?']


##### 4. Replacing patterns
- The `re.sub()` function is used to replace all occurrences of a pattern in a string with a specified replacement.

In [104]:
# Replacing patterns:

string = "at what time?"
pattern = r"\s"
replacement = r'!!!'

new_text = re.sub(pattern, replacement, string)

print(new_text)

at!!!what!!!time?


##### 5. Matching patterns 
- The `re.match()` function is used to match a pattern at the beginning of a string.

In [120]:
# Matching patterns:

string = "at what time?"
pattern = r"what"

match = re.match(pattern, string)

if match:
    print("Pattern found!", match.span())
else:
    print("Pattern not found.")

Pattern not found.


Difference between `re.search()` and `re.match()`:

>`re.search()`:
>> - Searches for a pattern anywhere in the input string.
>> - Scans the entire input string and stops at the first occurrence of the pattern.
>> - Returns a match object if a match is found; otherwise, returns None.
>> - Does not require the pattern to match at the beginning of the string.

>`re.match()`:
>> - Matches a pattern only at the beginning of the input string.
>> - Checks if the pattern matches the start of the string.
>> - Returns a match object if a match is found at the beginning; otherwise, returns None.
>> - Requires the pattern to match at the beginning of the string.


The attributes and properties of `re.Match` and `re.search` objects one by one. They are as follows:

- `match.group()`: This returns the part of the string where the match was there.
- `match.start()`: This returns the start position of the matching pattern in the string.
- `match.end()`: This returns the end position of the matching pattern in the string.
- `match.span()`: This returns a tuple which has start and end positions of matching pattern.
- `match.re`: This returns the pattern object used for matching.
- `match.string`: This returns the string given for matching.
- Using `r` prefix before regex: This is used to convert the pattern to raw string.This means any special character will be treated as normal character. Ex: \ character will not be treated as an escape character if we use r before the pattern.

In [131]:
text = '''Alan Turing was a pioneer of theoretical computer science and artificial intelligence. He was born on 23 June 1912 in Maida Vale, London'''

pattern = 'computer'

# Searches the pattern in the string.
res = re.search(pattern, text)
print(f"Match object = {res}")
print("--"*30)
print("group method output = ",res.group())
print("--"*30)
print("start method output = ",res.start())
print("--"*30)
print("end method output = ",res.end())
print("--"*30)
print("span method output = ",res.span())
print("--"*30)
print("re attribute output = ",res.re)
print("--"*30)
print("string attribute output = ",res.string)
print("--"*30)

# Example of using r as prefix.
# Searching for \\ in the following string
text = r'search \\ in this string'
# searching using r as prefix
res = re.search(r"\\",text)
print("With r as prefix = ",res.start())

Match object = <re.Match object; span=(41, 49), match='computer'>
------------------------------------------------------------
group method output =  computer
------------------------------------------------------------
start method output =  41
------------------------------------------------------------
end method output =  49
------------------------------------------------------------
span method output =  (41, 49)
------------------------------------------------------------
re attribute output =  re.compile('computer')
------------------------------------------------------------
string attribute output =  Alan Turing was a pioneer of theoretical computer science and artificial intelligence. He was born on 23 June 1912 in Maida Vale, London
------------------------------------------------------------
With r as prefix =  7


#### Meta Characters

In [132]:
text = '''Alan Turing was born on 23 June 1912 in London.'''
# Example for \A
res = re.findall(r'\AAlan',text)
print(r"Result for \A = ", res)
print("-"*79)
# Example for \b
res = re.findall(r'\bLon',text)
print("Result for \\b = ", res)
print("-"*79)
# Example for \b
res = re.findall(r'ring\b',text)
print("Result for \\b = ", res)
print("-"*79)
# Example for \B
res = re.findall(r'\Bon',text)
print(r"Result for \B = ", res)
print("-"*79)
# Example for \d
res = re.findall(r'\d',text)
print(r"Result for \d = ", res)
print("-"*79)
# Example for \D
res = re.findall(r'\D',text)
print(r"Result for \D = ", res)
print("-"*79)
# Example for \s
res = re.findall(r'\s',text)
print(r"Result for \s = ", res)
print("-"*79)
# Example for \S
res = re.findall(r'\S',text)
print(r"Result for \S = ", res)
print("-"*79)
# Example for \w
res = re.findall(r'\w',text)
print(r"Result for \w = ", res)
print("-"*79)
# Example for \W
res = re.findall(r'\W',text)
print(r"Result for \W = ", res)
print("-"*79)
# Example for \Z
res = re.findall(r'London.\Z',text)
print(r"Result for \Z = ", res)


Result for \A =  ['Alan']
-------------------------------------------------------------------------------
Result for \b =  ['Lon']
-------------------------------------------------------------------------------
Result for \b =  ['ring']
-------------------------------------------------------------------------------
Result for \B =  ['on', 'on']
-------------------------------------------------------------------------------
Result for \d =  ['2', '3', '1', '9', '1', '2']
-------------------------------------------------------------------------------
Result for \D =  ['A', 'l', 'a', 'n', ' ', 'T', 'u', 'r', 'i', 'n', 'g', ' ', 'w', 'a', 's', ' ', 'b', 'o', 'r', 'n', ' ', 'o', 'n', ' ', ' ', 'J', 'u', 'n', 'e', ' ', ' ', 'i', 'n', ' ', 'L', 'o', 'n', 'd', 'o', 'n', '.']
-------------------------------------------------------------------------------
Result for \s =  [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
-------------------------------------------------------------------------------

##### Difference Between [0-9] and [0-9.]
    - [0-9] specifies any digit, and [0-9.] represents any digit or a period(.).
        To be more clear, [0-9] represents the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and [0-9.] represents the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, .}.
        That means “1” will match with both [0-9] and [0-9.] but “.” will match only with [0-9.] but not [0-9].

##### Why use `r` before string?

- It's used to indicate that the string should be treated as a raw string without any special processing of escape sequences.
- In regular expressions, backslashes have special meaning as well. They are used to escape metacharacters or to introduce special character classes.


In [None]:
pattern = r"\\[a-zA-Z]"

pattern = "\\\\[a-zA-Z]" # If we don't use `r` prefix 
