<h2>What is Regular Expression and how is it used?</h2>

<b>Regular expression is a sequence of character(s) mainly used to find and replace patterns in a string or file.</b><br>
<b>Regular expressions, also called regex, is a syntax or rather a language to search, extract and manipulate specific string patterns from a larger text. It is widely used in projects that involve text validation, NLP and text mining</b>

<ul><li>Regular Expressions are common in many programming languages like Java,R, Python etc..</li>
<li>In Python we can use the regular expressions using a module called <b>re</b></li>
<li>We should import the re module , before starting with regular expressions(import re)</li>
    <li>Regular expressions use two types of characters: Meta characters, Literals</li>
    <li>Meta characters : As the name suggests, these characters have a special meaning, similar to * in wild card.</li>
    <li>Literals (like a,b,1,2…)</li>
    <li> Example: match = re.search(pat, str)</li>
</ul>

In [172]:
#Example
import re
pattern = '^v....o$' # RegEx pattern
test_string = 'verzeo'
result = re.match(pattern, test_string) #re.match() function to search pattern within the test_string
print(result) #The method returns a match object if the search is successful. If not, it returns None
if result:
  print("Search successful.")
else:
  print("Search unsuccessful.")

<re.Match object; span=(0, 6), match='verzeo'>
Search successful.


<b>Explanation:</b><br>

The above code defines a RegEx pattern. The pattern is: any six letter word starting with v and ending with o.

A pattern defined using RegEx can be used to match against a string.


Ex: verkeo, vaboeo, vskoeo etc..

re.match() will return the match object it the pattern is found in the string, else none

<b>Advantages of regular expression</b>

Search a string (search and match)

Finding a string (findall)

Break string into a sub strings (split)

Replace part of a string (sub)

<h2>How to write a regular expression</h2>

<b>MetaCharacters</b><br>

Metacharacters are characters that are interpreted in a special way by a RegEx engine. 

Here's a list of metacharacters:    [] . ^ $ * + ? {} () \ |

<h3>[] - Square brackets</h3>

In [23]:
#Square brackets specifies a set of characters you wish to match.
#Here, below [har] will match if the string you are trying to match contains any of the h/a/r.
#You can also specify a range of characters using - inside square brackets.
#Example : 
#[a-e] is the same as [abcde].
#[1-4] is the same as [1234].
#[0-39] is the same as [01239].

#You can complement (invert) the character set by using caret ^ symbol at the start of a square-bracket.
#Exaple : [^abc] means any character except a or b or c.
#         [^0-9] means any non-digit character.

![image.png](attachment:image.png)

<b>Function compile()</b><br>

Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.

In [25]:
import re 
  
# compile() creates regular expression character class [a-e], 
# which is equivalent to [abcde]. 
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'. 
p = re.compile('[a-e]') 
  
# findall() searches for the Regular Expression and return a list upon finding 
print(p.findall("hi all!!!, how are you?")) 

['a', 'a', 'e']


In [5]:
p = re.compile('[^a-e]')
print(re.findall(p,"hi all!!!, how are you?")) 

['h', 'i', ' ', 'l', 'l', '!', '!', '!', ',', ' ', 'h', 'o', 'w', ' ', 'r', ' ', 'y', 'o', 'u', '?']


<h3>. - Period</h3>

A period matches any single character (except newline '\n').

![image.png](attachment:image.png)

In [10]:
p = re.compile('...') 
print(p.findall("h"))
print(p.findall("hi")) 
print(p.findall("hi all")) 
print(re.search(p,"hi all!!!, how are you?")) 

[]
[]
['hi ']
<re.Match object; span=(0, 3), match='hi '>


In [17]:
p = re.compile('^..') 
print(p.findall("h"))
print(p.findall("hi")) 
print(p.findall("hi all")) 
print(re.search(p,"hi all!!!, how are you?")) 

[]
['hi']
['hi']
<re.Match object; span=(0, 2), match='hi'>


<h3>^ - Caret</h3>

The caret symbol ^ is used to check if a string starts with a certain character.

![image.png](attachment:image.png)

In [46]:
#Example
p = re.compile('^h') 
print(p.findall("yahh")) 
print(p.findall("hi")) 
print(p.findall("all")) 
print(re.search(p,"hi all!!!, how are you?")) 

[]
['h']
[]
<re.Match object; span=(0, 1), match='h'>


<h3>$ - Dollar</h3>

The dollar symbol $ is used to check if a string ends with a certain character.
    
![image.png](attachment:image.png)

In [47]:
#Example
p = re.compile('sh$')
print(p.findall("harsh"))
print(p.findall("hi"))
print(p.findall("smash"))
print(re.search(p,"hi all!!!, how are you?"))

['sh']
[]
['sh']
None


<h3>* - Star</h3>

The star symbol * matches zero or more occurrences of the pattern left to it.

![image.png](attachment:image.png)

In [19]:
#Example
p = re.compile('ma*n')
print(p.findall("mn"))
print(p.findall("man"))
print(p.findall("maan"))
print(p.findall("main"))
print(p.findall("woman"))
print(re.search(p,"hi all!!!, how are you?"))

['mn']
['man']
['maan']
[]
['man']
None


In [21]:
#Example
p = re.compile('ha*r')
print(p.findall("hasha"))
print(p.findall("arsh"))
print(p.findall("hahr"))
print(p.findall("sha"))
print(p.findall("haarsha"))
print(re.search(p,"hi all!!!, how are you?"))

[]
[]
['hr']
[]
['haar']
None


<h3>+ - Plus</h3>

The plus symbol + matches one or more occurrences of the pattern left to it.

![image.png](attachment:image.png)

In [22]:
#Example
p = re.compile('ma+n')
print(p.findall("mn"))
print(p.findall("man"))
print(p.findall("maan"))
print(p.findall("main"))
print(p.findall("woman"))
print(re.search(p,"hi all!!!, how are you?"))

[]
['man']
['maan']
[]
['man']
None


In [25]:
#Example
p = re.compile('ha+r')
print(p.findall("hasha"))
print(p.findall("arsh"))
print(p.findall("hahr"))
print(p.findall("sha"))
print(p.findall("harsha"))
print(re.search(p,"hi all!!!, how are you?"))

[]
[]
[]
[]
['har']
None


<h3>? - Question Mark</h3>

The question mark symbol ? matches zero or one occurrence of the pattern left to it.

![image.png](attachment:image.png)

In [52]:
#Example
p = re.compile('ma?n')
print(p.findall("mn"))
print(p.findall("man"))
print(p.findall("maan"))
print(p.findall("main"))
print(p.findall("woman"))
print(re.search(p,"hi all!!!, how are you?"))

['mn']
['man']
[]
[]
['man']
None


<h3>{} - Braces</h3>

Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.

![image.png](attachment:image.png)

Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2 digits but not more than 4 digits

![image.png](attachment:image.png)

In [27]:
#Example
p = re.compile('[^0-5]{2,4}')
print(p.findall("ab123csde"))
print(p.findall("12 and 345673"))
print(p.findall("1 and 2"))
print(p.findall("15 abc 16"))

['ab', 'csde']
[' and', '67']
[' and']
[' abc']


<h3>| - Alternation</h3>

Vertical bar | is used for alternation (or operator).

Here, a|b match any string that contains either a or b.

![image.png](attachment:image.png)

In [30]:
#Example
p = re.compile('a|h')
print(p.findall("ab123csde"))
print(p.findall("harsha"))
print(p.findall("vinod"))
print(p.findall("nnnnnnn"))

['a', 'b', 'c', 'd']
['h', 'a', 'h', 'a']
['v', 'd']
[]


<h3>() - Group</h3>

Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that matches either a or b or c followed by xz.

![image.png](attachment:image.png)

In [33]:
#Example
p = re.compile('(a|b|[^e-z])xz')
print(p.findall("ab xz"))
print(p.findall("abxz"))
print(p.findall("axz cabxz"))
print(p.findall("edxz"))

[' ']
['b']
['a', 'b']
['d']


<h3>\ - Backslash</h3>

Backlash \ is used to escape various characters including all metacharacters. For example,

<p>\\$a match if a string contains \$ followed by a. Here, $ is not interpreted by a RegEx engine in a special way.</p>

If you are unsure if a character has special meaning or not, you can put \ in front of it. This makes sure the character is not treated in a special way.

In [36]:
#Example
p = re.compile('\$Python')
print(p.findall("IlikeZPython"))
print(p.findall("I like Python"))

[]
[]


<h2>Special Sequences</h2>

Special sequences make commonly used patterns easier to write. Here's a list of special sequences:

\A - Matches if the specified characters are at the start of a string.

![image.png](attachment:image.png)

\b - Matches if the specified characters are at the beginning or end of a word.

![image.png](attachment:image.png)


\B - Opposite of \b. Matches if the specified characters are not at the beginning or end of a word.

![image.png](attachment:image.png)

\d - Matches any decimal digit. Equivalent to [0-9]

![image.png](attachment:image.png)

\D - Matches any non-decimal digit. Equivalent to [^0-9]

![image.png](attachment:image.png)

\s - Matches where a string contains any whitespace character.

![image.png](attachment:image.png)

\S - Matches where a string contains any non-whitespace character. Equivalent to [^ \t\n\r\f\v].

![image.png](attachment:image.png)

\w - Matches any alphanumeric character (digits and alphabets). Equivalent to [a-zA-Z0-9_]. By the way, underscore _ is also considered an alphanumeric character.

![image.png](attachment:image.png)


\W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_]

![image.png](attachment:image.png)

<h2>re.findall()</h2>

The re.findall() method returns a list of strings containing all matches.

In [65]:
# Program to extract numbers from a string
import re
string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'
result = re.findall(pattern, string) 
print(result)

['12', '89', '34']


In [39]:
# Program to extract numbers from a string
import re
string = 'hello 12 hi 89. Howdy 34 a45 b65'
pattern = '\d+'
result = re.findall(pattern, string) 
print(result)

['12', '89', '34', '45', '65']


<h2>re.split()</h2>

The re.split method splits the string where there is a match and returns a list of strings where the splits have occurred.

In [66]:
import re
string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'
result = re.split(pattern, string) 
print(result)

['Twelve:', ' Eighty nine:', '.']


<h2>re.sub()</h2>
The method returns a string where matched occurrences are replaced with the content of replace variable.

Syntax: re.sub(pattern, replace, string)

In [40]:
# Program to remove all whitespaces
import re
# multiline string
string = 'abc 12\
de 23 \n f45 6'
# matches all whitespace characters
pattern = '\d+'
# empty string
replace = ''
new_string = re.sub(pattern, replace, string) 
print(new_string) #If the pattern is no found, re.sub() returns the original string.

abc de  
 f 


<h2>re.subn()</h2>

The re.subn() is similar to re.sub() expect it returns a tuple of 2 items containing the new string and the number of substitutions made.

In [41]:
# Program to remove all whitespaces
import re
# multiline string
string = 'abc 12\
de 23 \n f45 6'
# matches all whitespace characters
pattern = '\s+'
# empty string
replace = ''
new_string = re.subn(pattern, replace, string) 
print(new_string)

('abc12de23f456', 4)


<h2>re.search()</h2>

The re.search() method takes two arguments: a pattern and a string. The method looks for the first location where the RegEx pattern produces a match with the string.

If the search is successful, re.search() returns a match object; if not, it returns None

Syntax:match = re.search(pattern, str)

In [69]:
import re
string = "Python is fun"
# check if 'Python' is at the beginning
match = re.search('\APython', string)
if match:
  print("pattern found inside the string")
else:
  print("pattern not found")  

pattern found inside the string


<h2>match.group()</h2>

The group() method returns the part of the string where there is a match.

In [42]:
import re
string = '39801 356, 2102 1111'
# Three digit number followed by space followed by two digit number
pattern = '(\d{3}) (\d{2})'
# match variable contains a Match object.
match = re.search(pattern, string) 
if match:
  print(match.group())
else:
  print("pattern not found")

801 35


Here, match variable contains a match object.

Our pattern (\d{3}) (\d{2}) has two subgroups (\d{3}) and (\d{2}). You can get the part of the string of these parenthesized subgroups.

In [71]:
print(match.group(1))
print(match.group(2))


801
35


<h2>Using r prefix before RegEx</h2>

When r or R prefix is used before a regular expression, it means raw string. For example, '\n' is a new line whereas r'\n' means two characters: a backslash \ followed by n.

Backlash \ is used to escape various characters including all metacharacters. However, using r prefix makes \ treat as a normal character.

In [51]:
import re
string = '\n and \r are escape sequences.'
result = re.findall(R'[\n\r]', string) 
print(result)

['\n', '\r']


<h2>Some Examples of Regular Expressions</h2>

In [48]:
#Return the first word of a given string
result=re.findall(r'\w+ \n','Hello Everyone , How are you?')
print(result)

[]


In [75]:
#Return the first two character of each word
result=re.findall(r'\w\w','AV is largest Analytics community of India')
print(result)

['AV', 'is', 'la', 'rg', 'es', 'An', 'al', 'yt', 'ic', 'co', 'mm', 'un', 'it', 'of', 'In', 'di']


In [47]:
#Return date from given string
result=re.findall(r'\d{2}-\d{2}-\d{4}','Amit 34-3456 12-05-2007, XYZ 56-4532 11-11-2011, ABC 67-8945 12-01-2009')
print(result)

['12-05-2007', '11-11-2011', '12-01-2009']


In [45]:
#Validate a phone number (phone number must be of 10 digits and starts with 8 or 9) 
import re
li=['8999999999','999999-999','99999x9999']
for val in li:
 if re.match(r'[8-9]{1}[0-9]{9}',val) and len(val) == 10:
     print('yes')
 else:
     print('no')

yes
no
no


So, we have the string we intend to search. We see that we have ages that are integers 2-3 numbers in length. We could also expect digits that are just 1, under 10 years old. We probably wont be seeing any digits that are 4 in length, unless we're talking about biblical times or something

In [1]:
import re

exampleString = '''
Jessica is 15 years old, and Daniel is 27 years old.
Edward is 97 years old, and his grandfather, Oscar, is 102. 
'''
ages = re.findall(r'\d{1,3}',exampleString)
names = re.findall(r'[A-Z][a-z]*',exampleString)

print(ages)
print(names)


['15', '27', '97', '102']
['Jessica', 'Daniel', 'Edward', 'Oscar']


https://www.w3resource.com/python-exercises/re/

In [None]:
<h2>What is Regular Expression and how is it used?</h2>

<b>Regular expression is a sequence of character(s) mainly used to find and replace patterns in a string or file.</b><br>
<b>Regular expressions, also called regex, is a syntax or rather a language to search, extract and manipulate specific string patterns from a larger text. It is widely used in projects that involve text validation, NLP and text mining</b>

<ul><li>Regular Expressions are common in many programming languages like Java,R, Python etc..</li>
<li>In Python we can use the regular expressions using a module called <b>re</b></li>
<li>We should import the re module , before starting with regular expressions(import re)</li>
    <li>Regular expressions use two types of characters: Meta characters, Literals</li>
    <li>Meta characters : As the name suggests, these characters have a special meaning, similar to * in wild card.</li>
    <li>Literals (like a,b,1,2…)</li>
    <li> Example: match = re.search(pat, str)</li>
</ul>

#Example
import re
pattern = '^v....o$' # RegEx pattern
test_string = 'verzeo'
result = re.match(pattern, test_string) #re.match() function to search pattern within the test_string
print(result) #The method returns a match object if the search is successful. If not, it returns None
if result:
  print("Search successful.")
else:
  print("Search unsuccessful.")

<b>Explanation:</b><br>

The above code defines a RegEx pattern. The pattern is: any six letter word starting with v and ending with o.

A pattern defined using RegEx can be used to match against a string.


Ex: verkeo, vaboeo, vskoeo etc..

re.match() will return the match object it the pattern is found in the string, else none

<b>Advantages of regular expression</b>

Search a string (search and match)

Finding a string (findall)

Break string into a sub strings (split)

Replace part of a string (sub)

<h2>How to write a regular expression</h2>

<b>MetaCharacters</b><br>

Metacharacters are characters that are interpreted in a special way by a RegEx engine. 

Here's a list of metacharacters:    [] . ^ $ * + ? {} () \ |

<h3>[] - Square brackets</h3>

#Square brackets specifies a set of characters you wish to match.
#Here, below [har] will match if the string you are trying to match contains any of the h/a/r.
#You can also specify a range of characters using - inside square brackets.
#Example : 
#[a-e] is the same as [abcde].
#[1-4] is the same as [1234].
#[0-39] is the same as [01239].

#You can complement (invert) the character set by using caret ^ symbol at the start of a square-bracket.
#Exaple : [^abc] means any character except a or b or c.
#         [^0-9] means any non-digit character.

![image.png](attachment:image.png)

<b>Function compile()</b><br>

Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.

import re 
  
# compile() creates regular expression character class [a-e], 
# which is equivalent to [abcde]. 
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'. 
p = re.compile('[a-e]') 
  
# findall() searches for the Regular Expression and return a list upon finding 
print(p.findall("hi all!!!, how are you?")) 

p = re.compile('[^a-e]')
print(re.findall(p,"hi all!!!, how are you?")) 

<h3>. - Period</h3>

A period matches any single character (except newline '\n').

![image.png](attachment:image.png)

p = re.compile('...') 
print(p.findall("h"))
print(p.findall("hi")) 
print(p.findall("hi all")) 
print(re.search(p,"hi all!!!, how are you?")) 

p = re.compile('^..') 
print(p.findall("h"))
print(p.findall("hi")) 
print(p.findall("hi all")) 
print(re.search(p,"hi all!!!, how are you?")) 

<h3>^ - Caret</h3>

The caret symbol ^ is used to check if a string starts with a certain character.

![image.png](attachment:image.png)

#Example
p = re.compile('^h') 
print(p.findall("yahh")) 
print(p.findall("hi")) 
print(p.findall("all")) 
print(re.search(p,"hi all!!!, how are you?")) 

<h3>$ - Dollar</h3>

The dollar symbol $ is used to check if a string ends with a certain character.
    
![image.png](attachment:image.png)

#Example
p = re.compile('sh$')
print(p.findall("harsh"))
print(p.findall("hi"))
print(p.findall("smash"))
print(re.search(p,"hi all!!!, how are you?"))

<h3>* - Star</h3>

The star symbol * matches zero or more occurrences of the pattern left to it.

![image.png](attachment:image.png)

#Example
p = re.compile('ma*n')
print(p.findall("mn"))
print(p.findall("man"))
print(p.findall("maan"))
print(p.findall("main"))
print(p.findall("woman"))
print(re.search(p,"hi all!!!, how are you?"))

#Example
p = re.compile('ha*r')
print(p.findall("hasha"))
print(p.findall("arsh"))
print(p.findall("hahr"))
print(p.findall("sha"))
print(p.findall("haarsha"))
print(re.search(p,"hi all!!!, how are you?"))

<h3>+ - Plus</h3>

The plus symbol + matches one or more occurrences of the pattern left to it.

![image.png](attachment:image.png)

#Example
p = re.compile('ma+n')
print(p.findall("mn"))
print(p.findall("man"))
print(p.findall("maan"))
print(p.findall("main"))
print(p.findall("woman"))
print(re.search(p,"hi all!!!, how are you?"))

#Example
p = re.compile('ha+r')
print(p.findall("hasha"))
print(p.findall("arsh"))
print(p.findall("hahr"))
print(p.findall("sha"))
print(p.findall("harsha"))
print(re.search(p,"hi all!!!, how are you?"))

<h3>? - Question Mark</h3>

The question mark symbol ? matches zero or one occurrence of the pattern left to it.

![image.png](attachment:image.png)

#Example
p = re.compile('ma?n')
print(p.findall("mn"))
print(p.findall("man"))
print(p.findall("maan"))
print(p.findall("main"))
print(p.findall("woman"))
print(re.search(p,"hi all!!!, how are you?"))

<h3>{} - Braces</h3>

Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.

![image.png](attachment:image.png)

Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2 digits but not more than 4 digits

![image.png](attachment:image.png)

#Example
p = re.compile('[^0-5]{2,4}')
print(p.findall("ab123csde"))
print(p.findall("12 and 345673"))
print(p.findall("1 and 2"))
print(p.findall("15 abc 16"))

<h3>| - Alternation</h3>

Vertical bar | is used for alternation (or operator).

Here, a|b match any string that contains either a or b.

![image.png](attachment:image.png)

#Example
p = re.compile('a|h')
print(p.findall("ab123csde"))
print(p.findall("harsha"))
print(p.findall("vinod"))
print(p.findall("nnnnnnn"))

<h3>() - Group</h3>

Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that matches either a or b or c followed by xz.

![image.png](attachment:image.png)

#Example
p = re.compile('(a|b|[^e-z])xz')
print(p.findall("ab xz"))
print(p.findall("abxz"))
print(p.findall("axz cabxz"))
print(p.findall("edxz"))

<h3>\ - Backslash</h3>

Backlash \ is used to escape various characters including all metacharacters. For example,

<p>\\$a match if a string contains \$ followed by a. Here, $ is not interpreted by a RegEx engine in a special way.</p>

If you are unsure if a character has special meaning or not, you can put \ in front of it. This makes sure the character is not treated in a special way.

#Example
p = re.compile('\$Python')
print(p.findall("IlikeZPython"))
print(p.findall("I like Python"))

<h2>Special Sequences</h2>

Special sequences make commonly used patterns easier to write. Here's a list of special sequences:

\A - Matches if the specified characters are at the start of a string.

![image.png](attachment:image.png)

\b - Matches if the specified characters are at the beginning or end of a word.

![image.png](attachment:image.png)


\B - Opposite of \b. Matches if the specified characters are not at the beginning or end of a word.

![image.png](attachment:image.png)

\d - Matches any decimal digit. Equivalent to [0-9]

![image.png](attachment:image.png)

\D - Matches any non-decimal digit. Equivalent to [^0-9]

![image.png](attachment:image.png)

\s - Matches where a string contains any whitespace character.

![image.png](attachment:image.png)

\S - Matches where a string contains any non-whitespace character. Equivalent to [^ \t\n\r\f\v].

![image.png](attachment:image.png)

\w - Matches any alphanumeric character (digits and alphabets). Equivalent to [a-zA-Z0-9_]. By the way, underscore _ is also considered an alphanumeric character.

![image.png](attachment:image.png)


\W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_]

![image.png](attachment:image.png)

<h2>re.findall()</h2>

The re.findall() method returns a list of strings containing all matches.

# Program to extract numbers from a string
import re
string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'
result = re.findall(pattern, string) 
print(result)

# Program to extract numbers from a string
import re
string = 'hello 12 hi 89. Howdy 34 a45 b65'
pattern = '\d+'
result = re.findall(pattern, string) 
print(result)

<h2>re.split()</h2>

The re.split method splits the string where there is a match and returns a list of strings where the splits have occurred.

import re
string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'
result = re.split(pattern, string) 
print(result)

<h2>re.sub()</h2>
The method returns a string where matched occurrences are replaced with the content of replace variable.

Syntax: re.sub(pattern, replace, string)

# Program to remove all whitespaces
import re
# multiline string
string = 'abc 12\
de 23 \n f45 6'
# matches all whitespace characters
pattern = '\d+'
# empty string
replace = ''
new_string = re.sub(pattern, replace, string) 
print(new_string) #If the pattern is no found, re.sub() returns the original string.

<h2>re.subn()</h2>

The re.subn() is similar to re.sub() expect it returns a tuple of 2 items containing the new string and the number of substitutions made.

# Program to remove all whitespaces
import re
# multiline string
string = 'abc 12\
de 23 \n f45 6'
# matches all whitespace characters
pattern = '\s+'
# empty string
replace = ''
new_string = re.subn(pattern, replace, string) 
print(new_string)

<h2>re.search()</h2>

The re.search() method takes two arguments: a pattern and a string. The method looks for the first location where the RegEx pattern produces a match with the string.

If the search is successful, re.search() returns a match object; if not, it returns None

Syntax:match = re.search(pattern, str)

import re
string = "Python is fun"
# check if 'Python' is at the beginning
match = re.search('\APython', string)
if match:
  print("pattern found inside the string")
else:
  print("pattern not found")  

<h2>match.group()</h2>

The group() method returns the part of the string where there is a match.

import re
string = '39801 356, 2102 1111'
# Three digit number followed by space followed by two digit number
pattern = '(\d{3}) (\d{2})'
# match variable contains a Match object.
match = re.search(pattern, string) 
if match:
  print(match.group())
else:
  print("pattern not found")

Here, match variable contains a match object.

Our pattern (\d{3}) (\d{2}) has two subgroups (\d{3}) and (\d{2}). You can get the part of the string of these parenthesized subgroups.

print(match.group(1))
print(match.group(2))


<h2>Using r prefix before RegEx</h2>

When r or R prefix is used before a regular expression, it means raw string. For example, '\n' is a new line whereas r'\n' means two characters: a backslash \ followed by n.

Backlash \ is used to escape various characters including all metacharacters. However, using r prefix makes \ treat as a normal character.

import re
string = '\n and \r are escape sequences.'
result = re.findall(R'[\n\r]', string) 
print(result)

<h2>Some Examples of Regular Expressions</h2>

#Return the first word of a given string
result=re.findall(r'\w+ \n','Hello Everyone , How are you?')
print(result)

#Return the first two character of each word
result=re.findall(r'\w\w','AV is largest Analytics community of India')
print(result)

#Return date from given string
result=re.findall(r'\d{2}-\d{2}-\d{4}','Amit 34-3456 12-05-2007, XYZ 56-4532 11-11-2011, ABC 67-8945 12-01-2009')
print(result)

#Validate a phone number (phone number must be of 10 digits and starts with 8 or 9) 
import re
li=['8999999999','999999-999','99999x9999']
for val in li:
 if re.match(r'[8-9]{1}[0-9]{9}',val) and len(val) == 10:
     print('yes')
 else:
     print('no')

So, we have the string we intend to search. We see that we have ages that are integers 2-3 numbers in length. We could also expect digits that are just 1, under 10 years old. We probably wont be seeing any digits that are 4 in length, unless we're talking about biblical times or something

import re

exampleString = '''
Jessica is 15 years old, and Daniel is 27 years old.
Edward is 97 years old, and his grandfather, Oscar, is 102. 
'''
ages = re.findall(r'\d{1,3}',exampleString)
names = re.findall(r'[A-Z][a-z]*',exampleString)

print(ages)
print(names)


https://www.w3resource.com/python-exercises/re/