# Regular Experession in Python

#### What is regular Expression?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern in a string.
RegEx can be used to check if a string contains the specified search pattern.
Regular Expression is independent of any programming language, and you can find general rules of regular expression on [this link.](https://en.wikipedia.org/wiki/Regular_expression)

> Regular expressions are used in search engines, search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK and in lexical analysis. Many programming languages provide regex capabilities either built-in or via libraries.

Python module re - provides regular expression matching operations similar to those found in Perl.
[Here is original documentation of re module.](https://docs.python.org/3/library/re.html)
Also, gentle introduction to usage of Regular expression in python is available [Here.](https://docs.python.org/3/howto/regex.html#regex-howto)


* Refereces


https://github.com/python/cpython/blob/3.8/Lib/re.py

https://docs.python.org/3/library/re.html


In [None]:
import re

re module exports the following functions:

    match     Match a regular expression pattern to the beginning of a string.
    fullmatch Match a regular expression pattern to all of a string.
    search    Search a string for the presence of a pattern.
    sub       Substitute occurrences of a pattern found in a string.
    subn      Same as sub, but also return the number of substitutions made.
    split     Split a string by the occurrences of a pattern.
    findall   Find all occurrences of a pattern in a string.
    finditer  Return an iterator yielding a Match object for each match.
    compile   Compile a pattern into a Pattern object.
    purge     Clear the regular expression cache.
    escape    Backslash all non-alphanumerics in a string.

In [None]:
''' 
re.match(pattern, string, flags=0)

If zero or more characters at the beginning of string match the regular expression pattern, 
return a corresponding match object. 
Return None if the string does not match the pattern; 

Note that even in MULTILINE mode, re.match() will only match at the 
beginning of the string and not at the beginning of each line.
'''
sample_string = "If you have submitted your work, call me."
print(re.match("If", sample_string))
print(re.match("work", sample_string))

<_sre.SRE_Match object; span=(0, 2), match='If'>
None


In [None]:
'''
re.fullmatch(pattern, string, flags=0)

If the whole string matches the regular expression pattern, return a corresponding match object. 
Return None if the string does not match the pattern; 

'''
sample2 = "Hello World"
print(re.fullmatch("work", sample_string))
print(re.fullmatch("Hello World", sample2))

None
<_sre.SRE_Match object; span=(0, 11), match='Hello World'>


In [None]:
'''
re.search(pattern, string, flags=0)

Scan through string looking for the first location 
where the regular expression pattern produces a match, and 
return a corresponding match object. 
Return None if no position in the string matches the pattern; 

'''

print(re.search("if", "If if if"))
print(re.search("you", sample_string))
print(re.search("BYE", "Bye BYE bye"))

<_sre.SRE_Match object; span=(3, 5), match='if'>
<_sre.SRE_Match object; span=(3, 6), match='you'>
<_sre.SRE_Match object; span=(4, 7), match='BYE'>


In [None]:
'''
re.sub(pattern, repl, string, count=0, flags=0)
  
  Return the string obtained by replacing pattern in string by the repl. 
  If the pattern isn’t found, string is returned unchanged. 
  repl can be a string or a function; 
  if it is a string, any backslash escapes in it are processed. 
  That is, \n is converted to a single newline character. 
  Unknown escapes of ASCII letters are reserved for future use and treated as errors. 
  Other unknown escapes such as \& are left alone. 
  Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. 
'''

print(re.sub("if","my" , "if else if IFl", 1))
print(re.sub("call","text" , sample_string))
print(re.sub(",","\n" , sample_string))

my else if IFl
If you have submitted your work, text me.
If you have submitted your work
 call me.


In [None]:
'''
re.subn(pattern, repl, string, count=0, flags=0)

Perform the same operation as sub(), but return a tuple (new_string, number_of_subs_made).
'''

print(re.subn("if","my" , "if else if IFl", 1))
print(re.subn("call","text" , sample_string))
print(re.subn(",","!" , sample_string))

('my else if IFl', 1)
('If you have submitted your work, text me.', 1)
('If you have submitted your work! call me.', 1)


In [None]:
'''
re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. 
If capturing parentheses are used in pattern, 
then the text of all groups in the pattern are also returned as part of the resulting list. 
If maxsplit is nonzero, at most maxsplit splits occur, 
and the remainder of the string is returned as the final element of the list.
'''

print(re.split("f","if else if IFl",1))
print(re.split("y", sample_string,1))
print(re.split(" ", sample2,1))

['i', ' else if IFl']
['If ', 'ou have submitted your work, call me.']
['Hello', 'World']


In [None]:
'''
re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. 
The string is scanned left-to-right, and matches are returned in the order found. 
If one or more groups are present in the pattern, return a list of groups; 
this will be a list of tuples if the pattern has more than one group. 
Empty matches are included in the result.
'''

print(re.findall("se","if else if and else if ok"))
print(re.findall("you", sample_string))
print(re.findall("ee", "See you soon, byee!"))

['se', 'se']
['you', 'you']
['ee', 'ee']


In [None]:
'''
re.finditer(pattern, string, flags=0)

Return an iterator yielding match objects over all non-overlapping matches 
for the RE pattern in string. 
The string is scanned left-to-right, and matches are returned in the order found. 
'''
print(re.finditer("if","if else if and else if ok"))
print(re.finditer("you", sample_string))
rex = re.finditer("if","if else if and else if ok")
rex2 = re.finditer("you", sample_string)
for i in rex:
  print(i)
for i in rex2:
  print(i)

<callable_iterator object at 0x7f19cdb91d68>
<callable_iterator object at 0x7f19cdb91dd8>
<_sre.SRE_Match object; span=(0, 2), match='if'>
<_sre.SRE_Match object; span=(8, 10), match='if'>
<_sre.SRE_Match object; span=(20, 22), match='if'>
<_sre.SRE_Match object; span=(3, 6), match='you'>
<_sre.SRE_Match object; span=(22, 25), match='you'>


In [None]:
'''
re.compile(pattern, flags=0)

Compile a regular expression pattern into a regular expression object,
 which can be used for matching using its match(), search() and other methods.

The expression’s behaviour can be modified by specifying a flags value. 
Flag Values can be any of the re flag variables, combined using bitwise OR (the | operator).

Note: Using re.compile() and saving the resulting regular expression object for 
reuse is more efficient when the expression will be used several times in a single program.
'''
string = "a is good at a"
pattern = "^a[a-z ]*a$" # Start with a, followed by any small character or space zero or more time, 
                        # and should end with small a" 
patt = re.compile(pattern)
result = patt.match(string)
print(result)
result = re.match(pattern, string)
print(result)

string1 = "s is as bad as s"
pattern1 = "^s[a-z ]*s$"
patt = re.compile(pattern1)
result = patt.match(string1)
print(result)
result = re.match(pattern1, string1)
print(result)

<_sre.SRE_Match object; span=(0, 14), match='a is good at a'>
<_sre.SRE_Match object; span=(0, 14), match='a is good at a'>
<_sre.SRE_Match object; span=(0, 16), match='s is as bad as s'>
<_sre.SRE_Match object; span=(0, 16), match='s is as bad as s'>


In [None]:
'''
re.purge()
  Clear the regular expression cache.
'''

'\nre.purge()\n  Clear the regular expression cache.\n'

In [None]:
'''
re.escape(pattern)
Escape special characters in pattern. 
This is useful if you want to match an arbitrary literal string 
that may have regular expression metacharacters in it.
'''
print(re.escape("h.(h)"))
print(re.escape('http://www.python.org'))

h\.\(h\)
http\:\/\/www\.python\.org
