# Regular Expressions

*`Regular Expression is a declarative mechanism to represent a group of strings according to particular pattern like email or phone no.`*

In [1]:
import re

### 1. compile()

`re module contains compile function to compile a pattern into a regex object`

In [2]:
pattern=re.compile("sg")

In [3]:
print(pattern,type(pattern),sep="\t")


re.compile('sg')	<class 're.Pattern'>


### 2. finditer()

`returns a iterator object which yeilds Match object for every Match`

In [4]:
#method-1
pattern=re.compile("s3tv")
matcher=pattern.finditer("sg@s3tv,twi@s3tv,var@s3tv,sam@s3tv,chu@s3tv")
for match in matcher:
    print(match.start(),match.end(),match.group(),sep="\t")

3	7	s3tv
12	16	s3tv
21	25	s3tv
30	34	s3tv
39	43	s3tv


In [5]:
#method-2
matcher=re.finditer("s3tv","sg@s3tv,twi@s3tv,var@s3tv,sam@s3tv,chu@s3tv")
for match in matcher:
    print(match.start(),match.end(),match.group(),sep="\t")

3	7	s3tv
12	16	s3tv
21	25	s3tv
30	34	s3tv
39	43	s3tv


## Character classes

1. [abc]===>either a or b or c
2. [^abc]===>except a and b and c
3. [a-z]===>any lower case alphabet symbol
4. [A-Z]===>any Upper case alphabet symbol
5. [0-9]===>any digit from 0-9
6. [a-zA-Z0-9]===>Any chracter except special characters

In [6]:
x=["[abc]","[^abc]","[a-z]","[A-Z]","[0-9]","[a-zA-Z0-9]","[^a-zA-Z0-9]"]
for i in range(len(x)):
    matcher=re.finditer(x[i],"a7b@k9zAB")
    print(x[i])
    for match in matcher:
        print(match.start(),match.group(),sep="...")
    print()    

[abc]
0...a
2...b

[^abc]
1...7
3...@
4...k
5...9
6...z
7...A
8...B

[a-z]
0...a
2...b
4...k
6...z

[A-Z]
7...A
8...B

[0-9]
1...7
5...9

[a-zA-Z0-9]
0...a
1...7
2...b
4...k
5...9
6...z
7...A
8...B

[^a-zA-Z0-9]
3...@



## Pre defined Character classes

* \s-->Space character
* \S-->Except Space Character
* \d-->Any Digit
* \D-->Except Digit
* \w-->Any word Character
* \W-->Except any word character
* .-->Any Character

In [7]:
x=["\s","\S","\d","\D","\w","\W","."]
for i in range(len(x)):
    matcher=re.finditer(x[i],"a7b @k9zAB")
    print(x[i])
    for match in matcher:
        print(match.start(),match.group(),sep="...")
    print()

\s
3... 

\S
0...a
1...7
2...b
4...@
5...k
6...9
7...z
8...A
9...B

\d
1...7
6...9

\D
0...a
2...b
3... 
4...@
5...k
7...z
8...A
9...B

\w
0...a
1...7
2...b
5...k
6...9
7...z
8...A
9...B

\W
3... 
4...@

.
0...a
1...7
2...b
3... 
4...@
5...k
6...9
7...z
8...A
9...B



## Quantifiers

* a ==> Exactly one 'a'
* a+ ==> Atleast one 'a'
* a* ==> Any no. of 'a' (0...)
* a? ==> Atmost one 'a'
* a{m} ==> Exactly m no. of 'a'
* a{m,n} ==> min(m) and max(n) no. of 'a' 

In [8]:
x=["a","a+","a*","a?","a{2}","a{2,4}"]
for i in range(len(x)):
    matcher=re.finditer(x[i],"babaabaaabaaaabaaabaabab")
    print(x[i])
    for match in matcher:
        print(match.start(),match.group(),sep="...")
    print()

a
1...a
3...a
4...a
6...a
7...a
8...a
10...a
11...a
12...a
13...a
15...a
16...a
17...a
19...a
20...a
22...a

a+
1...a
3...aa
6...aaa
10...aaaa
15...aaa
19...aa
22...a

a*
0...
1...a
2...
3...aa
5...
6...aaa
9...
10...aaaa
14...
15...aaa
18...
19...aa
21...
22...a
23...
24...

a?
0...
1...a
2...
3...a
4...a
5...
6...a
7...a
8...a
9...
10...a
11...a
12...a
13...a
14...
15...a
16...a
17...a
18...
19...a
20...a
21...
22...a
23...
24...

a{2}
3...aa
6...aa
10...aa
12...aa
15...aa
19...aa

a{2,4}
3...aa
6...aaa
10...aaaa
15...aaa
19...aa



**Note**:- 
* ^x It will check whether target string starts with x or not
* x$ It will check whether target string ends with x or not

## Important Functions of re Module

### 1. match() 

`We can use match function to check the given pattern at beginning of target string if match is available then we will get                               Match object otherwise we get None `

In [9]:
s=["sg@s3tv","sg@s3tv@sg","s3tv@sg"] #if string starts with pattern then found else not found
sub="sg"

for i in range(len(s)):
    m=re.match(sub,s[i])
    print(s[i],sub,sep="||")
    if m!=None:
        print(m.start(),m.end())
    else:
        print("Not Found")
    print()    

sg@s3tv||sg
0 2

sg@s3tv@sg||sg
0 2

s3tv@sg||sg
Not Found



### 2. fullmatch()

`We can Use fullmatch function to match a pattern to all of target string i.e complete string should be matched to the pattern`

In [10]:
sub=["ab","aabab","ababab"]
for i in range(len(sub)):
    m=re.fullmatch(sub[i],"ababab")
    print("ababab",sub[i],sep="||")
    if m!=None:
        print("Full Matched")
    else:
        print("Not Full Matched")
    print()  

ababab||ab
Not Full Matched

ababab||aabab
Not Full Matched

ababab||ababab
Full Matched



  ### 3. search()        

`We can use search function to search the given pattern in the string present anywhere`    

In [11]:
s=["sg@s3tv","sg@s3tv@sg","s3tv@sg","tv@ss"] #if string starts with pattern then found else not found
sub="sg"

for i in range(len(s)):
    m=re.search(sub,s[i])
    print(s[i],sub,sep="||")
    if m!=None:
        print(m.start(),m.end())
    else:
        print("Not Found")
    print() 

sg@s3tv||sg
0 2

sg@s3tv@sg||sg
0 2

s3tv@sg||sg
5 7

tv@ss||sg
Not Found



### 4. findall

`To find all the occurrences of a given pattern return in the form of list`

In [12]:
l=re.findall("\d{3}","QWERTY12345")
print(l)

['123']


### 5. finditer()

`returns a iterator object which yeilds Match object for every Match`

In [13]:
matcher=re.finditer("s3tv","sg@s3tv,twi@s3tv,var@s3tv,sam@s3tv,chu@s3tv")
for match in matcher:
    print(match.start(),match.end(),match.group(),sep="\t")

3	7	s3tv
12	16	s3tv
21	25	s3tv
30	34	s3tv
39	43	s3tv


### 6. sub()

`sub means substitution or replacement
re.sub(regex,replacement,targetstring)`

In [14]:
s=re.sub("[a-z]","@","qwerty123QWERTY")
print(s)

@@@@@@123QWERTY


### 7. subn()

`same as sub() it also return no. of replacements`

In [15]:
s=re.subn("[a-z]","@","qwerty123QWERTY")
print(s)

('@@@@@@123QWERTY', 6)


### 8. split()

`If we want to split the given target string according to a particular pattern then use split()`

In [16]:
l=re.split(",","sg,twi,var,sam,chu")
print(l)

['sg', 'twi', 'var', 'sam', 'chu']


In [17]:
l=re.split(".","sg.twi@var.sam@chu")# it splits on all chracters
print(l)

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']


In [18]:
l=re.split("\.","sg.twi@var.sam.chu")# . should be used as escape sequence
print(l)

['sg', 'twi@var', 'sam', 'chu']


### ^ symbol

In [19]:
s="Learning Python is Very Easy" 
res=re.search("^Learn",s)
if res != None:
    print("Target String starts with Learn")   
else:   
    print("Target String Not starts with Learn")   

Target String starts with Learn


### $ symbol

In [20]:
s="Learning Python is Very Easy"
res=re.search("Easy$",s)
if res != None:
    print("Target String ends with Easy")
else:
    print("Target String Not ends with Easy") 

Target String ends with Easy


In [21]:
s="Learning Python is Very Easy"
res=re.search("easy$",s,re.IGNORECASE)
if res != None:
    print("Target String ends with Easy")
else:
    print("Target String Not ends with Easy") 

Target String ends with Easy


### Write a program to check whether a identifier follows :

1. The allowed characters are      a-z,A-Z,0-9,#
2. The first character should be a lower case alphabet symbol from a to k
3. The second character should be a digit divisible by 3 
4. The length of identifier should be atleast 2. 

In [22]:
s=["ak47","m416","q3z","k3","k3#1"]
sub="[a-k][0369][a-zA-Z0-9#]*"
for i in range(len(s)):
    m=re.fullmatch(sub,s[i])
    print(s[i],end="\t")
    if m!=None:
        print("Yes")
    else:
        print("No") 
    

ak47	No
m416	No
q3z	No
k3	Yes
k3#1	Yes


### Write a program to check whether a given no. is valid phone no.

In [23]:
s=["94123456","1234567891","9876543210","+919557125328"]
sub="(0|\+91)?[7-9]\d{9}"
for i in range(len(s)):
    m=re.fullmatch(sub,s[i])
    print(s[i],end="\t")
    if m!=None:
        print("Yes")
    else:
        print("No")


94123456	No
1234567891	No
9876543210	Yes
+919557125328	Yes


### Write a program to check whether a given gmail id. is valid ?

In [25]:
s=input("Enter Mail id: ")
m=re.fullmatch("\w[a-zA-Z0-9_.]*@gmail[.]com",s)
if m!=None:
    print("Valid Mail Id")
else:
    print("Invalid Mail id") 

Enter Mail id: shubham9@gmail.com
Valid Mail Id
