# Regular Expressions in Python
### Table of contents:
1. **match()**
2. **search()**
3. **findall()**
4. **finditer()**
5. **sub()**
6. **split()**
7. **Groups**

In [1]:
import re

## 1. match()
Checks for a match only at the beginning of the string

In [2]:
# Defining a string
string = "Tiger is the national animal of India. Tiger lives in Forest."

# Defining the pattern
pattern = "Tiger"

# Running match() on a string
result = re.match(pattern, string)

# Printing the result
print(result)

<_sre.SRE_Match object; span=(0, 5), match='Tiger'>


In [3]:
# Defining a string
string = "Tiger is the national animal of India. Tiger lives in Forest."

# Defining the pattern
pattern = "Tiger"

# Extracting String from a match object
result = re.match(pattern, string).group()

# Printing the result
print(result)

Tiger


In [4]:
string = "The national animal of India is Tiger. Tiger lives in Forest."
pattern = "Tiger"

# Checking for match
result = re.match(pattern, string)
print(result)

None


## 2. search()
Locates a sub-string matching the RegEx pattern anywhere in the string

In [5]:
string = "The national animal of India is Tiger. Tiger lives in Forest."
pattern = "Tiger"

# Searching a substring using search()
result = re.search(pattern, string)
print(result)

<_sre.SRE_Match object; span=(32, 37), match='Tiger'>


In [6]:
string = "The national animal of India is Tiger. Tiger lives in Forest."
pattern = "Tiger"

# Extracting searched string
result = re.search(pattern, string).group()
print(result)

Tiger


## 3. findall()
Finds all the sub-strings matching the RegEx pattern

In [7]:
string = "The national animal of India is Tiger. Tiger lives in Forest."
pattern = "Tiger"

# Using findall() on a string
result = re.findall(pattern, string)
print(result)

['Tiger', 'Tiger']


In [9]:
# Defining the string
text = "India got freedom on 15-08-1947, and it is celebrated as Independence Day.\
        Indian Constitution came into effect on 26-01-1950, and it is celebrated as Republic Day."

# Defining the pattern
date_pattern = r'\d{2}-\d{2}-\d{4}'

# Extracting dates using findall()
re.findall(date_pattern, text)

['15-08-1947', '26-01-1950']

## 4. finditer()
Similar to findall() but returns an iterator

In [10]:
string = "The national animal of India is Tiger. Tiger lives in Forest."
pattern = "Tiger"

# Using finditer() on a string
result = re.finditer(pattern, string)
print(result)

# Iterating over the iterator
for m in result:
    # Printing match object
    print(m)
    # Printing starting and ending index with matched substring
    print('Start:',m.start(),' End:',m.end(),' Sub-string:',m.group())

<callable_iterator object at 0x7f79f1704b00>
<_sre.SRE_Match object; span=(32, 37), match='Tiger'>
Start: 32  End: 37  Sub-string: Tiger
<_sre.SRE_Match object; span=(39, 44), match='Tiger'>
Start: 39  End: 44  Sub-string: Tiger


## 5. sub()
Searches for a substring and replaces it with another string

In [11]:
text="Analytics Vidhya is largest Analytics community of India."

# Replacing a substring using sub()
result=re.sub('India', 'the World',text)
print(result)

Analytics Vidhya is largest Analytics community of the World.


## 6. split()
Split the text by the given RegEx Pattern

In [12]:
line = "I have a big test tomorrow; I can't go out tonight."

# Splitting a string into multiple substrings
re.split(r'[;]', line)

['I have a big test tomorrow', " I can't go out tonight."]

## 7. Groups

In [13]:
# Running a simple pattern on some text
string="Ajay credited $500 to your account on 13-08-2020.\
      Anmol debited $1,700 from your account on 14-08-2020.\
      Alex debited $100 on 16-08-2020 from your account."

pattern="[\w]+ [\w]+ \$[\d,]+ [a-zA-z ]+ \d{2}-\d{2}-\d{4}"

result=re.findall(pattern,string)

print(result)

['Ajay credited $500 to your account on 13-08-2020', 'Anmol debited $1,700 from your account on 14-08-2020', 'Alex debited $100 on 16-08-2020']


In [14]:
string="Ajay credited $500 to your account on 13-08-2020.\
      Anmol debited $1,700 from your account on 14-08-2020.\
      Alex debited $100 on 16-08-2020 from your account."

# Creating groups in the previous pattern
pattern="([\w]+) ([\w]+) (\$[\d,]+) [a-zA-z ]+ (\d{2}-\d{2}-\d{4})"

result=re.findall(pattern,string)

print(result)

[('Ajay', 'credited', '$500', '13-08-2020'), ('Anmol', 'debited', '$1,700', '14-08-2020'), ('Alex', 'debited', '$100', '16-08-2020')]


In [15]:
import pandas as pd

# Creating a dataframe
df=pd.DataFrame(result,columns=['Name','Type','Amount','Date'])
df

Unnamed: 0,Name,Type,Amount,Date
0,Ajay,credited,$500,13-08-2020
1,Anmol,debited,"$1,700",14-08-2020
2,Alex,debited,$100,16-08-2020


In [16]:
# Using finditer() for getting match objects
string="Ajay credited $500 to your account on 13-08-2020.\
      Anmol debited $1,700 from your account on 14-08-2020.\
      Alex debited $100 on 16-08-2020 from your account."

pattern="([\w]+) ([\w]+) (\$[\d,]+) [a-zA-z ]+ (\d{2}-\d{2}-\d{4})"

result=re.finditer(pattern,string)

# Accessing groups separately
for i in result:
    print(i.group(0),'=>',i.group(1),'=>',i.group(2),'=>',i.group(3),'=>',i.group(4))

Ajay credited $500 to your account on 13-08-2020 => Ajay => credited => $500 => 13-08-2020
Anmol debited $1,700 from your account on 14-08-2020 => Anmol => debited => $1,700 => 14-08-2020
Alex debited $100 on 16-08-2020 => Alex => debited => $100 => 16-08-2020


**Note:** Syntax for naming groups: `(?P<Group Name>Pattern)`

In [18]:
string="Ajay credited $500 to your account on 13-08-2020.\
      Anmol debited $1,700 from your account on 14-08-2020.\
      Alex debited $100 on 16-08-2020 from your account."

# Naming Groups
pattern="(?P<Name>[\w]+) (?P<Type>[\w]+) (?P<Amount>\$[\d,]+) [a-zA-z ]+ (?P<Date>\d{2}-\d{2}-\d{4})"

result=list(re.finditer(pattern,string))

In [19]:
# Accessing data by group names
for i in result:
    print(i.group('Name'),'=>',i.group('Amount'),'=>',i.group('Date'),'=>',i.group('Type'))

Ajay => $500 => 13-08-2020 => credited
Anmol => $1,700 => 14-08-2020 => debited
Alex => $100 => 16-08-2020 => debited


In [20]:
# Printing data with group names
for i in result:
    print(i.groupdict())

{'Name': 'Ajay', 'Type': 'credited', 'Amount': '$500', 'Date': '13-08-2020'}
{'Name': 'Anmol', 'Type': 'debited', 'Amount': '$1,700', 'Date': '14-08-2020'}
{'Name': 'Alex', 'Type': 'debited', 'Amount': '$100', 'Date': '16-08-2020'}
