# Regular Expressions

A Python regular expression is a sequence of metacharacters that define a search pattern. We use these patterns in a string-searching algorithm to "find" or "find and replace" on strings. 

* The term "regular expressions" is frequently shortened to "RegEx".


##### Regular expressions are a powerful tool for pattern matching and text manipulation. Python provides a built-in module called re that allows you to work with regular expressions

In [1]:
# Importing the regular expression re package 
import re

Regular expressions are defined as strings and consist of a combination of characters and special symbols. For example, if you want to match the word "cat" in a text, you can define a pattern like this:

## Find all() function

In [2]:

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x) #The list contains the matches in the order they are found.


['ai', 'ai']


In [3]:
x = re.findall("Portugal", txt)
print(x) #If no matches are found, an empty list is returned:


[]


## The search() Function

In [4]:
#Search for the first white-space character in the string:
txt1 = "The rain in Spain and no rain in Rome"
x = re.search("\s", txt1)

print("The first white-space character is located in position:", x)


The first white-space character is located in position: <re.Match object; span=(3, 4), match=' '>


In [5]:
txt="Therain in Spain"
ax=re.search("\s",txt)
print(ax)

<re.Match object; span=(7, 8), match=' '>


## The split() Function

In [6]:
#Split at each white-space character:

txt = "The rain in Spain"
x = re.split("i", txt)
print(x)


['The ra', 'n ', 'n Spa', 'n']


In [7]:
x="The rain in Spain and no rain in Rome"

In [8]:
#controlling the number of occurrences by specifying the maxsplit parameter:
x = re.split("\s", x, 3)
print(x)


['The', 'rain', 'in', 'Spain and no rain in Rome']


## The sub() Function

In [9]:

# Replace every white-space character with the number 100:

txt = "The rain in Spain"
x = re.sub("Spain", "India", txt)
print(x)


The rain in India


In [10]:

# Replace every white-space character with the Dani:
x = re.sub("\s", "Dani", txt)
print(x)


TheDanirainDaniinDaniSpain


In [11]:
# Replace the first 2 occurrences:
x = re.sub("\s", "DJ", txt, 2)
print(x)


TheDJrainDJin Spain


In [12]:

# Replace the string with other string
x = re.sub("Spain", "India", txt)
print(x)


The rain in India


## The match() function

In [13]:
print(re.match('ani', 'Daniel'))
print(re.search('ani', 'Daniel'))

None
<re.Match object; span=(1, 4), match='ani'>


In [14]:
# the expression "w+" and "\W" will match the words starting with letter 'g' and thereafter, 
#anything which is not started with 'g' is not identified. 
#To check match for each element in the list or string, we run the forloop. \W(D\w+)


list = ["Dani","DJ Daniel Jadi","Daniel Don","ai","art"]

for i in list:
    z = re.match("(D\w+)", i)
    if z:
        print((z.groups()))



('Dani',)
('DJ',)
('Daniel',)


In [15]:
#match checks for a match only at the beginning of the string, 
#while search checks for a match anywhere in the string.


line = "Cats are smarter than dogs"

matching = re.match('Ca',line)
print(matching)



<re.Match object; span=(0, 2), match='Ca'>


In [16]:
searching = re.search('are', line)
print(searching)


<re.Match object; span=(5, 8), match='are'>
