Essentially, a Python regular expression is a sequence of characters, that defines a search pattern. We can then use this pattern in a string-searching algorithm to “find” or “find and replace” on strings. You would’ve seen this feature in Microsoft Word as well.

## RegEx Module

Python has a built-in package called re, which can be used to work with Regular Expressions.

When you have imported the re module, you can start using regular expressions:

In [1]:
import re

## Functions

The re module offers a set of functions that allows us to search a string for a match:

- Match : Returns a match object if there is a match at the start of the string,otherwise None
- Search : Returns a match object if there is a match anywhere in the string, otherwise None
- Findall : Returns a list containing all matches.
- Sub : Replaces one or many matches with a string.
- Split : Returns a list where the string has been split at each match.

#### Match

In [11]:
pattern = "@"
string = "hello@gmail.com"
print(re.match(pattern,string))

None


In [13]:
string1 = "@hellopython"
print(re.match(pattern,string1))

<re.Match object; span=(0, 1), match='@'>


#### Search

In [15]:
print(re.search(pattern,string))
print(re.search(pattern,string1))

<re.Match object; span=(5, 6), match='@'>
<re.Match object; span=(0, 1), match='@'>


#### Findall

In [16]:
string = "welcome to python class welcome to python class"
pattern ="class"

print(re.findall(pattern,string))

['class', 'class']


#### Sub

In [27]:
string = "welcome to python class welcome to python class "
pattern ="class"

re.sub(pattern,"CLASS",string)

'welcome to python CLASS welcome to python CLASS '

#### Split

In [28]:
string = "welcome to python class"

print(re.split(" ",string))

['welcome', 'to', 'python', 'class']


## Meta Characters

Metacharacters are characters with a special meaning:

Basic regular meta characters are:
 
 [ ] , ^ , * , ? , . , $ , {},|,() etc.
 

 - [a-z] : Any character between a to z.(dash used to represent between)
 - [^ ]  : Any character that isn’t between the brackets. This is inverse of [ ]
 - ^ : This will match from the beginning of the string. `^spam` means the string must begin with spam.
 - $ : This represents the end of the string or line (the opposite end of the string/line from “^”).`spam$` means the string must ends with spam.
 -  `*` : Zero or more occurences
 -  `+` : One or more occurences
 - {} : Exactly specified number of occurences.`{n}` exactly n number of occurences.`{n,}` n or more number of occurences.`{,m}` zero to m occurences.
 - | : Either or

#### 1.[ ]

In [30]:
string = "hello1258@hai"

In [31]:
re.findall('[a-z]',string)

['h', 'e', 'l', 'l', 'o', 'h', 'a', 'i']

In [40]:
re.findall('[abcdef]',string)

['e', 'a']

In [65]:
str = "The rain in Spain"

#Find all lower case characters alphabetically between "a" and "m":

x = re.findall("[a-m]", str)
print(x)


['h', 'e', 'a', 'i', 'i', 'a', 'i']


#### 2. [^ ]

In [41]:
string

'hello1258@hai'

In [33]:
re.findall('[^a-z]',string)

['1', '2', '5', '8', '@']

In [42]:
re.findall('[^12345]',string)

['h', 'e', 'l', 'l', 'o', '8', '@', 'h', 'a', 'i']

In [66]:
str = "The rain in Spain"

#Find all lower case characters EXCEPT alphabetically between "a" and "m":

x = re.findall("[^a-m]", str)
print(x)

['T', ' ', 'r', 'n', ' ', 'n', ' ', 'S', 'p', 'n']


#### 3.^

In [59]:
str = "hello world"

#Check if the string starts with 'hello':

x = re.findall("^hello", str)
if (x):
  print("Yes, the string starts with 'hello'")
else:
  print("No match")


Yes, the string starts with 'hello'


#### 4.$

In [58]:
str = "hello world"

#Check if the string ends with 'world':

x = re.findall("world$", str)
if (x):
  print("Yes, the string ends with 'world'")
else:
  print("No match")

Yes, the string ends with 'world'


#### 5.*

In [60]:
str = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 0 or more "x" characters:

x = re.findall("aix*", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


['ai', 'ai', 'ai', 'ai']
Yes, there is at least one match!


#### 6.+

In [63]:
str = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 1 or more "x" characters:

x = re.findall("aix+", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


[]
No match


#### 7. |

In [64]:
str = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")

['falls']
Yes, there is at least one match!


#### 8. { }

In [67]:
str = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "a" followed by exactly two "l" characters:

x = re.findall("al{2}", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


['all']
Yes, there is at least one match!


## Special Sequences

A special sequence is a **`\`** followed by one of the characters in the list below, and has a special meaning:

- \d : Returns a match where the string contains digits
- \w : Returns a match where the string contains Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
- \s : Returns a match where the string contains a white space character
- \D, \W, and \S : anything except a digit, word, or space acter, respectively.

#### 1.\d

In [69]:
str = "The rain in Spain"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


[]
No match


#### 2.\w

In [68]:
str = "The rain in Spain"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):

x = re.findall("\w", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


#### 3.\s

In [70]:
str = "The rain in Spain"

#Return a match at every white-space character:

x = re.findall("\s", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")

[' ', ' ', ' ']
Yes, there is at least one match!


#### 4.\D

In [72]:
str = "The rain in Spain at 25689"

#Return a match at every no-digit character:

x = re.findall("\D", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


['T', 'h', 'e', ' ', 'r', 'a', 'i', 'n', ' ', 'i', 'n', ' ', 'S', 'p', 'a', 'i', 'n', ' ', 'a', 't', ' ']
Yes, there is at least one match!


#### 5.\W

In [75]:
str = "The rain in Spain"

#Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

x = re.findall("\W", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


[' ', ' ', ' ']
Yes, there is at least one match!


#### 6.\S

In [76]:
str = "The rain in Spain"

#Return a match at every NON white-space character:

x = re.findall("\S", str)

print(x)

if (x):
  print("Yes, there is at least one match!")
else:
  print("No match")


['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


# Matching Regrex Objects

In [80]:
mobile_num = re.compile('\d\d\d-\d\d\d-\d\d\d\d')
mo = mobile_num.search('My number is 415-555-4242.')

if mo:
    print("Mobile number matched")
else:
    print("Mobile number not matched")

Mobile number matched


In [89]:
mo.group()

'415-555-4242'

#### Matching Multiple Groups with the Pipe

The **|** character is called a pipe. You can use it anywhere you want to match one of many expressions. For
example, the regular expression r'Batman|Tina Fey' will match either 'Batman' or 'Tina Fey'.

In [94]:
hero = re.compile(r'Batman|Tine Fey')
mo1 = hero.search("Batman and Tine Fey")

if mo1:
    print("Hero found",mo1.group())
else:
    print("Hero Not Found")

Hero found Batman


In [95]:
hero = re.compile(r'Batman|Tine Fey')
mo2 = hero.search("Tine Fey and Batman")

if mo2:
    print("Hero found",mo2.group())
else:
    print("Hero Not Found")

Hero found Tine Fey


You can also use the pipe to match one of several patterns as part of your regex:

- ( ) : Capture and Group

In [96]:
bat_regex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = bat_regex.search('Batmobile lost a wheel')
mo.group()

'Batmobile'

#### Optional Matching with the Question Mark

The **?** character flags the group that precedes it as an optional part of the pattern.

In [98]:
bat_reg = re.compile(r'Bat(wo)?man')
mo = bat_reg.search('The adventures Batman')
mo.group()

'Batman'