# **RegEx** | PYTHON REGULAR EXPRESSIONS

A regular expression, regex or regexp (sometimes called a rational expression) is a sequence of characters that define a search
pattern. Usually this pattern is used by string searching algorithms for "find" or "find and replace" operations on strings, or for input
validation. It is a technique that developed in theoretical computer science and formal language theory.

To learn more about RegEx please follow this link : https://docs.python.org/3/howto/regex.html

### 1. REGULAR EXPRESSIONS IN PYTHON

In [1]:
import re

#Let us start with an example
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
print(x.string == txt)

True


### 2. COMPILING REGULAR EXPRESSIONS

Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern
matches or performing string substitutions.

In [2]:
compiler = re.compile('[a-z]+')
compiler

re.compile(r'[a-z]+', re.UNICODE)

In [3]:
compiler = re.compile('[a-z]+', re.IGNORECASE)
compiler

re.compile(r'[a-z]+', re.IGNORECASE|re.UNICODE)

In [4]:
compiler.match('Schoolofai')

<_sre.SRE_Match object; span=(0, 10), match='Schoolofai'>

### 3. THE MATCH FUNCTION

Once you have an object representing a compiled regular expression, what do you do with it? Pattern objects have several methods
and attributes. Only the most significant ones will be covered here; consult the re docs for a complete listing.

In [5]:
line = "Lets teach the machines to learn"

matchObject = re.match( r'(.*) the (.*?) .*', line, re.M|re.I)

if matchObject:
    print ("matchObj.group() : ", matchObject.group())
    print ("matchObj.group(1) : ", matchObject.group(1))
    print ("matchObj.group(2) : ", matchObject.group(2))
else:
    print ("No match!!")

matchObj.group() :  Lets teach the machines to learn
matchObj.group(1) :  Lets teach
matchObj.group(2) :  machines


### 3. THE SEARCH FUNCTION

The search() function searches the string for a match, and returns a Match object if there is a match.
If there is more than one match, only the first occurrence of the match will be returned.

In [6]:
string = "The School of AI"
x = re.search("\s", string)

print("The first white-space character is located in position:", x.start())

The first white-space character is located in position: 3


### 4. MATCH vs SEARCH

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the
string, while search checks for a match anywhere in the string (this is what Perl does by default).

In [7]:
line = "Cats are smarter than dogs";
matchObj = re.match( r'dogs', line, re.M|re.I)

if matchObj:
    print ("match --> matchObj.group() : ", matchObj.group())
else:
    print ("match --> No match!!")

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
    print ("search --> searchObj.group() : ", searchObj.group())
else:
    print ("search --> Nothing found!!")

match --> No match!!
search --> searchObj.group() :  dogs


### 5. THE SPLIT() FUNCTION

The split() function returns a list where the string has been split at each match

In [8]:
string = "Let's make the machine learn"
split = re.split("\s", string)
print(split)

["Let's", 'make', 'the', 'machine', 'learn']


### 6. THE SUBSTITUTE SUB() FUNCTION

The sub() function replaces the matches with the text of your choice

In [9]:
string = "Let's make the machine learn"
substitute = re.sub("\s", "_", string)
print(substitute)

Let's_make_the_machine_learn


### 7. THE FINDALL() FUNCTION

The findall() function return a list containing all matches

In [10]:
string = "Let's make the machine learn, the school of ai"
alloccurances = re.findall("the", string)
print(alloccurances)

['the', 'the']
