# <span style="color:brown">Regular Expressions

##### A `regular expression` is a sequence of characters that defines a search `pattern`, mainly for use in `pattern matching` with strings, or string matching, i.e."find and replace"-like operations.

### Lets list a few expressions

- (.) --> Any character except new line
- (\d) --> a digit
- (\D) --> Not a digit

- (\w) --> a word character (A-Z, a-z, 0-9, _)
- (\W) --> Not a word character

- (\s) --> Whitespace character (space, tab, newline)
- (\S) --> Not a Whitespace character

- (\b) --> Word boundary
- (\B) --> Not a word boundary

- (^) --> beginning of a string
- ($) --> End of a string
- (x|y|z) --> exactly one of x, y or z.

- [a-z0-9]	The set of characters can include a range
- [aeiou]	Matches a single character in the listed set
- [^XYZ]	Matches a single character not in the listed set
- (	Indicates where string extraction is to start
- )	Indicates where string extraction is to end

- [] --> matches characters in brackets
- [^ ] --> matches characters not in brackets

(x)  in general is a remembered group. We can get the value of what matched by using the groups() method of the object returned by re.search. <br>
x?  matches an optional x character (in other words, it matches an x zero or one times). <br>
x*  matches x zero or more times. <br>
x+  matches x one or more times. <br>
x{m,n} matches an x character at least m times, but not more than n times. <br>
?:  matches an expression but do not capture it. Non capturing group. <br>
?=  matches a suffix but exclude it from capture. Positive look ahead. <br>
a(?=b) will match the "a" in "ab", but not the "a" in "ac" <br>
In other words, a(?=b) matches the "a" which is followed by the string 'b', without consuming what follows the a.<br>
?!  matches if suffix is absent. Negative look ahead.<br>
a(?!b) will match the "a" in "ac", but not the "a" in "ab"<br>
?<=  positive look behind <br>
?<!  negative look behind <br>

*	Repeats a character zero or more times <br>
\S	Matches any non-whitespace character <br>
*?	Repeats a character zero or more times (non-greedy) <br>
+	Repeats a character one or more times <br>
+?	Repeats a character one or more times (non-greedy) <br>

In [3]:
import re # Import regular expression library

#### Example-1

In [10]:
Test_string = 'The hackerrank team is on a mission to flatten the world by restructuring the DNA of every company on the planet. We rank programmers based on their coding skills, helping companies source great programmers and reduce the time to hire. As a result, we are revolutionizing the way companies discover and evaluate talented engineers. The hackerrank platform is the destination for the best engineers to hone their skills and companies to find top engineers.'
print(Test_string)

The hackerrank team is on a mission to flatten the world by restructuring the DNA of every company on the planet. We rank programmers based on their coding skills, helping companies source great programmers and reduce the time to hire. As a result, we are revolutionizing the way companies discover and evaluate talented engineers. The hackerrank platform is the destination for the best engineers to hone their skills and companies to find top engineers.


In [23]:
Regex_Pattern = r'hackerrank'
print(Regex_Pattern)

hackerrank


In [9]:
import re
Test_String = input()
match = re.findall(Regex_Pattern, Test_String)
print("Number of matches :", len(match))

'The hackerrank team is on a mission to flatten the world by restructuring the DNA of every company on the planet. We rank programmers based on their coding skills, helping companies source great programmers and reduce the time to hire. As a result, we are revolutionizing the way companies discover and evaluate talented engineers. The hackerrank platform is the destination for the best engineers to hone their skills and companies to find top engineers.'
Number of matches : 2


- '/s' in regex indicates a white space

In [11]:
Regex_Pattern_2 = r'coding\sskills' # \s indicates white space
re.findall(Regex_Pattern_2,Test_String)

['coding skills']

#### Example-2

In [None]:
regex_pattern = r"^...\....\....\....$" # Do not delete 'r'.

<mark style="background:yellow">The above pattern matches wih a string of format "abc.def.xyz.ghj" where the variables can be anything except newline

#### Example-3

In [26]:
# Search for a number in the given text
text = "This is session on 23 of this 68685 month" 
re.findall(r'\d',text) # a single digit

['2', '3', '6', '8', '6', '8', '5']

In [27]:
re.findall('\d+',text) # gives the full number

['23', '68685']

In [13]:
Regex_Pattern_3 = r"\d\d\D\d\d\D\d\d\d\d" # string of form "aaAaaAaaaa" where a->digit; A->non-digit

In [20]:
print(str(bool(re.search(Regex_Pattern_3, input()))).lower())

01-01-2022
true


In [22]:
print(str(re.findall(Regex_Pattern_4, input())))

Abdul Kalam is born on 15-10-1931
['15-10-1931']


#### Example-4

In [33]:
## Searching for random words in the text
text = 'This is abstract from book %$#$%@#$#$"shifting org culture", sai@gmail.com sai@skillovilla.com 127.87.23.90 and 83.34.23.23 Author: Ashok  year 1987'
text
re.search(r'\S+@\S+',text) # \S refers to all the not a white space characters

<re.Match object; span=(27, 46), match='%$#$%@#$#$"shifting'>

In [34]:
re.findall(r' \S+@\S+', text)

[' %$#$%@#$#$"shifting', ' sai@gmail.com', ' sai@skillovilla.com']

In [35]:
## searching string of numbers separted with some special character(in this cas ips)
# b matches word boundary , (?:[0-9]{1,3}\.) group {3} 3 times,  [0-9]{1,3}  last block without '.'
s=  re.findall(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b',text)
s

['127.87.23.90', '83.34.23.23']

In [36]:
## adding new ip
new_ip = '127.0.0.1'
replaced = re.sub(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b', new_ip, text)
replaced

'This is abstract from book %$#$%@#$#$"shifting org culture", sai@gmail.com sai@skillovilla.com 127.0.0.1 and 127.0.0.1 Author: Ashok  year 1987'

In [13]:
#re.sub()  The re.sub() function performs regular expression-based string substitutions
re.sub('y$', 'ies', 'emergency')

'emergencies'

In [5]:
Regex_Pattern_4 = r"\S\S\s\S\S\s\S\S" # string of form "YYyYYyYY" where Y->non-whitespaces y->whitespace

In [8]:
print(str(bool(re.search(Regex_Pattern_4, input()))).lower())

welcome to skillovilla, have a great future ahead.
true


In [29]:
print(str(re.findall(Regex_Pattern_4, input())))

I don't know what they are working on. But I can assure that its one of the best based on their early work.
['ne of th', 'ed on th']


#### Example-5

In [10]:
Regex_Pattern_5 = r"\w\w\w\W\w\W\w\W" # string of form "bbbBbBbB" b-->word B->Not a word

#### Example-6

In [11]:
Regex_Pattern_6 = r"^\d\w\w\w\w\.$" #Ggggg. G->Word character g->digit

#### Example-7

In [14]:
## searching a word from the text
text = 'The cyber security has become one of the most important aspect of the business. \
$550 million has been invested research. 12.45.65.78 is one the most spammed inject ips. \
ask@web.com. $600 million wasted. save safe.'
re.search('most',text)

<re.Match object; span=(41, 45), match='most'>

In [15]:
text[41:45]

'most'

In [16]:

re.findall('most',text)

['most', 'most']

#### Example-8

In [17]:
text = 'The cyber security has become one of the most important ascept of the business. \
$550 million has been invested research. 12.45.65.78 is one the most spammed inject ips. \
ask@web.com. $600 million wasted. save safe. sale. hi5@connect.com'

re.findall('sa.e',text)

['save', 'safe', 'sale']

In [18]:
re.findall('safe|saze',text)

['safe']

In [19]:
re.findall('\$550',text)

['$550']

In [20]:
re.findall('\d+',text)

['550', '12', '45', '65', '78', '600', '5']

In [21]:
re.findall('[a-z]+',text)

['he',
 'cyber',
 'security',
 'has',
 'become',
 'one',
 'of',
 'the',
 'most',
 'important',
 'ascept',
 'of',
 'the',
 'business',
 'million',
 'has',
 'been',
 'invested',
 'research',
 'is',
 'one',
 'the',
 'most',
 'spammed',
 'inject',
 'ips',
 'ask',
 'web',
 'com',
 'million',
 'wasted',
 'save',
 'safe',
 'sale',
 'hi',
 'connect',
 'com']

In [22]:
re.findall('[a-z0-9]+@[a-z0-9]+.com',text)

['ask@web.com', 'hi5@connect.com']

#### Example-9

In [32]:
# Examples for Phone numbers
text = " Please call this number (098) (0445).(2016) in case of emergency"
Regex_Pattern_9 = ".?(\\d{3}).*(\\d{3}).*(\\d{4})"
re.findall(Regex_Pattern_9,text)

[('098', '445', '2016')]

#### Example-10

In [4]:
## Searching email from the text data.
emailtext = 'ok take this to the next level from a google@skillovilla.com productivity standpoint zed each info@ernet.co.in time you send an infoi@gmail.com email'

In [5]:
re.findall(r'[a-zA-Z0-9]+@[a-zA-Z0-9]+.[a-z]+',emailtext) #any character in the range is considered with square brackets

['google@skillovilla.com', 'info@ernet.co', 'infoi@gmail.com']

In [6]:
re.findall('\S+@\S+', emailtext) # \S stands for not a white space character

['google@skillovilla.com', 'info@ernet.co.in', 'infoi@gmail.com']

# <mark style="background:lightgreen">Thank You