# Regular Expression

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.

The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

In [1]:
import re

In [2]:
# phoneNumberRegex variable contains regular expression object
phoneNumberRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumberRegex.findall('My number is 129-123-4545')

# It will find the matched object from data & display it.
print('Phone number found: ', mo)

Phone number found:  ['129-123-4545']


In [3]:
# If there are groups in the regular expression, then findall() will return a list of tuples. 
# Each tuple represents a found match, and its items are the matched strings for each group in the regex.
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')
mo_findall_group = phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

print(mo_findall_group)

[('415', '555', '9999'), ('212', '555', '0000')]


- \   Used to drop the special meaning of character following it (discussed below)
- []  Represent a character class
- ^   Matches the beginning
- $   Matches the end
- .   Matches any character except newline
- ?   Matches zero or one occurrence.
- |   Means OR (Matches with any of the characters separated by it.
- (*)   Any number of occurrences (including 0 occurrences)
- (+)   One ore more occurrences
- {}  Indicate number of occurrences of a preceding RE to match.
- ()  Enclose a group of REs

In [4]:
# Find valid ip address
ipRegex = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
find_all_ips = ipRegex.findall('My local ip is 192.168.0.10 and public ip is 47.247.129.54')

print(find_all_ips)

['192.168.0.10', '47.247.129.54']


In [6]:
# Find valid date matches a date in yyyy-mm-dd format from 1900-01-01 through 2099-12-31, 
# with a choice of four separators
dateRegex = re.compile(r'((?:19|20)\d\d[- /.](?:0[1-9]|1[012])[- /.](?:0[1-9]|[12][0-9]|3[01]))')
find_date = dateRegex.findall('1989-02-05 and 1987-10-30')

print(find_date)

['1989-02-05', '1987-10-30']


- \d	Any numeric digit from 0 to 9.
- \D 	Any character that is not a numeric digit from 0 to 9.
- \w	Any letter, numeric digit, or the underscore character. (Think of this as matching “word” characters.)
- \W 	Any character that is not a letter, numeric digit, or the underscore character.
- \s 	Any space, tab, or newline character. (Think of this as matching “space” characters.)
- \S 	Any character that is not a space, tab, or newline.


Character classes are nice for shortening regular expressions. The character class
[0-5] will match only the numbers 0 to 5; this is much shorter than typing 
(0|1|2|3|4|5).

In [6]:
# Find valid password
passwordRegex = re.compile(r'[a-z0-9_-]{6,18}')
find_password = passwordRegex.findall('swapnil')

print(find_password)

['swapnil']


In [11]:
# Find valid email address
emailRegex = re.compile(r'((?:[a-zA-Z0-9._-]+)@(?:[a-zA-Z0-9]+)(?:\.{1})(?:[a-zA-Z]{3,}))')
find_email = emailRegex.findall('sundar@google.co.in and sachin@flipkart.com')

print(find_email)

['sachin@flipkart.com']


In [7]:
# Find valid email address
emailRegex = re.compile(r'((?:[a-zA-Z0-9._-]+)@(?:[a-zA-Z0-9]+)\.(?:[a-zA-Z]{2,})\.[a-zA-Z]{2})')
find_email = emailRegex.findall('sundar@google.com and sachin@flipkart.co.in')

print(find_email)

['sachin@flipkart.co.in']


In [10]:
# Find valid email address
data = 'sundar@google.co.in and Sachin@flipkart.com'
find_email = re.findall(r'[a-zA-Z0-9._-]+@[a-zA-Z0-9]+\.[a-zA-Z]{3,}',
                                data, re.I)

print(find_email)

['Sachin@flipkart.com']
