##### Regular Expressions (RegEx) in Python
A Regular Expression (RegEx) is a sequence of chararcters which describes a textual pattern.

In Python, **re** package that provides a set of methods to perform various RegEx operations.

##### Major functions that are avaialble in re:

* Compile - Returns a RegEx pattern object.
* Search - Returns a match object if there is a match found.
* findall - Returns a list of all matches.
* sub - Replaces one or more matches within the text.
* split - Returns a list where the text has been split at every match.

##### Create a Pattern object

In [1]:
# import the reuqired package

import re

In [2]:
# A RegEx pattern object for a mobile number - '9944912345'

re.compile('\d\d\d\d\d\d\d\d\d\d')

re.compile(r'\d\d\d\d\d\d\d\d\d\d', re.UNICODE)

In [3]:
mobile_po = re.compile('\d\d\d\d\d\d\d\d\d\d')

In [4]:
type(mobile_po)

re.Pattern

##### Create a Match object

In [6]:
text1 = '''Support around JioFiber is now just a WhatsApp away.
Click Here to open the support channel on WhatsApp. Alternatively, you can send "Hi" to 7000570005 to get started.'''

In [7]:
text1

'Support around JioFiber is now just a WhatsApp away.\nClick Here to open the support channel on WhatsApp. Alternatively, you can send "Hi" to 7000570005 to get started.'

* search() will return a Match object if the pattern is found else it will return None.

In [8]:
mobile_mo = mobile_po.search(text1)

In [9]:
mobile_mo

<re.Match object; span=(141, 151), match='7000570005'>

In [10]:
type(mobile_mo)

re.Match

In [11]:
print(mobile_mo)

<re.Match object; span=(141, 151), match='7000570005'>


* group() returns the matched text

In [12]:
print('Mobile number is: ', mobile_mo.group())

Mobile number is:  7000570005


##### Creating groups with parantheses ()

In [13]:
text2 = '''JioCare

Our experts are available for your assistance 24x7 (Monday - Sunday)

Interested in Jio? Talk to us on	1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
For Other numbers	1800-889-9999
Tele-verification to activate both HD voice & data services	1977
Tele-verification to activate data services only	1800-890-1977
For support on International Roaming
(accessible only when roaming abroad) +917018899999 (charges applicable)
Device Care Helpline (JioPhone, LYF Mobile & JioFi)

Our experts are available for your assistance on all Days, from 9am to 9pm

Helpline 1800-890-9999
Jio Enterprise Mobility & Business Solutions

Our experts are available for your assistance 24x7 (Monday - Sunday)

Enterprise Mobility Services 1800-889-9333
Enterprise Connectivity Services & Business Solutions	1800-889-9444
New Business Connection	1800-889-9555
Care Helpline for JioFiber Customers

Our experts are available for your assistance 24X7 (Monday – Sunday)

Helpline 1800-896-9999
Online shopping

For any online shopping related assistance, reach out to our experts between 9 am to 9 pm (Monday – Sunday)

Helpline 1800-893-3399'''

In [16]:
tollfree_po = re.compile('\d\d\d\d-\d\d\d-\d\d\d\d')
tollfree_mo = tollfree_po.search(text2)

In [17]:
tollfree_mo

<re.Match object; span=(112, 125), match='1860-893-3333'>

In [19]:
print(tollfree_mo.group())

1860-893-3333


In [20]:
print(tollfree_mo.group(0))

1860-893-3333


In [21]:
print(tollfree_mo.group(1))

IndexError: no such group

In [22]:
# create the pattern with groups using "()"

tollfree_po = re.compile('(\d\d\d\d)-(\d\d\d)-(\d\d\d\d)')
tollfree_mo = tollfree_po.search(text2)

In [23]:
tollfree_mo

<re.Match object; span=(112, 125), match='1860-893-3333'>

In [25]:
print(tollfree_mo.group())
print(tollfree_mo.group(0))
print(tollfree_mo.group(1))
print(tollfree_mo.group(2))
print(tollfree_mo.group(3))

1860-893-3333
1860-893-3333
1860
893
3333


In [28]:
mobile_po = re.compile('\+\d\d\d\d\d\d\d\d\d\d\d\d')
mobile_mo = mobile_po.search(text2)
print(mobile_mo.group())
print(mobile_mo.group(0))
# print(mobile_mo.group(1))

+917018899999
+917018899999


In [31]:
mobile_po = re.compile('(\+\d\d)(\d\d\d\d\d\d\d\d\d\d)')
mobile_mo = mobile_po.search(text2)
print(mobile_mo.group())
print(mobile_mo.group(0))
print(mobile_mo.group(1))
print(mobile_mo.group(2))

+917018899999
+917018899999
+91
7018899999


##### Macthing parantheses in text

In [41]:
text3 = '(accessible only when roaming abroad) (+91)7018899999 (charges applicable)'

In [43]:
mobile_po = re.compile('(\(\+\d\d\))(\d\d\d\d\d\d\d\d\d\d)')
mobile_mo = mobile_po.search(text3)
print(mobile_mo)
print(mobile_mo.group())
print(mobile_mo.group(0))
print(mobile_mo.group(1))
print(mobile_mo.group(2))

<re.Match object; span=(38, 53), match='(+91)7018899999'>
(+91)7018899999
(+91)7018899999
(+91)
7018899999


##### Matching multiple groups with

In [54]:
text2 = '''JioCare

Our experts are available for your assistance 24x7 (Monday - Sunday) 

Interested in Jio? Talk to us on	1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
For Other numbers	1800-889-9999
Tele-verification to activate both HD voice & data services	1977
Tele-verification to activate data services only	1800-890-1977
For support on International Roaming
(accessible only when roaming abroad) +917018899999 (charges applicable)
Device Care Helpline (JioPhone, LYF Mobile & JioFi)

Our experts are available for your assistance on all Days, from 9am to 9pm

Helpline 1800-890-9999
Jio Enterprise Mobility & Business Solutions

Our experts are available for your assistance 24x7 (Monday - Sunday)

Enterprise Mobility Services 1800-889-9333
Enterprise Connectivity Services & Business Solutions	1800-889-9444
New Business Connection	1800-889-9555
Care Helpline for JioFiber Customers

Our experts are available for your assistance 24X7 (Monday – Sunday)

Helpline 1800-896-9999
Online shopping

For any online shopping related assistance, reach out to our experts between 9 am to 9 pm (Monday – Sunday)

Helpline 1800-893-3399'''

In [55]:
tollfree_po = re.compile('(\d\d\d\d)|(\d\d\d)')

tollfree_mo = tollfree_po.search(text2)

In [56]:
print(tollfree_mo)

<re.Match object; span=(113, 117), match='1860'>


In [57]:
print(tollfree_mo.group())
print(tollfree_mo.groups())

1860
('1860', None)


In [59]:
# adding few more patterns
tollfree_po = re.compile('(\d\d\d\d)|(\d\d\d)|(\+\d\d\d\d\d\d\d\d\d\d\d\d)')

tollfree_mo = tollfree_po.search(text2)

print(tollfree_mo.groups())

('1860', None, None)


##### Optional matching with ?
* the ? character indicates the group () that precedes it as an optional part of the pattern.

In [60]:
text4 = '''Interested in Jio? Talk to us on	18608933333 
Tele-verification to activate data services only	1800-890-1977'''

In [62]:
tollfree_po = re.compile('\d\d\d\d(-)?\d\d\d(-)?\d\d\d\d')
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

18608933333


##### Matching zero or more with *
* The * character indicates the group () that precedes it can be absent or repeated over and over again.

In [64]:
text4 = '''Interested in Jio? Talk to us on	1860---893---3333 
Tele-verification to activate data services only	1800-890-1977'''

tollfree_po = re.compile('\d\d\d\d(-)*\d\d\d(-)*\d\d\d\d')
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

1860---893---3333


##### Matching one or more with +
* The + character indicates the group () that precedes it has to be present atleast once or repeated over and over again.

In [67]:
text4 = '''Interested in Jio? Talk to us on 18608933333 
Tele-verification to activate data services only 1800-890-1977'''

tollfree_po = re.compile('\d\d\d\d(-)+\d\d\d(-)+\d\d\d\d')
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

1800-890-1977


##### Matching specific repetitions with {}
* The {} braces indicates the group () that precedes it has to be present n number of times, where n is specified in the {}.

In [72]:
text4 = '''Interested in Jio? Talk to us on 18608933333 
Tele-verification to activate data services only 1800-890-1977'''

tollfree_po = re.compile('\d\d\d\d(-){2}\d\d\d(-){1}\d\d\d\d')
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

AttributeError: 'NoneType' object has no attribute 'group'

In [73]:
text4 = '''Interested in Jio? Talk to us on	18608933333 
Tele-verification to activate data services only	1800-890-1977
Enterprise Mobility Services	1800----889--9333
Enterprise Connectivity Services & Business Solutions	1800--889--9444
New Business Connection	1800-889-9555'''

In [74]:
tollfree_po = re.compile('\d\d\d\d(-){2}\d\d\d(-){2}\d\d\d\d') # {2} --> exactly 2
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

1800--889--9444


In [75]:
tollfree_po = re.compile('\d\d\d\d(-){2,}\d\d\d(-){2,}\d\d\d\d') # {2,} --> 2 or more
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

1800----889--9333


In [76]:
tollfree_po = re.compile('\d\d\d\d(-){,2}\d\d\d(-){,2}\d\d\d\d') # {,2} --> 0 to 2
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

18608933333


In [77]:
tollfree_po = re.compile('\d\d\d\d(-){2,5}\d\d\d(-){2,5}\d\d\d\d') # {2,5} --> 2 to 5
tollfree_mo = tollfree_po.search(text4)
print(tollfree_mo.group())

1800----889--9333


##### findall() method
* search() returns the match object of the first matched text,where as findall() returns strings for every match.

In [78]:
print(text2)

JioCare

Our experts are available for your assistance 24x7 (Monday - Sunday) 

Interested in Jio? Talk to us on	1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
For Other numbers	1800-889-9999
Tele-verification to activate both HD voice & data services	1977
Tele-verification to activate data services only	1800-890-1977
For support on International Roaming
(accessible only when roaming abroad) +917018899999 (charges applicable)
Device Care Helpline (JioPhone, LYF Mobile & JioFi)

Our experts are available for your assistance on all Days, from 9am to 9pm

Helpline 1800-890-9999
Jio Enterprise Mobility & Business Solutions

Our experts are available for your assistance 24x7 (Monday - Sunday)

Enterprise Mobility Services 1800-889-9333
Enterprise Connectivity Services & Business Solutions	1800-889-9444
New Business Connection	1800-889-9555
Care Helpline for JioFiber Customers

Our experts are available for you

In [85]:
tollfree_po = re.compile('\d\d\d\d-\d\d\d-\d\d\d\d')

tollfree_mo = tollfree_po.search(text2)

In [86]:
print(tollfree_mo.groups())

()


In [87]:
tollfree_po.findall(text2)

['1860-893-3333',
 '1800-889-9999',
 '1800-890-1977',
 '1800-890-9999',
 '1800-889-9333',
 '1800-889-9444',
 '1800-889-9555',
 '1800-896-9999',
 '1800-893-3399']

In [88]:
tollfree_po = re.compile('\d{4}-\d{3}-\d{4}|\+\d{12}')
tollfree_po.findall(text2)

['1860-893-3333',
 '1800-889-9999',
 '1800-890-1977',
 '+917018899999',
 '1800-890-9999',
 '1800-889-9333',
 '1800-889-9444',
 '1800-889-9555',
 '1800-896-9999',
 '1800-893-3399']

###### Character classes
* \d - Any numeric digit from 0 to 9.
* \D - Any character that is not a numeric digit (from 0 to 9).
* \w - Any character which is an alphabet or numeric digit or underscore (_).
* \W - Any character which is not an alphabet or numeric digit or underscore (_).
* \s - Any space or tab space or newline.
* \S - Any character which is not a space or tab space or newline.
* \b - Word boundary. This matches the beginning and end of a word.

In [89]:
po = re.compile('\d')
print(po.findall(text2))

['2', '4', '7', '1', '8', '6', '0', '8', '9', '3', '3', '3', '3', '3', '1', '9', '9', '1', '1', '9', '9', '1', '9', '8', '1', '8', '0', '0', '8', '8', '9', '9', '9', '9', '9', '1', '9', '7', '7', '1', '8', '0', '0', '8', '9', '0', '1', '9', '7', '7', '9', '1', '7', '0', '1', '8', '8', '9', '9', '9', '9', '9', '9', '9', '1', '8', '0', '0', '8', '9', '0', '9', '9', '9', '9', '2', '4', '7', '1', '8', '0', '0', '8', '8', '9', '9', '3', '3', '3', '1', '8', '0', '0', '8', '8', '9', '9', '4', '4', '4', '1', '8', '0', '0', '8', '8', '9', '9', '5', '5', '5', '2', '4', '7', '1', '8', '0', '0', '8', '9', '6', '9', '9', '9', '9', '9', '9', '1', '8', '0', '0', '8', '9', '3', '3', '3', '9', '9']


In [90]:
po = re.compile('\d+')
print(po.findall(text2))

['24', '7', '1860', '893', '3333', '1991', '199', '198', '1800', '889', '9999', '1977', '1800', '890', '1977', '917018899999', '9', '9', '1800', '890', '9999', '24', '7', '1800', '889', '9333', '1800', '889', '9444', '1800', '889', '9555', '24', '7', '1800', '896', '9999', '9', '9', '1800', '893', '3399']


In [93]:
po = re.compile('\D+') # Any character that is not a numeric digit (from 0 to 9)
print(po.findall(text2))

['JioCare\n\nOur experts are available for your assistance ', 'x', ' (Monday - Sunday) \n\nInterested in Jio? Talk to us on\t', '-', '-', '\nFor recharge plans, data balance, validity, recharge confirmation & offers\t', '\nFor Queries\t', '\nFor Complaints\t', '\nFor Other numbers\t', '-', '-', '\nTele-verification to activate both HD voice & data services\t', '\nTele-verification to activate data services only\t', '-', '-', '\nFor support on International Roaming\n(accessible only when roaming abroad) +', ' (charges applicable)\nDevice Care Helpline (JioPhone, LYF Mobile & JioFi)\n\nOur experts are available for your assistance on all Days, from ', 'am to ', 'pm\n\nHelpline ', '-', '-', '\nJio Enterprise Mobility & Business Solutions\n\nOur experts are available for your assistance ', 'x', ' (Monday - Sunday)\n\nEnterprise Mobility Services ', '-', '-', '\nEnterprise Connectivity Services & Business Solutions\t', '-', '-', '\nNew Business Connection\t', '-', '-', '\nCare Helpline for 

In [95]:
po = re.compile('\w+') # Any character which is an alphabet or numeric digit or underscore (_).
print(po.findall(text2))

['JioCare', 'Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', '24x7', 'Monday', 'Sunday', 'Interested', 'in', 'Jio', 'Talk', 'to', 'us', 'on', '1860', '893', '3333', 'For', 'recharge', 'plans', 'data', 'balance', 'validity', 'recharge', 'confirmation', 'offers', '1991', 'For', 'Queries', '199', 'For', 'Complaints', '198', 'For', 'Other', 'numbers', '1800', '889', '9999', 'Tele', 'verification', 'to', 'activate', 'both', 'HD', 'voice', 'data', 'services', '1977', 'Tele', 'verification', 'to', 'activate', 'data', 'services', 'only', '1800', '890', '1977', 'For', 'support', 'on', 'International', 'Roaming', 'accessible', 'only', 'when', 'roaming', 'abroad', '917018899999', 'charges', 'applicable', 'Device', 'Care', 'Helpline', 'JioPhone', 'LYF', 'Mobile', 'JioFi', 'Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', 'on', 'all', 'Days', 'from', '9am', 'to', '9pm', 'Helpline', '1800', '890', '9999', 'Jio', 'Enterprise', 'Mobility', 'Business', 'Solutions', 'O

In [97]:
po = re.compile('\W+') # Any character which is not an alphabet or numeric digit or underscore (_).
print(po.findall(text2))

['\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' (', ' - ', ') \n\n', ' ', ' ', '? ', ' ', ' ', ' ', '\t', '-', '-', '\n', ' ', ' ', ', ', ' ', ', ', ', ', ' ', ' & ', '\t', '\n', ' ', '\t', '\n', ' ', '\t', '\n', ' ', ' ', '\t', '-', '-', '\n', '-', ' ', ' ', ' ', ' ', ' ', ' & ', ' ', '\t', '\n', '-', ' ', ' ', ' ', ' ', ' ', '\t', '-', '-', '\n', ' ', ' ', ' ', ' ', '\n(', ' ', ' ', ' ', ' ', ') +', ' (', ' ', ')\n', ' ', ' ', ' (', ', ', ' ', ' & ', ')\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ', ', ' ', ' ', ' ', '\n\n', ' ', '-', '-', '\n', ' ', ' ', ' & ', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' (', ' - ', ')\n\n', ' ', ' ', ' ', '-', '-', '\n', ' ', ' ', ' & ', ' ', '\t', '-', '-', '\n', ' ', ' ', '\t', '-', '-', '\n', ' ', ' ', ' ', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' (', ' – ', ')\n\n', ' ', '-', '-', '\n', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ', ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' (', ' – ', ')\n\n', ' ', '-', '-']


In [100]:
# \s - Any space or tab space or newline.
po = re.compile('\s+')
print(po.findall(text2))

['\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' \n\n', ' ', ' ', ' ', ' ', ' ', ' ', '\t', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\t', '\n', ' ', '\t', '\n', ' ', '\t', '\n', ' ', ' ', '\t', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\t', '\n', ' ', ' ', ' ', ' ', ' ', '\t', '\n', ' ', ' ', ' ', ' ', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n\n', ' ', '\n', ' ', ' ', ' ', ' ', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n\n', ' ', ' ', ' ', '\n', ' ', ' ', ' ', ' ', ' ', '\t', '\n', ' ', ' ', '\t', '\n', ' ', ' ', ' ', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n\n', ' ', '\n', ' ', '\n\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n\n', ' ']


In [101]:
# \S - Any character which is not a space or tab space or newline.
po = re.compile('\S+')
print(po.findall(text2))

['JioCare', 'Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', '24x7', '(Monday', '-', 'Sunday)', 'Interested', 'in', 'Jio?', 'Talk', 'to', 'us', 'on', '1860-893-3333', 'For', 'recharge', 'plans,', 'data', 'balance,', 'validity,', 'recharge', 'confirmation', '&', 'offers', '1991', 'For', 'Queries', '199', 'For', 'Complaints', '198', 'For', 'Other', 'numbers', '1800-889-9999', 'Tele-verification', 'to', 'activate', 'both', 'HD', 'voice', '&', 'data', 'services', '1977', 'Tele-verification', 'to', 'activate', 'data', 'services', 'only', '1800-890-1977', 'For', 'support', 'on', 'International', 'Roaming', '(accessible', 'only', 'when', 'roaming', 'abroad)', '+917018899999', '(charges', 'applicable)', 'Device', 'Care', 'Helpline', '(JioPhone,', 'LYF', 'Mobile', '&', 'JioFi)', 'Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', 'on', 'all', 'Days,', 'from', '9am', 'to', '9pm', 'Helpline', '1800-890-9999', 'Jio', 'Enterprise', 'Mobility', '&', 'Business', 'Solu

In [102]:
po = re.compile('\d+\s\w+')
po.findall(text2)

['3333\nFor',
 '1991\nFor',
 '199\nFor',
 '198\nFor',
 '9999\nTele',
 '1977\nTele',
 '1977\nFor',
 '9999\nJio',
 '9333\nEnterprise',
 '9444\nNew',
 '9555\nCare',
 '9999\nOnline',
 '9 am',
 '9 pm']

In [104]:
po = re.compile('\((\w+\D+)\)')
po.findall(text2)

['Monday - Sunday',
 'accessible only when roaming abroad',
 'charges applicable)\nDevice Care Helpline (JioPhone, LYF Mobile & JioFi',
 'Monday - Sunday',
 'Monday – Sunday',
 'Monday – Sunday']

##### Matching a set of chararcters using []

In [110]:
myregex = re.compile('[a-zA-Z0-9]+')
print(myregex.findall(text2))

['JioCare', 'Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', '24x7', 'Monday', 'Sunday', 'Interested', 'in', 'Jio', 'Talk', 'to', 'us', 'on', '1860', '893', '3333', 'For', 'recharge', 'plans', 'data', 'balance', 'validity', 'recharge', 'confirmation', 'offers', '1991', 'For', 'Queries', '199', 'For', 'Complaints', '198', 'For', 'Other', 'numbers', '1800', '889', '9999', 'Tele', 'verification', 'to', 'activate', 'both', 'HD', 'voice', 'data', 'services', '1977', 'Tele', 'verification', 'to', 'activate', 'data', 'services', 'only', '1800', '890', '1977', 'For', 'support', 'on', 'International', 'Roaming', 'accessible', 'only', 'when', 'roaming', 'abroad', '917018899999', 'charges', 'applicable', 'Device', 'Care', 'Helpline', 'JioPhone', 'LYF', 'Mobile', 'JioFi', 'Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', 'on', 'all', 'Days', 'from', '9am', 'to', '9pm', 'Helpline', '1800', '890', '9999', 'Jio', 'Enterprise', 'Mobility', 'Business', 'Solutions', 'O

In [111]:
myregex = re.compile('[a-zA-Z]{10}')
print(myregex.findall(text2))

['assistance', 'Interested', 'confirmati', 'Complaints', 'verificati', 'verificati', 'Internatio', 'accessible', 'applicable', 'assistance', 'Enterprise', 'assistance', 'Enterprise', 'Enterprise', 'Connectivi', 'Connection', 'assistance', 'assistance']


In [116]:
myregex = re.compile(r'\s[a-zA-Z]{10,}\s')
print(myregex.findall(text2))

[' assistance ', '\nInterested ', ' confirmation ', ' Complaints\t', ' International ', ' assistance ', ' Enterprise ', ' assistance ', '\nEnterprise ', '\nEnterprise ', ' Connection\t', ' assistance ']


In [117]:
myregex = re.compile(r'\b[a-zA-Z]{10,}\b')
print(myregex.findall(text2))

['assistance', 'Interested', 'confirmation', 'Complaints', 'verification', 'verification', 'International', 'accessible', 'applicable', 'assistance', 'Enterprise', 'assistance', 'Enterprise', 'Enterprise', 'Connectivity', 'Connection', 'assistance', 'assistance']


##### ^ (Beginning) and $ (End)

In [119]:
text2

'JioCare\n\nOur experts are available for your assistance 24x7 (Monday - Sunday) \n\nInterested in Jio? Talk to us on\t1860-893-3333\nFor recharge plans, data balance, validity, recharge confirmation & offers\t1991\nFor Queries\t199\nFor Complaints\t198\nFor Other numbers\t1800-889-9999\nTele-verification to activate both HD voice & data services\t1977\nTele-verification to activate data services only\t1800-890-1977\nFor support on International Roaming\n(accessible only when roaming abroad) +917018899999 (charges applicable)\nDevice Care Helpline (JioPhone, LYF Mobile & JioFi)\n\nOur experts are available for your assistance on all Days, from 9am to 9pm\n\nHelpline 1800-890-9999\nJio Enterprise Mobility & Business Solutions\n\nOur experts are available for your assistance 24x7 (Monday - Sunday)\n\nEnterprise Mobility Services 1800-889-9333\nEnterprise Connectivity Services & Business Solutions\t1800-889-9444\nNew Business Connection\t1800-889-9555\nCare Helpline for JioFiber Customers

In [118]:
myregex = re.compile('^\w+', re.I) # re.I --> IGNORECASE
myregex.findall(text2)

['JioCare']

In [120]:
myregex = re.compile('\w+$')
myregex.findall(text2)

['3399']

##### Wildcard (.)

In [121]:
myregex = re.compile('.at')
myregex.findall(text2)

['dat', 'mat', 'cat', 'vat', 'dat', 'cat', 'vat', 'dat', 'nat', 'lat']

In [123]:
myregex = re.compile('..at')
myregex.findall(text2)

[' dat',
 'rmat',
 'icat',
 'ivat',
 ' dat',
 'icat',
 'ivat',
 ' dat',
 'rnat',
 'elat']

In [124]:
myregex = re.compile('..io..')
myregex.findall(text2)

[' Jio? ',
 'ation ',
 'ation ',
 'ation ',
 'ationa',
 '(JioPh',
 ' JioFi',
 'utions',
 'utions',
 'ction\t',
 ' JioFi']

##### sub() method

In [125]:
myregex = re.compile('\d+')
print(myregex.findall(text2))

['24', '7', '1860', '893', '3333', '1991', '199', '198', '1800', '889', '9999', '1977', '1800', '890', '1977', '917018899999', '9', '9', '1800', '890', '9999', '24', '7', '1800', '889', '9333', '1800', '889', '9444', '1800', '889', '9555', '24', '7', '1800', '896', '9999', '9', '9', '1800', '893', '3399']


In [128]:
print(text2)

JioCare

Our experts are available for your assistance 24x7 (Monday - Sunday) 

Interested in Jio? Talk to us on	1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
For Other numbers	1800-889-9999
Tele-verification to activate both HD voice & data services	1977
Tele-verification to activate data services only	1800-890-1977
For support on International Roaming
(accessible only when roaming abroad) +917018899999 (charges applicable)
Device Care Helpline (JioPhone, LYF Mobile & JioFi)

Our experts are available for your assistance on all Days, from 9am to 9pm

Helpline 1800-890-9999
Jio Enterprise Mobility & Business Solutions

Our experts are available for your assistance 24x7 (Monday - Sunday)

Enterprise Mobility Services 1800-889-9333
Enterprise Connectivity Services & Business Solutions	1800-889-9444
New Business Connection	1800-889-9555
Care Helpline for JioFiber Customers

Our experts are available for you

In [None]:
myregex.sub('---', text2)

##### split() method

In [130]:
myregex = re.compile('\s')

In [131]:
re.split(myregex, text2)

['JioCare',
 '',
 'Our',
 'experts',
 'are',
 'available',
 'for',
 'your',
 'assistance',
 '24x7',
 '(Monday',
 '-',
 'Sunday)',
 '',
 '',
 'Interested',
 'in',
 'Jio?',
 'Talk',
 'to',
 'us',
 'on',
 '1860-893-3333',
 'For',
 'recharge',
 'plans,',
 'data',
 'balance,',
 'validity,',
 'recharge',
 'confirmation',
 '&',
 'offers',
 '1991',
 'For',
 'Queries',
 '199',
 'For',
 'Complaints',
 '198',
 'For',
 'Other',
 'numbers',
 '1800-889-9999',
 'Tele-verification',
 'to',
 'activate',
 'both',
 'HD',
 'voice',
 '&',
 'data',
 'services',
 '1977',
 'Tele-verification',
 'to',
 'activate',
 'data',
 'services',
 'only',
 '1800-890-1977',
 'For',
 'support',
 'on',
 'International',
 'Roaming',
 '(accessible',
 'only',
 'when',
 'roaming',
 'abroad)',
 '+917018899999',
 '(charges',
 'applicable)',
 'Device',
 'Care',
 'Helpline',
 '(JioPhone,',
 'LYF',
 'Mobile',
 '&',
 'JioFi)',
 '',
 'Our',
 'experts',
 'are',
 'available',
 'for',
 'your',
 'assistance',
 'on',
 'all',
 'Days,',
 'fr

In [132]:
text4 = 'Our,experts are available for,your assistance'

In [133]:
text4.split(',')

['Our', 'experts are available for', 'your assistance']

In [134]:
myregex = re.compile(',')
re.split(myregex, text4)

['Our', 'experts are available for', 'your assistance']

##### A quick summary:
Major functions that are avaialble in re:
* Compile - Returns a RegEx pattern object.
* Search - Returns a match object if there is a match found.
* findall - Returns a list of all matches.
* sub - Replaces one or more matches within the text.
* split - Returns a list where the text has been split at every match.
* Braces - (), {}, []
* Special characters - |, ?, *, +, ^, $, .
* Character classes
* \d - Any numeric digit from 0 to 9.
* \D - Any character that is not a numeric digit (from 0 to 9).
* \w - Any character which is an alphabet or numeric digit or underscore (_).
* \W - Any character which is not an alphabet or numeric digit or underscore (_).
* \s - Any space or tab space or newline.
* \S - Any character which is not a space or tab space or newline.
* \b - Word boundary. This matches the beginning and end of a word.

* reference  https://regex101.com/

In [135]:
myregex = re.compile('[a-z0-9()\s-]+', re.I)
print(myregex.findall(text2))

['JioCare\n\nOur experts are available for your assistance 24x7 (Monday - Sunday) \n\nInterested in Jio', ' Talk to us on\t1860-893-3333\nFor recharge plans', ' data balance', ' validity', ' recharge confirmation ', ' offers\t1991\nFor Queries\t199\nFor Complaints\t198\nFor Other numbers\t1800-889-9999\nTele-verification to activate both HD voice ', ' data services\t1977\nTele-verification to activate data services only\t1800-890-1977\nFor support on International Roaming\n(accessible only when roaming abroad) ', '917018899999 (charges applicable)\nDevice Care Helpline (JioPhone', ' LYF Mobile ', ' JioFi)\n\nOur experts are available for your assistance on all Days', ' from 9am to 9pm\n\nHelpline 1800-890-9999\nJio Enterprise Mobility ', ' Business Solutions\n\nOur experts are available for your assistance 24x7 (Monday - Sunday)\n\nEnterprise Mobility Services 1800-889-9333\nEnterprise Connectivity Services ', ' Business Solutions\t1800-889-9444\nNew Business Connection\t1800-889-9555\

In [136]:
myregex = re.compile('.io')
myregex.findall(text2)

['Jio',
 'Jio',
 'tio',
 'tio',
 'tio',
 'tio',
 'Jio',
 'Jio',
 'Jio',
 'tio',
 'tio',
 'tio',
 'Jio']

In [137]:
myregex = re.compile(r'\b\w+tion\b')
myregex.findall(text2)

['confirmation', 'verification', 'verification', 'Connection']

In [138]:
myregex.sub('++++', text2)

'JioCare\n\nOur experts are available for your assistance 24x7 (Monday - Sunday) \n\nInterested in Jio? Talk to us on\t1860-893-3333\nFor recharge plans, data balance, validity, recharge ++++ & offers\t1991\nFor Queries\t199\nFor Complaints\t198\nFor Other numbers\t1800-889-9999\nTele-++++ to activate both HD voice & data services\t1977\nTele-++++ to activate data services only\t1800-890-1977\nFor support on International Roaming\n(accessible only when roaming abroad) +917018899999 (charges applicable)\nDevice Care Helpline (JioPhone, LYF Mobile & JioFi)\n\nOur experts are available for your assistance on all Days, from 9am to 9pm\n\nHelpline 1800-890-9999\nJio Enterprise Mobility & Business Solutions\n\nOur experts are available for your assistance 24x7 (Monday - Sunday)\n\nEnterprise Mobility Services 1800-889-9333\nEnterprise Connectivity Services & Business Solutions\t1800-889-9444\nNew Business ++++\t1800-889-9555\nCare Helpline for JioFiber Customers\n\nOur experts are available 

In [139]:
text4 = 'Our,experts are available for,your assistance'

In [142]:
text4.split(',')

['Our', 'experts are available for', 'your assistance']

In [141]:
myregex = re.compile(',')
re.split(myregex, text4)

['Our', 'experts are available for', 'your assistance']