# re (Regualr expression) Module
- A RegEx, or Regular Expression, is a sequence of characters that defines a search pattern. 
- It is a tool for pattern matching and string manipulation. Regular expressions are used to search, match, and manipulate text based on a specified pattern.

__Properties of RegEx module:__
- RegEx is a sequence of characters
- It forms a search pattern
- It is used to find a sequence of characters within a string
- It extracts and manipulate them

__`findall >> list;` `search >> object;` `match >> object;` `sub >> str`__

__Functions:__
- __`re.findall():`__ It returns all non-overlapping matches of the regular expression pattern as a list of strings.
- __`re.search():`__ It searches for a single occurrence of the regular expression pattern in the string and returns a match object if found. It only returns the first occurrence.
- __`re.match():`__ It checks if the regular expression pattern matches at the beginning of the string. If there is a match, it returns a match object; otherwise, it returns None.
- __`re.sub():`__ It replaces occurrences of the regular expression pattern with a specified replacement and returns the modified string.
- __`re.split():`__ It splits the string based on occurrences of the regular expression pattern. The resulting substrings are returned as a list.
- __`re.compile():`__ It compiles a regular expression pattern into a regular expression object, which can then be used for matching using its match(), search(), and other methods. Compiling the regular expression can provide performance benefits if the same pattern is used multiple times.

__Brackets:__ 
- __`[]:`__ Using square brackets, we can include a variety of characters (letters, special characters, numbers, whitespace characters). The regular expression will search through and select one that matches the specified pattern. Square brackets are a way to create a custom list of characters, indicating a set of possibilities. Inside these square brackets, special characters lose their special meaning. ex. [0-9], [a-z], [a-zA-Z]
- __`{}:`__ Curly brackets specify the quantity of a particular element, and they can also be used to denote a range. For instance, the notation {m,n} causes the resulting regular expression to match from 'm' to 'n' repetitions of the preceding pattern or character.
- __`():`__ Parentheses are used for grouping in regular expressions. They allow you to apply quantifiers to a group of characters as a whole. 

__Special Escape sequence:__

- __`\d:`__ Matches any single digit. __[\d == [0-9]]__
- __`\D:`__ Matches any single character that is not a digit. __[\D == [^0-9]]__
- __`\s:`__ Matches any whitespace character, including spaces, tabs (\t), newlines (\n), and other whitespace characters. __[\s == \t\n]__
- __`\S:`__ Matches any non-whitespace character. __[\S == [^ \t\n\r\f\v]]__
- __`\w:`__ Matches any word character, which includes letters (both uppercase and lowercase), digits, and the underscore. __[\w == [a-zA-Z0-9_]]__
- __`\W:`__ Matches any non-word character, excluding letters, digits, and the underscore. __[\W == [^a-zA-Z0-9_]]__
- __`\b:`__ It will check for characters at the beginning of a word or end of a word or we can say it matches word boundary. 

__Special characters:__

- __`'+' (plus):`__ Matches one or more occurrences of the preceding regular expression..
- __`'\*' (star):`__ Matches zero or more occurrences of the preceding regular expression.
- __`'.' (dot):`__ Matches any character except a newline. However, if the DOTALL flag is specified, it matches any character, including a newline.
- __`'?' (question mark):`__ Indicates that the preceding element is optional, matching either zero or one occurrence.
- __`'^' (caret/startswith):`__ Matches the start of the string. In MULTILINE mode, it also matches immediately after each newline.
- __`'$' (dollar/endswith):`__  Matches the end of the string. In MULTILINE mode, it also matches just before a newline.
- __`'|' (or):`__ Creates a RE that will match either the pattern before or after the pipe symbol.

In [1]:
import re

__1) re.findall(pattern, string)__

In [2]:
# Accessing numbers [0-9]

text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[0-9]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['8', '9', '8', '3', '7', '4', '3', '9', '6', '8', '1', '1', '1', '0', '3', '0', '6', '2', '0', '2', '2']
21


In [3]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[0-9]{2}',text1) # [condition] and {count you want}
print(result) # it will search for continue count
print(len(result))

['89', '83', '74', '39', '68', '11', '03', '06', '20', '22']
10


In [4]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[0-9]{10}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['8983743968']
1


In [5]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[0-5]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['3', '4', '3', '1', '1', '1', '0', '3', '0', '2', '0', '2', '2']
13


In [6]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[012345]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['3', '4', '3', '1', '1', '1', '0', '3', '0', '2', '0', '2', '2']
13


In [7]:
# Accessing lower characters [a-z]

text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[a-z]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['y', 'm', 'o', 'b', 'i', 'l', 'e', 'u', 'm', 'b', 'e', 'r', 'i', 's', 'y', 'e', 'm', 'a', 'i', 'l', 'i', 'd', 'i', 's', 'f', 'a', 'd', 't', 'a', 'r', 'e', 'o', 'm', 'k', 'a', 'r', 'g', 'm', 'a', 'i', 'l', 'c', 'o', 'm', 'o', 'd', 'a', 'y', 's', 'd', 'a', 't', 'e', 'i', 's']
55


In [8]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[a-z]{3}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['mob', 'ile', 'umb', 'ema', 'fad', 'tar', 'eom', 'kar', 'gma', 'com', 'oda', 'dat']
12


In [9]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[abcdefg]{2}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['be', 'fa', 'da', 'da']
4


In [10]:
text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[abcdefg]{3}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['fad']
1


In [11]:
# Accessing upper characters [A-Z]

text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[A-Z]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['M', 'N', 'M', 'T']
4


In [12]:
# Accessing lower & upper characters [A-Za-z]

text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[A-Za-z]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['M', 'y', 'm', 'o', 'b', 'i', 'l', 'e', 'N', 'u', 'm', 'b', 'e', 'r', 'i', 's', 'M', 'y', 'e', 'm', 'a', 'i', 'l', 'i', 'd', 'i', 's', 'f', 'a', 'd', 't', 'a', 'r', 'e', 'o', 'm', 'k', 'a', 'r', 'g', 'm', 'a', 'i', 'l', 'c', 'o', 'm', 'T', 'o', 'd', 'a', 'y', 's', 'd', 'a', 't', 'e', 'i', 's']
59


In [13]:
# Accessing lower, upper characters & numbers [A-Za-z0-9]

text1 = """ My mobile Number is 8983743968
            My email id : fadtareomkar111@gmail.com
            Today's date : 03/06/2022 """

result = re.findall('[A-Za-z0-9]{1}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['M', 'y', 'm', 'o', 'b', 'i', 'l', 'e', 'N', 'u', 'm', 'b', 'e', 'r', 'i', 's', '8', '9', '8', '3', '7', '4', '3', '9', '6', '8', 'M', 'y', 'e', 'm', 'a', 'i', 'l', 'i', 'd', 'f', 'a', 'd', 't', 'a', 'r', 'e', 'o', 'm', 'k', 'a', 'r', '1', '1', '1', 'g', 'm', 'a', 'i', 'l', 'c', 'o', 'm', 'T', 'o', 'd', 'a', 'y', 's', 'd', 'a', 't', 'e', '0', '3', '0', '6', '2', '0', '2', '2']
76


In [14]:
text1 = """ My mobile Number is 8983743968
            My email id : fadtareomkar111@gmail.com
            Today's date : 03/06/2022 """

result = re.findall('[a-z]{5}[0-9]{0,3}[@]{1}[a-z.]{4,10}',text1) # [condition] and {count you want}
print(result)
print(len(result))

['omkar111@gmail.com']
1


In [15]:
# See the differnece between above and this example

text1 = """ My mobile Number is 8983743968
            My email id is: fadtareomkar111@gmail.com
            Today's date is: 03/06/2022 """

result = re.findall('[a-z]{,155}[0-9]{0,3}[@]{1}[a-z.]{4,10}',text1) 
print(result)
print(len(result))

['fadtareomkar111@gmail.com']
1


In [16]:
text = """ My email id is: viratkohli@infosys.com
                           viratkohli123@gmail.com
                           virat_kohli123@gmail.com
                           virat.kohli@gmail.com
                           virat_kohli_123@gov.co.in
                           virat_kohli_123@coep.edu.in
                           hr@capgemini.com """

email_id = re.findall('[a-z_.]{2,20}[0-9]{0,4}[@]{1}[a-z.]{5,20}',text)
print(email_id)
print(len(email_id))

['viratkohli@infosys.com', 'viratkohli123@gmail.com', 'virat_kohli123@gmail.com', 'virat.kohli@gmail.com', 'virat_kohli_123@gov.co.in', 'virat_kohli_123@coep.edu.in', 'hr@capgemini.com']
7


In [17]:
# Accessing aadhar card number

text = """ 1234 245 0978    
           1234 2345 0987    
           4567 2345 0987678    
           1234 2345 987 """

aadhar_num = re.findall('[0-9]{4}[ ][0-9]{4}[ ][0-9]{4}',text)
aadhar_num

['1234 2345 0987', '4567 2345 0987']

In [18]:
# Accessing pan card number

text = """ ASDFB0987P
           RTYUI65780
           POIUY3456T """
pan_num = re.findall('[A-Z]{5}[0-9]{4}[A-Z]',text)
pan_num

['ASDFB0987P', 'POIUY3456T']

In [19]:
# Accessing dates 

text = """ ASDFB0987P
           RTYUI65780
           POIUY3456T
           
           Dates:
           01/03/2022
           28/02/2022
           08-02-2022
           25/05/2022 """
dates = re.findall('[0-9]{2}[/-][0-9]{2}[/-][0-9]{4}',text)
dates

['01/03/2022', '28/02/2022', '08-02-2022', '25/05/2022']

In [20]:
text = """ ASDFB0987P
           RTYUI65780
           POIUY3456T
        
           Dates:
           01/03/2022
           28/02/2022
           08-02-2022
           25/05/2022
           40-08-2021 """
dates = re.findall('[0-3][0-9][/-][0-9]{2}[/-][0-9]{4}',text)
dates

['01/03/2022', '28/02/2022', '08-02-2022', '25/05/2022']

__\d == [0-9]__

In [21]:
text1 = """ My mobile Number is 8983743968
            My email id : fadtareomkar111@gmail.com
            Today's date : 03/06/2022 """

result = re.findall('[\d]{10}',text1)
print(result)
print(len(result))

['8983743968']
1


__\D == [^0-9]__

In [22]:
text1 = """ My mobile Number is 8983743968
            My email id : fadtareomkar111@gmail.com
            Today's date : 03/06/2022 """

result = re.findall('[\D]{2}',text1)
print(result)
print(len(result))

[' M', 'y ', 'mo', 'bi', 'le', ' N', 'um', 'be', 'r ', 'is', '\n ', '  ', '  ', '  ', '  ', '  ', ' M', 'y ', 'em', 'ai', 'l ', 'id', ' :', ' f', 'ad', 'ta', 're', 'om', 'ka', '@g', 'ma', 'il', '.c', 'om', '\n ', '  ', '  ', '  ', '  ', '  ', ' T', 'od', 'ay', "'s", ' d', 'at', 'e ', ': ']
48


__\s == \t\n__

In [23]:
text = """ 1234 245 0978    
           1234 2345 0987    
           4567 2345 0987678    
           1234 2345 987 """
aadhar = re.findall('[\d]{4}[\s]{1}[\d]{4}[\s]{1}[\d]{4}',text)
aadhar

['1234 2345 0987', '4567 2345 0987']

In [24]:
text = """ 1234 245 0978    
           1234 2345 0987    
           4567 2345 0987678    
           1234 2345 987 """
aadhar = re.findall('\d{4}\s{1}\d{4}\s{1}\d{4}',text) # square braket is not neccesary while using special sequence but while specifying [a-z] it is necessary
aadhar

['1234 2345 0987', '4567 2345 0987']

__\S == [^ \t\n\r\f\v]__

In [25]:
text = """ ASDFB0987P
           RTYUI65780
           POIUY3456T
           1234 2345 0987 """
pan_num = re.findall('\S',text)
pan_num

['A',
 'S',
 'D',
 'F',
 'B',
 '0',
 '9',
 '8',
 '7',
 'P',
 'R',
 'T',
 'Y',
 'U',
 'I',
 '6',
 '5',
 '7',
 '8',
 '0',
 'P',
 'O',
 'I',
 'U',
 'Y',
 '3',
 '4',
 '5',
 '6',
 'T',
 '1',
 '2',
 '3',
 '4',
 '2',
 '3',
 '4',
 '5',
 '0',
 '9',
 '8',
 '7']

__\w == [a-zA-Z0-9_]__

In [26]:
text = """ Data science is the domain of study that deals with vast volumes of data using modern tools and 
           techniques to find unseen patterns, derive meaningful information, 
           and make business decisions. Data science uses complex machine learning algorithms to build predictive models.
           ASDFB0987P
           RTYUI65780
           POIUY3456T
           1234 2345 0987 """

result = re.findall('\w{2,9}',text)
print(result)

['Data', 'science', 'is', 'the', 'domain', 'of', 'study', 'that', 'deals', 'with', 'vast', 'volumes', 'of', 'data', 'using', 'modern', 'tools', 'and', 'technique', 'to', 'find', 'unseen', 'patterns', 'derive', 'meaningfu', 'informati', 'on', 'and', 'make', 'business', 'decisions', 'Data', 'science', 'uses', 'complex', 'machine', 'learning', 'algorithm', 'to', 'build', 'predictiv', 'models', 'ASDFB0987', 'RTYUI6578', 'POIUY3456', '1234', '2345', '0987']


__\W == [^a-zA-Z0-9_]__

In [27]:
text = """ Data science is the domain of study that deals with vast volumes of data
           RTYUI65780@
           POIUY3456T&
           1234 2345 0987* """

result = re.findall('\W',text)
print(result)

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '@', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '&', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '*', ' ']


__\b__

In [28]:
text = """Bank IFSC Code is :
          HDFC0009873
          9736467483926
          9876543218
          ASDFG5678PA
          ASDSDVFHDBD5566POIUYY
          HGHSBMHSD4567XBCBXJHCXCMX"""

pan_num = re.findall(r'\b[A-Z]{5}\d{4}[A-Z]\b',text)
pan_num

[]

In [29]:
text = """Bank IFSC Code is :
          HDFC0009873
          9736467483926
          9876543218
          ASDFG5678PA
          ASDSDVFHDBD5566POIUYY
          HGHSBMHSD4567XBCBXJHCXCMX"""

pan_num = re.findall(r'\b\d{10}\b',text)
pan_num

['9876543218']

In [30]:
text = "This is pune "
output = re.findall(r'ne\b',text)
output

['ne']

In [31]:
text = """Bank IFSC Code is :
          HDFC0009873
          9736467483926
          9876543218
          ASDFG5678P
          ASDFG56784
          ASDSDVFHDBD5566POIUYY
          HGHSBMHSD4567XBCBXJHCXCMX"""

mobile = re.findall(r'\b\d{10}\b',text)
mobile

['9876543218']

__2) re.search(pattern, string)__

In [32]:
text = """ PAN CARDs
           QWERT5678A
           QWERT3456P
           POIYU5678P
           9988776655
           1234567777
           QWEKL9876L
           3456 7890 1234 """

pan_num = re.search(r'\b[A-Z]{5}\d{4}[A-Z]',text)
print(pan_num) # prints the match object, which includes information about the match such as the matched string, start position, and end position
print(pan_num.group()) # prints the actual matched string
print(pan_num.start()) # prints the start position of the match in the original text
print(pan_num.end()) # prints the end position of the match in the original text

<re.Match object; span=(22, 32), match='QWERT5678A'>
QWERT5678A
22
32


In [33]:
text = """ PAN CARDs
           QWERT5678A
           QWERT3456P
           POIYU5678P
           9988776655
           1234567777
           QWEKL9876L
           3456 7890 1234 """

pan_num = re.search(r'\b[A-Z]{5}\d{4}[A-Z]',text)
print(pan_num)
if pan_num:
    print('Match found')
    print(pan_num.group())

<re.Match object; span=(22, 32), match='QWERT5678A'>
Match found
QWERT5678A


In [34]:
text = ' batwoman and batman'
result = re.search('bat(wo)?man', text)
result.group()

'batwoman'

__3) re.match(pattern, string)__

In [35]:
text = """ PAN CARDs
           QWERT5678A
           QWERT3456P
           POIYU5678P
           9988776655
           1234567777
           QWEKL9876L
           3456 7890 1234 """

result = re.match(r'[A-Z]{5}\d{4}[A-Z]',text)
if result:
    print('Match found')
    print(result.group())
    
else:
    print('No match found')

No match found


In [36]:
text = """ PAN CARDs
           QWERT5678A
           QWERT3456P
           POIYU5678P
           9988776655
           1234567777
           QWEKL9876L
           3456 7890 1234 """

result = re.match(r'[A-Z]{3}',text)
if result:
    print('Match found')
    print(result.group())
    
else:
    print('No match found')

No match found


In [37]:
text = """ python class
           QWERT5678A
           QWERT3456P
           POIYU5678P
           9988776655
           1234567777
           QWEKL9876L
           3456 7890 1234 """

result = re.match(r' python',text)
if result:
    print('Match found')
    print(result.group())
    
else:
    print('No match found')

Match found
 python


__4) re.sub(pattern, replacement, string)__

In [38]:
string = """Data science is an interdisciplinary field uses 45,678 scienctific methods"""
string = string.replace('Data science', 'Python')
string

'Python is an interdisciplinary field uses 45,678 scienctific methods'

In [39]:
string = """Data science is an interdisciplinary field uses 45678 scienctific methods"""
new_string = re.sub('\d','+', string)
print(new_string)

Data science is an interdisciplinary field uses +++++ scienctific methods


In [40]:
string = """Data science is an @#$%^ interdisciplinary field uses 45678 scienctific methods"""
new_string = re.sub('[^A-Za-z]','', string)
print(new_string)

Datascienceisaninterdisciplinaryfieldusesscienctificmethods


In [41]:
string = """Data science is an @#$%^ interdisciplinary field uses 45678 scienctific methods"""
new_string = re.sub('[^A-Za-z]',' ', string)
print(new_string)

Data science is an       interdisciplinary field uses       scienctific methods


__'+' (plus sign)__

In [42]:
text = """Data         science is the domain of       study that deals with     vast volumes of data using modern tools and 
techniques to find unseen patterns, derive meaningful     information, and     make business decisions.
Data science uses complex machine learning algorithms to build predictive models."""

result = re.sub('\s+', ' ',text)
result

'Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions. Data science uses complex machine learning algorithms to build predictive models.'

In [43]:
text = """Data @#$$%^  science is the domain of study that deals with vast volumes of data using modern tools and 
techniques to find unseen patterns, derive meaningful ##$%^&  information, and make business decisions.
Data science uses complex machine learning algorithms to build predictive models."""

result = re.sub('\W+', ' ',text)
result

'Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns derive meaningful information and make business decisions Data science uses complex machine learning algorithms to build predictive models '

In [44]:
text = """Data @#$$%^  science is the domain of study that deals with vast volumes of data using modern tools and 
techniques to find unseen patterns, derive meaningful ##$%^&  information, and make business decisions.
Data science uses complex machine learning algorithms to build predictive models."""

result = re.sub('[^a-zA-Z0-9]+', ' ',text)
result

'Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns derive meaningful information and make business decisions Data science uses complex machine learning algorithms to build predictive models '

In [45]:
text = 'python and data science pythonnnn pyttthon'
result = re.findall('python',text)
result

['python', 'python']

In [46]:
text = 'python and data science pythonnnn pyttthon'
result = re.findall('pyt+hon',text)
result

['python', 'python', 'pyttthon']

In [47]:
text = 'python and data science pythonnnn pyttthon pyttthonnnnn'
result = re.findall('pyt+hon',text)
result

['python', 'python', 'pyttthon', 'pyttthon']

__'\*' (star sign)__

In [48]:
text = 'python and data science pyhonnnn pyttthon pyttthooonnnnn'
result = re.findall('pyt*hon',text)
result

['python', 'pyhon', 'pyttthon']

In [49]:
text = 'python and data science pyhonnnn pyttthon pyttthnnnnn'
result = re.findall('pyt*ho*n',text)
result

['python', 'pyhon', 'pyttthon', 'pyttthn']

In [50]:
text = 'python and data science pyhon  pytthhhhon'
result = re.findall('pyt*h+on',text)
result

['python', 'pyhon', 'pytthhhhon']

__'.' (dot sign)__

In [51]:
text = 'python and data science pyhon pytthhhhon'
result = re.findall('py..on', text)
result

['python']

In [52]:
text = 'python and data science pyhon pytthhhhon pysuon'
result = re.findall('sci..ce', text)
result

['science']

In [53]:
text = 'python and data science pyhon pytthhhhon pysuon'
result = re.findall('p...on', text)
result

['python', 'pysuon']

In [54]:
string = '8812345656 8812343030 9812343030 7812343030 8898763030 8823456700'
result = re.findall('9.......30', string)
result

['9812343030']

__'?' (question mark sign)__

In [55]:
text  = 'python and data science pyhon pytthhhhon'
result = re.findall('pyt?hon',text)
result

['python', 'pyhon']

In [56]:
# Difference between '*', '+' and '?'

text  = 'python and data science pyhon pytthon'
result = re.findall('pyt*hon',text)
result

['python', 'pyhon', 'pytthon']

In [57]:
text  = 'python and data science pyhon pytthon'
result = re.findall('pyt?hon',text)
result

['python', 'pyhon']

In [58]:
text  = 'python and data science pyhon pytthon'
result = re.findall('pyt+hon',text)
result

['python', 'pytthon']

__'^' (caret sign): startswith__

In [59]:
string = '8812345656 8812343030 9812343030 7812343030 8898763030 8823456700'
result = re.findall('^8', string)
result

['8']

In [60]:
string = '7812345656 8812345656 8812343030 9812343030 7812343030 8898763030 8823456700'
result = re.findall('^\d', string)
result

['7']

In [61]:
string = '7812345656 8812345656 8812343030 9812343030 7812343030 8898763030 8823456700'
result = re.findall('^[987]', string) # caret is outside the square bracket; here it works as startswith 9 or 8 or 7
result

['7']

In [62]:
string = '7812345656 8812345656 8812343030 9812343030 7812343030 8898763030 8823456700'
result = re.findall('[^987]', string) # caret is inside the square bracket; here it works as except 9,8,7
result

['1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '5',
 '6',
 ' ',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '5',
 '6',
 ' ',
 '1',
 '2',
 '3',
 '4',
 '3',
 '0',
 '3',
 '0',
 ' ',
 '1',
 '2',
 '3',
 '4',
 '3',
 '0',
 '3',
 '0',
 ' ',
 '1',
 '2',
 '3',
 '4',
 '3',
 '0',
 '3',
 '0',
 ' ',
 '6',
 '3',
 '0',
 '3',
 '0',
 ' ',
 '2',
 '3',
 '4',
 '5',
 '6',
 '0',
 '0']

In [63]:
string = 'Python and data science pyton pytthon'
result = re.findall('^[a-zA-Z]', string)
result

['P']

In [64]:
string = 'Python and data science pyton pytthon'
result = re.findall('^[a-zA-Z]{6}', string)
result

['Python']

__'$' (dollar sign): endswith__

In [65]:
string = 'Python and data science pyton pytthon'
result = re.findall('n$', string)
result

['n']

In [66]:
string = 'Python and data science pyton pytthon'
result = re.findall('[a-z]$', string)
result

['n']

In [67]:
string = '   Python and data science pyton pytthon 12345   '
result = re.findall('[a-z0-9]$', string) # string is ending with whitespace character.
result

[]

In [68]:
string = '   Python and data science pyton pytthon 12345   '
string = string.strip() # using strip function to removee leadig and trailing spaces
result = re.findall('[a-z0-9]$', string)
result

['5']

In [69]:
text = """Data science is the domain of study that deals with vast volumes of data using modern 
tools and techniques to find unseen patterns"""
result = re.findall('^D......', text)
result

['Data sc']

In [70]:
text = """Data science is the domain of study that deals with vast volumes of data using modern 
tools and techniques to find unseen patterns"""
result = re.findall('^D.*', text)
result

['Data science is the domain of study that deals with vast volumes of data using modern ']

In [71]:
text = """Data science is the domain of study that deals with vast volumes of data using modern 
tools and techniques to find unseen patterns"""
result = re.findall('^D.+', text)
result

['Data science is the domain of study that deals with vast volumes of data using modern ']

__'|' (or sign)__

In [72]:
text = """Data science is the domain of study that deals with vast volumes of data using modern 
tools and techniques to find unseen patterns"""
result = re.findall('Data|science', text) # 1 occurances
result

['Data', 'science']

In [73]:
text = """Data science is the domain of study that Data deals with vast volumes of science data using modern 
tools and techniques to find unseen patterns"""
result = re.findall('Data|science', text) # 2 occurances
result

['Data', 'science', 'Data', 'science']

In [74]:
text = """Data science is the domain of study that Data deals with vast volumes of science data using modern 
tools and techniques to find unseen patterns 2345"""
result = re.findall('Data|science|2345', text)
result

['Data', 'science', 'Data', 'science', '2345']

In [75]:
text = """27-05-2022 and 27/05/2022"""
result = re.findall('\d{2}[-/]\d{2}[-/]\d{4}', text) # You can write in this way also
result

['27-05-2022', '27/05/2022']

In [76]:
text = """27-05-2022 and 27/05/2022"""
result = re.findall(r'\b\d{2}[-]\d{2}[-]\d{4}\b|\d{2}[/]\d{2}[/]\d{4}', text)
result

['27-05-2022', '27/05/2022']

In [77]:
text = """27-05-2022 and 27/05/2022, '27 May 2022'"""
result = re.findall(r'\b\d{2}[-/]\d{2}[-/]\d{4}\b|\d{2}[ ][A-Za-z]{3}[ ]\d{4}', text)
result

['27-05-2022', '27/05/2022', '27 May 2022']

__5) re.split(pattern,string,maxsplit=0):__

In [78]:
text = """Data science is the domain of study that Data deals with vast volumes of science data using modern 
tools and techniques to find unseen patterns 2345 1234"""
list1 = text.split()
print(list1)

['Data', 'science', 'is', 'the', 'domain', 'of', 'study', 'that', 'Data', 'deals', 'with', 'vast', 'volumes', 'of', 'science', 'data', 'using', 'modern', 'tools', 'and', 'techniques', 'to', 'find', 'unseen', 'patterns', '2345', '1234']


In [79]:
text = """Data science 1 is the domain of study that Data deals with vast volumes of science data using modern 
tools and techniques to find unseen patterns 2345 1234"""
list1 = text.split('1')
print(list1)

['Data science ', ' is the domain of study that Data deals with vast volumes of science data using modern \ntools and techniques to find unseen patterns 2345 ', '234']


In [80]:
text = """python A and B data science"""
list1 = re.split('[A-Z]',text)
print(list1)

['python ', ' and ', ' data science']


In [81]:
text = """99,33,44,12-34-67-51"""
list1 = re.split('[,-]',text)
print(list1)

['99', '33', '44', '12', '34', '67', '51']


In [82]:
text = """99,33,44,12-34-67-51"""
list1 = re.split(',|-',text)
print(list1)

['99', '33', '44', '12', '34', '67', '51']


__6) re.compile(pattern):__

In [83]:
string5 = '2123-456-9876'
result = re.findall('\d{3}[-]\d{3}[-]\d{4}',string5)
result

['123-456-9876']

In [84]:
pattern = re.compile('(\d{3}[-])?\d{3}[-]\d{4}')
string1 = '123-456-9876'
result = pattern.search(string1)
result.group()

'123-456-9876'

In [85]:
string1 = '123-456-9876'
string2 = '456-9876'
string3 = '123-9876'
string4 = '567-456-9876'
string5 = '2123-456-9876'
string6 = '123-456-9876'

In [86]:
pattern = re.compile('(\d{3}[-])?\d{3}[-]\d{4}')

In [87]:
result1 = pattern.search(string1)
result2 = pattern.search(string2)
result3 = pattern.search(string3)
result4 = pattern.search(string4)
result5 = pattern.search(string5)
result6 = pattern.search(string6)

print(result1.group())
print(result2.group())
print(result3.group())
print(result4.group())
print(result5.group())
print(result6.group())

123-456-9876
456-9876
123-9876
567-456-9876
123-456-9876
123-456-9876


In [88]:
# For example

# prog = re.compile(pattern) # We can use this again and again
# result = prog.match(string)

# Which is equivalent to
# result = re.match(pattern, string) 