In [1]:
import re

<h2>re.search - Find the first match anywhere</h2>
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

In [2]:
pattern = r'\d+'
sent = '52 is my lucky number 12' # it only shows the first number

match = re.search(pattern,sent)

if match:
    print('Match ',match.group(0),' at position ',match.start())
else:
    print('No match')

Match  52  at position  0


In [3]:
def is_integer(text):
    pattern = '^\d+$' # looks for strings that ends with an integer
    match = re.search(pattern,text) # match loos for integers at the start of the string
    return True if match else False

def is_integer_test():
    pass_list = ['12','432','2','9878','12','532']
    fail_list = ['abc','a123','3  4','123a','hardik1']
    flag = 0
    
    for item in pass_list:
        if not is_integer(item):
            print('\tItem ', item, ' not identified as an integer')
            flag=1
            
    for item in fail_list:
        if is_integer(item):
            print('\tItem ', item, ' incorrectly identified as an integer')
            flag=1
            
    if flag == 0:
        print('Test is successful')
        
    else:
        print('\nTask is not successful')
            
            

In [4]:
is_integer_test()

Test is successful


Search if better than match, as in this case all the logic is in the pattern, whereas in the match case, the logic was divided, the match looked for starting and $ checks matches the  ending part. 

## re.findall - Find all the matches¶
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

In [5]:
text = "My runs after three games are 42, 99 and 100"

pattern = r'\d+'
match = re.findall(pattern,text)


if match:
    print('matches are - ',match)
else:
    print('No match')

matches are -  ['42', '99', '100']


The problem with re.findall is that it runs over the entire text, and may get stuck if there's a large corpus. 

## re.finditer - Iterator¶
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

In [6]:
text = "My runs after three games are 42, 99 and 100"

pattern = r'\d+'
matches = re.finditer(pattern,text)


if matches:
    print('matches are - ',matches)
else:
    print('No match')
    
for temp in matches:
    print(temp.group(0), 'at position - ',temp.start(0))

matches are -  <callable_iterator object at 0x7fd12842e520>
42 at position -  30
99 at position -  34
100 at position -  41


<h2>groups - find sub matches </h2>
group 0 = refers to the text in a string that matched the pattern<br>
group 1..n onwards refer to the sub-groups

In [25]:
pattern = r'(?P<date>\d{2})(?P<month>\d{2})(?P<year>\d{4})'
sent = 'today"s data is 06022021' # it only shows the first number

match = re.search(pattern,sent)

if match:
    print('Match ',match.group(0),' at position ',match.start())
    print(match.groupdict())
    print('Date - ',match.group(1))
    print('Month - ',match.group(2))
    print('Year - ',match.group(3))
    
    print(match.group('date')) # accessing groups through name
    print(match.group('month'))
    print(match.group('year'))
else:
    print('No match')

Match  06022021  at position  16
{'date': '06', 'month': '02', 'year': '2021'}
Date -  06
Month -  02
Year -  2021
06
02
2021
