## re.match() vs re.findall() vs re.search()

In [1]:
import re 
text = ''' 
"One cannot have enough socks with him", said Dora. 
"Another has come and gone and I didn't get a single pair. 
People will keep on insisting on giving me books." 
Christmas Quote 
''' 
regex = 'Christ.*' 
print(re.match(regex, text),",",re.findall(regex, text)) 

None , ['Christmas Quote ']


<h3>re.match()</h3>

re.match() function will search the regular expression pattern and return the first occurrence. It checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object. But if a match is found in some other line, it returns null.

**Example**
```
import re 
result = re.match(r'Hello', 'Hello Quora Hello') 
print result.group(0)
```
**Output**
Hello  
Here it matched the first word ‘Hello’.

<h3>re.search()</h3>

re.search() function will search the regular expression pattern and return the first occurrence. Unlike re.match, it will check all lines of the input string. It returns a match object when the pattern is found and “null” if the pattern is not found

**Example**
```
import re 
result = re.search(r'Quora', 'Hello Quora Hello') 
print result.group(0)
```

**Output**
Quora  

<h3>re.findall()</h3>

findall() module is used to search for “all” occurrences that match a given pattern. In contrast, search() module will only return the first occurrence that matches the specified pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.

**Example**
```
import re 
results = re.search(r'Hello', 'Hello Quora Hello') 
 
for result in results: 
	print(result)
```
**Output**
Hello 
Hello 

In [11]:
import re 
new_input = ['18:29', '23:55', '123', 'ab:de', '18:299', '99:99']
results = lambda x: re.match('[0-9]{2}:[0-9]{2}', x) != None
for x in new_input:
  print(results(x),end=" ")

True True False False True True 

In [12]:
import re
s1 = 'Artificial Intelligence consists of Machine Learning and Deep Learning'
pattern = 'Learning'
for match in re.finditer(pattern, s1):
  s = match.start()
  e = match.end()
  print('%d:%d'%(s, e),end=" ")

44:52 62:70 

In [22]:
import re
text = ''' Extract the doamin from the urls www.scaler.com'''
pattern = r'(www.([A-Za-z_0-9-]+)(.\w+))'

find_iter_result = re.finditer(pattern, text)
for i in find_iter_result:
  print(i)     #output1

find_all_result = re.findall(pattern, text)
for i in find_all_result:
  print(i)          #output2

<re.Match object; span=(34, 48), match='www.scaler.com'>
('www.scaler.com', 'scaler', '.com')


`Explanation:`

- re.findall() returns a list of matched patterns in string. So, first output is 'www.scaler.com'.
If more than one capture group is used in regex, output will be a list of tuples. Each element will be a tuple containing portions matched by all the capturing groups.
So, output will be ('www.scaler.com', 'scaler', '.com').
Therefore, i[1] returns 'scaler'.


- re.finditer() returns iterator of matched objects in the string. And calling groups(i) method retrieves the ith captured group of the matched object. Here in this case group(1) would retrieve "www.scaler.com" as it is the first group of the matched object.

In [25]:
import re
target_string = '''Emma is a basketball player who was born on June 17, 1993. 
She played 112 matches with a scoring average of 26.12 points per game. Her weight is 51 kg.'''

result = re.finditer(r"\d{2}", target_string)
for match_obj in result:
    print(match_obj.group(),end=" ")

17 19 93 11 26 12 51 

- `re.finditer()` method finds all the matches and returns an iterator yielding match objects matching the regex pattern. We can iterate each Match object and extract its value.

- And we can use group() method on the match objects to retrieve the value of groups in the matched object the given code re.finditer(r"\d{2}", target_string) searches for two consecutive digits in the target_string and return an iterator.

## re.compile in RegEx 

In [34]:
import re
text1 = '**//DataScience// - 12. '
pattern = re.compile('[\W_]+')
print(pattern.sub('', text1))

DataScience12


`Explanantion:`

- The uppercase counterpart \W (non-word-character) matches any single character that doesn’t match by \w (same as [^a-zA-Z0-9_]).
- Therefore pattern.sub(‘’, text1) replaces all the substrings consisting of non-word characters and replaces them with empty string in text1.

![image.png](attachment:d7896dc7-7199-474d-8cc6-411e00660c2b.png)

In [48]:
import re
paragraph = '''
An investment of  $1 in the year 1801,  would have given you $18087791.41 today.
This is a 7.967% return on investment.
But with an investment of only $0.25 in 1801, you would end up with $4521947.8525.
'''
result = [x[0] for x in re.findall('(\$[0-9]+(\.[0-9]*)?)', paragraph)]
print(result)

['$1', '$18087791.41', '$0.25', '$4521947.8525']


Explanation:

- The regex `(\$[0-9]+(\.[0-9]*)?)` firstly matches the dollar sign `$`. Secondly, it matches a number with atleast one digit with digits between 0 and 9. Then, it matches an arbitrary number of decimal values after the (escaped) dot character ‘.’
- (this last match is optional as indicated by the zero-or-one regex quantifier ?).
- we use list comprehension to extract only the first tuple value of all three resulting matches meaning we are extracting the whole matches not their individual group.
[‘1’,‘18087791.41’, ‘0.25’,‘4521947.8525’] is the only options with all elements starting with ‘$’, therefore correct.

![image.png](attachment:b7ba4f3d-e93b-4105-9d56-8e6b3467cc4e.png)

In [53]:
import re 
text = "aabab"
greedy = 'a.*b'
lazy = 'a.*?b' 
print(re.findall(greedy, text)) 
print(re.findall(lazy, text)) 

['aabab']
['aab', 'ab']


For example, consider the pattern `a.*?b` and the string `"aabab"` The lazy quantifier `.*?` will match `"aab"` as the minimum possible match, whereas the greedy quantifier `.*` would match the entire `"aabab"` string.



## Q. Remove all vowels from the given comment

In [85]:
import re
def remove_vowels(comment):    
    
    pattern = '[aeiouAEIOU]'
    # Replace all occurrences of vowels with empty space
    result = re.sub(pattern, '', comment)

    return result

comment = "adam is not a good person"
remove_vowels(comment)

'dm s nt  gd prsn'

## Q.Validate debit card

![image.png](attachment:c905fe35-e966-4a2f-b018-ca6477e42e76.png)

In [86]:
import re
def card(x):
    '''
    Input x : 16 digit number
    Ouput:  Return whether its a valid debit card number or not
    '''
    pattern = r"^(4|5|6)\d{15}$"
    # Use regex to match the pattern against the debit card number
    if re.match(pattern, x):
        return 'Valid'
    else:
        return "Invalid"
    
card("4245647893112578")

'Valid'

## Password Valid or not?

![image.png](attachment:addd58fb-d19a-4d9c-8467-9a04f779d9d5.png)

In [90]:
import re
def valid_or_not(password):
    ''' password is a string
        Output -> The function is expected to be returning a string one of:
         Valid Password / Invalid Password'''
         
    # YOUR CODE GOES HERE
    length_pattern = r".{8,}"  # At least 8 characters
    uppercase_pattern = r"[A-Z]"  # At least one uppercase character
    lowercase_pattern = r"[a-z]"  # At least one lowercase character
    digit_pattern = r"\d"  # At least one digit
    special_char_pattern = r"[$#@]"  # At least one special character ($, #, or @)
    space_pattern = r"\s"  # No spaces
    
    combined_pattern = f"^(?={length_pattern})(?=.*{uppercase_pattern})(?=.*{lowercase_pattern})(?=.*{digit_pattern})(?=.*{special_char_pattern})(?!.*{space_pattern}).*$"

    if re.match(combined_pattern, password):
        return 'Valid Password'
    else:
        return 'Invalid Password'

valid_or_not('Nikhil2709$')

'Valid Password'

In this version, we define the regular expressions for each condition separately.<br> 
We then combine them into a single pattern using positive lookahead assertions (`(?=...)`)<br>
This ensures that all conditions are met simultaneously.

## ‘R’ or ‘r’ In python 
Python raw string treats the backslash character (\) as a literal character

In [66]:
print(r'good\nboy')
r'good\nboy'

good\nboy


'good\\nboy'

In [64]:
print('good\nboy')
'good\nboy'

good
boy


'good\nboy'

In [80]:
s = "\\examplehost\\digitalocean\\content\\"
print(s)
s = r"\\examplehost\\digitalocean\\content\\"
print(s)

\examplehost\digitalocean\content\
\\examplehost\\digitalocean\\content\\
