# Python Assignment 17

### Answer 1

* Greedy : By default the regular expression are greedy, which means the expression try to match as many matches as possible in a given string. This usually contains `'*'` and `'+'` character in expression.
* Non-Greedy : This expression will try to match as less matches as possible in a given string. This usually contain `'?'` in expression.

In [1]:
import re

String = "emailHome:rice@ineuron.ai, emailOffice:onion@ineuron.ai, emailShop:random@ineuron.ai"

greedy = re.compile(r'email.+:[\w]+@ineuron.ai')

# greedy expression is considering all the string as a match, here the pattern consider whole string as a match
greedy_res = greedy.findall(String)

non_greedy = re.compile(r'email.+?:[\w]+@ineuron.ai')

# non_greedy is match as small as possible, here result are three different email
non_greedy_res = non_greedy.findall(String)

print('Result is whole string in greedy:\n', greedy_res)
print('Result is three strings in non greedy:\n', non_greedy_res)

Result is whole string in greedy:
 ['emailHome:rice@ineuron.ai, emailOffice:onion@ineuron.ai, emailShop:random@ineuron.ai']
Result is three strings in non greedy:
 ['emailHome:rice@ineuron.ai', 'emailOffice:onion@ineuron.ai', 'emailShop:random@ineuron.ai']


### Answer 2

The greedy expression try to match as many character as possible while the non-greedy match as less character as possible. If we are looking for a non-greedy match but only available is greedy then to get a non-greedy match in regular expressions we need to use the modifier `'?'` after the quantifier.

### Answer 3

In a simple match of a string, where we are only looking for one match and not doing any replacement, using a non-tagged group likely will not make a practical difference in terms of the results of the match. A non-capturing group is specified using the syntax `(?:...)`.

### Answer 4

Here is a scenario in which using a nontagged category would have a significant import on the program's outcomes.

In [2]:
import re

urls = 'https://www.github.com/ganeshss0\nhttps://ineuron.ai\ngithub.com/iNeuronai'

# result contains a list of tuples as it consider there are two groups 
tagged = re.compile(r'(https:\/\/w*\.?)?([\S]+\.[\S]+[\/\S]*)')

# result contains a list of strings as it ignore the first capturing group
non_tagged = re.compile(r'(?:https:\/\/w*\.?)?([\S]+\.[\S]+[\/\S]*)')

print('Tagged Pattern:\n', tagged.findall(urls))
print('Non Tagged Pattern:\n', non_tagged.findall(urls))

Tagged Pattern:
 [('https://www.', 'github.com/ganeshss0'), ('https://', 'ineuron.ai'), ('', 'github.com/iNeuronai')]
Non Tagged Pattern:
 ['github.com/ganeshss0', 'ineuron.ai', 'github.com/iNeuronai']


### Answer 5

Here is a situation like if we want to extract date from a string we can use look-ahead in this case, as we know the regex won't include the look-ahead group in the result, so the result in this case is different from what we are accepting.

In [3]:
import re

Dates = '12/01/2002, 13/02/2000, 26/11/2008, 26/11/2009, 11/09/2000, 21/10/2003'

only_2000_y = re.compile(r'\d+/\d+/2000')
only_2000 = re.compile(r'\d+/\d+/(?=2000)')

print('Accepting:\n', only_2000_y.findall(Dates))

# the result would be better if the year is also added to result
print('Result from look-ahead:\n', only_2000.findall(Dates))

Accepting:
 ['13/02/2000', '11/09/2000']
Result from look-ahead:
 ['13/02/', '11/09/']


### Answer 6

* Positive Look Ahead: In this type of regular expression regex search a particular element which may be a character, characters or group after the item matched. If that particular element is present in the string then it find a match. The expression look like `'rice (?=onion)'` this will match `'rice'` only if it is followed by `'onion'` (not including `'onion'`).
* Negitive Look Ahead: In this type of regular expression, regex search a particular element which may be a character, characters or group after the item matched. If that particular element is not present in the string then if find a match. The expression look like `'rice (?!onion)'` this will match `'rice'` only if it is __not__ followed by `'onion'` (not including `'onion'`).

### Answer 7

Reffering groups by name rather than number is very usefull when the regular expression contain multiple groups and expression that have some conditions.

In [4]:
import re

ids = 'A1243, C4321, B4613, A6132, P9798, I0124'

# getting the pass_id from ids which start with A or B
named = re.compile(r'[A-B](?P<pass_id>[0-9]{4})')

# getting the number from which start with A or B
without_named = re.compile(r'[A-B]([0-9]{4})')

print('Refering by name:')
for match in named.finditer(ids):
    print(match.group('pass_id'))

print('Refering by number:')
for match in without_named.finditer(ids):
    print(match.group())

Refering by name:
1243
4613
6132
Refering by number:
A1243
B4613
A6132


### Answer 8

Yes we can find a repeadted word in a string with a named group.

In [5]:
import re

String = 'The cow jumped over the moon.'

# regular expression that match a word if it occur more than once
repeated = re.compile(r'(?P<repeated_word>\b\w+\b)(?=.*\1)', flags = re.IGNORECASE)

for match in repeated.finditer(String):
    print(match.groupdict())

{'repeated_word': 'The'}


### Answer 9

The `Scanner` object is very helpful in tokenizing the string which the `findall` method can't provide.

In [6]:
import re

my_tokens = re.Scanner([
   ( '[0-9]+', lambda scan, token: ('Integer', token)),
    ('[a-z]+', lambda scan, token: ('String', token)),
    ('[,.]+', lambda scan, token: ('Punctuation}', token)),
    ('[\s]+', None)
])

res, rem = my_tokens.scan('rice 20, onion 10, \nflower 30')
res

[('String', 'rice'),
 ('Integer', '20'),
 ('Punctuation}', ','),
 ('String', 'onion'),
 ('Integer', '10'),
 ('Punctuation}', ','),
 ('String', 'flower'),
 ('Integer', '30')]

### Answer 10

No, it is not mandatory to name `Scanner` object `scanner` we can name it as we want. As I have used `my_tokens` name for the `Scanner` object in the previous answer.