Q1.Explain the difference between greedy and non-greedy syntax with visual terms in as few words
as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy
one? What characters or characters can you introduce or change?

Answer- Greedy and non-greedy syntax in regular expressions are used to control the behavior of matching patterns.

1. Greedy syntax: The greedy quantifiers match as much as possible. They try to consume the maximum number of characters that satisfy the pattern.

2. Non-greedy syntax: The non-greedy or lazy quantifiers match as little as possible. They try to consume the minimum number of characters that satisfy the pattern.

To transform a greedy pattern into a non-greedy one, the minimum effort required is to add a ? character after the quantifier.

For example:

Greedy pattern: .*
Non-greedy pattern: .*?
In the above example, .* is a greedy pattern that matches any number of characters. Adding ? after .* changes it into a non-greedy pattern that matches as few characters as possible.

Using visual terms, we can think of the greedy syntax as a "greedy eater" that tries to consume as much as it can, while the non-greedy syntax is a "reluctant eater" that takes only what it needs.

By adding the ? character after the quantifier, we modify the behavior of the pattern from greedy to non-greedy, enabling it to match the minimum instead of the maximum.






In [4]:
import re
print(re.findall("v*", "vvvvvv")) # Greedy Match Synatx
print(re.findall("v*?", "vvvvvv")) # Non Greddy Syntax


['vvvvvv', '']
['', 'v', '', 'v', '', 'v', '', 'v', '', 'v', '', 'v', '']


Q2.When exactly does greedy versus non-greedy make a difference?  What if you&#39;re looking for a
non-greedy match but the only one available is greedy?

Answer- The Greedy Match will try to match as many repetitions of the quantified pattern as possible. The Non Greedy Match will try to match as few repetitions of the quantified pattern as possible. If only Non Greedy Match is available, we can use other filtering or pattern matching methods of regex and further identify the required pattern.

Q3.In a simple match of a string, which looks only for one match and does not do any replacement, is
the use of a nontagged group likely to make any practical difference?

Answer- In a simple match of a string where you are only looking for one match and not performing any replacements, the use of a non-tagged group will not make any practical difference in terms of the result of the match.

Non-tagged groups, also known as non-capturing groups, are used in regular expressions for grouping patterns without capturing the matched substring. They are typically denoted by (?:pattern), where pattern represents the regular expression pattern.

Non-tagged groups are useful when you want to group a pattern for purposes such as applying quantifiers or alternations, but you don't need to capture the matched substring as a separate group.

However, in a simple match scenario where you are not capturing or referencing the matched substring, using a non-tagged group or omitting the group altogether will have the same outcome. The overall match result will be the same, regardless of whether a non-tagged group is used or not.

So, in the context of a simple match without any capturing or referencing needs, the use of a non-tagged group will not have any practical difference.

In [5]:
import re
phoneNumRegex = re.compile(r'\d\d\d')
num = phoneNumRegex.search('My number is 234-567-8901.')
print(f'Phone number found -> {num.group()}') # Non Tagged group
print(f'Phone number found -> {num.group(0)}') # Tagged Group


Phone number found -> 234
Phone number found -> 234


Q4.Describe a scenario in which using a nontagged category would have a significant impact on the
program&#39;s outcomes.

One scenario where using a non-tagged category can have a significant impact on the program's outcomes is when you are using a regular expression pattern with multiple capturing groups and you want to extract specific information from the matched string.

By default, when you use capturing groups in a regular expression, the matched substrings within those groups are captured and can be accessed or referenced separately. However, in some cases, you may not need to capture all the groups, and capturing unnecessary groups can have performance implications or lead to unnecessary memory usage.

In such a scenario, using non-tagged groups can make a difference. By converting certain capturing groups into non-tagged groups, you prevent those groups from capturing and storing the matched substrings. This can lead to improved performance and reduced memory usage, especially when dealing with large inputs or repetitive matching operations.

In [6]:
import re
text='135.456'
pattern=r'(\d+)(?:.)(\d+)'
regobj=re.compile(pattern)
matobj=regobj.search(text)
matobj.groups()


('135', '456')

Q5.Unlike a normal regex pattern, a look-ahead condition does not consume the characters it
examines. Describe a situation in which this could make a difference in the results of your
programme.

Answer- A situation where the non-consuming nature of a look-ahead condition in a regex pattern can make a difference in the results of a program is when you need to match a specific pattern that is followed by another pattern but you don't want to include the second pattern in the overall match.

For example, let's say you have a text document containing email addresses, and you want to extract all email addresses that are followed by the word "example" but without including "example" in the matched result.

Using a look-ahead condition in the regex pattern allows you to assert the presence of "example" after the email address without actually consuming or including it in the matched result. This can be achieved by using the syntax (?=pattern) for positive look-ahead.

Consider the following example regex pattern: (\w+@\w+\.\w+)(?=\s+example). In this pattern, we capture the email address using (\w+@\w+\.\w+) and then use the positive look-ahead (?=\s+example) to assert that the email address is followed by one or more whitespace characters and the word "example."

Q6.In standard expressions, what is the difference between positive look-ahead and negative look-
ahead?

Answer- Positive look ahead is an assertion continuing the search and extending the string e.g.pattern= r'abc(?=[A-Z])''. Here after 'abc', ? is extending the search and says that in the remaining string, first identify the next character should be capitalized character between A and Z, but doesnt consume it. Example of Positive lookahead :

In [7]:
import re
pat=r'abc(?=[A-Z])'
text="abcABCEF"
regobj=re.compile(pat)
matobj=regobj.findall(text)
print("Positive lookahead:",matobj)


Positive lookahead: ['abc']


Negative look head is also an assertion to exclude certain patterns e.g. pattern = r'abc(?!abc)', means identify a substring containing 'abc' which is not followed by another 'abc'. Example of Negative lookahead :

In [8]:
import re
pat1=r'abc(?!abc)'
text1="aeiouabcabc"
regobj1=re.compile(pat1)
matobj1=regobj1.findall(text)
print("Negative look ahead:",matobj1) 


Negative look ahead: ['abc']


Q7.What is the benefit of referring to groups by name rather than by number in a standard
expression?

Answer- Referring to groups by name rather than by number in a standard expression helps to keep the code clear and easy to understand.

Q8.Can you identify repeated items within a target string using named groups, as in &quot;The cow
jumped over the moon&quot;?

Answer- Yes, you can identify repeated items within a target string using named groups in Python's regular expressions. However, it's important to note that named groups alone cannot directly identify repeated items. They are used to capture and refer to specific parts of a pattern.

To identify repeated items, you can use backreferences along with named groups. Backreferences allow you to refer back to a previously captured group within the same regular expression pattern. By combining named groups with backreferences, you can identify repeated items in a target string.

In [10]:
import re
text = "The cow jumped over the moon"
regobj=re.compile(r'(?P<w1>The)',re.I)

regobj.findall(text)


['The', 'the']

Q9.When parsing a string, what is at least one thing that the Scanner interface does for you that the
re.findall feature does not?

Answer- re.search() method either returns None (if the pattern doesn’t match), or a re.MatchObject that contains information about the matching part of the string. This method stops after the first match, so this is best suited for testing a regular expression more than extracting data,whereas Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left to right, and matches are returned in the order found.

Q10.Does a scanner object have to be named scanner?

Answer- No, a Scanner object does not have to be named "scanner." The choice of variable name is completely up to you as the programmer. You can assign any valid variable name to the Scanner object based on your preference and to make it more meaningful in the context of your code.

In [12]:
import re

my_scanner = re.Scanner([
    (r'\d+', lambda scanner, token: ('INTEGER', int(token))),
    (r'[a-zA-Z]+', lambda scanner, token: ('WORD', token)),
])

input_string = '123 foo bar 456 baz'
tokens, remainder = my_scanner.scan(input_string)

print(tokens)


[('INTEGER', 123)]


In this example, a Scanner object is created and assigned to the variable my_scanner. The Scanner object is then used to tokenize the input_string based on the defined patterns and actions.

You can choose a variable name that best represents the purpose or role of the Scanner object in your code. It's a good practice to use meaningful and descriptive variable names to enhance the readability and maintainability of your code.