Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?
--->
Greedy syntax: Matches as much text as possible.

Non-greedy syntax: Matches as little text as possible.

Transformation: Add a ? after the quantifier to change from greedy to non-greedy.

This minimal change allows you to switch the behavior of the regular expression from greedy to non-greedy, altering the way it matches text in a pattern.

Q2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?
---> The distinction between greedy and non-greedy matching makes a difference when you have a pattern that can match multiple occurrences of a subexpression and you want to control the extent of the match.

If you are looking for a non-greedy match but the only available option is a greedy match, there are a few possible approaches:

Modify the Pattern: If you have control over the pattern, you can modify it to make it more specific. By making the pattern more precise, you can potentially avoid the need for non-greedy matching. This approach is preferred if you can accurately define the specific boundaries of the desired match.

Use Non-Greedy Quantifiers: If the pattern allows, you can apply non-greedy quantifiers (*?, +?, ??, {m,n}?) to the quantified subexpression. This tells the regex engine to match as little text as possible. It allows you to enforce non-greedy behavior even in situations where only greedy matches are available.

Combine Greedy and Non-Greedy Matching: Sometimes, it may be necessary to combine both greedy and non-greedy matching in a single pattern. You can achieve this by using a combination of positive/negative lookaheads or lookbehinds, or by using alternations (|) to specify multiple possible matches.

Post-Process the Greedy Match: If none of the above options are suitable, and you are left with a greedy match but require a non-greedy one, you can post-process the greedy match to extract the desired portion using string manipulation functions or additional logic outside the regular expression.

Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?
---> In a simple match where you are not capturing or extracting specific subgroups, using a non-tagged group will not significantly affect the result of the match. It primarily serves organizational and quantification purposes in the pattern.

Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.
---> One scenario where using a non-tagged category (non-capturing group) can have a significant impact on program outcomes is when you are using complex regular expressions with alternations and want to extract specific parts of the match while ignoring others.

In [2]:
#Example
import re

text = """Name: John Doe
Age: 25
Country: USA

Name: Jane Smith
Age: 30
Country: Canada"""

pattern = r"(?:Name: (\w+).*Age: (\d+))"
matches = re.findall(pattern, text)

for match in matches:
    name, age = match
    print(f"Name: {name}, Age: {age}")

Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.
---> One situation where the non-consumable nature of a look-ahead condition in a regular expression could make a difference in the results of your program is when you need to check for the presence or absence of a specific pattern without including it in the final match.

Consider a scenario where you want to find email addresses that are followed by a specific domain but do not want to include the domain in the matched result. Let's say you are searching through a large text document and want to extract email addresses that are followed by "@example.com".

Using a look-ahead condition in this case can be useful. The look-ahead assertion (?=...) allows you to check if a specific pattern exists without consuming the characters that make up that pattern

In [3]:
#Example
import re

text = "Email addresses: john@example.com, jane@example.com, johndoe@gmail.com"

pattern = r"\w+@(?=example.com)"
matches = re.findall(pattern, text)

for match in matches:
    print(match)


john@
jane@


Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?
---> In regular expressions, positive look-ahead and negative look-ahead are two types of look-ahead assertions that allow you to check for the presence or absence of a specific pattern ahead of the current position in the string, without including it in the actual match. The main difference between them lies in their conditions and purposes:

1. Positive Look-Ahead ((?=...)):

Positive look-ahead is denoted by (?=...) syntax.
It asserts that a specific pattern must be present ahead of the current position in the string.
It does not consume any characters during the match.
It is used to include matches where the specified pattern exists ahead.
Example: pattern1(?=pattern2) matches occurrences of pattern1 only if they are followed by pattern2.

2. Negative Look-Ahead ((?!...)):

Negative look-ahead is denoted by (?!...) syntax.
It asserts that a specific pattern must not be present ahead of the current position in the string.
It does not consume any characters during the match.
It is used to exclude matches where the specified pattern exists ahead.
Example: pattern1(?!pattern2) matches occurrences of pattern1 only if they are not followed by pattern2.

Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?
---> Using named groups in regular expressions improves code readability, makes the code self-documenting, allows for flexible refactoring, and increases the robustness of your code. It is especially beneficial when dealing with complex regular expressions and when capturing multiple groups with specific meanings.

Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?
---> In regular expressions, named groups are primarily used for capturing and extracting specific portions of the target string. They are not specifically designed to identify repeated items within the target string. However, you can utilize named groups along with other regex features to achieve such functionality.

If you want to identify repeated items within a target string using named groups, you can use backreferences combined with capturing groups. A capturing group captures a specific pattern, and a backreference refers to a previously captured group. By comparing the backreference with the current capturing group, you can identify repeated items.

Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?
---> When parsing a string, one thing that the Scanner interface in Python's re module does for you that the re.findall function does not is that it allows you to iterate over the matches and process them one at a time, rather than returning all matches at once.

The Scanner interface provides a way to scan through a string and find matches incrementally. It returns a match object for each match found, allowing you to access the matched text and perform specific actions on it.


In [4]:
import re

text = "The cow jumped over the moon"

pattern = r"\b\w+\b"
scanner = re.Scanner([(pattern, lambda scanner, token: token)])

matches = scanner.scan(text)

for match in matches:
    print(match)  # Output: ('The',), ('cow',), ('jumped',), ('over',), ('the',), ('moon',)


['The']
 cow jumped over the moon


Q10. Does a scanner object have to be named scanner?
---> No, a Scanner object does not have to be named "scanner". The name given to the Scanner object is arbitrary and can be chosen based on your preference or the context of your code.