#### Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?
**Ans:** Greedy syntax grabs the longest match, while non-greedy syntax grabs the shortest. <br/>To transform greedy to non-greedy in Python, add "?" after quantifiers, like changing "" to "?" or "+" to "+?".

In [18]:
import re
print(re.findall("a*", "aaaa")) # Greedy Match Synatx
print(re.findall("a*?", "aaaa")) # Non Greedy Match Syntax

['aaaa', '']
['', 'a', '', 'a', '', 'a', '', 'a', '']


#### Q2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?
**Ans:** Greedy quantifiers match as much as they can while still allowing the overall pattern to match. <br/>
In Python's re module, quantifiers like *, +, and ? are greedy by default.

Non-greedy quantifiers match as little as possible while still allowing the overall pattern to match. <br/>They stop as soon as the smallest possible match is found.<br/>
In Python's re module, we can make a quantifier non-greedy by adding a ? after the quantifier, such as *?, +?, or ??.

In [19]:
import re

text = "ababababab"

# Greedy matching
pattern = r"a.*b"
match = re.search(pattern, text)
print("Greedy match:", match.group())  # Output: "ababababab"

# Non-greedy (lazy) matching
pattern = r"a.*?b"
match = re.search(pattern, text)
print("Non-greedy match:", match.group())  # Output: "ab"


Greedy match: ababababab
Non-greedy match: ab


#### Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?
**Ans:** In a simple match of a string where we are looking for only one match and do not intend to use the matched groups for any further processing,<br/> using a non-capturing(nontagged) group (a group with (?: ... )) is unlikely to make a practical difference in Python.

In [38]:
#Using a capturing or tagged group
import re

text = "Hello World"
pattern = r"(World)"
match = re.search(pattern, text)
if match:
    print(match.group(1))  # Prints 'World'


World


In [49]:
#Using a non-capturing or nontagged group
import re

text = "Hello World"
pattern = r"(?:World)"
match = re.search(pattern, text)
if match:
    print(match.group())  # Prints 'World'


World


In both cases, we will achieve the same result. If we are not interested in capturing and using the matched group later in the code, we can use the non-capturing group for simplicity.

#### Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes ?
**Ans:** Here in the below Code Snippet **`.`** decimal is not tagged or captured. It will useful in scenarios where the separator of value in a string is of no use and we need to capture only the values.

In [57]:
import re
text='135.456'
pattern=r'(\d+)(?:.)(\d+)'
regobj=re.compile(pattern)
matobj=regobj.search(text)
matobj.groups()

('135', '456')

#### Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme ?
**Ans:**  
Look-ahead assertions in regular expressions are used to specify a condition that must be true for a match to occur, but they do not consume characters in the input string. This non-consumptive behavior can make a significant difference in the results of a program in various situations. Here's an example to illustrate this:

Suppose we have a list of email addresses, and you want to extract all the email addresses that are followed by a specific domain (e.g., "@example.com"). You want to find these email addresses without consuming the domain part.

Let's say you have the following list of email addresses:

1. john.doe@example.com<br/>
2. jane.smith@example.net<br/>
3. alice@example.com<br/>

We want to extract only the email addresses ending with "@example.com." we can use a look-ahead assertion to achieve this without consuming the domain part.

The required Regular Expression is: .*@example\.com(?=)

In this regex pattern, .* matches any characters before the "@" symbol, and @example\.com matches the desired domain. However, the critical part is (?=), which is a positive look-ahead assertion. It asserts that the text following the "@" symbol must be "@example.com" but does not consume any characters.

In [62]:
import re
print(re.findall('.*@example\.com(?=)','john.doe@example.com'))
print(re.findall('.*@example\.com(?=)','jane.smith@example.net'))
print(re.findall('.*@example\.com(?=)','alice@example.com'))

['john.doe@example.com']
[]
['alice@example.com']


Thus, the non-consumptive behavior of look-ahead conditions can be essential in situations where you need to find specific patterns in the input string without including them in the match results.

#### Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead ?
**Ans:**

1. Positive Look-Ahead (?=...): <br/>
It specifies a pattern that must be followed by the main pattern but does not include the lookahead pattern in the match.

For example, the regular expression /a(?=b)/ will match the "a" only if it is followed by a "b", but it will not include the "b" in the match.<br/> So, it will match the "a" in "abc" but not in "axc".

2. Negative Look-Ahead (?!...):<br/>
It specifies a pattern that must not be followed by the main pattern, and if the lookahead pattern is found at the specified position, the match fails.

For example, the regular expression /a(?!b)/ will match the "a" only if it is not followed by a "b". It will match the "a" in "axc" but not in "abc".


#### Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?
**Ans:** Referring to groups by name rather than by number in a standard expression helps to keep the code clear and easy to understand.

In [58]:
import re

text = "John's email is john@example.com, and Jane's email is jane@example.com."

# Using named groups to extract email addresses
pattern = r"(?P<name>\w+)'s email is (?P<email>\S+)"
matches = re.finditer(pattern, text)

for match in matches:
    name = match.group("name")
    email = match.group("email")
    print(f"{name}: {email}")


John: john@example.com,
Jane: jane@example.com.


In above example, we define two named groups, "name" and "email," which makes it clear what each part of the pattern represents. When we retrieve the captured text using match.group("name") and match.group("email"), it enhances the code's readability and maintainability.

#### Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?

In [67]:
import re
text = "The cow jumped over the moon"
regobj=re.compile(r'(?P<w1>The)',re.I)
regobj.findall(text)

['The', 'the']

#### Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not ?
**Ans:** The **Scanner** interface is typically used to tokenize a string into smaller, meaningful units called tokens. It allows to define patterns for different types of tokens in input string and then scan the string to extract these tokens. This is especially useful for parsing structured data or programming languages where we need to break down the input into distinct components, like keywords, identifiers, literals, and operators.

**re.findall**, on the other hand, is primarily used for regular expression-based string matching and extraction. While it can extract substrings based on patterns, it doesn't provide a straightforward way to break down the input string into structured tokens.

In [15]:
#Example
import re

# Create a Scanner object with a custom name
scanner = re.Scanner([
    (r'\d+', lambda scanner, token: ('INTEGER', token)),
    (r"\d+\.\d*", lambda scanner, token: ('Float', token)),
    (r'[a-zA-Z_]\w*', lambda scanner, token: ('IDENTIFIER', token)),
    (r'=|\+|-|\*|/', lambda scanner, token: ('OPERATOR', token)),
    (r"\s+", lambda scanner, token: ('NONE', token)),
])

# Use the custom Scanner object
text = '42 +    var_name'
tokens, remainder = scanner.scan(text)

print(tokens)

[('INTEGER', '42'), ('NONE', ' '), ('OPERATOR', '+'), ('NONE', '    '), ('IDENTIFIER', 'var_name')]


#### Q10. Does a scanner object have to be named scanner?
**Ans:** No, a Scanner object in Python does not have to be named "scanner." We can use any valid variable name that adheres to Python's naming rules and conventions. 

In [14]:
#Example
import re

# Create a Scanner object with a custom name
my_custom_scanner = re.Scanner([
    (r'\d+', lambda scanner, token: ('INTEGER', token)),
    (r"\d+\.\d*", lambda scanner, token: ('Float', token)),
    (r'[a-zA-Z_]\w*', lambda scanner, token: ('IDENTIFIER', token)),
    (r'=|\+|-|\*|/', lambda scanner, token: ('OPERATOR', token)),
    (r"\s+", lambda scanner, token: ('NONE', token)),
])

# Use the custom Scanner object
text = '42 +    var_name'
tokens, remainder = my_custom_scanner.scan(text)

print(tokens)

[('INTEGER', '42'), ('NONE', ' '), ('OPERATOR', '+'), ('NONE', '    '), ('IDENTIFIER', 'var_name')]
