# Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?

**Ans:**

- **Greedy vs. Non-Greedy:** Greedy quantifiers in regular expressions match as much text as possible, while non-greedy (or lazy) quantifiers match as little as possible.

- **Transformation:** To change a greedy pattern into a non-greedy one, add a `?` after the quantifier. For example, change `*` to `*?`, or `+` to `+?`.

In [1]:
# with greedy quantifier
import re

text = "abcabcabc"
pattern = r"a.*c"
match = re.search(pattern, text)
print(match.group()) 


abcabcabc


In [2]:
# with non-greedy quantifier
import re

text = "abcabcabc"
pattern = r"a.*?c"
match = re.search(pattern, text)
print(match.group())  # Output will be "abc"

abc


In the first example with the greedy quantifier (`.*`), it matches from the first "a" to the last "c." In the second example with the non-greedy quantifier (`.*?`), it matches from the first "a" to the next "c," resulting in a shorter match.

# Q2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?

**Ans:**

Greedy versus non-greedy quantifiers make a difference when you're dealing with text that contains multiple occurrences of the pattern you're trying to match.

- **Greedy**: Greedy quantifiers try to match as much text as possible while still allowing the overall pattern to match. They are appropriate when you want to find the longest possible match within the text.


- **Non-Greedy (or Lazy)**: Non-greedy quantifiers, denoted by adding a `?` to the quantifier, try to match as little text as possible while still allowing the overall pattern to match. They are useful when you want to find the shortest possible match within the text.


*For example*, let's say we have the text `"abracadabra"` and we looking for a pattern between the first "a" and the last "a":

- Greedy: `.*a` would match the entire string `"abracadabra"`.
- Non-Greedy: `.*?a` would match only `"abra"`.

# Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?

**Ans:**

In a simple match of a string where you're only looking for one match and not doing any replacement, the use of a non-capturing group (a group that starts with `?:`) versus a capturing group (a group without `?:`) generally won't make a practical difference. 


Capturing groups are typically used when we want to extract specific portions of the matched text or when using backreferences within the regular expression. Non-capturing groups are useful when we want to create a group for logical grouping and applying quantifiers or modifiers but don't need to extract that specific part of the match.

In [3]:
# Example:

import re

text = "I have an apple5, another apple42, and one more apple123."

# Using a capturing group to extract the matched words
matches = re.findall(r'apple(\d+)', text)
print("Capturing group matches:", matches)

# Using a non-capturing group
matches_non_capturing = re.findall(r'apple(?:\d+)', text)
print("Non-capturing group matches:", matches_non_capturing)

Capturing group matches: ['5', '42', '123']
Non-capturing group matches: ['apple5', 'apple42', 'apple123']


# Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.

**Ans:**

consider a scenario where parsing a large log file containing records of various events, and we want to extract specific pieces of information from each log entry. 

Let's say each log entry follows a pattern like this:

```
[INFO] User 'Alice' logged in from IP: 192.168.1.100
[ERROR] Connection failed for user 'Bob' with IP: 192.168.1.200
```

We want to extract the usernames and IP addresses, but we are not interested in the log levels ("INFO" or "ERROR"). In this case, using non-capturing groups can have a significant impact on the program's outcomes.

Using a capturing group for the log level:

In [4]:
import re

log = "[INFO] User 'Alice' logged in from IP: 192.168.1.100"
match = re.match(r'\[(\w+)\] User \'(\w+)\' logged in from IP: (\d+\.\d+\.\d+\.\d+)', log)

log_level = match.group(1)
username = match.group(2)
ip_address = match.group(3)

print(f"Log Level: {log_level}")
print(f"Username: {username}")
print(f"IP Address: {ip_address}")

Log Level: INFO
Username: Alice
IP Address: 192.168.1.100


The program's outcome includes the log level, which we're not interested in.

Now, let's use non-capturing groups to exclude the log level:

In [5]:
import re

log = "[INFO] User 'Alice' logged in from IP: 192.168.1.100"
match = re.match(r'\[(?:\w+)\] User \'(\w+)\' logged in from IP: (\d+\.\d+\.\d+\.\d+)', log)

username = match.group(1)
ip_address = match.group(2)

print(f"Username: {username}")
print(f"IP Address: {ip_address}")

Username: Alice
IP Address: 192.168.1.100


By using non-capturing groups `(?:\w+)` for the log level, we exclude it from the program's outcome, resulting in cleaner and more focused results, especially when processing a large number of log entries.

# Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.

**Ans:**

Consider a scenario where you're analyzing a text document and need to find all occurrences of a specific word, but you want to exclude instances where the word is followed by another specific word. For example, you want to find all occurrences of "apple" but not when it's followed by "pie."

Using a lookahead condition makes a significant difference in the results because it allows you to match "apple" without consuming the characters that follow. 

Here's an example:

In [17]:
import re

text = "I love apple. My favorite dessert is apple pie."

# Match "apple" but not when followed by "pie"
pattern = r'apple(?! pie)'
matches = re.findall(pattern, text)

print(matches)


['apple']


In this example, the lookahead condition (?! pie) ensures that "apple" is matched only when it's not followed by "pie." 

# Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?

**Ans:**

Positive look-ahead checks for the presence of a pattern, while negative look-ahead checks for the absence of a pattern ahead of the current position in the text.

# Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?

**Ans:**

Using named groups enhances the clarity, maintainability, and robustness of the regular expressions, making them easier to work with and understand.

# Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?

**Ans:**

In [21]:
import re

text = "The cow jumped over the moon"

# Define a pattern with a named group to capture repeated words
pattern = r'\b(?P<word>\w+)\b.*\b\1\b'

# Search for repeated words in the text
matches = re.finditer(pattern, text, re.IGNORECASE)

# Extract and print the repeated words
for match in matches:
    repeated_word = match.group('word')
    print(f"Repeated word: {repeated_word}")

Repeated word: The


# Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?

**Ans:**

Scanner is commonly used in Java.
The Scanner interface might offer a built-in way to tokenize a string, breaking it down into meaningful units or tokens. This can be useful for more complex parsing tasks where you need to process individual components of the input string, such as words, numbers, or symbols. While re.findall can find patterns, it doesn't inherently perform tokenization.

# Q10. Does a scanner object have to be named scanner?

To create a Scanner object, you use the new keyword. To create a Scanner object that gets input from the keyboard, specify System.in in the parentheses.