In [2]:
'''Q1 - Explain the difference between greedy and non-greedy syntax with visual terms in as few words
as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy
one? What characters or characters can you introduce or change?'''
#Answer:
'''The difference between greedy and non-greedy syntax in regular expressions is related to how they match patterns:

- Greedy: Matches as much as possible, trying to match the longest possible substring that satisfies the pattern.

- Non-greedy: Matches as little as possible, trying to match the shortest possible substring that satisfies the pattern.

To transform a greedy pattern into a non-greedy one, you can introduce or change a single character: ?.

By adding a ? after a quantifier (*, +, ?, {m,n}), you can change it from greedy to non-greedy (or vice versa).'''

#Exp:
import re

text = 'Hello, World!'

# Greedy pattern
greedy_pattern = r'H.*o'
greedy_match = re.search(greedy_pattern, text)
print('Greedy match:', greedy_match.group()) 

# Non-greedy pattern
non_greedy_pattern = r'H.*?o'
non_greedy_match = re.search(non_greedy_pattern, text)
print('Non-greedy match:', non_greedy_match.group())

Greedy match: Hello, Wo
Non-greedy match: Hello


In [4]:
#Q2 - When exactly does greedy versus non-greedy make a difference?  What if you&#39;re looking for a non-greedy match but the only one available is greedy?
#Answer:
'''The difference between greedy and non-greedy matching in regular expressions becomes significant when there are multiple possible 
matches within the given text.

Greedy matching, as mentioned before, aims to match the longest possible substring that satisfies the pattern. On the other hand, 
non-greedy matching seeks to match the shortest possible substring that satisfies the pattern.'''

#Exp:
import re

text = 'Hello, World! This is a test.'

greedy_pattern = r'H.*o'
greedy_match = re.search(greedy_pattern, text)
print('Greedy match:', greedy_match.group())

non_greedy_pattern = r'H.*?o'
non_greedy_match = re.search(non_greedy_pattern, text)
print('Non-greedy match:', non_greedy_match.group())



Greedy match: Hello, Wo
Non-greedy match: Hello


In [6]:
#Q3 - In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?
#Answer:
'''In a simple match of a string where you are only looking for one match and not performing any replacements, the use of a non-tagged group 
(a group without capturing parentheses) does not make any practical difference in terms of the final result or the match itself.

A non-tagged group is a group in a regular expression pattern that is used for grouping but does not capture the matched substring. 
It is denoted by using a non-capturing parentheses (?:...) or by omitting the parentheses altogether.'''

#Exp:
import re

text = 'Hello, World!'

# Pattern with non-tagged group
pattern_with_group = r'Hello(?:, World)!'
match_with_group = re.search(pattern_with_group, text)
print('Match with group:', match_with_group.group())

# Pattern without group
pattern_without_group = r'Hello, World!'
match_without_group = re.search(pattern_without_group, text)
print('Match without group:', match_without_group.group())


Match with group: Hello, World!
Match without group: Hello, World!


In [8]:
#Q4 - Describe a scenario in which using a nontagged category would have a significant impact on the program&#39;s outcomes.
#Answer:
'''One scenario where using a non-tagged category (a non-capturing group) in a regular expression can have a significant impact on the program's 
outcomes is when using the re.findall() function.

The re.findall() function returns all non-overlapping matches of a pattern in a string as a list of strings. When capturing groups are 
used in the pattern, re.findall() returns only the captured groups rather than the full matching substrings. However, by using 
non-tagged groups, you can control whether a group should be included in the results or not.'''

#Exp:
import re

text = 'I have 3 cats and 5 dogs.'

# Pattern with tagged group
pattern_with_group = r'(\d+) (cats|dogs)'
matches_with_group = re.findall(pattern_with_group, text)
print('Matches with group:', matches_with_group)

# Pattern with non-tagged group
pattern_with_non_tagged_group = r'\d+ (?:cats|dogs)'
matches_with_non_tagged_group = re.findall(pattern_with_non_tagged_group, text)
print('Matches with non-tagged group:', matches_with_non_tagged_group)


Matches with group: [('3', 'cats'), ('5', 'dogs')]
Matches with non-tagged group: ['3 cats', '5 dogs']


In [9]:
#Q5 - Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.
#Answer:
'''The non-consuming nature of a look-ahead condition in a regular expression can make a difference in the results of a program in various 
situations where you need to assert a certain condition without actually consuming the characters being examined. Here's an example scenario:

Let's say you have a text file containing multiple lines, and you want to extract all the lines that contain the word "apple" followed by 
the word "pie". However, you don't want to include the word "pie" in the extracted lines.'''

#Exp:
import re

text = '''
I love eating apple pie.
The apple pie is delicious.
I enjoy baking apple strudel.
'''

pattern = r'.*apple(?=\spie)'
matches = re.findall(pattern, text)
print(matches)


['I love eating apple', 'The apple']


In [11]:
#Q6 - In standard expressions, what is the difference between positive look-ahead and negative look- ahead?
#Answer:
'''In regular expressions, positive look-ahead and negative look-ahead are two types of look-ahead assertions that allow you to specify conditions 
that must (positive) or must not (negative) be true following a specific pattern.

Here's the difference between positive look-ahead ((?=...)) and negative look-ahead ((?!...)) in regular expressions:

1. Positive Look-ahead ((?=...)):
Positive look-ahead is used to assert that a specific pattern must be followed by another pattern. It checks whether the subexpression within 
the look-ahead, represented by ..., matches the input immediately after the current position, without consuming the characters. If the 
look-ahead condition is met, the match proceeds.'''

#Exp:
import re

text = 'apple pie'

# Positive look-ahead
pattern = r'apple(?=\spie)'
match = re.search(pattern, text)
if match:
    print('Matched:', match.group())

'''2. Negative Look-ahead ((?!...)):
Negative look-ahead is used to assert that a specific pattern must not be followed by another pattern. It checks whether the subexpression 
within the look-ahead, represented by ..., does not match the input immediately after the current position. If the negative look-ahead 
condition is satisfied (the subexpression does not match), the match proceeds.'''

#Exp:
import re

text = 'apple pie'

# Negative look-ahead
pattern = r'apple(?!.*pie)'
match = re.search(pattern, text)
if match:
    print('Matched:', match.group()) 


Matched: apple


In [13]:
#Q7 - What is the benefit of referring to groups by name rather than by number in a standard expression?
#Answer:
'''Referring to groups by name instead of by number in a regular expression offers several benefits:

1. Improved Readability: Using named groups makes the regular expression more readable and self-explanatory. Group names can provide 
meaningful labels or identifiers that describe the captured content, making it easier to understand the purpose of each group.

2. Enhanced Maintenance: Named groups can simplify maintenance and modifications of regular expressions. When the pattern needs to be 
updated or extended, using named groups allows you to refer to the captured content by their logical names, which reduces the risk of errors 
and makes it easier to understand the changes being made.

3. Avoidance of Group Order Dependency: When using numbered groups, the order of the groups matters. If the pattern is modified and the 
group positions change, it may require updating the referencing code accordingly. However, named groups eliminate this dependency on group 
positions, as the references are based on the group names instead.

4. Clearer Code Intention: Named groups make the code more expressive and convey the intention of the regular expression. By using named groups, 
it becomes evident which specific captured content is being accessed or manipulated, enhancing code clarity.

5. Simpler Code Adaptation: If the regular expression pattern is used in different contexts or scenarios, using named groups allows for easier 
adaptation and reuse of the pattern. By referencing groups by name, you can access the captured content consistently, regardless of its 
position within the pattern.'''

#Exp:
import re

text = 'John Doe, age 30'

# Using named groups
pattern_with_names = r'(?P<name>[A-Za-z ]+), age (?P<age>\d+)'
match_with_names = re.match(pattern_with_names, text)
print('Name:', match_with_names.group('name')) 
print('Age:', match_with_names.group('age'))

# Using numbered groups
pattern_with_numbers = r'([A-Za-z ]+), age (\d+)'
match_with_numbers = re.match(pattern_with_numbers, text)
print('Name:', match_with_numbers.group(1))  
print('Age:', match_with_numbers.group(2)) 


Name: John Doe
Age: 30
Name: John Doe
Age: 30


In [19]:
#Q8 - Can you identify repeated items within a target string using named groups, as in &quot;The cow jumped over the moon&quot;?
#Answer:
'''Yes, you can identify repeated items within a target string using named groups in regular expressions. You can define a named group and 
then refer to it later within the pattern to check for repeated occurrences of the same content.'''

#Exp:
import re

text = "the cow jumped over the moon"

pattern = r'(?P<word>\b\w+\b)(?=.*\b\1\b)'
matches = re.findall(pattern, text)

print(matches)  # Output: ['the']


['the']


In [None]:
#Q9 - When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?
#Answer:
When parsing a string, the Scanner interface in Java provides a feature that re.findall() in Python does not inherently offer: the ability to 
work with different token types and perform more complex parsing operations.

Here are some things that the Scanner interface in Java offers beyond what re.findall() provides:

1. Tokenization: The Scanner interface allows you to tokenize a string by breaking it into smaller units called tokens. You can specify 
delimiters or regular expressions to define the token boundaries, making it easier to process and analyze the string based on specific token 
types.

2. Token-based parsing: With the Scanner interface, you can parse and process different token types individually. It provides methods like 
next(), nextInt(), nextFloat(), etc., to read and parse tokens based on their expected data types. This enables you to handle different 
data types more easily during parsing operations.

3. Advanced parsing operations: The Scanner interface offers additional functionality for parsing, such as skipping patterns, setting 
custom locale and radix, and retrieving match results. It allows you to perform more complex parsing operations beyond simple matching 
and extraction, such as skipping certain patterns, controlling parsing behavior, and accessing detailed information about the matching process.

In [None]:
#Q10 - Does a scanner object have to be named scanner?
#Answer:
No, a Scanner object does not have to be named "scanner." The choice of variable name is up to the programmer and can be any valid variable 
identifier as per the programming language's naming rules.