<b>Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?</b>

Ans: 
Greediness refers to the quantity of times the regex engine will try to match certain set of characters. The way to state the "greediness" of a regex expression is using the special characters *, +, ? and {}.

Consider

str = "asdfasdfbbbb"
r1 = /b/
r2 = /(asdf)*/
r3 = /b{3}/
r4 = /.*/
Matching these regex against str will result in:

r1 matching "asdfasdf b bbb" (non-greedy, tries to match b just once)
r2 matching "asdfasdf bbbb" (greedy, tries to match asdf as many times as possible)
r3 matching "asdfasdf bbb b" (non-greedy, matches b exactly 3 times)
r4 matching "asdfasdfbbbb" (ULTRA-greedy, matches almost any character as many times as possible)

As regex are means to represent specific text patterns, it's not like greediness it's a matter of approach. You will sometimes need to match three times foo(/(foo){3}/) or infinite times bar(/(bar)*/).



<b>Q2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?
</b>

Ans: my $string = 'bcdabdcbabcd';

$string =~ m/^(.*)ab/;

print "$1\n"; # prints: bcdabdcb

The * is greedy; therefore, the .* portion of the regex will match as

much as it can and still allow the remainder of the regex to match. In

this case, it will match everything up to the last 'ab'. Actually,

the .* will match right to the end of the string, and then start

backing up until it can match an 'ab' (this is called backtracking).

To make the quantifier non-greedy you simply follow it with a '?'

symbol:

my $string = 'bcdabdcbabcd';

$string =~ m/^(.*?)ab/;

print "$1\n"; # prints: bcd

In this case the .*? portion attempts to match the least amount of data

while allowing the remainder of the regex to match. Here the regex

engine will match the beginning of the string, then it will try to

match zero of anything and check to see if the rest can match (that

fails). Next, it will match the 'b' and then check again if the 'ab'

can match (still fails). This continues until the the .*? has matched

the first 3 characters and then the following 'ab' is matched.

You can make any of the standard quantifiers that aren't exact non-

greedy by appending a '?' symbol to them: *?, +?, ??, {n,m}?, and {n,}?.

One thing to watch out for: given a pattern such as /^(.*?)%(.*?)/ one

could match and extract the first two fields of a like of % separated

data:

#!/usr/bin/perl -w

use strict;

$_ = 'Johnson%Andrew%AX321%37';

m/^(.*?)%(.*?)%/;

print "$2 $1\n";

And one can easily begin to think of each subexpression as

meaning 'match up to the next % symbol', but that isn't exactly what it

means. Let's say that the third field represents an ID tag and we want

to extract only those names of people with ID tags starting with 'A'.



<b>Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?
</b>

<b>Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.
</b>

<b>Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.
</b>

<b>Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?</b>

Ans: The syntax is: X(?=Y), it means "look for X, but match only if followed by Y". There may be any pattern instead of X and Y.

For an integer number followed by €, the regexp will be \d+(?=€):
    
When we look for X(?=Y), the regular expression engine finds X and then checks if there’s Y immediately after it. If it’s not so, then the potential match is skipped, and the search continues.

More complex tests are possible, e.g. X(?=Y)(?=Z) means:

Find X.
Check if Y is immediately after X (skip if isn’t).
Check if Z is also immediately after X (skip if isn’t).
If both tests passed, then the X is a match, otherwise continue searching.
In other words, such pattern means that we’re looking for X followed by Y and Z at the same time.

That’s only possible if patterns Y and Z aren’t mutually exclusive.

For example, \d+(?=\s)(?=.*30) looks for \d+ that is followed by a space (?=\s), and there’s 30 somewhere after it (?=.*30):
    
Negative lookahead:
    
Let’s say that we want a quantity instead, not a price from the same string. That’s a number \d+, NOT followed by €.

For that, a negative lookahead can be applied.

The syntax is: X(?!Y), it means "search X, but only if not followed by Y".

let str = "2 turkeys cost 60€";

alert( str.match(/\d+\b(?!€)/g) ); // 2 (the price is not matched)


<b>Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?
</b>

Ans: In Python, the (?P<group_name>…) syntax allows one to refer to the matched string through its name:

import re
match = re.search('(?P<name>.*) (?P<phone>.*)', 'John 123456')
match.group('name')
'John'


<b>Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?
</b>

Ans: To match an UPPERCASE word
To match a number
To extract the uppercase word and number from the target string we must first write two regular expression patterns.

Pattern to match the uppercase word (The cow jumped over the moon)
Pattern to match the number (20).
The first group pattern to search for an uppercase word: [A-Z]+

[A-Z] is the character class. It means match any letter from the capital A to capital Z in uppercase exclusively.
Then the + metacharacter indicates 1 or more occurrence of an uppercase letter


<b>Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?
</b>

Ans: re.search()
re.search() method either returns None (if the pattern doesn’t match), or a re.MatchObject that contains information about the matching part of the string. This method stops after the first match, so this is best suited for testing a regular expression more than extracting data.
re.findall()
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.



<b>Q10. Does a scanner object have to be named scanner?</b>

Ans: myvar = MyClass(name="example")
In Java, the syntax is very similar:

in = new Scanner(System.in);
And just like in Python, we can now call methods on it, such as:

fahr = in.nextDouble();
This is just about identical to what the equivalent Python code would look like. It's possible I'm not understanding the source.

<b></b>

<b></b>

<b></b>

<b></b>

<b></b>

<b></b>