1.What is the name of the feature responsible for generating Regex objects in python?
Sol:
    The feature responsible for generating Regex objects in Python is the "re" module. The "re" module provides support for regular expressions in Python, and it includes functions for creating and working with Regex objects. You can use the "re.compile()" function to create a Regex object from a regular expression pattern, and then use the methods of the Regex object to search, match, and manipulate text.
2.Why do raw strings often appear in Regex objects?

Sol:
Raw strings are often used in regular expressions because they allow you to avoid backslashes being interpreted as escape characters. In regular expressions, backslashes are commonly used to escape special characters, such as the dot (.) or the asterisk (*), or to indicate special sequences, such as \d for digits or \s for whitespace.

However, Python also uses backslashes as escape characters in strings. This means that if you use a regular string to define a regular expression pattern, you need to use double backslashes (\) instead of a single backslash () to escape special characters in the pattern.

To avoid this confusion and make the regular expression pattern more readable, you can use a raw string, which is a string that is prefixed with the letter "r". Raw strings in Python do not interpret backslashes as escape characters, so you can use a single backslash to escape special characters in the regular expression pattern.

For example, if you want to match a string that contains a dot, you could define the regular expression pattern in a raw string like this: r".". Without the "r" prefix, you would need to write the regular expression pattern as "\.", which is less readable and more error-prone.
3. What is the return value of the search() method?

Sol.The search() method in Python's regular expression module (re) returns a match object if the regular expression pattern matches any part of the string being searched. If the pattern is not found, the search() method returns None.

The match object returned by search() can be used to obtain information about the match, such as the starting and ending positions of the match, the matched text, and any captured groups in the pattern. The match object also provides methods for further processing of the match, such as replacing the matched text with other text.

4.From a Match item, how do you get the actual strings that match the pattern?
Sol:
To get the actual strings that match the pattern from a Match object in Python, you can use the group() method.

If the regular expression pattern contains one or more capturing groups, you can pass an argument to group() specifying the index of the group you want to retrieve. Group 0 is always the entire match, and subsequent groups are numbered based on the order of their opening parentheses in the pattern.
5. In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover?
Group 2? Group 1?
Sol:
In the regular expression r'(\d\d\d)-(\d\d\d-\d\d\d\d)', group zero covers the entire match, group 1 covers the first capturing group, and group 2 covers the second capturing group.

Here's what each group covers:

Group 0: The entire match, which is the substring that matches the entire regular expression pattern.
Group 1: The first capturing group, which is the three digits before the hyphen in the input string.
Group 2: The second capturing group, which is the seven digits after the hyphen in the input string.
Here's an example of how to use groups in Python to extract the captured text:

In [1]:
import re

text = "My phone number is 123-456-7890."
pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'

match = re.search(pattern, text)

if match:
    # Print the entire match
    print("Entire match:", match.group(0))
    
    # Print the first capturing group
    print("Group 1:", match.group(1))
    
    # Print the second capturing group
    print("Group 2:", match.group(2))
else:
    print("Match not found.")

Entire match: 123-456-7890
Group 1: 123
Group 2: 456-7890


6.In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell
a regex that you want it to fit real parentheses and periods?
Sol:
In regular expressions, parentheses and periods (also known as dots) have special meanings, which means that if you want to match them as literal characters, you need to escape them with a backslash \.

To match real parentheses and periods using regular expressions in Python, you can use the backslash character \ to escape them. Here's an example that shows how to match a string containing parentheses and periods:

In [2]:
import re

text = "This is a (test) string with periods. And more periods..."

# Use backslashes to escape the parentheses and periods
pattern = r"\(test\) string with periods\. And more periods\.\.\."

match = re.search(pattern, text)

if match:
    print("Match found!")
else:
    print("Match not found.")


Match found!


7. The findall() method returns a string list or a list of string tuples. What causes it to return one of
the two options?
Sol:
    The findall() method in Python's re module returns either a list of strings or a list of string tuples depending on whether or not the regular expression pattern contains capturing groups.

If the regular expression pattern contains no capturing groups (i.e., no parentheses), then findall() returns a list of strings, where each string in the list is a non-overlapping match for the pattern in the input string.

For example, consider the following code:    

In [3]:
import re

text = "The quick brown fox jumps over the lazy dog."
pattern = r"\w+"

matches = re.findall(pattern, text)

print(matches)

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']


8.In standard expressions, what does the | character mean?
Sol:
In regular expressions, the | (vertical bar or pipe) character is used to specify alternation, which means "either this or that". It is a metacharacter that allows you to specify a choice between two or more patterns.

For example, the regular expression A|B matches either the character "A" or the character "B" in the input text. The vertical bar separates the two alternatives, and the regular expression engine will try to match the input text against each alternative in turn. If either alternative matches, the match succeeds.

Here's an example that shows how to use the | character in a regular expression:

In [4]:
import re

text = "The quick brown fox jumps over the lazy dog."

# Match either "fox" or "dog"
pattern = r"fox|dog"

matches = re.findall(pattern, text)

print(matches)


['fox', 'dog']


9. In regular expressions, what does the character stand for?
Sol:
    In regular expressions, the dot or period (.) character is a metacharacter that matches any single character except for a newline character (\n).

For example, the regular expression a.b matches a string that contains an "a", followed by any single character, followed by a "b". This pattern would match strings like "axb", "a4b", "acb", but not "ab" (because there's no character between the "a" and "b").

Here's an example that shows how to use the dot character in a regular expression:

In [5]:
import re

text = "The quick brown fox jumps over the lazy dog."

# Match any three characters starting with "q" and ending with "k"
pattern = r"q..k"

matches = re.findall(pattern, text)

print(matches)


[]


10.In regular expressions, what is the difference between the + and * characters?
Sol:
    In regular expressions, the + and * characters are quantifiers that specify how many times the preceding character or group should be matched.

The + character matches one or more occurrences of the preceding character or group. For example, the regular expression a+ matches one or more "a" characters. It would match strings like "a", "aa", "aaa", and so on, but not an empty string.

The * character matches zero or more occurrences of the preceding character or group. For example, the regular expression a* matches zero or more "a" characters. It would match any string, including an empty string.

Here's an example that shows the difference between the + and * characters in a regular expression:

In [6]:
import re

text = "The quick brown fox jumps over the lazy dog."

# Match any sequence of "o" characters
pattern1 = r"o+"

# Match any sequence of "o" characters, including zero
pattern2 = r"o*"

matches1 = re.findall(pattern1, text)
matches2 = re.findall(pattern2, text)

print(matches1)
print(matches2)


['o', 'o', 'o', 'o']
['', '', '', '', '', '', '', '', '', '', '', '', 'o', '', '', '', '', 'o', '', '', '', '', '', '', '', '', 'o', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'o', '', '', '']


11. What is the difference between {4} and {4,5} in regular expression?
Sol:
    In regular expressions, curly braces {} are used as quantifiers to specify the number of occurrences of a character or group.

{4} means that the preceding character or group should be matched exactly 4 times.

For example, the regular expression a{4} would match a string that has exactly 4 consecutive "a" characters. It would match "aaaa" but not "aa" or "aaaaa".

{4,5} means that the preceding character or group should be matched at least 4 times and at most 5 times.

For example, the regular expression a{4,5} would match a string that has either 4 or 5 consecutive "a" characters. It would match "aaaa" or "aaaaa" but not "aa" or "aaaaaa".

In summary, the difference between {4} and {4,5} is that the former matches exactly 4 occurrences, while the latter matches between 4 and 5 occurrences.

12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular
expressions?
Sol:
    In regular expressions, shorthand character classes are used to match specific types of characters. Here are what the \d, \w, and \s shorthand character classes signify:

\d - This matches any digit character (0-9). It is equivalent to the character class [0-9].

\w - This matches any word character, including alphanumeric characters and underscores. It is equivalent to the character class [a-zA-Z0-9_].

\s - This matches any whitespace character, including spaces, tabs, and newlines. It is equivalent to the character class [\t\n\r\f\v].

These shorthand character classes are very useful when you need to match a large number of characters that belong to a specific category. For example, if you want to match a phone number, you can use the shorthand character class \d to match any digit character, instead of having to specify all possible digit characters (0-9) individually. Similarly, if you want to match an email address, you can use the shorthand character class \w to match any alphanumeric character or underscore, instead of having to specify all possible alphanumeric characters and underscore characters individually.

13.What do means by \D, \W, and \S shorthand character classes signify in regular expressions?
Sol:
    In regular expressions, the uppercase versions of the shorthand character classes \d, \w, and \s are used to match characters that are not in those classes. Here is what the \D, \W, and \S shorthand character classes signify:

\D - This matches any character that is not a digit character. It is equivalent to the character class [^0-9].

\W - This matches any character that is not a word character. It is equivalent to the character class [^a-zA-Z0-9_].

\S - This matches any character that is not a whitespace character. It is equivalent to the character class [^\t\n\r\f\v].

These uppercase shorthand character classes are useful when you need to match any character that is not in a specific category. For example, if you want to match a string that does not contain any digit characters, you can use the \D shorthand character class to match any non-digit character. Similarly, if you want to match a string that does not contain any whitespace characters, you can use the \S shorthand character class to match any non-whitespace character.
14.What is the difference between .*? and .*?
Sol:
There seems to be a typo in the question, as both expressions are identical. However, I will explain the difference between .* and .*?.

In regular expressions, .* is a greedy match that matches zero or more occurrences of any character (except for newline) until it reaches the end of the line or the end of the input string. This means that it will match as many characters as possible, which can cause issues in certain situations.

On the other hand, .*? is a non-greedy match that matches zero or more occurrences of any character (except for newline) until it reaches the next character in the pattern. This means that it will match as few characters as possible, which can be useful in situations where you want to match the smallest possible substring.

For example, consider the string "abc123def456". The regular expression .*\d would match the entire string because the .* would match "abc123def" and the \d would match the final digit "6". However, the regular expression .*?\d would match only "abc1" because the .*? would match only the characters up to the first digit it encounters, which is "1".

In summary, .* is a greedy match that matches as many characters as possible, while .*? is a non-greedy match that matches as few characters as possible.
15. What is the syntax for matching both numbers and lowercase letters with a character class?
Sol:
To match both numbers and lowercase letters with a character class in regular expressions, you can use the shorthand character classes \d and [a-z]. Here's an example:
16. What is the procedure for making a normal expression in regax case insensitive?
Sol:
To make a regular expression case-insensitive in Python, you can use the re.IGNORECASE or re.I flag when compiling the regular expression pattern.
17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd
argument in re.compile()?
Sol:
In regular expressions, the . character (dot) normally matches any character except a newline (\n). However, if the re.DOTALL (or re.S) flag is passed as the second argument to the re.compile() function, then the dot will match any character, including newlines.
18.If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4
hen&#39;) return?
Sol:
If numReg = re.compile(r'\d+'), then numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') will return the string 'X drummers, X pipers, five rings, X hen'.

Here's how this works:

The numReg regular expression pattern matches one or more digits (\d+).
The sub() method of the compiled regular expression object numRegex replaces all matches of the pattern with the replacement string 'X'.
The input string '11 drummers, 10 pipers, five rings, 4 hen' contains four substrings that match the \d+ pattern: '11', '10', '4'. The substring 'five' does not match the pattern.
The sub() method replaces each of the matching substrings with the replacement string 'X', resulting in the string 'X drummers, X pipers, five rings, X hen'.
Therefore, the final output of numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') will be 'X drummers, X pipers, five rings, X hen'.
19.What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?
Sol:
Passing re.VERBOSE as the second argument to re.compile() in Python allows you to create a regular expression pattern with whitespace and comments.

Normally, in a regular expression pattern, whitespace characters (spaces, tabs, and newlines) are significant and are interpreted as part of the pattern. This can make complex patterns difficult to read and understand.

By using the re.VERBOSE flag, you can include whitespace and comments in your pattern to make it more readable. In this mode, whitespace characters in the pattern are ignored, except when they are escaped with a backslash, and comments starting with # are allowed. This makes it easier to write and maintain complex regular expressions.
20.How would you write a regex that match a number with comma for every three digits? It must
match the given following:
&#39;42&#39;
&#39;1,234&#39;
&#39;6,368,745&#39;

but not the following:
&#39;12,34,567&#39; (which has only two digits between the commas)
&#39;1234&#39; (which lacks commas)
Sol:

In [7]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')
strings = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for s in strings:
    if pattern.match(s):
        print(f"{s} is a valid number with commas")
    else:
        print(f"{s} is not a valid number with commas")


42 is a valid number with commas
1,234 is a valid number with commas
6,368,745 is a valid number with commas
12,34,567 is not a valid number with commas
1234 is not a valid number with commas


In [None]:
How would you write a regex that matches the full name of someone whose last name is
Watanabe? You can assume that the first name that comes before it will always be one word that
begins with a capital letter. The regex must match the following:
&#39;Haruto Watanabe&#39;
&#39;Alice Watanabe&#39;
&#39;RoboCop Watanabe&#39;
but not the following:
&#39;haruto Watanabe&#39; (where the first name is not capitalized)
&#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)
&#39;Watanabe&#39; (which has no first name)
&#39;Haruto watanabe&#39; (where Watanabe is not capitalized)
Sol:
    import re

regex = re.compile(r'[A-Z][a-z]*\sWatanabe')


In [None]:
How would you write a regex that matches a sentence where the first word is either Alice, Bob,
or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs;
and the sentence ends with a period? This regex should be case-insensitive. It must match the
following:
&#39;Alice eats apples.&#39;
&#39;Bob pets cats.&#39;
&#39;Carol throws baseballs.&#39;
&#39;Alice throws Apples.&#39;
&#39;BOB EATS CATS.&#39;
but not the following:
&#39;RoboCop eats apples.&#39;
&#39;ALICE THROWS FOOTBALLS.&#39;
&#39;Carol eats 7 cats.&#39;