## 1. What is the name of the feature responsible for generating Regex objects?

In Python, the "re" module is responsible for generating Regex objects, which are used for pattern matching in strings.

## 2. Why do raw strings often appear in Regex objects?

A regular expression that matches a backslash character followed by a letter 'n' can be written as r'\\n' instead of '\\\\n'. 
This makes the regular expression easier to read and understand.

## 3. What is the return value of the search() method?

In [2]:
import re

pattern = r'cat'
text = 'The cat sat on the mat.'

match = re.search(pattern, text)

if match:
    print('Match found:', match.group())
else:
    print('No match found')

Match found: cat


## 4. From a Match item, how do you get the actual strings that match the pattern?

The group() method returns the substring of the searched string that matches the regular expression. If the regular expression contains one or more groups (defined using parentheses), then you can pass a group number or name to the group() method to 
retrieve the substring that matched that particular group.

In [4]:
import re

text = 'The cat sat on the mat.'
pattern = r'(\w+)\s(\w+)\s(\w+)\s(\w+)\s(\w+)\.'

match = re.search(pattern, text)

if match:
    print('Match found')
    print('Matched substring:', match.group())
    print('First word:', match.group(1))
    print('Second word:', match.group(2))
    print('Third word:', match.group(3))
    print('Fourth word:', match.group(4))
    print('Fifth word:', match.group(5))
else:
    print('No match found')

Match found
Matched substring: cat sat on the mat.
First word: cat
Second word: sat
Third word: on
Fourth word: the
Fifth word: mat


## 5. In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover? Group 2? Group 1?

So, when we apply this regular expression using the search() method on a string,
the group(0) method will return the entire matched substring,
while group(1) will return the area code (first three digits), 
and group(2) will return the local phone number (remaining seven digits, separated by a hyphen).

## 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tella regex that you want it to fit real parentheses and periods?

In [7]:
import re

text = 'The (cat) sat on the mat. The dog barked.'

# match literal parentheses and period
pattern = r'The \(cat\) sat on the mat\.'
match = re.search(pattern, text)
if match:
    print('Match found:', match.group())
else:
    print('No match found')

# match any word followed by a period
pattern = r'\w+\.'
matches = re.findall(pattern, text)
if matches:
    print('Matches found:', matches)
else:
    print('No matches found')

Match found: The (cat) sat on the mat.
Matches found: ['mat.', 'barked.']


## 7. The findall() method returns a string list or a list of string tuples. What causes it to return one ofthe two options?

The findall() method in Python's re module returns a list of all non-overlapping matches of a regular expression pattern in a given string. The format of the returned list depends on whether the regular expression pattern contains capturing groups or not.

If the regular expression pattern contains no capturing groups (i.e., no parentheses), then findall() returns a list of strings that match the pattern. Each string in the list corresponds to one match

In [8]:
import re

text = 'The quick brown fox jumps over the lazy dog'
pattern = r'\w+'
matches = re.findall(pattern, text)
print(matches)

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']


## 8.In standard expressions, what does the | character mean?

In standard expressions, the | character is known as the pipe or alternation operator. It is used to specify a choice between two or more expressions.

For example, the regular expression cat|dog would match either the string "cat" or the string "dog". The vertical bar character separates the two choices.

## 9. In regular expressions, what does the character stand for?

Here are a few more examples:

The regular expression c.t would match "cat", "cot", "cut", and any other three-character string that starts with "c" and ends with "t".
The regular expression a.b would match "aab", "abb", "acb", "adb", and any other three-character string that starts with "a" and ends with "b", with any character in the middle.
The regular expression 1.3 would match "103", "123", "153", and any other three-character string that starts with "1" and ends with "3", with any character in the middle.

## 10.In regular expressions, what is the difference between the + and * characters?

In regular expressions, the + and * characters are known as quantifiers, and they are used to specify how many times a pattern should match. The main difference between the two is:

The * character matches zero or more occurrences of the pattern.
The + character matches one or more occurrences of the pattern.

## 11.What is the difference between {4} and {4,5} in regular expression?|

The regular expression a{4} would match the string "aaaa" but would not match "aaa" or "aaaaa", because it requires exactly four occurrences of the letter "a". On the other hand, the regular expression a{4,5} would match both "aaaa" and "aaaaa", because it allows for between four and five occurrences of the letter "a".

## 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

d: Matches any digit character. This is equivalent to the character class [0-9]. For example, the regular expression \d{3} would match any three-digit number.

\w: Matches any word character, which includes letters, digits, and underscores. This is equivalent to the character class [a-zA-Z0-9_]. For example, the regular expression \w+ would match one or more word characters.

\s: Matches any whitespace character, which includes spaces, tabs, and newlines. For example, the regular expression \s+ would match one or more whitespace characters.

## 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

\D: Matches any non-digit character. This is equivalent to the character class [^0-9]. For example, the regular expression \D+ would match one or more non-digit characters.

\W: Matches any non-word character, which includes any character that is not a letter, digit, or underscore. This is equivalent to the character class [^a-zA-Z0-9_]. For example, the regular expression \W+ would match one or more non-word characters.

\S: Matches any non-whitespace character. This is equivalent to the character class [^\s]. For example, the regular expression \S+ would match one or more non-whitespace characters.

## 14.What is the difference between .*? and .*?

For example, the regular expression a.*?b would match the shortest possible string that starts with the letter "a" and ends with the letter "b". In the string "abcab", this would match the substring "abcab", because it is the shortest possible substring that contains both "a" and "b".

For example, the regular expression a.*b would match the longest possible string that starts with the letter "a" and ends with the letter "b". In the string "abcab", this would match the entire string "abcab", because it is the longest possible substring that contains both "a" and "b".

## 15. What is the syntax for matching both numbers and lowercase letters with a character class?

To match both numbers and lowercase letters using a character class in a regular expression, you can use the shorthand character classes \d and \w as follows:

The regular expression [\d\w]+ would match one or more characters that are either digits or lowercase letters. For example, it would match the string "abc123" but not the string "ABC123".

## 16. What is the procedure for making a normal expression in regax case insensitive?

To make a regular expression case-insensitive in regex, you can use the i flag at the end of the pattern. This flag tells the regex engine to ignore the case of the letters when matching the pattern.

/regex pattern/i
/apple/i
/(?i)regex pattern/


## 17.What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

If the re.DOTALL flag is passed as the second argument to the re.compile() function, then the dot character will also match newline characters (\n). This flag changes the behavior of the dot character to be more inclusive, allowing it to match any character in the input string, including newline characters.

## 18.If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4hen&#39;) return?

If numReg = re.compile(r'\d+'), calling numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') will replace all occurrences of one or more digits with the string 'X', resulting in the following output:

he regular expression r'\d+' matches one or more digits in the input string. The sub() method of the numReg object is then called with two arguments: the replacement string 'X' and the input string '11 drummers, 10 pipers, five rings, 4 hen'. 

## 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

When the re.VERBOSE flag is passed as the second argument to re.compile(), it allows you to write regular expressions that are more readable and easier to understand by adding whitespace and comments to the pattern.

In [2]:
import re

pattern = re.compile(r'''
    \d{3}  # match three digits
    -      # match a hyphen
    \d{2}  # match two digits
    -      # match a hyphen
    \d{4}  # match four digits
''', re.VERBOSE)
print(pattern)

re.compile('\n    \\d{3}  # match three digits\n    -      # match a hyphen\n    \\d{2}  # match two digits\n    -      # match a hyphen\n    \\d{4}  # match four digits\n', re.VERBOSE)


## 20.How would you write a regex that match a number with comma for every three digits? It must match the given following:&#39;42&#39;&#39;1,234&#39;&#39;6,368,745&#39;but not the following:&#39;12,34,567&#39;  (which has only two digits between the commas)&#39;1234&#39; (which lacks commas)

Let's break this expression down:

^ - matches the start of the string.
\d{1,3} - matches one to three digits at the beginning of the string.
(,\d{3})* - matches zero or more occurrences of a comma followed by three digits.
$ - matches the end of the string.

In [4]:
import re

regex = re.compile(r'^\d{1,3}(,\d{3})*$')

print(regex.match('42'))             # Match
print(regex.match('1,234'))          # Match
print(regex.match('6,368,745'))      # Match
print(regex.match('12,34,567'))      # No match
print(regex.match('1234'))           # No match


<re.Match object; span=(0, 2), match='42'>
<re.Match object; span=(0, 5), match='1,234'>
<re.Match object; span=(0, 9), match='6,368,745'>
None
None


## 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word thatbegins with a capital letter. The regex must match the following:&#39;Haruto Watanabe&#39;&#39;Alice Watanabe&#39;&#39;RoboCop Watanabe&#39;but not the following:&#39;haruto Watanabe&#39; (where the first name is not capitalized)&#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)&#39;Watanabe&#39; (which has no first name) &#39;Haruto watanabe&#39; (where Watanabe is not capitalized)

^ - matches the start of the string.

[A-Z] - matches an uppercase letter at the beginning of the string.

[a-z]* - matches zero or more lowercase letters after the first uppercase letter (representing the first name).

\s - matches a whitespace character between the first and last names.
Watanabe - matches the last name.

$ - matches the end of the string.

In [5]:
import re

regex = re.compile(r'^[A-Z][a-z]*\sWatanabe$')

print(regex.match('Haruto Watanabe'))     # Match
print(regex.match('Alice Watanabe'))      # Match
print(regex.match('RoboCop Watanabe'))    # Match
print(regex.match('haruto Watanabe'))     # No match
print(regex.match('Mr. Watanabe'))        # No match
print(regex.match('Watanabe'))            # No match
print(regex.match('Haruto watanabe'))     # No match


<re.Match object; span=(0, 15), match='Haruto Watanabe'>
<re.Match object; span=(0, 14), match='Alice Watanabe'>
None
None
None
None
None


## 22.How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following: &#39;Alice eats apples.&#39;&#39;Bob pets cats.&#39;&#39;Carol throws baseballs.&#39;&#39;Alice throws Apples.&#39;&#39;BOB EATS CATS.&#39;but not the following:&#39;RoboCop eats apples.&#39;&#39;ALICE THROWS FOOTBALLS.&#39; &#39;Carol eats 7 cats.&#39;

^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.$a

In [8]:
import re
regex = re.compile(r'^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.$', re.IGNORECASE)
print(regex)

re.compile('^(Alice|Bob|Carol)\\s+(eats|pets|throws)\\s+(apples|cats|baseballs)\\.$', re.IGNORECASE)
