# 1. What is the name of the feature responsible for generating Regex objects?

Answer: 

The feature responsible for generating regular expression (Regex) objects in Python is part of the re module. The re.compile() function is used to create Regex objects.

In [60]:
# Example

import re

# Create a Regex object to match a simple pattern (e.g., 'apple')
pattern = re.compile(r'apple')

# Use the Regex object for matching or searching in strings
match = pattern.match('apple pie')
if match:
    print('Match found:', match.group())

Match found: apple


# 2. Why do raw strings often appear in Regex objects?

Answer: 

Raw strings, denoted by the r prefix in Python (e.g., r'pattern'), are often used in regular expressions (Regex) objects for several reasons:

1. Avoiding Escape Sequences: 
- Regular expressions often contain many backslashes (\) to escape special characters (e.g., \d for a digit). 

- In regular Python strings, backslashes are used to escape characters as well (e.g., \n for a newline). 

- Using a raw string (prefixed with r) eliminates the need to double-escape backslashes in regular expressions. This makes regex patterns more readable and less error-prone.

2. Improved Readability: 

- Raw strings make regex patterns more readable because you can express the pattern exactly as it appears.

3. Clarity: 

- It's often clearer to use raw strings in regex patterns to explicitly show that the string is intended for regular expression matching. 

- It distinguishes regex patterns from regular strings and enhances code clarity.

In [61]:
# Example:

# Without a raw string, you need to escape backslashes for regex and Python
pattern_without_raw = "\\d{3}-\\d{2}-\\d{4}"

# With a raw string, you can write the pattern more naturally
pattern_with_raw = r'\d{3}-\d{2}-\d{4}'

print(pattern_without_raw)
print(pattern_with_raw)

\d{3}-\d{2}-\d{4}
\d{3}-\d{2}-\d{4}


# 3. What is the return value of the search() method?

Answer: 

The search() method of a regular expression (regex) object in Python returns a special object called a "match object" if a match is found in the input string. If no match is found, it returns None.

In [62]:
# Example:

import re

# Create a regex pattern
pattern = re.compile(r'orange')

# Search for the pattern in a string
match = pattern.search('I love apples and oranges.')

# Check if a match is found
if match:
    print('Match found:', match.group())
else:
    print('No match found')

Match found: orange


# 4. From a Match item, how do you get the actual strings that match the pattern?

Answer: 

To get the actual string(s) that match the pattern from a Match object in Python's `re` module, you can use the `group()` method. The `group()` method returns the substring of the input string that matched the pattern.

In [63]:
# Example: 

import re
pattern = re.compile(r'\d{4}-\d{4}-\d{4}')                           # Create a regex pattern
match = pattern.search('My Adhar number is 0011-2233-4455.')  # Search for the pattern in a string
# Check if a match is found
if match:
    matched_string = match.group()             # Get the matched string
    print('Matched string:', matched_string)
else:
    print('No match found')

Matched string: 0011-2233-4455


**Important Note:**

We can also use `group(0)` to achieve the same result since 0 is the default argument for `group()`. 

Additionally, if your regex pattern contains capturing groups, you can access each captured group using `group(1)`, `group(2)`, and so on, to extract specific parts of the matched string.

# 5. In the regex which created from the `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`, what does group zero cover? Group 2? Group 1?

Answer: 

In the regex pattern `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`, which contains two sets of parentheses for capturing groups, here's what each group covers:

1. **Group 0**: Group 0 covers the entire matched string, including the entire phone number in the format `###-###-####`. It represents the entire match.

2. **Group 1**: Group 1 covers the first set of parentheses `(\d\d\d)`, which matches and captures three consecutive digits. This group represents the first three digits of the phone number.

3. **Group 2**: Group 2 covers the second set of parentheses `(\d\d\d-\d\d\d\d)`, which matches and captures the remaining seven digits in the format `###-####`. This group represents the remaining part of the phone number after the hyphen.

In [64]:
# Example:

import re
# Create a regex pattern
pattern = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')

# Search for the pattern in a string
match = pattern.search('My phone number is 555-123-4567.')

# Check if a match is found
if match:
    # Get the entire matched string (Group 0)
    entire_match = match.group(0)
    print('Entire match:', entire_match)

    # Get the first three digits (Group 1)
    first_three_digits = match.group(1)
    print('First three digits:', first_three_digits)

    # Get the remaining seven digits (Group 2)
    remaining_digits = match.group(2)
    print('Remaining digits:', remaining_digits)
else:
    print('No match found')


Entire match: 555-123-4567
First three digits: 555
Remaining digits: 123-4567


# 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

Answer: 

Periods and parentheses can be escaped with a backslash: `\., \(, and \)`.

In [65]:
# Example:

import re

# Create a regex pattern to match literal parentheses and a period
pattern = re.compile(r'\(\d+\.\d+\)')

# Search for the pattern in a string
match = pattern.search('The value is (3.14).')

# Check if a match is found
if match:
    # Get the entire matched string
    matched_string = match.group()
    print('Matched string:', matched_string)
else:
    print('No match found')


Matched string: (3.14)


# 7. The `findall()` method returns a string list or a list of string tuples. What causes it to return one of the two options?

Answer: 

- If our regex pattern does not contain any capturing groups (parentheses), the findall() method returns a list of strings. Each element in the list represents a complete match of the pattern.

In [66]:
# Example:

import re

pattern = re.compile(r'\d{3}-\d{2}-\d{4}')
text = 'My SSN is 123-45-6789 and yours is 987-65-4321.'
matches = pattern.findall(text)

print(matches)  # Returns a list of matched strings

['123-45-6789', '987-65-4321']


- If our regex pattern contains one or more capturing groups (parentheses), the findall() method returns a list of tuples. Each tuple represents a complete match of the pattern, and the elements of the tuple correspond to the captured groups in the pattern.

In [67]:
# Example: 

import re

pattern = re.compile(r'(\d{3})-(\d{2})-(\d{4})')
text = 'My SSN is 123-45-6789 and yours is 987-65-4321.'
matches = pattern.findall(text)

print(matches)  # Returns a list of tuples, each containing captured groups

[('123', '45', '6789'), ('987', '65', '4321')]


# 8. In standard expressions, what does the | character mean?

Answer: 

- The `|` character is used to specify alternatives within a regex pattern. It allows us to match one pattern or another.

- In other words, it acts like a logical OR operator for regex patterns.

In [68]:
# Example:

import re

pattern = re.compile(r'cat|dog') 
text = 'I have a cat and a dog.'
matches = pattern.findall(text)

print(matches)

['cat', 'dog']


    The | character is useful when you want to search for multiple alternative patterns within a single regex expression.

# 9. In regular expressions, what does the character stand for?

Answer: 

    However the name of the character in the question is not stated. So, let's assume it as `.` (dot) character.
    (becuase dot character has a special meaning)

- The dot . matches any character except a newline character (\n). It is used to represent a wildcard, meaning it can match any character at that position in the string.

For example:

The pattern `a.b` matches any string that has an "a," followed by any character, followed by "b." This would match "aab," "axb," "a1b," and so on.

In [69]:
# Example:

import re

pattern = re.compile(r'a.b')
text = 'I have a bag and a bat.'
matches = pattern.findall(text)

print(matches)

['a b', 'a b']


# 10.In regular expressions, what is the difference between the `+` and `*` characters?

Answer: 

1. - The `+` quantifier matches one or more occurrences of the preceding character or group.

- For example, `a+` matches one or more consecutive "a" characters. It would match "a," "aa," "aaa," and so on, but not an empty string or a string without "a."

2. - The `*` quantifier matches zero or more occurrences of the preceding character or group.

- For example, a* matches zero or more consecutive "a" characters. It would match an empty string (no "a" characters), "a," "aa," "aaa," and so on.

# 11. What is the difference between {4} and {4,5} in regular expression?

Answer: 

- `{4}` matches exactly 4 occurrences.
- For example, \d{4} matches exactly four consecutive digits.

- `{4,5}` matches between 4 and 5 occurrences (inclusive)
- For example, `\d{4,5}` matches between 4 and 5 consecutive digits. It would match "1234" and "12345," but not "123" or "123456."

# 12. What do you mean by the `\d`, `\w`, and `\s` shorthand character classes signify in regular expressions?

Answer: 

1. `\d`: This shorthand character class represents any digit. It matches any single digit from 0 to 9.
- For example, `\d` would match the following characters: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

2. `\w`: This shorthand character class represents a word character. It matches alphanumeric characters (letters and digits) as well as underscores (`_`).
- For example, `\w` would match the following characters: a-z, A-Z, 0-9, and `_`.

3. `\s`: This shorthand character class represents whitespace characters. It matches spaces, tabs, newlines, carriage returns, and other whitespace characters.
- For example, `\s` would match spaces, tabs, and newline characters.

# 13. What do means by `\D`, `\W`, and `\S` shorthand character classes signify in regular expressions?

Answer: 

In regular expressions (regex), the `\D`, `\W`, and `\S` shorthand character classes are negated versions of `\d`, `\w`, and `\s`, respectively. They represent sets of characters that are the opposite of their non-negated counterparts.

1. `\D`: This shorthand character class represents any character that is not a digit. It matches any character except for digits (0-9).
 - For example, `\D` would match any character that is not a digit, such as letters, punctuation, and whitespace.

2. `\W`: This shorthand character class represents any character that is not a word character. It matches any character that is not alphanumeric (letters and digits) or an underscore (`_`).
- For example, `\W` would match any character that is not a word character, including spaces, punctuation, and special symbols.

3. `\S`: This shorthand character class represents any character that is not a whitespace character. It matches any character that is not a space, tab, newline, or other whitespace characters.
- For example, `\S` would match any character that is not a whitespace character, including letters, digits, and symbols.

# 14. What is the difference between `.*?` and `.*`?

Answer: 

1. **`.*` (Greedy Matching)**: `.*` is a greedy quantifier that matches as much text as possible while still allowing the remainder of the regex pattern to match successfully. It will match the longest possible sequence of characters.
   - For example, in the regex pattern `a.*b`, if the input text is "aabbab," `a.*b` will match the entire string "aabbab" because it's the longest sequence of characters between "a" and "b."

2. **`.*?` (Lazy Matching)**: `.*?` is a lazy quantifier that matches as little text as possible while still allowing the remainder of the regex pattern to match successfully. It will match the shortest possible sequence of characters.
   - Using the same pattern, in the regex pattern `a.*?b`, if the input text is "aabbab," `a.*?b` will match "aab" because it's the shortest sequence of characters between "a" and "b" that allows the pattern to match.

In [70]:
# Example:

import re

text = "aabbab"

# Greedy matching with `.*`
pattern_greedy = re.compile(r'a.*b')
match_greedy = pattern_greedy.search(text)
if match_greedy:
    print('Greedy Match:', match_greedy.group())

# Lazy matching with `.*?`
pattern_lazy = re.compile(r'a.*?b')
match_lazy = pattern_lazy.search(text)
if match_lazy:
    print('Lazy Match:', match_lazy.group())


Greedy Match: aabbab
Lazy Match: aab


# 15. What is the syntax for matching both numbers and lowercase letters with a character class?

Answer: 

The syntax is-

`[0-9a-z]`

- [0-9]: This part of the character class matches any digit from 0 to 9.

- [a-z]: This part of the character class matches any lowercase letter from 'a' to 'z'.

In [71]:
# Example:
import re

text = "The quick brown fox jumps over 123 lazy dogs."

pattern = re.compile(r'[0-9a-z]+')
matches = pattern.findall(text)

print(matches)

['he', 'quick', 'brown', 'fox', 'jumps', 'over', '123', 'lazy', 'dogs']


# 16. What is the procedure for making a normal expression in regax case insensitive?

Answer: 

- To make a regular expression (regex) case-insensitive in Python, you can use the re.IGNORECASE or re.I flag when compiling the regex pattern

##### Procedure:

In [72]:
# Step 1 : Import the re module-

import re

In [73]:
# Step 2: Create your regex pattern, and compile it with the re.IGNORECASE or re.I flag-

pattern = re.compile(r'your_pattern_here', re.IGNORECASE)

##### Example

In [74]:
import re

text = "The quick brown Fox jumped over the Lazy Dog."
# Case-insensitive pattern to match "fox"
pattern = re.compile(r'fox', re.IGNORECASE)
match = pattern.search(text)

if match:
    print('Match found:', match.group())
else:
    print('No match found')

Match found: Fox


# 17. What does the `.` character normally match? What does it match if `re.DOTALL` is passed as 2nd argument in `re.compile()`?

Answer: 

- The `.` character in a regular expression normally matches any character except for a newline character (`\n`).

- If we pass `re.DOTALL` as the second argument when compiling a regex pattern using `re.compile()`, it changes the behavior of the `.` character. Specifically, `re.DOTALL` (or re.S) makes the . character match any character, including newline characters (\n).

In [75]:
# Example: 

import re

text = """Hello
World"""

# Without re.DOTALL, the dot does not match newline
pattern_no_dotall = re.compile(r'.+')
match_no_dotall = pattern_no_dotall.search(text)
print('Without re.DOTALL:', match_no_dotall.group() if match_no_dotall else 'No match')

# With re.DOTALL, the dot matches newline
pattern_dotall = re.compile(r'.+', re.DOTALL)
match_dotall = pattern_dotall.search(text)
print('With re.DOTALL:', match_dotall.group() if match_dotall else 'No match')


Without re.DOTALL: Hello
With re.DOTALL: Hello
World


# 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

Answer: 

In [76]:
import re

numReg = re.compile(r'\d+')
text = '11 drummers, 10 pipers, five rings, 4 hen'
result = numReg.sub('X', text)

print(result)

X drummers, X pipers, five rings, X hen


# 19. What does passing `re.VERBOSE` as the 2nd argument to `re.compile()` allow to do?

Answer: 

Passing `re.VERBOSE` as the second argument to `re.compile()` in Python allows you to write more readable and well-structured regular expressions by adding whitespace and comments.

In [77]:
# Without re.VERBOSE:

import re

pattern = re.compile(r'\d{3}-\d{2}-\d{4}|\(\d{3}\)\s\d{3}-\d{4}')


In [78]:
# With re.VERBOSE:

pattern = re.compile(r'''
    \d{3}-\d{2}-\d{4}   # Match XXX-XX-XXXX (SSN)
    |                  # OR
    \(\d{3}\)\s\d{3}-\d{4}  # Match (XXX) XXX-XXXX (phone number)
''', re.VERBOSE)


# 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:

#### '42'

#### '1,234'

#### '6,368,745'

### but not the following:

#### '12,34,567' (which has only two digits between the commas)

#### '1234' (which lacks commas)

Answer: 

`e.compile(r'^\d{1,3}(,{3})*$')` will create this regex, but other regex strings can produce a similar regular expression.

# 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

### 'Haruto Watanabe'
### 'Alice Watanabe'
### 'RoboCop Watanabe'
### but not the following:
### 'haruto Watanabe' (where the first name is not capitalized)
### 'Mr. Watanabe' (where the preceding word has a nonletter character)
### 'Watanabe' (which has no first name)
### 'Haruto watanabe' (where Watanabe is not capitalized)

Answer: 

To write a regex that matches the full name of someone whose last name is Watanabe with the given conditions, we can use the following pattern:

In [79]:
pattern = re.compile(r'[A-Z][a-zA-Z]*\sWatanabe')

# 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

### 'Alice eats apples.'
### 'Bob pets cats.'
### 'Carol throws baseballs.'
### 'Alice throws Apples.'
### 'BOB EATS CATS.'
## but not the following:
### 'RoboCop eats apples.'
### 'ALICE THROWS FOOTBALLS.'
### 'Carol eats 7 cats.'

Answer: 

To write a regex that matches a sentence following the specified pattern while being case-insensitive, we can use the following pattern:

In [80]:
pattern = re.compile(r'^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.\s*$', re.IGNORECASE)