### 1. What is the name of the feature responsible for generating Regex objects?

**Answer:**

The feature responsible for generating Regex objects is the `re.compile()` function. Here **re** is the module name and **compile()** is the function of the re module.

`re.module()` is responsible for generating the Regex objects and it is used to compile a regular expression pattern into a Regex object.

### 2. Why do raw strings often appear in Regex objects?

**Answer:**

Raw string often appears in Regex objects because, regular expression is a set of characters that specifies a search pattern and it is mostly used in the text processors and search engines to execute find and replace operations. Regular expression frequently consist of backslashes i.e., \ and other special characters (a.k.a Metacharacters), which have special meanings in Python string literals. But if we want a raw string just like as it is we can use the string in Python prefixed with **r" "** or **R" "**, and it becomes a raw string just like as it is.

Let's take an example to understand it more clearly,

In [20]:
import re

# w/o using the raw string
print("I\n\tLove\n\t\tCoding")

# with using the raw string
print(r"I\n\tLove\n\t\tCoding")

I
	Love
		Coding
I\n\tLove\n\t\tCoding


### 3. What is the return value of the search() method?

**Answer:**

The return value of `re.search(pattern, string)` method is a **match object** if there is a match anywhere in the string.

If the string is not matched with the pattern then the return value is **None**.

In [22]:
string = "I want to become a Data Scientist"

x = re.search("Data", string) # return a match object if there is a match anywhere in the string

if x:
  print("YES! It is matched")
else:
  print("No match")


YES! It is matched


### 4. From a Match item, how do you get the actual strings that match the pattern?

**Answer:**

We can get the actual strings that match the pattern using the `group()` method of the Match object.

Example,

In [28]:
import re

string = "I want to become a Data Scientist"

x = re.search("D.*t", string)
print(x.group())

Data Scientist


### 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?

**Answer:**

As we know that `\d` returns a match where the string contains digits (numbers from 0-9).

Let's take an example to give proper answer for this question,

In [36]:
import re

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
text = 'Please contact on this number 033-123-0987'

match = re.search(pattern, text)

if match:
    print(match.group(0))
    
    print(match.group(2))
    
    print(match.group(1))

033-123-0987
123-0987
033


As we can see from the above example group(0), group(1) and group(2) covers,

* **group(0)**: It covers the entire pattern match so it will return the entire string that matched the regex pattern.

* **group(1)**: It covers the (\d\d\d) that is the first capturing group of (\d\d\d), which matches the first three consecutive digits of the phone number. 

* **group(2)**: It covers the (\d\d\d-\d\d\d\d) that is the second capturing group of (\d\d\d-\d\d\d\d), which matches the hyphen-separated group of six digits i.e., the last seven digits of the phone number.

### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?


**Answer:**

In standard expression syntax parentheses are used to define the groups. If we want to use like real parentheses we have to use the backslash i.e., \ before every parantheses.

Let's take and example,

In [42]:
import re

pattern_using_backslash = r"\(\d\d\d\)-(\d\d\d\d\d\d\d)"
pattern_without_using_backslash = r"(\d\d\d)-(\d\d\d\d\d\d\d)"

text = '(033)-1800972'

match1 = re.search(pattern_using_backslash, text)
match2 = re.search(pattern_without_using_backslash, text)

print("Using the backslash --->",match1.group())
print("With out using the backslash --->",match2)

Using the backslash ---> (033)-1800972
With out using the backslash ---> None


### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

**Answer:**

If the regex pattern has no groups, a string list returned.

If the regex pattern has groups, a list of string tuples returned.

### 8. In standard expressions, what does the | character mean?

**Answer:**

In Standard Expressions **|** means OR operator.

It is used to match any one of the set of alternatives.

### 9. In regular expressions, what does the character stand for?


**Answer:**

I don't understand this question.

What character stand for does it mean?

In Regular Expression the character stand for a very special meaning. They are used in the text processors and search engines to execute find and replace operations.

### 10.In regular expressions, what is the difference between the + and * characters?

**Answer:**

* In Regular Expressions, `*` represents zero or more occurances of the preceeding group.

* In Regular Expressions, `+` represents one or more occurances of the preceeding group.

### 11. What is the difference between {4} and {4,5} in regular expression?

**Answer:**

In RE we are using curly braces {} to specify the number of occurrences of the preceding character or group.

`{4}` means in regular expression is that its preceding group should repeat 4 times.

`{4,5}` means in regular expression is that its preceding group should repeat mininum 4 times and maximum 5 times inclusively.

Example,

In [4]:
import re

text = "I lived in Kolkata and my pincode is 12345. Please come to my place"

pattern1 = r"\d{4}"
pattern2 = r"\d{4,5}"

match1 = re.search(pattern1, text)
match2 = re.search(pattern2, text)

print(f"My first four digits of the pincode is {match1.group()}.")
print(f"My full five digit pincode is {match2.group()}.")


My first four digits of the pincode is 1234.
My full five digit pincode is 12345.


### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

**Answer:**

* `\d`: This shorthand character class matches any digit from 0 to 9.

* `\w`: This shorthand character class matches any word character, which includes all uppercase and lowercase letters, digits, and underscores.

* `\s`: This shorthand character class matches any whitespace character, including spaces, tabs, and line breaks.

Example,

In [13]:
import re

text = """Hello everyone my name is John Doe and my emp id is 461324123.
          My official mail id is johndoe@ineuron.com.
          I       write this sentence                   just to check the \s with multiple                    white spaces."""


pattern1 = r"\d{9}"
pattern2 = r"\w+@\w+.com"
pattern3 = r"\s+"

match1 = re.search(pattern1, text)
match2 = re.search(pattern2, text)
match3 = re.findall(pattern3, text)

print(f"My employee number is: {match1.group()}")
print(f"My mail id is: {match2.group()}")
print(f"White spaces, line breaks are as mentioned below: \n{match3}")

My employee number is: 461324123
My mail id is: johndoe@ineuron.com
White spaces, line breaks are as mentioned below: 
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\n          ', ' ', ' ', ' ', ' ', ' ', '\n          ', '       ', ' ', ' ', '                   ', ' ', ' ', ' ', ' ', ' ', ' ', '                    ', ' ']


### 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

**Answer:**

* `\D`: Returns a match where the string does not contain digits.

* `\W`: Returns a match where the string does not contain any word characters.

* `\S`: Returns a match where the string does not contain a white space character.

Therefore all these are just opposite to the previous question's special sequences.

Example,

In [23]:
import re

text = """Hello everyone my name is John Doe and my emp id is 461324123.
          My official mail id is johndoe@ineuron.com.
          I       write this sentence                   just to check the \s with multiple                    white spaces."""


pattern1 = r"\D+"
pattern2 = r"\W+"
pattern3 = r"\S+"

match1 = re.findall(pattern1, text)
match2 = re.findall(pattern2, text)
match3 = re.findall(pattern3, text)

print(f"Example of \D:\n {match1}\n----------------------------\n--------------------------")
print(f"Example of \W:\n {match2}\n----------------------------\n--------------------------")
print(f"Example of \S:\n {match3}")

Example of \D:
 ['Hello everyone my name is John Doe and my emp id is ', '.\n          My official mail id is johndoe@ineuron.com.\n          I       write this sentence                   just to check the \\s with multiple                    white spaces.']
----------------------------
--------------------------
Example of \W:
 [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.\n          ', ' ', ' ', ' ', ' ', ' ', '@', '.', '.\n          ', '       ', ' ', ' ', '                   ', ' ', ' ', ' ', ' \\', ' ', ' ', '                    ', ' ', '.']
----------------------------
--------------------------
Example of \S:
 ['Hello', 'everyone', 'my', 'name', 'is', 'John', 'Doe', 'and', 'my', 'emp', 'id', 'is', '461324123.', 'My', 'official', 'mail', 'id', 'is', 'johndoe@ineuron.com.', 'I', 'write', 'this', 'sentence', 'just', 'to', 'check', 'the', '\\s', 'with', 'multiple', 'white', 'spaces.']


### 14. What is the difference between ".\*?" and ".\*"?

**Answer:**

`.*?` is a non-greedy quantifier. It returns the shortest string that meets the condition.

`.*` is a greedy quantifier. It returns the longest string that meets the condition.

Example,

In [25]:
import re

text = "aabbbcdddd"

pattern1 = r"a.*d"
pattern2 = r"a.*?c"

match1 = re.findall(pattern1, text)
match2 = re.findall(pattern2, text)

print(match1)
print(match2)

['aabbbcdddd']
['aabbbc']


### 15. What is the syntax for matching both numbers and lowercase letters with a character class?

**Answer:**

The syntax for matching both numbers and lowercase letters with a character class is
        `r"[0-9a-z]+"`

### 16. What is the procedure for making a normal expression in regax case insensitive?

**Answer:**

We can pass **re.IGNORECASE** as a flag to make a noraml expression case insensitive.

### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

**Answer:**

The `.` character in a regular expression pattern normally matches any character except for a newline character.

If we pass the `re.DOTALL` flag as the second argument to `re.compile()`, then the `.` character will match any character, including a newline character.

### 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

In [27]:
import re

numReg = re.compile(r'\d+')
# numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

**Answer:**

`re.VERBOSE` will allow to add whitespace and comments to string passed to `re.compile()`.

### 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
'42'\
'1,234'\
'6,368,745'\
but not the following:\
'12,34,567' (which has only two digits between the commas)\
'1234' (which lacks commas)


In [2]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')

strings = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for s in strings:
    if pattern.match(s):
        print(s, 'match the condition')
    else:
        print(s, 'does not match the condition')

42 match the condition
1,234 match the condition
6,368,745 match the condition
12,34,567 does not match the condition
1234 does not match the condition


Here we used `pattern = re.compile(r'^\d{1,3}(,\d{3})*$')`.

Where,

**^**: Matches the start of the string.

**\d{1,3}**: Matches 1 to 3 digits.

**(,\d{3})***: Matches zero or more occurrences of a comma followed by exactly 3 digits.

**$**: Matches the end of the string.

### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
'Haruto Watanabe'\
'Alice Watanabe'\
'RoboCop Watanabe'\
but not the following:\
'haruto Watanabe' (where the first name is not capitalized)\
'Mr. Watanabe' (where the preceding word has a nonletter character)\
'Watanabe' (which has no first name)\
'Haruto watanabe' (where Watanabe is not capitalized)


In [4]:
import re
pattern = re.compile(r'^[A-Z][a-zA-Z]*\sWatanabe$')

strings = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe', 'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

for s in strings:
    if pattern.match(s):
        print(s, 'match the condition')
    else:
        print(s, 'does not match the condition')

Haruto Watanabe match the condition
Alice Watanabe match the condition
RoboCop Watanabe match the condition
haruto Watanabe does not match the condition
Mr. Watanabe does not match the condition
Watanabe does not match the condition
Haruto watanabe does not match the condition


### 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
'Alice eats apples.'\
'Bob pets cats.'\
'Carol throws baseballs.'\
'Alice throws Apples.'\
'BOB EATS CATS.'\
*but not the following*:\
'RoboCop eats apples.'\
'ALICE THROWS FOOTBALLS.'\
'Carol eats 7 cats.'


In [8]:
import re

condition = r'^(alice|bob|carol)\s(eats|pets|throws)\s(apples|cats|baseballs)$'
pattern = re.compile(condition, re.IGNORECASE)

strings = ['Alice eats apples', 'Bob pets cats', 'Carol throws baseballs', 'Alice throws Apples',
           'BOB EATS CATS', 'RoboCop eats apples', 'ALICE THROWS FOOTBALLS', 'Carol eats 7 cats']
for s in strings:
    if pattern.match(s):
        print(s, 'matches')
    else:
        print(s, 'does not match')

Alice eats apples matches
Bob pets cats matches
Carol throws baseballs matches
Alice throws Apples matches
BOB EATS CATS matches
RoboCop eats apples does not match
ALICE THROWS FOOTBALLS does not match
Carol eats 7 cats does not match
