### 1. What is the name of the feature responsible for generating Regex objects?

The **re.compile()** function returns Regex objects **which can be used for matching using its match(), search() and other methods.**

In [2]:
import re

s = "The name of the feature responsible for generating Regex objects."
re.compile(s)

re.compile(r'The name of the feature responsible for generating Regex objects.',
re.UNICODE)

### 2. Why do raw strings often appear in Regex objects?

So that backslashes do not have to be escaped.

According to Python docs, **raw string notation (r"text")** keeps regular expressions meaningful and confusion-free. Without it, every backslash ('\\') in a regular expression would have to be prefixed with another one to escape it. 

### 3. What is the return value of the search() method?

**search() method** returns a **Match object** if there is a match anywhere in the string.

### 4. From a Match item, how do you get the actual strings that match the pattern?

The **Match** object has properties and methods used to retrieve information about the search, and the result, viz:

- **.span()** returns a tuple containing the start-, and end positions of the match.
- **.string** returns the string passed into the function.
- **.group()** returns the part of the string where there was a match.

In [12]:
s = "Returns a match object where the x doesn't contain digits."
x = re.search("\D", s) #returns a match object where the x doesn't contain digits.

print(x.string)
print(x.span())
print(x.group())

Returns a match object where the x doesn't contain digits.
(0, 1)
R


### 5. In the regex which is created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover? Group 2? Group 1?

**Group 0 is the entire match, group 1 covers the first set of parentheses, and group 2 covers the second set of parentheses.**

### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

By using them with escape character **'\\'.**

### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of "the two options?

The result depends on the **number of capturing groups in the pattern**. 
- If there are no groups, return a list of strings matching the whole pattern. 
- If there is exactly one group, return a list of strings matching that group.
- If **multiple groups** are present, **return a list of tuples of strings matching the groups**. Non-capturing groups do not affect the form of the result.

In [8]:
re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest?')

['foot', 'fell', 'fastest']

In [12]:
re.findall(r'(\w+)=(\d+)', 'set width=20 and height=10')

[('width', '20'), ('height', '10')]

### 8. In standard expressions, what does the | character mean?

**OR** i.e. either one or other or both.

### 9. In regular expressions, what does the character stand for?

What does **which** character stands for?! Didn't get the question..

### 10.In regular expressions, what is the difference between the + and * characters?

These both are **meta characters** having special meaning, viz.:

- **\*** ---------> Zero or more occurrences.<br>
"he.*o"
- **\+** ---------> One or more occurrences.<br>
"he.+o"

### 11. What is the difference between {4} and {4,5} in regular expression?

**{n}** describes exactly the specified **n** number of occurences.

In [75]:
x="Ain't no way I'ma let you cause mayhem."

#returns a list containing a match starting and ending with m and having exactly 4 words in between. 
re.findall("m.{4}m", x)

['mayhem']

In [77]:
#returns a list containing matches starting w 'm' and any 4 to 5 characters afterwards except newline.
re.findall("m.{4,5}", x)

['ma let', 'mayhem']

### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

- **\\d : Returns a match where the string contains digits. (numbers from 0-9).**
- **\\w : Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character).**
- **\\s : Returns a match where the string contains a white space character.**

### 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

- **\\D : Returns a match where the string DOES NOT contain digits.**
- **\\W : Returns a match where the string DOES NOT contain any word characters.**
- **\\S: Returns a match where the string DOES NOT contain a white space character.** 

### 14. What is the difference between .\*? and .\*?

I'm assuming here, the difference is asked between **.?** and **.\*?**.

- **.?: Staring with any character followed by zero or one occurence of any character.** 
- **.\*?: Starting with any character followed by zero or more occurence of any character again followed but this time by zero or one occurence of any character.**

### 15. What is the syntax for matching both numbers and lowercase letters with a character class?

**\w**, that will return a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character).

In [81]:
x = "I ain't go no $500, you gotta understand that!"

print(re.findall("\w", x))

['I', 'a', 'i', 'n', 't', 'g', 'o', 'n', 'o', '5', '0', '0', 'y', 'o', 'u', 'g', 'o', 't', 't', 'a', 'u', 'n', 'd', 'e', 'r', 's', 't', 'a', 'n', 'd', 't', 'h', 'a', 't']


### 16. What is the procedure for making a normal expression in regex case insensitive?

**re.IGNORECASE**: This flag allows for case-insensitive matching of the Regular Expression with the given string i.e. expressions like [A-Z] will match lowercase letters, too.<br>
**Generally, It’s passed as an optional argument to re.compile().**

### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

Typically, **.** character can match any kind of single character provided that it won't be a **\n**.<br>
<br>
But using **DOTALL flag**, we can extend its functionality.<br>
**With the help of DOTALL flag the ‘.’ character can match any character including newline.**
<br><br>
While working on real-life projects there may arise scenarios where we have to process multi-line strings(separated by newline characters – ‘\n’). In such situations, we use re.DOTALL.

In [97]:
#without re.DOTALL
s="slknvfk;sngko;aen\nv;akegmnoiqm'\nmbpoqeopbnmk"
re.compile(s)

re.compile(r"slknvfk;sngko;aen\nv;akegmnoiqm'\nmbpoqeopbnmk", re.UNICODE)

In [98]:
#with re.DOTALL
s="slknvfk;sngko;aen\nv;akegmnoiqm'\nmbpoqeopbnmk"
re.compile(s, re.DOTALL)

re.compile(r"slknvfk;sngko;aen\nv;akegmnoiqm'\nmbpoqeopbnmk",
re.DOTALL|re.UNICODE)

### 18. If numReg = re.compile(r&#39;\d+&#39;), what will numReg.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4hen&#39;) return?

**So, basically this**

In [106]:
numReg = re.compile(r'\d+') #creating a regex object.
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4hen')

'X drummers, X pipers, five rings, Xhen'

**is same as**

In [105]:
re.sub("\d+", 'X', '11 drummers, 10 pipers, five rings, 4hen')

'X drummers, X pipers, five rings, Xhen'

What exactly done is here is that by the virtue of re.sub() function every digit character **followed by one or more character** is replaced by character **X** in the given string.

### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

**re.VERBOSE** allows us to write regular expressions that look nicer and are more readable by allowing us to visually separate logical sections of the pattern and add comments.

- **The whitespaces inside the pattern are ignored.**
     - **but when the whitespace is present in the character class or when it is preceded by the unescaped backslash, or when it is inside the tokens such as * ? , ( ? P or (? :, whitespaces cannot be ignored.**<br><br>
- **Although, whenever # is present in the line, which is not in the character class or is not preceded by the unescaped backslash. All the characters from the leftmost of # to the end of the line will be ignored.**

### 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
**&#39;42&#39;<br>
&#39;1,234&#39;<br>
&#39;6,368,745&#39;**<br>
<br>
### but not the following:
**&#39;12,34,567&#39; (which has only two digits between the commas)<br>
&#39;1234&#39; (which lacks commas)**

In [166]:
test_cases = ["42", "1,234", "6,368,745", "43,547,465,345", "12,34,567", '1234']

for i in test_cases:
    is_match = re.match("(^\d{1,3}(,\d{3})*$)", i)
    if is_match:
        print(is_match.group())
    else:
        print(f"For {i}, the match doesn't exist!")

42
1,234
6,368,745
43,547,465,345
For 12,34,567, the match doesn't exist!
For 1234, the match doesn't exist!


### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
**&#39;Haruto Watanabe&#39;<br>
&#39;Alice Watanabe&#39;<br>
&#39;RoboCop Watanabe&#39;**<br>
### but not the following:
**&#39;haruto Watanabe&#39; (where the first name is not capitalized)<br>
&#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)<br>
&#39;Watanabe&#39; (which has no first name)<br>
&#39;Haruto watanabe&#39; (where Watanabe is not capitalized)**<br>

In [191]:
names = ["Haruto Watanabe", "Alice Watanabe", "RoboCop Watanabe", "haruto Watanabe", "Watanabe", "Haruto watanabe"]
for name in names:
    x = re.match("^([A-Z]{1})[A-Za-z]* Watanabe$", name)
    if x:
        print(x.group())
    else:
        print("The match doesn't exist!")

Haruto Watanabe
Alice Watanabe
RoboCop Watanabe
The match doesn't exist!
The match doesn't exist!
The match doesn't exist!


### 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
**&#39;Alice eats apples.&#39;<br>
&#39;Bob pets cats.&#39;<br>
&#39;Carol throws baseballs.&#39;<br>
&#39;Alice throws Apples.&#39;<br>
&#39;BOB EATS CATS.&#39;<br>**
### but not the following:
**&#39;RoboCop eats apples.&#39;<br>
&#39;ALICE THROWS FOOTBALLS.&#39;<br>
&#39;Carol eats 7 cats.&#39;<br>**

In [205]:
txts = ['Alice eats apples.', 'Bob pets cats.', 'Carol throws baseballs.', 'Alice throws Apples.', 'BOB EATS CATS.', \
      'RoboCop eats apples.', 'ALICE THROWS FOOTBALLS.', 'Carol eats 7 cats.']

for txt in txts:
    pattern = "^(Alice|Bob|Carol) (eats|throws|pets) (apples|cats|baseballs).$"
    Reg_pat = re.compile(pattern, re.IGNORECASE)
    
    x = re.match(Reg_pat, txt)
    
    if x:
        print(x.group())
    else:
        print("There ain't no match!")

Alice eats apples.
Bob pets cats.
Carol throws baseballs.
Alice throws Apples.
BOB EATS CATS.
There ain't no match!
There ain't no match!
There ain't no match!
