### 1. What is the name of the feature responsible for generating Regex objects?

**Ans**

The re.compile() function returns Regex objects

In [1]:
import re 

p = re.compile('[a-e]') 
print(p.findall("Hello World")) 

['e', 'd']


### 2. Why do raw strings often appear in Regex objects?

**Ans**

Raw strings are used so that backslashes do not have to be escaped.

### 3. What is the return value of the search() method?

**Ans**

The search() method returns Match objects.

In [2]:
txt = "Hello World!"
x = re.search("World", txt)
x

<re.Match object; span=(6, 11), match='World'>

### 4. From a Match item, how do you get the actual strings that match the pattern?

**Ans**

The group() method returns strings of the matched text.

In [3]:
string = '39801 356, 2102 1111'

pattern = '(\d{3}) (\d{2})'
match = re.search(pattern, string) 

print(match.group())

801 35


### 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?

**Ans**

Group 0 is the entire match, group 1 covers the first set of parentheses, and group 2 covers the second set of parentheses.

In [4]:
string = '39801-356-2102'

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
match = re.search(pattern, string) 

print(match.group(0))
print(match.group(1))
print(match.group(2))

801-356-2102
801
356-2102


### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

**Ans**

Periods and parentheses can be escaped with a backslash

In [5]:
string = '39801.356.2102'

pattern = r'(\d\d\d)\.(\d\d\d\.\d\d\d\d)'
match = re.search(pattern, string) 

print(match.group(0))
print(match.group(1))
print(match.group(2))

801.356.2102
801
356.2102


### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

**Ans**

If the regex has no groups, a list of strings is returned. If the regex has groups, a list of tuples of strings is returned.

In [6]:
txt = "Hello world"
x = re.findall("ello", txt)
print(x)

['ello']


In [7]:
string = '39801.356.2102.123.456.7890'

pattern = r'(\d\d\d)\.(\d\d\d\.\d\d\d\d)'

x = re.findall(pattern, string)
print(x)

[('801', '356.2102'), ('123', '456.7890')]


### 8. In standard expressions, what does the | character mean?

**Ans**

The | character signifies matching "either, or" between two groups.

In [8]:
txt = "Hello world!"
x = re.findall("Hello|bye", txt)

print(x)

['Hello']


### 9. In regular expressions, what does the? character stand for?

**Ans**

The ? character can either mean "match zero or one of the preceding group" or be used to signify nongreedy matching.

In [9]:
txt = "a"
x = re.findall("ab?", txt)

print(x)

['a']


### 10.regular expressions, what is the difference between the + and * characters?

**Ans**

The + matches one or more. 

In [10]:
x = re.findall("ab+", "ab")
print(x)

x = re.findall("ab+", "a")
print(x)

['ab']
[]


The * matches zero or more.

In [11]:
x = re.findall("ab*", "ab")
print(x)

x = re.findall("ab*", "a")
print(x)

['ab']
['a']


### 11. What is the difference between {4} and {4,5} in regular expression?

**Ans**

The {4} matches exactly four instances of the preceding group. 

In [12]:
p = re.compile('[0-9]{4}')
print(p.findall("1234567890")) 

['1234', '5678']


The {4,5} matches between four and five instances.

In [13]:
p = re.compile('[0-9]{4,5}')                              
print(p.findall("1234567890"))

['12345', '67890']


### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

**Ans**

The \d, \w, and \s shorthand character classes match a single digit, word, or space character, respectively.

In [14]:
string = 'abc123a bc'

pattern = r'\d\w\s'

x = re.findall(pattern, string)
print(x)

['3a ']


### 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

**Ans**

The \D, \W, and \S shorthand character classes match a single character that is not a digit, word, or space character, respectively.

In [15]:
string = 'abc123a bc'

pattern = r'\D\W\S'

x = re.findall(pattern, string)
print(x)

['a b']


### 14. What is the difference between . and .?

**Ans**

The . performs a greedy match, and the .? performs a nongreedy match.

In [16]:
re.findall('a+', 'aaaa')

['aaaa']

In [17]:
re.findall('a+?', 'aaaa')

['a', 'a', 'a', 'a']

In [18]:
re.findall('a*', 'aaaa')

['aaaa', '']

In [19]:
re.findall('a*?', 'aaaa')

['', 'a', '', 'a', '', 'a', '', 'a', '']

In [20]:
re.findall('a?', 'aaaa')

['a', 'a', 'a', 'a', '']

In [21]:
re.findall('a??', 'aaaa')

['', 'a', '', 'a', '', 'a', '', 'a', '']

### 15. What is the syntax for matching both numbers and lowercase letters with a character class?

**Ans**

Either [0-9a-z] or [a-z0-9]

In [22]:
p = re.compile('[0-9a-z]|[a-z0-9]')
print(p.findall("Hello World 123 @#")) 

['e', 'l', 'l', 'o', 'o', 'r', 'l', 'd', '1', '2', '3']


### 16. What is the procedure for making a normal expression in regax case insensitive?

**Ans**

Passing re.I or re.IGNORECASE as the second argument to re.compile() will make the matching case insensitive.

In [23]:
p = re.compile('hello', re.I)
print(p.findall("Hello World")) 

['Hello']


### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

**Ans**

The . character normally matches any character except the newline character. If re.DOTALL is passed as the second argument to re.compile(), then the dot will also match newline characters.

In [24]:
p = re.compile('.')
print(p.findall("Hello\nWorld")) 

['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']


In [25]:
p = re.compile('.', re.DOTALL)
print(p.findall("Hello\nWorld")) 

['H', 'e', 'l', 'l', 'o', '\n', 'W', 'o', 'r', 'l', 'd']


### 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

**Ans**

In [26]:
numReg = re.compile(r'\d+')

In [27]:
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') 

'X drummers, X pipers, five rings, X hen'

### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

**Ans**

The re.VERBOSE argument allows you to add whitespace and comments to the string passed to re.compile().

In [28]:
# Without Using VERBOSE 
regex_email1 = re.compile(r'^([a-z0-9_\.-]+)@([0-9a-z\.-]+)\.([a-z\.]{2, 6})$', 
                         re.IGNORECASE) 

# Using VERBOSE 
regex_email2 = re.compile(r""" 
            ^([a-z0-9_\.-]+)             # local Part 
            @                            # single @ sign 
            ([0-9a-z\.-]+)               # Domain name 
            \.                           # single Dot . 
            ([a-z]{2,6})$                # Top level Domain 
            """,re.VERBOSE | re.IGNORECASE) 

print(regex_email1.findall("example@mail.com"))
print(regex_email2.findall("example@mail.com")) 

[]
[('example', 'mail', 'com')]


### 20. How would you write a regex that matche a number with comma for every three digits? It must match the given following:

'42'<br>
'1,234'<br>
'6,368,745'<br>

but not the following:<br>

'12,34,567' (which has only two digits between the commas)<br>
'1234' (which lacks commas)


**Ans**

In [29]:
p = re.compile(r'^\d{1,3}(,\d{3})*$') 
print(p.findall("42")) 
print(p.findall("1,234")) 
print(p.findall("6,368,745")) 

print(p.findall("12,34,567")) 
print(p.findall("1234")) 

['']
[',234']
[',745']
[]
[]


### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

'Haruto Watanabe'<br>
'Alice Watanabe'<br>
'RoboCop Watanabe'<br>

but not the following:<br>

'haruto Watanabe' (where the first name is not capitalized)<br>
'Mr. Watanabe' (where the preceding word has a nonletter character)<br>
'Watanabe' (which has no first name)<br>
'Haruto watanabe' (where Watanabe is not capitalized)


**Ans**

In [30]:
p = re.compile(r'[A-Z][a-z]*\sWatanabe')

print(p.findall("Haruto Watanabe")) 
print(p.findall("Alice Watanabe")) 
print(p.findall("RoboCop Watanabe")) 

print(p.findall("haruto Watanabe")) 
print(p.findall("Mr. Watanabe")) 
print(p.findall("Watanabe")) 
print(p.findall("Haruto watanabe")) 

['Haruto Watanabe']
['Alice Watanabe']
['Cop Watanabe']
[]
[]
[]
[]


22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

'Alice eats apples.'<br>
'Bob pets cats.'<br>
'Carol throws baseballs.'<br>
'Alice throws Apples.'<br>
'BOB EATS CATS.'<br>

but not the following:

'RoboCop eats apples.'<br>
'ALICE THROWS FOOTBALLS.'<br>
'Carol eats 7 cats.'


**Ans**

In [31]:
p = re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.', re.IGNORECASE)

print(p.findall("Alice eats apples.")) 
print(p.findall("Bob pets cats.")) 
print(p.findall("Carol throws baseballs.")) 
print(p.findall("Alice throws Apples."))
print(p.findall("BOB EATS CATS."))

print(p.findall("RoboCop eats apples.")) 
print(p.findall("ALICE THROWS FOOTBALLS.")) 
print(p.findall("Carol eats 7 cats."))

[('Alice', 'eats', 'apples')]
[('Bob', 'pets', 'cats')]
[('Carol', 'throws', 'baseballs')]
[('Alice', 'throws', 'Apples')]
[('BOB', 'EATS', 'CATS')]
[]
[]
[]
