# Assignment 7 Solutions

#### 1. What is the name of the feature responsible for generating Regex objects?
**Ans:** The feature responsible for generating Regex objects in Python is called the `"re"` module, which is part of Python's standard library. The "re" module provides support for working with regular expressions (regex) in Python, including functions for `creating`, `compiling`, and using regex patterns to search, match, and manipulate strings.

In [4]:
import re
x = re.compile("some_random_pattern")
print(type(x))
print(x)

<class 're.Pattern'>
re.compile('some_random_pattern')


#### 2. Why do raw strings often appear in Regex objects?
**Ans:** Raw strings in Regex objects are used to avoid Python's default escape character interpretation, ensuring that special characters in regular expressions are treated literally rather than as escape sequences. This helps maintain the integrity of the regex pattern. 

For example, consider the regex pattern \d\d-\d\d-\d\d\d\d to match a date in the format "dd-mm-yyyy". If you use it as a regular string without the r prefix, you would need to escape each backslash, like this: \\d\\d-\\d\\d-\\d\\d\\d\\d. However, with a raw string, you can write it more simply as r'\d\d-\d\d-\d\d\d\d'. Using raw strings in regex objects makes the patterns more readable and less error-prone, as you don't have to worry about double escaping backslashes. It's a common practice in Python when dealing with regular expressions.

#### 3. What is the return value of the search() method?
**Ans:** The return value of `re.search(pattern,string)` method is a match object if the pattern is observed in the string else it returns a None

In [5]:
# Example
import re

# Define a pattern
pattern = r'apple'

# Define a string to search within
text = 'I have an apple and a banana.'

# Search for the pattern within the text
match = re.search(pattern, text)

# Check if a match is found
if match:
    print("Match found:", match.group())
else:
    print("No match found.")


Match found: apple


#### 4. From a Match item, how do you get the actual strings that match the pattern?
**Ans:** To get the actual strings that match the pattern from a Match object, you can use the `group()` method. If you want to retrieve all matches, you can use `groups()` or `group(0)` for the entire match, and `group(1)`, `group(2)`, etc. for specific capturing groups within the pattern.

In [6]:
import re

# Define a pattern with capturing groups
pattern = r'(\d{2})-(\d{2})-(\d{4})'

# Define a string to search within
text = 'Today is 03-04-2024, tomorrow will be 03-05-2024.'

# Search for the pattern within the text
match = re.search(pattern, text)

# Check if a match is found
if match:
    # Get the entire match
    entire_match = match.group(0)
    print("Entire match:", entire_match)

    # Get the first capturing group (day)
    day = match.group(1)
    print("Day:", day)

    # Get the second capturing group (month)
    month = match.group(2)
    print("Month:", month)

    # Get the third capturing group (year)
    year = match.group(3)
    print("Year:", year)
else:
    print("No match found.")




Entire match: 03-04-2024
Day: 03
Month: 04
Year: 2024


#### 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?
**Ans:** In the Regex **`r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`** the zero group covers the entire pattern match where as the first group cover **`(\d\d\d)`** and the second group cover **`(\d\d\d-\d\d\d\d)`**

In [7]:
# Example
import re
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 111-222-3333.')
print(mo.groups()) # Prints all groups in a tuple format
print(mo.group()) # Always returns the fully matched string 
print(mo.group(1)) # Returns the first group
print(mo.group(2)) # Returns the second group

('111', '222-3333')
111-222-3333
111
222-3333


#### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?
**Ans:**  **`\(`** and **`\)`** escape characters in the raw string passed to re.compile() will match actual parenthesis characters.

In [8]:
# Example Program
import re
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is (111) 222-3333.')
print(mo.group())

(111) 222-3333


#### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?
**Ans:** If the regex pattern has no groups, an empty list is returned. if the regex pattern has groups, a list of strings is returned.

In [9]:
# Example 1
import re
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.findall('My phone number is (1112) 222-3333.')
print(mo)

# Example 2
import re
phoneNumRegex = re.compile(r'\d{3}-\d{3}-\d{4}')
mo = phoneNumRegex.findall('My number are 111-222-3333 and 222-333-4444')
print(mo) # Prints all groups in a tuple format

[]
['111-222-3333', '222-333-4444']


#### 8. In standard expressions, what does the | character mean?
**Ans:** In Standard Expressions `|` means `OR` operator.

#### 9. In regular expressions, what does the `?` character stand for?
**Ans:** In regular Expressions, `?` characters represents zero or one match of the preceeding group.

In [10]:
# Example
import re
match_1 = re.search("Bat(wo)?man","Batman returns") # zero characters match for `wo`
print(match_1)
match_2 = re.search("Bat(wo)?man","Batwoman returns") # one characters match for `wo`
print(match_2)

<re.Match object; span=(0, 6), match='Batman'>
<re.Match object; span=(0, 8), match='Batwoman'>


#### 10.In regular expressions, what is the difference between the + and * characters?
**Ans:** In Regular Expressions, `*` Represents `Zero ore more` occurances of the preceeding group, whereas `+` represents `one or more` occurances of the preceeding group.

In [11]:
# Example
import re
match_1 = re.search("Bat(wo)*man","Batman returns")
print(match_1)
match_2 = re.search("Bat(wo)+man","Batman returns")
print(match_2)

<re.Match object; span=(0, 6), match='Batman'>
None


#### 11. What is the difference between {4} and {4,5} in regular expression?
**Ans:** `{4}` means that its preceeding group should repeat 4 times. where as `{4,5}` means that its preceeding group should repeat mininum 4 times and maximum 5 times inclusively

In [12]:
# Example
import re
haRegex = re.compile(r'(Ha){3}')
mo1 = haRegex.search('HaHaHa') # Pattern HaHaHa is matched with HaHaHa string
mo2 = haRegex.search('Ha') # Pattern HaHaHa is not matched with Ha string
print(mo1.group())
print(mo2)

HaHaHa
None


#### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?
**Ans:** \d, \w and \s are special sequences in regular expresssions in python:
1. **`\w`** – Matches a word character equivalent to [a-zA-Z0-9_]. Word characters typically include alphanumeric characters (letters and digits) as well as underscores.
2. **`\d`** – Matches digit character equivalent to [0-9]
3. **`\s`** – Matches whitespace character (space, tab, newline, etc.)

#### 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?
**Ans:** \D, \W and \S are special sequences in regular expresssions in python:
1. **`\W`** – Matches any non-alphanumeric character equivalent to [^a-zA-Z0-9_]
2. **`\D`** – Matches any non-digit character, this is equivalent to the set class [^0-9]
3. **`\S`** – Matches any non-whitespace character

#### 14. What is the difference between `.*?` and `.*`?
**Ans:** **`.*`** is a Greedy mode, which returns the longest string that meets the condition. Whereas **`.*?`** is a non greedy mode which returns the shortest string that meets the condition.

In [13]:
# Example
import re

text = "Hello, world! Let's try this for our example"

pattern_1 = r'H.*?o'
match_1 = re.search(pattern_1, text)

if match_1:
    print("Match using .*?:", match_1.group())
else:
    print("No match found.")
    
pattern_2 = r'H.*o'
match_2 = re.search(pattern_2, text)

if match_2:
    print("Match using .*:", match_2.group())
else:
    print("No match found.")

Match using .*?: Hello
Match using .*: Hello, world! Let's try this for o


#### 15. What is the syntax for matching both numbers and lowercase letters with a character class?
**Ans:** The Synatax is Either **`[a-z0-9]`** or **`[0-9a-z]`**

In [14]:
# Example
import re

text = "abc123xyz"
pattern = r'[0-9a-z]+'

matches = re.findall(pattern, text)
print(matches)  # Output: ['abc123xyz']


['abc123xyz']


#### 16. What is the procedure for making a normal expression in regax case insensitive?
**Ans:** We can pass **`re.IGNORECASE`** as a flag to make a noraml expression case insensitive

#### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?
**Ans:** 
In Python's re module, the `. (dot)` character normally matches any character except a newline character (\n). However, if the re.DOTALL flag is passed as the second argument to re.compile(), then the . character matches any character, including newline characters.

In [15]:
# Example
import re

text = "Line 1\nLine 2"

# Normal behavior of .
pattern_normal = r'.+'
regex_normal = re.compile(pattern_normal)
match_normal = regex_normal.match(text)

print("Normal behavior of .:", match_normal.group())  # Output: Line 1

# Using re.DOTALL
pattern_dotall = r'.+'
regex_dotall = re.compile(pattern_dotall, re.DOTALL)
match_dotall = regex_dotall.match(text)

print("Using re.DOTALL:", match_dotall.group())  # Output: Line 1\nLine 2


Normal behavior of .: Line 1
Using re.DOTALL: Line 1
Line 2


#### 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?
**Ans:** The Ouput will be **`'X drummers, X pipers, five rings, X hen'`**

In [16]:
import re
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

#### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?
**Ans:** Passing re.VERBOSE as the second argument to re.compile() allows for the use of verbose mode in regular expressions. In verbose mode, whitespace within the regular expression pattern is ignored (except when in a character class or escaped by a backslash), and comments are allowed.

In [17]:
import re

# Regular expression without VERBOSE mode
pattern_normal = r'\d{3}-\d{2}-\d{4}'

# Regular expression with VERBOSE mode
pattern_verbose = r'''
    \d{3}   # Match three digits
    -       # Match a hyphen
    \d{2}   # Match two digits
    -       # Match a hyphen
    \d{4}   # Match four digits
'''

text = "123-45-6789"

# Compile regular expressions
regex_normal = re.compile(pattern_normal)
regex_verbose = re.compile(pattern_verbose, re.VERBOSE)

# Match using both regular expressions
match_normal = regex_normal.match(text)
match_verbose = regex_verbose.match(text)

# Output the matched groups
print("Normal pattern match:", match_normal.group())  # Output: 123-45-6789
print("Verbose pattern match:", match_verbose.group())  # Output: 123-45-6789


Normal pattern match: 123-45-6789
Verbose pattern match: 123-45-6789


#### 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
`'42','1,234', '6,368,745'`but not the following: `'12,34,567'` (which has only two digits between the commas) `'1234'` (which lacks commas)

In [18]:
import re
pattern = r'^\d{1,3}(,\d{3})*$' # start with (3 digits max) and ends with 0  or more (, and 3 dighits)
pagex = re.compile(pattern)
for ele in ['42','1,234', '6,368,745','12,34,567','1234']:
    print('Output:',ele, '->', pagex.search(ele))

Output: 42 -> <re.Match object; span=(0, 2), match='42'>
Output: 1,234 -> <re.Match object; span=(0, 5), match='1,234'>
Output: 6,368,745 -> <re.Match object; span=(0, 9), match='6,368,745'>
Output: 12,34,567 -> None
Output: 1234 -> None


#### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
`'Haruto Watanabe'`  
`'Alice Watanabe'`  
`'RoboCop Watanabe'`  

but not the following:

`'haruto Watanabe'` (where the first name is not capitalized)  
`'Mr. Watanabe'` (where the preceding word has a nonletter character)  
`'Watanabe'` (which has no first name)  
`'Haruto watanabe'` (where Watanabe is not capitalized)  

**Ans:** **`pattern = r'[A-Z]{1}[a-z]*\sWatanabe'`**

In [19]:
import re
pattern = r'[A-Z]{1}[a-z]*\sWatanabe'
namex = re.compile(pattern)
for name in ['Haruto Watanabe','Alice Watanabe','RoboCop Watanabe','haruto Watanabe','Mr. Watanabe','Watanabe','Haruto watanabe']:
    print('Output: ',name,'->',namex.search(name))

Output:  Haruto Watanabe -> <re.Match object; span=(0, 15), match='Haruto Watanabe'>
Output:  Alice Watanabe -> <re.Match object; span=(0, 14), match='Alice Watanabe'>
Output:  RoboCop Watanabe -> <re.Match object; span=(4, 16), match='Cop Watanabe'>
Output:  haruto Watanabe -> None
Output:  Mr. Watanabe -> None
Output:  Watanabe -> None
Output:  Haruto watanabe -> None


#### 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob,or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
`'Alice eats apples.'`  
`'Bob pets cats.'`  
`'Carol throws baseballs.'`  
`'Alice throws Apples.'`  
`'BOB EATS CATS.'`  

but not the following:  

`'RoboCop eats apples.'`  
`'ALICE THROWS FOOTBALLS.'`   
`'Carol eats 7 cats.'`  

**Ans:** pattern = **`r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.'`**

In [20]:
import re
pattern = r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.'
casex = re.compile(pattern,re.IGNORECASE)
for ele in ['Alice eats apples.','Bob pets cats.','Carol throws baseballs.','Alice throws Apples.','BOB EATS CATS.','RoboCop eats apples.'
,'ALICE THROWS FOOTBALLS.','Carol eats 7 cats.']:
    print('Output: ',ele,'->',casex.search(ele))

Output:  Alice eats apples. -> <re.Match object; span=(0, 18), match='Alice eats apples.'>
Output:  Bob pets cats. -> <re.Match object; span=(0, 14), match='Bob pets cats.'>
Output:  Carol throws baseballs. -> <re.Match object; span=(0, 23), match='Carol throws baseballs.'>
Output:  Alice throws Apples. -> <re.Match object; span=(0, 20), match='Alice throws Apples.'>
Output:  BOB EATS CATS. -> <re.Match object; span=(0, 14), match='BOB EATS CATS.'>
Output:  RoboCop eats apples. -> None
Output:  ALICE THROWS FOOTBALLS. -> None
Output:  Carol eats 7 cats. -> None
