# Python Basics | Assignment 7

# Q1.
### What is the name of the feature responsible for generating Regex objects?

The feature responsible for generating regular expression objects is the **re** module. This module provides functions and methods for working with regular expressions in Python. 

# Q2.
### Why do raw strings often appear in Regex objects?

Raw strings are commonly used in regular expressions (Regex) to handle backslashes and special characters more easily. A raw string literal is denoted by the **r** prefix before the string.
Raw strings treat backslashes **(\)** as literal characters and do not interpret them as escape characters. 

# Q3.
### What is the return value of the search() method?

The **search()** method from the **re** module in Python returns a match object if a match is found, or None if no match is found. The match object contains information about the matched substring, starting and endling location etc. and there are different method to access these values
###### Below is the **Example**

In [3]:
import re

text = "Hello, Python is awesome!"
pattern = r"Python"

match = re.search(pattern, text)

if match:
    print("Match found:", match.group())
    print("Start position:", match.start())
    print("End position:", match.end())
    print("Start and end positions:", match.span())
else:
    print("No match found.")


Match found: Python
Start position: 7
End position: 13
Start and end positions: (7, 13)


# Q4.
### From a Match item, how do you get the actual strings that match the pattern?

To retrieve the actual strings that match the pattern from a match object, we use the **group()** method. The **group()** method without any arguments returns the entire matched substring.
###### Below is the **Example**

In [2]:
import re

text = "Hello, Python is awesome!"
pattern = r"Python"

match = re.search(pattern, text)

if match:
    print("Match found:", match.group())
else:
    print("No match found.")


Match found: Python


# Q5.
### In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover?Group 2? Group 1?

In the regular expression r'(\d\d\d)-(\d\d\d-\d\d\d\d)', the groups are defined by the parentheses. 
- **Group Zero** covers the entire matched substring, including all the characters that match the pattern.
- **Group One** refers to the portion of the match enclosed in the first set of parentheses (\d\d\d). It captures and represents a three-digit sequence.
- **Group Two** refers to the part of the match enclosed in the second set of parentheses (\d\d\d-\d\d\d\d). It captures and represents a seven-character sequence in the format of three digits followed by a hyphen and four digits.
###### below is the **Example**

In [4]:
import re

text = "Phone numbers: 123-456-7890, 987-654-3210"
pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'

match = re.search(pattern, text)

if match:
    print("Group 0 (entire match):", match.group(0))
    print("Group 1:", match.group(1))
    print("Group 2:", match.group(2))
else:
    print("No match found.")


Group 0 (entire match): 123-456-7890
Group 1: 123
Group 2: 456-7890


# Q6.
### In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

 We can use the backslash '\' as an escape character. By preceding a metacharacter with a backslash, we can tell the regex engine to treat it as a literal character instead of its special meaning.
 ###### Below is the **Example**

In [9]:
import re

text = "I have (some) parentheses and a period."
pattern = r'\(some\) parentheses and a period\.'

match = re.search(pattern, text)

if match:
    print("Match found:", match.group())
else:
    print("No match found.")


Match found: (some) parentheses and a period.


# Q7.
### The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

The **findall()** method from the re module in Python returns a list of strings when the regular expression pattern being searched does not contain any capturing groups **(parentheses)**.
If the pattern contains one or more capturing groups, the **findall()** method returns a list of tuples

# Q8.
### In standard expressions, what does the | character mean?

The vertical bar **|** is used as the bitwise OR operator. It performs a bitwise OR operation on the binary representations of two integers

In [19]:
a = 5    # binary: 0101
b = 3    # binary: 0011
result = a | b    # binary: 0111 (decimal: 7)
print(result)    # Output: 7


7


# Q9.
### In regular expressions, what does the character stand for?

In regular expressions, the **|** character, known as the vertical bar, represents the logical OR operator. It allows you to specify multiple alternatives within a regular expression pattern. The **|** character acts as a separator between the alternatives, indicating that the pattern should match any of the alternatives.
###### Below is the **Example**

In [18]:
import re

text = "I love cats and dogs."

pattern = r"cats|dogs"  # Matches either "cats" or "dogs"

matches = re.findall(pattern, text)

print("Matches:", matches)


Matches: ['cats', 'dogs']


# Q10.
### In regular expressions, what is the difference between the + and * characters?

n regular expressions, both the + and * characters are used to quantify the preceding element in the pattern, but they have different meanings:

*  `*` (Asterisk):
The * character means "zero or more occurrences" of the preceding element. In other words, it allows the preceding element to appear any number of times, including zero times. For example, the pattern a* matches strings like "aa", "aaa", "aaaa", and even an empty string.
+  `+` (Plus):
The + character means "one or more occurrences" of the preceding element. It requires the preceding element to appear at least once in the string. For example, the pattern a+ matches strings like "a", "aa", "aaa", but not an empty string.

# Q11.
### What is the difference between {4} and {4,5} in regular expression?

{4}:

This specifies that the preceding element should occur exactly 4 times.
For example, the pattern [0-9]{4} would match any sequence of exactly four digits.

{4,5}:

This specifies that the preceding element should occur between 4 and 5 times, inclusive.
For example, the pattern [0-9]{4,5} would match sequences of either four or five digits.

# Q12.
### What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

1. \d:
This shorthand character class represents any digit character (0-9). It is equivalent to the character class [0-9]. So, \d matches any single digit.
For example, the pattern \d{3} would match any sequence of three consecutive digits.
2. \w:
This shorthand character class represents any word character. A word character is typically an alphanumeric character (a-z, A-Z, 0-9) or an underscore (_). It is equivalent to the character class [a-zA-Z0-9_].
For example, the pattern \w+ would match one or more consecutive word characters.
3. \s:
This shorthand character class represents any whitespace character, including spaces, tabs, and newline characters. It is equivalent to the character class [\t\n\f\r ].
For example, the pattern \s+ would match one or more consecutive whitespace characters.

# Q13.
### What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

1. \D:
This shorthand character class represents any character that is not a digit. It is the opposite of \d. It matches any character except the digits 0 to 9.
For example, the pattern \D{3} would match any sequence of three consecutive non-digit characters.
2. \W:
This shorthand character class represents any character that is not a word character. It is the opposite of \w. It matches any character that is not an alphanumeric character (a-z, A-Z, 0-9) or an underscore (_).
For example, the pattern \W+ would match one or more consecutive non-word characters.
3. \S:
This shorthand character class represents any character that is not a whitespace character. It is the opposite of \s. It matches any character that is not a space, tab, newline, carriage return, or form feed.
For example, the pattern \S+ would match one or more consecutive non-whitespace characters.`

# Q14.
### What is the difference between .*? and .?

* .*?:
.*? is a non-greedy or lazy quantifier. It matches any sequence of characters (including none) but in a non-greedy way. This means it will try to match as few characters as possible while still allowing the rest of the pattern to match.
For example, in the pattern a.*?b, if you apply it to the string "aabcab", it would match "aab" instead of the entire string "aabcab".
* .?:
.? is a quantifier that matches the preceding element (in this case, . which represents any character) zero or one time. It's essentially saying "match zero or one of the preceding character or element".
For example, the pattern a.?b would match both "ab" and "a" followed by "b", making the character between "a" and "b" optional.


# Q15.
### What is the syntax for matching both numbers and lowercase letters with a character class?

To match both numbers and lowercase letters using a character class in a regular expression, you can combine the character ranges for numbers (0-9) and lowercase letters (a-z). The syntax for this is:
`[0-9a-z]`

# Q16.
### What is the procedure for making a normal expression in regax case insensitive?

To make a regular expression case-insensitive, you can use the appropriate flags or modifiers provided by the programming language or regular expression library you are using. 

 ###### Below is the **Example**

In [1]:
import re

pattern = r"example"
text = "Example text"
matches = re.findall(pattern, text, re.IGNORECASE)  
print(matches)  

['Example']


# Q17.
### What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

In a regular expression, the . character normally matches any character except a newline (\n). It's often used as a wildcard to represent any character within a pattern.

If we pass the re.DOTALL flag (or re.S as an alias) as the second argument when compiling a regular expression using the re.compile() function, it changes the behavior of the . character to match any character, including newline characters.

 ###### Below is the **Example**

In [3]:
import re

text = "Hello\nWorld"

# Without DOTALL
pattern1 = re.compile(r".+")
result1 = pattern1.findall(text)
print(result1)  

# With DOTALL
pattern2 = re.compile(r".+", re.DOTALL)
result2 = pattern2.findall(text)
print(result2) 


['Hello', 'World']
['Hello\nWorld']


# Q18.
### If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4 hen&#39;) return?

# Q19.
### What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

# Q20.
### How would you write a regex that match a number with comma for every three digits? It must match the given following:
### &#39;42&#39;
### &#39;1,234&#39;
### &#39;6,368,745&#39;
### but not the following:
### &#39;12,34,567&#39; (which has only two digits between the commas)
### &#39;1234&#39; (which lacks commas)

# Q21.
### How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
### &#39;Haruto Watanabe&#39;
### &#39;Alice Watanabe&#39;
### &#39;RoboCop Watanabe&#39;
### but not the following:
### &#39;haruto Watanabe&#39; (where the first name is not capitalized)
### &#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)
### &#39;Watanabe&#39; (which has no first name)
### &#39;Haruto watanabe&#39; (where Watanabe is not capitalized)

# Q22.
### How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
### &#39;Alice eats apples.&#39;
### &#39;Bob pets cats.&#39;
### &#39;Carol throws baseballs.&#39;
### &#39;Alice throws Apples.&#39;
### &#39;BOB EATS CATS.&#39;
### but not the following:
### &#39;RoboCop eats apples.&#39;
### &#39;ALICE THROWS FOOTBALLS.&#39;
### &#39;Carol eats 7 cats.&#39;