# <center>PythonBasics: Assignment_07</center>

### Question 01:
What is the name of the feature responsible for generating Regex objects?

**<span style='color:blue'>Answer</span>**

The feature responsible for generating Regex objects in Python is the re module. The re module provides functions and methods for working with regular expressions.

### Question 02:
Why do raw strings often appear in Regex objects?

**<span style='color:blue'>Answer</span>**


Raw strings are often used in Regex objects to avoid unwanted escape sequences. In Python, backslashes (\) are used to escape special characters in string literals. However, in regular expressions, backslashes are also used as escape characters for various special characters and sequences.

By using a raw string (prefixing the string literal with r), backslashes in the string are treated as literal backslashes and are not used to escape any characters. This is particularly useful in regular expressions, where backslashes are commonly used to escape special characters or represent special sequences.

### Question 03:
What is the return value of the search() method?

**<span style='color:blue'>Answer</span>**


The search() method of a regular expression object returns a match object if a match is found, or None if no match is found.

A match object contains information about the match, such as the matched string, the starting and ending positions of the match, and any captured groups. It provides various methods and attributes to access and manipulate the matched data.

If the search() method finds a match in the target string, it returns the corresponding match object. This allows you to perform further operations on the match, such as extracting the matched string or accessing captured groups.

If the search() method does not find a match, it returns None, indicating that no match was found in the target string.

### Question 04:
From a Match item, how do you get the actual strings that match the pattern?

**<span style='color:blue'>Answer</span>**


To get the actual strings that match a pattern from a Match object in Python, you can use the group() method. The group() method returns the substring that was matched by the pattern.

By default, group(0) returns the entire matched substring. You can also use group(1), group(2), and so on to access captured groups if your pattern contains capturing groups defined with parentheses.

In [17]:
import re

pattern = r'(\d+)-(\d+)-(\d+)'  # Pattern to match a date in the format "YYYY-MM-DD"
text = 'Date: 2023-06-08'

match = re.search(pattern, text)
if match:
    full_match = match.group(0)
    year = match.group(1)
    month = match.group(2)
    day = match.group(3)

    print("Full match:", full_match)
    print("Year:", year)
    print("Month:", month)
    print("Day:", day)
else:
    print("No match found.")


Full match: 2023-06-08
Year: 2023
Month: 06
Day: 08


### Question 05:
In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover?
Group 2? Group 1?

**<span style='color:blue'>Answer</span>**

Group 0: The entire matched substring.

Group 1: The first group, (\d\d\d), which matches three digits.

Group 2: The second group, (\d\d\d-\d\d\d\d), which matches a sequence of three digits followed by a hyphen and then four more digits.

### Question 06:
In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell
a regex that you want it to fit real parentheses and periods?

**<span style='color:blue'>Answer</span>**

In regular expression syntax, parentheses and periods have special meanings, so if you want to match them as literal characters, you can use backslashes to escape them. By adding a backslash before parentheses or periods, you tell the regex engine to interpret them as literal characters rather than special regex symbols.

In [18]:
import re

text = "(hello)."
pattern = r"\(hello\)\."

match = re.search(pattern, text)
if match:
    print("Match found.")
else:
    print("No match found.")


Match found.


### Question 07:
The findall() method returns a string list or a list of string tuples. What causes it to return one of
the two options?

**<span style='color:blue'>Answer</span>**


The findall() method in regular expressions returns a list of all non-overlapping matches of the pattern in the input string. The return value depends on the presence of capturing groups (parentheses) in the pattern.

If the pattern contains capturing groups, findall() returns a list of string tuples. Each tuple corresponds to a match, and each element of the tuple represents a captured group. The first element (index 0) of each tuple represents the entire match.

If the pattern does not contain any capturing groups, findall() returns a list of strings. Each string in the list represents a complete match.

### Question 08:
In standard expressions, what does the | character mean?

**<span style='color:blue'>Answer</span>**


In regular expressions, the ` | `character is used as the logical OR operator. It allows you to specify multiple alternative patterns, and it matches any one of the patterns.

The `|`character separates the alternative patterns, and the regular expression engine tries to match each pattern in the order they are specified. Once a pattern matches, the rest of the alternatives are not tested.

### Question 09:
In regular expressions, what does `.` the character stand for?

**<span style='color:blue'>Answer</span>**



In regular expressions, the `.` (dot) character is a special metacharacter that matches any single character except a newline character. It can be used to represent any character in a pattern.

For example:

The pattern `b.t` will match strings like "bat", "bit", "but", "b#t", etc., where the . matches any single character.

The pattern `a.b` will match strings like "axb", "aab", "azb", etc., where the . matches any single character.

The pattern `...` will match any three-character string, regardless of the specific characters.

### Question 10:
In regular expressions, what is the difference between the + and * characters?

**<span style='color:blue'>Answer</span>**


In regular expressions, the `+` and `*` characters are quantifiers used to specify the repetition of a preceding element or group in a pattern. The main difference between them is as follows:

1. `+` (Plus): It matches one or more occurrences of the preceding element. The preceding element must appear at least once for a match to occur. For example, the pattern `a+` matches one or more consecutive occurrences of the letter "a" (e.g., "a", "aa", "aaa", etc.).

2. `*` (Asterisk): It matches zero or more occurrences of the preceding element. The preceding element can appear zero times or multiple times. It allows for matches even if the preceding element is absent. For example, the pattern `a*` matches zero or more consecutive occurrences of the letter "a" (e.g., "", "a", "aa", "aaa", etc.).



### Question 11:
What is the difference between {4} and {4,5} in regular expression?

**<span style='color:blue'>Answer</span>**


In regular expressions, the curly braces `{}` are used to specify the exact number of occurrences of the preceding element or group. The difference between `{4}` and `{4,5}` is as follows:

1. `{4}`: It specifies that the preceding element must occur exactly four times. For example, the pattern `a{4}` matches only when the letter "a" appears exactly four times consecutively (e.g., "aaaa").

2. `{4,5}`: It specifies that the preceding element must occur between four and five times, inclusive. It allows for a range of possible occurrences. For example, the pattern `a{4,5}` matches when the letter "a" appears either four or five times consecutively (e.g., "aaaa", "aaaaa").

Thus, `{4}` sets an exact count of occurrences, while `{4,5}` specifies a range of possible occurrences.

### Question 12:
What do you mean by the \d, \w, and \s shorthand character classes signify in regular
expressions?

**<span style='color:blue'>Answer</span>**

In regular expressions, the shorthand character classes `\d`, `\w`, and `\s` have specific meanings:

1. `\d`: It represents any digit character. It is equivalent to the character class `[0-9]`. For example, the pattern `\d` matches any single digit from 0 to 9.

2. `\w`: It represents any word character. It matches alphanumeric characters (letters and digits) and underscores. It is equivalent to the character class `[a-zA-Z0-9_]`. For example, the pattern `\w` matches any single alphanumeric character or underscore.

3. `\s`: It represents any whitespace character. It matches spaces, tabs, newlines, and other whitespace characters. For example, the pattern `\s` matches a single whitespace character.

### Question 13:
What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

**<span style='color:blue'>Answer</span>**


In regular expressions, the shorthand character classes `\D`, `\W`, and `\S` have specific meanings:

1. `\D`: It represents any non-digit character. It is the inverse of the `\d` shorthand character class. It matches any character that is not a digit. For example, the pattern `\D` matches any single character that is not a digit.

2. `\W`: It represents any non-word character. It is the inverse of the `\w` shorthand character class. It matches any character that is not an alphanumeric character or underscore. For example, the pattern `\W` matches any single character that is not a word character.

3. `\S`: It represents any non-whitespace character. It is the inverse of the `\s` shorthand character class. It matches any character that is not a whitespace character. For example, the pattern `\S` matches any single character that is not a whitespace character.

### Question 14:
What is the difference between .*? and .*?

**<span style='color:blue'>Answer</span>**


In regular expressions, the `.` (dot) character is a special metacharacter that matches any character except a newline character. 

The `.?` sequence in a regular expression means matching zero or one occurrence of any character except a newline. The `?` is a quantifier that modifies the preceding character or group to indicate that it is optional. So, `.?` matches an optional single character.

On the other hand, `.*` is another common sequence in regular expressions. The `.*` means matching zero or more occurrences of any character except a newline. The `*` is a quantifier that allows the preceding character or group to repeat zero or more times.

In summary:
- `.?` matches an optional single character.
- `.*` matches zero or more occurrences of any character.

Both `.?` and `.*` are used for making parts of a regular expression optional, but `.*` allows for multiple occurrences of the preceding character or group, while `.?` allows for at most one o
ccurrence.

### Question 15:
What is the syntax for matching both numbers and lowercase letters with a character class?

**<span style='color:blue'>Answer</span>**

To match both numbers and lowercase letters using a character class in a regular expression, you can use the following syntax:

`[0-9a-z]`

In this syntax, `[0-9]` matches any digit from 0 to 9, and a-z matches any lowercase letter from a to z. By combining them within square brackets `[ ]`, you create a character class that matches either a digit or a lowercase letter.

### Question 16:
What is the procedure for making a normal expression in regax case insensitive?

To make a regular expression case-insensitive in Python, you can use the re.IGNORECASE flag or the re.I flag

In [19]:
import re
pattern = r"pattern"


Compile the regular expression: Use the re.compile() function to compile the regular expression pattern. Pass the re.IGNORECASE flag or re.I flag as the second argument to make it case-insensitive.

In [20]:
regex = re.compile(pattern, re.IGNORECASE)  # or re.compile(pattern, re.I)


In [21]:
text = "text"
result = regex.search(text)


In [22]:
### Method 2
### Method 2

re.search(pattern, text, re.IGNORECASE)

In [23]:
re.findall(pattern, text, re.I)

[]

### Question 17:
What does the . character normally match? What does it match if re.DOTALL is passed as 2nd
argument in re.compile()?

**<span style='color:blue'>Answer</span>**

In regular expressions, the `.` (dot) character normally matches any character except a newline character (`\n`). It matches any single character in the text.

However, if the `re.DOTALL` flag (or `re.S` flag) is passed as the second argument to the `re.compile()` function, it changes the behavior of the `.` (dot) character. With the `re.DOTALL` flag enabled, the `.` (dot) character matches any character, including newline characters (`\n`).

### Question 18:
If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4
hen&#39;) return?

In [24]:
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

### Question 19:
What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

**<span style='color:blue'>Answer</span>**


Passing `re.VERBOSE` as the second argument to `re.compile()` allows you to write more readable and organized regular expressions by ignoring whitespace and adding comments.

When `re.VERBOSE` is used, whitespace characters within the regular expression pattern are ignored, except when they are within a character class or escaped with a backslash. This allows you to add extra spaces, line breaks, and indentation to the regular expression to improve readability without affecting its functionality.

Additionally, you can include comments in the pattern by using the `#` symbol. Any text after `#` until the end of the line is treated as a comment and ignored by the regular expression engine. This helps in documenting and explaining complex patterns.

### Question 20:
How would you write a regex that match a number with comma for every three digits? It must
match the given following:
&#39;42&#39;
&#39;1,234&#39;
&#39;6,368,745&#39;

but not the following:
&#39;12,34,567&#39; (which has only two digits between the commas)
&#39;1234&#39; (which lacks commas)

In [25]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')

####################################################

strings = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for string in strings:
    if pattern.match(string):
        print(f'Matched: {string}')
    else:
        print(f'Not matched: {string}')


Matched: 42
Matched: 1,234
Matched: 6,368,745
Not matched: 12,34,567
Not matched: 1234


`^` asserts the start of the string.

`\d{1,3}` matches one to three digits.

`(,\d{3})*` matches zero or more occurrences of a comma followed by exactly three digits.

`$` asserts the end of the string.

### Question 21:
How would you write a regex that matches the full name of someone whose last name is
Watanabe? You can assume that the first name that comes before it will always be one word that
begins with a capital letter. The regex must match the following:
&#39;Haruto Watanabe&#39;
&#39;Alice Watanabe&#39;
&#39;RoboCop Watanabe&#39;
but not the following:
&#39;haruto Watanabe&#39; (where the first name is not capitalized)
&#39;Mr. Watanabe&#39; (where the preceding word has a nonletter character)
&#39;Watanabe&#39; (which has no first name)
&#39;Haruto watanabe&#39; (where Watanabe is not capitalized)

In [26]:
import re

pattern = re.compile(r'^[A-Z][a-zA-Z]*\sWatanabe$')

#########################

names = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe', 'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

for name in names:
    if pattern.match(name):
        print(f'Matched: {name}')
    else:
        print(f'Not matched: {name}')


Matched: Haruto Watanabe
Matched: Alice Watanabe
Matched: RoboCop Watanabe
Not matched: haruto Watanabe
Not matched: Mr. Watanabe
Not matched: Watanabe
Not matched: Haruto watanabe


`^` asserts the start of the string.

`[A-Z]` matches an uppercase letter (the first letter of the first name).

`[a-zA-Z]*` matches zero or more lowercase or uppercase letters (the remaining letters of the first name).
`
\s` matches a whitespace character (space).

`Watanabe` matches the last name exactly.

`$` asserts the end of the string.

### Question 22:
How would you write a regex that matches a sentence where the first word is either Alice, Bob,
or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs;
and the sentence ends with a period? This regex should be case-insensitive. It must match the
following:
&#39;Alice eats apples.&#39;
&#39;Bob pets cats.&#39;
&#39;Carol throws baseballs.&#39;
&#39;Alice throws Apples.&#39;
&#39;BOB EATS CATS.&#39;
but not the following:
&#39;RoboCop eats apples.&#39;
&#39;ALICE THROWS FOOTBALLS.&#39;
&#39;Carol eats 7 cats.&#39;

In [27]:
import re

pattern = re.compile(r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$', re.IGNORECASE)

############################################

sentences = [
    'Alice eats apples.',
    'Bob pets cats.',
    'Carol throws baseballs.',
    'Alice throws Apples.',
    'BOB EATS CATS.',
    'RoboCop eats apples.',
    'ALICE THROWS FOOTBALLS.',
    'Carol eats 7 cats.'
]

for sentence in sentences:
    if pattern.match(sentence):
        print(f'Matched: {sentence}')
    else:
        print(f'Not matched: {sentence}')



Matched: Alice eats apples.
Matched: Bob pets cats.
Matched: Carol throws baseballs.
Matched: Alice throws Apples.
Matched: BOB EATS CATS.
Not matched: RoboCop eats apples.
Not matched: ALICE THROWS FOOTBALLS.
Not matched: Carol eats 7 cats.


`^` asserts the start of the string.

`(Alice|Bob|Carol)` matches either "Alice", "Bob", or "Carol" as the first word.

`\s` matches a whitespace character (space) between words.

`(eats|pets|throws)` matches either "eats", "pets", or "throws" as the second word.

`(apples|cats|baseballs)` matches either "apples", "cats", or "baseballs" as the third word.

`\.` matches a period at the end of the sentence.

`$` asserts the end of the string.

`re.IGNORECASE` enables case-insensitive matching.