## Python Assignment 7

### 1. What is the name of the feature responsible for generating Regex objects?

A regex object, short for regular expression object, refers to an instance of a regular expression pattern in a programming language that supports regular expressions.

###### The re.compile()  function is used to generate a regex object in Python's re module. 

It takes a regular expression pattern as an argument and returns a compiled regex object.

### 2. Why do raw strings often appear in Regex objects?

Raw strings are often used in regex objects because they help simplify the handling of backslashes () in regular expression patterns. 

However, in regular expressions, backslashes also have special meaning and are used to escape metacharacters or represent certain character classes. This can lead to situations where you need to use double backslashes (\\) to represent a single backslash in a regular expression pattern.

To avoid excessive backslash escaping and improve readability, raw strings are commonly used when defining regular expression patterns.

### 3. What is the return value of the search() method?

The search() method in regular expression objects typically returns a match object if a match is found, or None if no match is found.

In general, the match object provides information about the matched substring, its position in the original string, and additional methods for accessing and manipulating the match.


In [4]:
import re

pattern = r'pen'
text = 'I have a pen and a book'

regex_obj = re.compile(pattern)
match = regex_obj.search(text)

if match:
    print('Match found!')
    print('Matched substring:', match.group())
    print('Start position:', match.start())
    print('End position:', match.end())
    print('Tuple of start and end positions:', match.span())
else:
    print('No match found.')


Match found!
Matched substring: pen
Start position: 9
End position: 12
Tuple of start and end positions: (9, 12)


### 4. From a Match item, how do you get the actual strings that match the pattern?

To retrieve the actual strings that match the pattern from a match object, you can use the group() method. 

The group() method returns the matched substring that corresponds to the entire match or a specific capturing group within the match.

### 5.In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover?Group 2? Group 1?

In the regex r'(\d\d\d)-(\d\d\d-\d\d\d\d)', the group zero (group(0)) covers the entire match, including both capturing groups and any characters that matched the entire pattern. Group 1 (group(1)) covers the first capturing group (\d\d\d), and Group 2 (group(2)) covers the second capturing group (\d\d\d-\d\d\d\d).

In [5]:
import re

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
text = 'Phone number: 123-456-7890'

regex_obj = re.compile(pattern)
match = regex_obj.search(text)

if match:
    full_match = match.group(0)
    group1 = match.group(1)
    group2 = match.group(2)

    print('Full match:', full_match)  # Output: Full match: 123-456-7890
    print('Group 1:', group1)  # Output: Group 1: 123
    print('Group 2:', group2)  # Output: Group 2: 456-7890
else:
    print('No match found.')


Full match: 123-456-7890
Group 1: 123
Group 2: 456-7890


The group(0) returns the entire match "123-456-7890", which includes both capturing groups. group(1) returns the match of the first capturing group "123", and group(2) returns the match of the second capturing group "456-7890".

Keep in mind that group(0) is often optional since it represents the entire match, and it's the default value returned if no argument is provided to the group() method.

### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

To tell a regular expression (regex) that you want to match literal parentheses and periods without any special interpretation, you can use the backslash () as an escape character. 

By placing a backslash before the parentheses or periods, you can indicate that they should be treated as literal characters rather than having their special meanings in regular expressions.

1.Matching parentheses:
To match literal parentheses, use \( to match an opening parenthesis and \) to match a closing parenthesis. 

For example, the regex pattern \(example\) will match the string "(example)".

In [6]:
import re

pattern = r'\(example\)'
text = 'This is an (example) string.'

matches = re.findall(pattern, text)
print(matches)  # Output: ['(example)']


['(example)']


2.Matching periods:
To match a literal period (dot), use \.. For example, the regex pattern example\.com will match the string "example.com".

In [7]:
import re

pattern = r'example\.com'
text = 'Visit example.com for more information.'

matches = re.findall(pattern, text)
print(matches)  # Output: ['example.com']


['example.com']


In both cases, the backslash \ escapes the special meaning of the parentheses or period and treats them as literal characters to be matched in the regex.

By using the escape character () before parentheses and periods, you can instruct the regex to treat them as normal characters rather than having their special meanings in regular expressions.

### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

The findall() method in regular expressions returns a list of strings or a list of string tuples depending on the presence of capturing groups in the regular expression pattern.

When the regular expression pattern has no capturing groups, findall() returns a list of strings. Each element in the list represents a separate match of the pattern in the input text.

On the other hand, if the regular expression pattern contains one or more capturing groups, findall() returns a list of string tuples. Each tuple represents a match, and each element within the tuple corresponds to a capturing group.

In summary, findall() returns a list of strings when there are no capturing groups in the pattern, and it returns a list of string tuples when there are capturing groups present in the pattern.

### 8. In standard expressions, what does the | character mean?

In regular expressions, the | character is used to denote an alternation, also known as a logical OR. It allows you to specify multiple patterns, and if any of those patterns match, the expression as a whole is considered a match.

### 9. In regular expressions, what does the character stand for?

In regular expressions, the term "character" typically refers to a single unit of text that can be matched or represented within the pattern. Characters can be literal characters, metacharacters, special sequences, or character classes.

Literal Characters:
Literal characters represent themselves and match exactly the same character in the input text. For example, the pattern a matches the literal character "a" in the input text.

Metacharacters:
Metacharacters have special meanings in regular expressions and are used to define patterns or match specific types of characters. Some commonly used metacharacters 

### 10.In regular expressions, what is the difference between the + and * characters?

+ (Plus):
The + quantifier matches one or more occurrences of the preceding element. It requires that the preceding element must appear at least once in order for a match to occur. If there are multiple occurrences, it matches as many as possible.

* (Asterisk):
The * quantifier matches zero or more occurrences of the preceding element. It allows the preceding element to be absent or appear any number of times. It matches as many occurrences as possible, including zero occurrences.

### 11. What is the difference between {4} and {4,5} in regular expression?

{4}:
The {4} quantifier specifies an exact number of occurrences of the preceding element. It matches exactly four occurrences of the preceding element.

{4,5}:
The {4,5} quantifier specifies a range of occurrences of the preceding element. It matches at least four occurrences and up to five occurrences of the preceding element.

### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

\d: Matches any digit character.
Equivalent to the character class [0-9].
It matches any single digit from 0 to 9.


\w: Matches any word character.
Equivalent to the character class [a-zA-Z0-9_].
It matches any alphanumeric character (letters and digits) as well as the underscore character.


\s: Matches any whitespace character.
It matches spaces, tabs, newlines, and other whitespace characters.
It can be useful for matching and extracting whitespace in text or for splitting text based on whitespace.

### 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

\D: Matches any non-digit character.
Equivalent to the character class [^0-9].
It matches any character that is not a digit.

\W: Matches any non-word character.
Equivalent to the character class [^a-zA-Z0-9_].
It matches any character that is not an alphanumeric character or an underscore.

\S: Matches any non-whitespace character.
It matches any character that is not a space, tab, newline, or any other whitespace character.


### 14. What is the difference between .* ? and .*  ?

.?:
The . metacharacter matches any single character except a newline character. The ? quantifier that follows it makes it optional, matching zero or one occurrence of the preceding element.


.*:
The . metacharacter matches any single character except a newline character. The * quantifier that follows it allows it to match zero or more occurrences of the preceding element.

### 15. What is the syntax for matching both numbers and lowercase letters with a character class?

To match both numbers and lowercase letters using a character class in regular expressions, you can combine the ranges for numbers (0-9) and lowercase letters (a-z).
#### The syntax for the character class that matches both is [0-9a-z].

### 16. What is the procedure for making a normal expression in regax case insensitive?

To make a regular expression case-insensitive in most regex engines, you can use the "case-insensitive" flag or modifier. The exact syntax may vary depending on the specific programming language or regex implementation you are using. However, a common approach is to use the i flag or the (?i) modifier.

In [9]:
import re

pattern = r'apple'
text = 'I have an Apple'

matches = re.findall(pattern, text, re.IGNORECASE)
print(matches)  # Output: ['Apple']


['Apple']


### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

In regular expressions, the . (dot) metacharacter normally matches any character except a newline character (\n). It represents a wildcard that can match any single character.

For example, if you have the pattern a.b, it would match strings like "aab", "axb", "a7b", where the dot matches any character between the "a" and "b" (except a newline).

However, if you pass re.DOTALL as the second argument to re.compile() or use the re.DOTALL flag when compiling the regular expression pattern, the behavior of the dot changes.

With re.DOTALL, the dot (.) will match any character, including newline characters (\n). It makes the dot metacharacter "dot-all," meaning it matches any character, including newlines.

### 18. If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4hen&#39;) return?


If numRegex is defined as re.compile(r'\d+'), and you call numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4hen'), it will return the string with all the matches of the regular expression pattern \d+ replaced with the letter 'X'. Here's the result:

Input: '11 drummers, 10 pipers, five rings, 4hen'

Output: 'X drummers, X pipers, five rings, Xhen'

Explanation:

The regular expression pattern \d+ matches one or more consecutive digits.
In the input string, the pattern matches the following substrings: "11", "10", and "4".
The sub() method replaces all matches with the specified replacement string, which is 'X' in this case.
After the substitution, all the matches "11", "10", and "4" are replaced with 'X'.
The resulting string is 'X drummers, X pipers, five rings, Xhen'.
Therefore, calling numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4hen') returns the modified string where all the numeric sequences are replaced with the letter 'X'.

### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

Passing re.VERBOSE as the second argument to re.compile() allows you to write regular expressions in a more readable and organized manner by ignoring whitespace and adding comments.

When using re.VERBOSE, the whitespace within the regular expression pattern is ignored, except when it is escaped or within a character class ([]). This means you can freely add spaces, line breaks, and indentation to make the regular expression more visually appealing and easier to understand.

Additionally, you can include comments in the regular expression pattern by using the # symbol. Comments can provide explanations, document the pattern, or make it easier for others to understand your regular expression.

### 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:

&#39;42&#39;

&#39;1,234&#39;

&#39;6,368,745&#39;

but not the following:

&#39;12,34,567&#39; 

(which has only two digits between the commas)

&#39;1234&#39; (which lacks commas)

#### A regex that match a number with comma for every three digits : '^\d{1,3}(,\d{3})*$' 

In [10]:
import re

pattern = r'^\d{1,3}(,\d{3})*$'

numbers = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for number in numbers:
    match = re.match(pattern, number)
    if match:
        print(f"Matched: {number}")
    else:
        print(f"Not matched: {number}")


Matched: 42
Matched: 1,234
Matched: 6,368,745
Not matched: 12,34,567
Not matched: 1234


### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

&#39;Haruto Watanabe&#39;

&#39;Alice Watanabe&#39;

&#39;RoboCop Watanabe&#39;

but not the following:

&#39;haruto Watanabe&#39;
(where the first name is not capitalized)

&#39;Mr. Watanabe&#39;
(where the preceding word has a nonletter character)

&#39;Watanabe&#39;
(which has no first name)

&#39;Haruto watanabe&#39; 
(where Watanabe is not capitalized)

 #### A regex that matches the full name of someone whose last name is Watanabe :'^[A-Z][a-zA-Z]*Watanabe$'


In [11]:
import re

pattern = r'^[A-Z][a-zA-Z]* Watanabe$'

names = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe', 'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

for name in names:
    match = re.match(pattern, name)
    if match:
        print(f"Matched: {name}")
    else:
        print(f"Not matched: {name}")


Matched: Haruto Watanabe
Matched: Alice Watanabe
Matched: RoboCop Watanabe
Not matched: haruto Watanabe
Not matched: Mr. Watanabe
Not matched: Watanabe
Not matched: Haruto watanabe


### 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob,or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

&#39;Alice eats apples.&#39;

&#39;Bob pets cats.&#39;

&#39;Carol throws baseballs.&#39;

&#39;Alice throws Apples.&#39;

&#39;BOB EATS CATS.&#39;

but not the following:

&#39;RoboCop eats apples.&#39;

&#39;ALICE THROWS FOOTBALLS.&#39;

&#39;Carol eats 7 cats.&#39;

A regex that matches a sentence where the first word is either Alice, Bob,or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period   == 
#### '^(?i)(Alice|Bob|Carol) (eats|pets|throws) (apples|cats|baseballs)\.$'

In [14]:
import re

pattern = r'^(?i)(Alice|Bob|Carol) (eats|pets|throws) (apples|cats|baseballs)\.$'

sentences = ['Alice eats apples.', 'Bob pets cats.', 'Carol throws baseballs.', 'Alice throws Apples.', 'BOB EATS CATS.', 'RoboCop eats apples.', 'ALICE THROWS FOOTBALLS.', 'Carol eats 7 cats.']

for sentence in sentences:
    match = re.match(pattern, sentence)
    if match:
        print(f"Matched: {sentence}")
    else:
        print(f"Not matched: {sentence}")


Matched: Alice eats apples.
Matched: Bob pets cats.
Matched: Carol throws baseballs.
Matched: Alice throws Apples.
Matched: BOB EATS CATS.
Not matched: RoboCop eats apples.
Not matched: ALICE THROWS FOOTBALLS.
Not matched: Carol eats 7 cats.
