<a href="https://colab.research.google.com/github/kilos11/PYTHON-_AUTOMATION-/blob/main/Pattern_Matchingwith_RegularExpression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Finding Patterns of Text Without Regular Expressions**

Say you want to find an American phone number in a string. You know the pattern if you’re American: three numbers, a hyphen, three numbers, a hyphen, and four numbers. Here’s an example: 415-555-4242.

Let’s use a function named isPhoneNumber() to check whether a string matches this pattern, returning either True or False.

In [None]:
def isPhoneNumber(text):
    if not text[i].isdecimal():
        return False
     for i in range(0, 3):
         if not text[i].isdecimal():
            return False
    if text[3] != '-':
         return False
     for i in range(4, 7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
     for i in range(8, 12):
        if not text[i].isdecimal():
            return False
    return True

print('Is 415-555-4242 a phone number?')
print(isPhoneNumber('415-555-4242'))
print('Is Moshi moshi a phone number?')
print(isPhoneNumber('Moshi moshi'))


#**Finding Patterns of Text with Regular Expressions**

The previous phone number–finding program works, but it uses a lot of code to do something limited: the isPhoneNumber() function is 17 lines but can find only one pattern of phone numbers. What about a phone number formatted like 415.555.4242 or (415) 555-4242? What if the phone number had an extension, like 415-555-4242 x99? The isPhoneNumber() function would fail to validate them. You could add yet more code for these additional patterns, but there is an easier way.

Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character—that is, any single numeral from 0 to 9. The regex \d\d\d-\d\d\d-\d\d\d\d is used by Python to match the same text pattern the previous isPhoneNumber() function did: a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers. Any other string would not match the \d\d\d-\d\d\d-\d\d\d\d regex.

But regular expressions can be much more sophisticated. For example, adding a 3 in braces ({3}) after a pattern is like saying, “Match this pattern three times.” So the slightly shorter regex \d{3}-\d{3}-\d{4} also matches the correct phone number format.

##**Creating Regex Objects**

All the regex functions in Python are in the re module.
Passing a string value representing your regular expression to re.compile() returns a Regex pattern object (or simply, a Regex object).

To create a Regex object that matches the phone number pattern, enter the following into the interactive shell. (Remember that \d means “a digit character” and \d\d\d-\d\d\d-\d\d\d\d is the regular expression for a phone number pattern.)
Now the phoneNumRegex variable contains a Regex object.

In [None]:
import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

##**Matching Regex Objects**

A Regex object’s search() method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object, which have a group() method that will return the actual matched text from the searched string.
Here, we pass our desired pattern to re.compile() and store the resulting Regex object in phoneNumRegex. Then we call search() on phoneNumRegex and pass search() the string we want to match for during the search. The result of the search gets stored in the variable mo. In this example, we know that our pattern will be found in the string, so we know that a Match object will be returned. Knowing that mo contains a Match object and not the null value None, we can call group() on mo to return the match. Writing mo.group() inside our print() function call displays the whole match, 415-555-4242.

In [None]:
import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())

Phone number found: 415-555-4242


##**Grouping with Parentheses**

Say you want to separate the area code from the rest of the phone number. Adding parentheses will create groups in the regex: (\d\d\d)-(\d\d\d-\d\d\d\d). Then you can use the group() match object method to grab the matching text from just one group.

The first set of parentheses in a regex string will be group 1. The second set will be group 2. By passing the integer 1 or 2 to the group() match object method, you can grab different parts of the matched text. Passing 0 or nothing to the group() method will return the entire matched text.

In [None]:
import re

phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print(mo.group(1))
print(mo.group(2))
print(mo.group(0))
print(mo.group())

#If you would like to retrieve all the groups at once, use the groups() method—note the plural form for the name.
print(mo.groups())
areaCode, mainNumber = mo.groups()
print(areaCode)
print(mainNumber)

415
555-4242
415-555-4242
415-555-4242
('415', '555-4242')
415
555-4242


##Parentheses have a special meaning in regular expressions, but what do you do if you need to match a parenthesis in your text? For instance, maybe the phone numbers you are trying to match have the area code set in parentheses. In this case, you need to escape the ( and ) characters with a backslash.


The \( and \) escape characters in the raw string passed to re.compile() will match actual parenthesis characters. In regular expressions, the following characters have special meanings:

.  ^  $  *  +  ?  {  }  [  ]  \  |  (  )

If you want to detect these characters as part of your text pattern, you need to escape them with a backslash:

\.  \^  \$  \*  \+  \?  \{  \}  \[  \]  \\  \|  \(  \) ##

In [None]:
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is (415) 555-4242.')
print(mo.group(1))
print(mo.group(2))

(415)
555-4242


#**Matching Multiple Groups with the Pipe**#
The | character is called a pipe. You can use it anywhere you want to match one of many expressions. For example, the regular expression r'Batman|Tina Fey' will match either 'Batman' or 'Tina Fey'.

When both Batman and Tina Fey occur in the searched string, the first occurrence of matching text will be returned as the Match object.

In [None]:
heroRegex = re.compile (r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey')
print(mo1.group())

mo2 = heroRegex.search('Tina Fey and Batman')
print(mo2.group())

Batman
Tina Fey


##You can also use the pipe to match one of several patterns as part of your regex. For example, say you wanted to match any of the strings 'Batman', 'Batmobile', 'Batcopter', and 'Batbat'. Since all these strings start with Bat, it would be nice if you could specify that prefix only once. This can be done with parentheses##

In [None]:
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
print(mo.group(0))
print(mo.group(1))

Batmobile
mobile


#**Optional Matching with the Question Mark**#

Sometimes there is a pattern that you want to match only optionally. That is, the regex should find a match regardless of whether that bit of text is there. The ? character flags the group that precedes it as an optional part of the pattern.

In [None]:
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())

mo2 = batRegex.search('The Adventures of Batwoman')
print(mo2.group())

Batman
Batwoman


##*The (wo)? part of the regular expression means that the pattern wo is an optional group. The regex will match text that has zero instances or one instance of wo in it. This is why the regex matches both 'Batwoman' and 'Batman'.

Using the earlier phone number example, you can make the regex look for phone numbers that do or do not have an area code.

You can think of the ? as saying, “Match zero or one of the group preceding this question mark.”

If you need to match an actual question mark character, escape it with \?.*##

In [None]:
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo1 = phoneRegex.search('My number is 415-555-4242')
print(mo1.group())

mo2 = phoneRegex.search('My number is 555-4242')
print(mo2.group())

415-555-4242
555-4242


#**Matching Zero or More with the Star**#

The * (called the star or asterisk) means “match zero or more”—the group that precedes the star can occur any number of times in the text. It can be completely absent or repeated over and over again.

In [None]:
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())

mo2 = batRegex.search ('The Adventures of Batwoman')
print(mo2.group ())

mo3 = batRegex.search('The Adventures of Batwowowowoman')
print(mo3.group())

#**Matching One or More with the Plus**#

While * means “match zero or more,” the + (or plus) means “match one or more.” Unlike the star, which does not require its group to appear in the matched string, the group preceding a plus must appear at least once. It is not optional.

In [None]:
batRegex = re.compile(r'Bat(wo)+man')

mo1 = batRegex.search('The Adventures of Batwoman')
print(mo1.group())


mo2 = batRegex.search('The Adventures of Batwowowowoman')
print(mo2.group())

mo3 = batRegex.search('The Adventures of Batman')
mo3 == None
#The regex Bat(wo)+man will not match the string 'The Adventures of Batman',
#because at least one wo is required by the plus sign.


Batwoman
Batwowowowoman


True

#**Matching Specific Repetitions with Braces**#

If you have a group that you want to repeat a specific number of times, follow the group in your regex with a number in braces. For example, the regex (Ha){3} will match the string 'HaHaHa', but it will not match 'HaHa', since the latter has only two repeats of the (Ha) group.

Instead of one number, you can specify a range by writing a minimum, a comma, and a maximum in between the braces. For example, the regex (Ha){3,5} will match 'HaHaHa', 'HaHaHaHa', and 'HaHaHaHaHa'.

You can also leave out the first or second number in the braces to leave the minimum or maximum unbounded. For example, (Ha){3,} will match three or more instances of the (Ha) group, while (Ha){,5} will match zero to five instances. Braces can help make your regular expressions shorter. These two regular expressions match identical patterns:

In [None]:
#(Ha){3}
#(Ha)(Ha)(Ha)

#And these two regular expressions also match identical patterns:
#(Ha){3,5}
#((Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha)(Ha))

haRegex = re.compile(r'(Ha){3}')

mo1 = haRegex.search('HaHaHa')
print(mo1.group())

mo2 = haRegex.search('Ha')
mo2 == None
#Here, (Ha){3} matches 'HaHaHa' but not 'Ha'.
#Since it doesn’t match 'Ha', search() returns None.

HaHaHa


#**Greedy and Non-greedy Matching**#

Since (Ha){3,5} can match three, four, or five instances of Ha in the string 'HaHaHaHaHa', you may wonder why the Match object’s call to group() in the previous brace example returns 'HaHaHaHaHa' instead of the shorter possibilities. After all, 'HaHaHa' and 'HaHaHaHa' are also valid matches of the regular expression (Ha){3,5}.

Python’s regular expressions are greedy by default, which means that in ambiguous situations they will match the longest string possible. The non-greedy (also called lazy) version of the braces, which matches the shortest string possible, has the closing brace followed by a question mark.

In [None]:
greedyHaRegex = re.compile(r'(Ha){3,5}')

mo1 = greedyHaRegex.search('HaHaHaHaHa')
print(mo1.group())

nongreedyHaRegex = re.compile(r'(Ha){3,5}?')

mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
print(mo2.group())
#Note that the question mark can have two meanings in regular expressions:
#declaring a non-greedy match or flagging an optional group.
#These meanings are entirely unrelated.

HaHaHaHaHa
HaHaHa


#**The findall() Method**#

In addition to the search() method, Regex objects also have a findall() method. While search() will return a Match object of the first matched text in the searched string, the findall() method will return the strings of every match in the searched string.

To summarize what the findall() method returns, remember the following:

When called on a regex with no groups, such as \d\d\d-\d\d\d-\d\d\d\d, the method findall() returns a list of string matches, such as ['415-555-9999', '212-555-0000'].
When called on a regex that has groups, such as (\d\d\d)-(\d\d\d)-(\d\d\d\d), the method findall() returns a list of tuples of strings (one string for each group), such as [('415', '555', '9999'), ('212', '555', '0000')].

In [None]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

mo = phoneNumRegex.search('Cell: 415-555-9999 Work: 212-555-0000')
print(mo.group())

#On the other hand, findall() will not return a Match object but a list of
#strings—as long as there are no groups in the regular expression.
#Each string in the list is a piece of the searched text that matched the
#regular expression.
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # has no groups
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

#If there are groups in the regular expression, then findall() will return a list of tuples.
#Each tuple represents a found match,
#and its items are the matched strings for each group in the regex.
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

415-555-9999


[('415', '555', '9999'), ('212', '555', '0000')]

#**Character Classes**#
In the earlier phone number regex example, you learned that \d could stand for any numeric digit. That is, \d is shorthand for the regular expression (0|1|2|3|4|5|6|7|8|9).

Shorthand character class

Represents

\d

Any numeric digit from 0 to 9.

\D

Any character that is not a numeric digit from 0 to 9.

\w

Any letter, numeric digit, or the underscore character. (Think of this as matching “word” characters.)

\W

Any character that is not a letter, numeric digit, or the underscore character.

\s

Any space, tab, or newline character. (Think of this as matching “space” characters.)

\S

Any character that is not a space, tab, or newline.

Character classes are nice for shortening regular expressions. The character class [0-5] will match only the numbers 0 to 5; this is much shorter than typing (0|1|2|3|4|5). Note that while \d matches digits and \w matches digits, letters, and the underscore, there is no shorthand character class that matches only letters. (Though you can use the [a-zA-Z] character class, as explained next.)

In [None]:
import re

xmasRegex = re.compile(r'\d+\s\w+', re.IGNORECASE)

xmasRegex.findall('12 drummers, 11 pipers, 10 lords, 9 ladies, 8 maids, \
7 swans, 6 geese, 5 rings, 4 birds, 3 hens, 2 doves, 1 partridge')
#The regular expression \d+\s\w+ will match text that has one or more numeric digits (\d+),
#followed by a whitespace character (\s), followed by one or more letter/digit/underscore characters (\w+).
#The findall() method returns all matching strings of the regex pattern in a list.

['12 drummers',
 '11 pipers',
 '10 lords',
 '9 ladies',
 '8 maids',
 '7 swans',
 '6 geese',
 '5 rings',
 '4 birds',
 '3 hens',
 '2 doves',
 '1 partridge']

#**Making Your Own Character Classes**#

There are times when you want to match a set of characters but the shorthand character classes (\d, \w, \s, and so on) are too broad. You can define your own character class using square brackets. For example, the character class [aeiouAEIOU] will match any vowel, both lowercase and uppercase.

In [None]:
vowelRegex = re.compile(r'[aeiouAEIOU]')
vowelRegex.findall('RoboCop eats baby food. BABY FOOD.')

['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']

#**You can also include ranges of letters or numbers by using a hyphen. For example, the character class [a-zA-Z0-9] will match all lowercase letters, uppercase letters, and numbers.

Note that inside the square brackets, the normal regular expression symbols are not interpreted as such. This means you do not need to escape the ., *, ?, or () characters with a preceding backslash. For example, the character class [0-5.] will match digits 0 to 5 and a period. You do not need to write it as [0-5\.].

By placing a caret character (^) just after the character class’s opening bracket, you can make a negative character class. A negative character class will match all the characters that are not in the character class.*##

In [None]:
consonantRegex = re.compile(r'[^aeiouAEIOU]')
consonantRegex.findall('RoboCop eats baby food. BABY FOOD.')

['R',
 'b',
 'C',
 'p',
 ' ',
 't',
 's',
 ' ',
 'b',
 'b',
 'y',
 ' ',
 'f',
 'd',
 '.',
 ' ',
 'B',
 'B',
 'Y',
 ' ',
 'F',
 'D',
 '.']

#**The Caret and Dollar Sign Characters**#

You can also use the caret symbol (^) at the start of a regex to indicate that a match must occur at the beginning of the searched text. Likewise, you can put a dollar sign ($) at the end of the regex to indicate the string must end with this regex pattern. And you can use the ^ and $ together to indicate that the entire string must match the regex—that is, it’s not enough for a match to be made on some subset of the string.

For example, the r'^Hello' regular expression string matches strings that begin with 'Hello'.

In [None]:
beginsWithHello = re.compile(r'^Hello')

begin1 = beginsWithHello.search('Hello, world!')
beginsWithHello.search('He said hello.') == None
print(begin1)


#The r'\d$' regular expression string matches strings that end with
#a numeric character from 0 to 9.
endsWithNumber = re.compile(r'\d$')

end1 = endsWithNumber.search('Your number is 42')
endsWithNumber.search('Your number is forty two.') == None
print(end1)

#The r'^\d+$' regular expression string matches strings that both begin
#and end with one or more numeric characters.
wholeStringIsNum = re.compile(r'^\d+$')

whole = wholeStringIsNum.search('1234567890')
print(whole)
wholeStringIsNum.search('12345xyz67890') == None
wholeStringIsNum.search('12  34567890') == None

#The last two search() calls in the previous interactive shell example demonstrate
#how the entire string must match the regex if ^ and $ are used.

#I always confuse the meanings of these two symbols,
#so I use the mnemonic “Carrots cost dollars” to remind myself that the caret
#comes first and the dollar sign comes last.


<re.Match object; span=(0, 5), match='Hello'>
<re.Match object; span=(16, 17), match='2'>
<re.Match object; span=(0, 10), match='1234567890'>


True

#**The Wildcard Character**#
The . (or dot) character in a regular expression is called a wildcard and will match any character except for a newline.

Remember that the dot character will match just one character, which is why the match for the text flat in the previous example matched only lat.

In [None]:
atRegex = re.compile(r'.at')

atRegex.findall('The cat in the hat sat on the flat mat.')

['cat', 'hat', 'sat', 'lat', 'mat']

#**Matching Everything with Dot-Sta**#
Sometimes you will want to match everything and anything. For example, say you want to match the string 'First Name:', followed by any and all text, followed by 'Last Name:', and then followed by anything again. You can use the dot-star (.*) to stand in for that “anything.” Remember that the dot character means “any single character except the newline,” and the star character means “zero or more of the preceding character.”

In [None]:
import re

nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo = nameRegex.search('First Name: Al Last Name: Sweigart')
print(mo.group(1))
print(mo.group(2))

#The dot-star uses greedy mode: It will always try to match as much text as possible.
#To match any and all text in a non-greedy fashion, use the dot, star, and question mark (.*?).
#Like with braces, the question mark tells Python to match in a non-greedy way.
nongreedyRegex = re.compile(r'<.*?>')
mogreedy = nongreedyRegex.search('<To serve man> for dinner.>')
print(mogreedy.group())


greedyRegex = re.compile(r'<.*>')
mo = greedyRegex.search('<To serve man> for dinner.>')
print(mo.group())


Al
Sweigart
<To serve man>
<To serve man> for dinner.>


#**Matching Newlines with the Dot Character**#
The dot-star will match everything except a newline. By passing re.DOTALL as the second argument to re.compile(), you can make the dot character match all characters, including the newline character.

In [None]:
noNewlineRegex = re.compile(r'.*')
noNewlineRegex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group()
newlineRegex = re.compile('.*', re.DOTALL)
print(newlineRegex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group())


Serve the public trust.
Protect the innocent.
Uphold the law.


#**Review of Regex Symbols**#
This chapter covered a lot of notation, so here’s a quick review of what you learned about basic regular expression syntax:

The ? matches zero or one of the preceding group.

The * matches zero or more of the preceding group.

The + matches one or more of the preceding group.

The {n} matches exactly n of the preceding group.

The {n,} matches n or more of the preceding group.

The {,m} matches 0 to m of the preceding group.

The {n,m} matches at least n and at most m of the preceding group.

{n,m}? or *? or +? performs a non-greedy match of the preceding group.

^spam means the string must begin with spam.

spam$ means the string must end with spam.

The . matches any character, except newline characters.

\d, \w, and \s match a digit, word, or space character, respectively.

\D, \W, and \S match anything except a digit, word, or space character, respectively.

[abc] matches any character between the brackets (such as a, b, or c).

[^abc] matches any character that isn’t between the brackets.

#**Case-Insensitive Matching**#

Normally, regular expressions match text with the exact casing you specify.

But sometimes you care only about matching the letters without worrying whether they’re uppercase or lowercase. To make your regex case-insensitive, you can pass re.IGNORECASE or re.I as a second argument to re.compile().

In [None]:
regex1 = re.compile('RoboCop')
regex2 = re.compile('ROBOCOP')
regex3 = re.compile('robOcop')
regex4 = re.compile('RobocOp')

robocop = re.compile(r'robocop', re.I)
print(robocop.search('RoboCop is part man, part machine, all cop.').group())

print(robocop.search('ROBOCOP protects the innocent.').group())

print(robocop.search('Al, why does your programming book talk about robocop so much?').group())

RoboCop
ROBOCOP
robocop


#**Substituting Strings with the sub() Method**#
Regular expressions can not only find text patterns but can also substitute new text in place of those patterns. The sub() method for Regex objects is passed two arguments. The first argument is a string to replace any matches. The second is the string for the regular expression. The sub() method returns a string with the substitutions applied.


Sometimes you may need to use the matched text itself as part of the substitution. In the first argument to sub(), you can type \1, \2, \3, and so on, to mean “Enter the text of group 1, 2, 3, and so on, in the substitution.”

For example, say you want to censor the names of the secret agents by showing just the first letters of their names. To do this, you could use the regex Agent (\w)\w* and pass r'\1****' as the first argument to sub(). The \1 in that string will be replaced by whatever text was matched by group 1—that is, the (\w) group of the regular expression.

In [None]:
namesRegex = re.compile(r'Agent \w+')
namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')

import re

agentNamesRegex = re.compile(r'Agent (\w)\w*')
print(agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.'))

A**** told C**** that E**** knew B**** was a double agent.


#**Managing Complex Regexes**#
Regular expressions are fine if the text pattern you need to match is simple. But matching complicated text patterns might require long, convoluted regular expressions. You can mitigate this by telling the re.compile() function to ignore whitespace and comments inside the regular expression string. This “verbose mode” can be enabled by passing the variable re.VERBOSE as the second argument to re.compile().

In [None]:
#Now instead of a hard-to-read regular expression like this:

phoneRegex = re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}
(\s*(ext|x|ext.)\s*\d{2,5})?)')

#you can spread the regular expression over multiple lines with comments like this:
phoneRegex = re.compile(r'''(
        (\d{3}|\(\d{3}\))?            # area code
            (\s|-|\.)?                    # separator
                \d{3}                         # first 3 digits
                    (\s|-|\.)                     # separator
                        \d{4}                         # last 4 digits
                            (\s*(ext|x|ext.)\s*\d{2,5})?  # extension
                                )''', re.VERBOSE)
))


#**Combining re.IGNORECASE, re.DOTALL, and re.VERBOSE**#

What if you want to use re.VERBOSE to write comments in your regular expression but also want to use re.IGNORECASE to ignore capitalization? Unfortunately, the re.compile() function takes only a single value as its second argument. You can get around this limitation by combining the re.IGNORECASE, re.DOTALL, and re.VERBOSE variables using the pipe character (|), which in this context is known as the bitwise or operator.

So if you want a regular expression that’s case-insensitive and includes newlines to match the dot character, you would form your re.compile() call like this:

In [None]:
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)

#**Project: Phone Number and Email Address Extractor**#

##*Step 1: Create a Regex for Phone Numbers*##

In [None]:
import pyperclip, re

phoneRegex = re.compile(r'''(
        (\d{3}|\(\d{3}\))?                # area code
            (\s|-|\.)?                        # separator
                (\d{3})                           # first 3 digits
                    (\s|-|\.)                         # separator
                        (\d{4})                           # last 4 digits
                            (\s*(ext|x|ext.)\s*(\d{2,5}))?    # extension
                                )''', re.VERBOSE)

# TODO: Create email regex.

# TODO: Find matches in clipboard text.

# TODO: Copy results to the clipboard.


##*Step 2: Create a Regex for Email Addresses*##

In [None]:
phoneRegex = re.compile(r'''(
    --snip--
    # Create email regex.
    [a-zA-Z0-9._%+-]+      # username
    @                      # @ symbol
    [a-zA-Z0-9.-]+         # domain name
    (\.[a-zA-Z]{2,4})       # dot-something
     )''', re.VERBOSE)


#*Step 3: Find All Matches in the Clipboard Text*##

Now that you have specified the regular expressions for phone numbers and email addresses, you can let Python’s re module do the hard work of finding all the matches on the clipboard. The pyperclip.paste() function will get a string value of the text on the clipboard, and the findall() regex method will return a list of tuples.

In [None]:
!apt install PyQt4

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
[1;31mE: [0mUnable to locate package PyQt4[0m


In [None]:
# Find matches in the clipboard text
text = str(pyperclip.paste())
matches = []

# Iterate through all phone number matches
for groups in phoneRegex.findall(text):
    # Format the phone number
        phoneNum = '-'.join([groups[1], groups[3], groups[5]])

        # Add extension if present
        if groups[8] != '':
            phoneNum += ' x' + groups[8]


        # Append the formatted phone number to the matches list
        matches.append(phoneNum)

# Find email addresses in the text
for groups in emailRegex.findall(text):
    matches.append(groups[0])
# Append the email address to the matches list



##*Step 4: Join the Matches into a String for the Clipboard*##
Now that you have the email addresses and phone numbers as a list of strings in matches, you want to put them on the clipboard. The pyperclip.copy() function takes only a single string value, not a list of strings, so you call the join() method on matches.

In [None]:
for groups in emailRegex.findall(text):
    matches.append(groups[0])
# Copy results to the clipboard.
if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard:')
    print('\n'.join(matches))
else:
    print('No phone numbers or email addresses found.')


#*Date Detection*##
Write a regular expression that can detect dates in the DD/MM/YYYY format. Assume that the days range from 01 to 31, the months range from 01 to 12, and the years range from 1000 to 2999. Note that if the day or month is a single digit, it’ll have a leading zero.

The regular expression doesn’t have to detect correct days for each month or for leap years; it will accept nonexistent dates like 31/02/2020 or 31/04/2021. Then store these strings into variables named month, day, and year, and write additional code that can detect if it is a valid date. April, June, September, and November have 30 days, February has 28 days, and the rest of the months have 31 days. February has 29 days in leap years. Leap years are every year evenly divisible by 4, except for years evenly divisible by 100, unless the year is also evenly divisible by 400. Note how this calculation makes it impossible to make a reasonably sized regular expression that can detect a valid date.

In [None]:
import re

date_pattern = r'^(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[0-2])/([12][0-9]{3})'

def is_valid_date(day, month, year):
    # Check for leap year
    is_leap_year = (year % 4 == 0 and year % 100 != 0) or year % 400 == 0

    # Check for valid month and day
    max_days = [31, 28 + is_leap_year, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    if month < 1 or month > 12 or day < 1 or day > max_days[month - 1]:
        return False

    return True

# Example usage
date_str = "31/12/2022"
match = re.match(date_pattern, date_str)

if match:
    day, month, year = map(int, match.groups())
    if is_valid_date(day, month, year):
        print(f"Valid date: {date_str}")
    else:
        print(f"Invalid date: {date_str}")
else:
    print(f"Invalid date format: {date_str}")

Valid date: 31/12/2022


##*Strong Password Detection*##
Write a function that uses regular expressions to make sure the password string it is passed is strong. A strong password is defined as one that is at least eight characters long, contains both uppercase and lowercase characters, and has at least one digit. You may need to test the string against multiple regex patterns to validate its strength.

In [6]:
import re

def is_strong_password(password):
    # Check length (at least 8 characters)
    if len(password) < 8:
        return False

    # Check for at least one uppercase letter
    if not re.search(r'[A-Z]', password):
        return False

    # Check for at least one lowercase letter
    if not re.search(r'[a-z]', password):
        return False

    # Check for at least one digit
    if not re.search(r'\d', password):
        return False

    # If all conditions are met, the password is strong
    return True

password = "mystrngpassword"
if is_strong_password(password):
    print("The password is strong.")
else:
    print("The password is not strong.")

The password is not strong.
