## Learn python by regular expressions


### Example # 1

In [1]:
import re
 

phrase = """There's a lotta hear to learn there's a whole lotta here to learn 
and there's a whole lotta hear to learnnnnnnnnnnnn! But I yearn I gotta learn!
"""
regex = re.compile('lot')
regex.search(phrase)


<re.Match object; span=(10, 13), match='lot'>

## What's Exactly Happening

- The re module is imported. The word re is short for regular expressions 
- A variable phrase is created which contains some dummy text 
- A regex pattern is created and compiled 
- The regex pattern is called on the phrase, and a match was found from indexes 10-13 on the string

## Common regex functions in python

- **search**(pattern, string, flags=0) -> scans the ENTIRE string for a match
- **match**(pattern, string, flags=0) -> returns a match if 0 or more characters at the BEGINNING of string matches the regex
- **fullmatch**(pattern, string, flags=0) -> returns a match if the whole string matches the regex
- **split**(pattern, string, maxsplit=0, flags=0) -> split string by the pattern
- **findall**(pattern, string, flags=) -> returns non overlapping matches of pattern in string

### Example # 2

In [2]:
the_eagle_poem = """
He clasps the crag with crooked hands;
Close to the sun in lonely lands,
Ring'd with the azure world, he stands.

The wrinkled sea beneath him crawls;
He watches from his mountain walls,
And like a thunderbolt he falls.
"""

find_word = re.search('azure', the_eagle_poem)  # <_sre.SRE_Match object; span=(92, 97), match='azure'>
print(find_word)

print(re.match('walls', the_eagle_poem))        # None
print(re.match('', the_eagle_poem))             # <_sre.SRE_Match object; span=(0, 0), match=''>
print(re.fullmatch(the_eagle_poem, the_eagle_poem))  # <_sre.SRE_Match object; span=(0, 227) ...>
print(re.split(';',
               the_eagle_poem))                  # ['\nHe clasps the crag with crooked hands', "\n\nClose to the sun in lonely lands...]
print(re.findall('the', the_eagle_poem))


<re.Match object; span=(90, 95), match='azure'>
None
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 221), match="\nHe clasps the crag with crooked hands;\nClose t>
['\nHe clasps the crag with crooked hands', "\nClose to the sun in lonely lands,\nRing'd with the azure world, he stands.\n\nThe wrinkled sea beneath him crawls", '\nHe watches from his mountain walls,\nAnd like a thunderbolt he falls.\n']
['the', 'the', 'the']


## What’s Happening Here

- Notice how the match function didn’t return a match for the pattern that was actually in the string? The reason for this is because the pattern wasn’t in the beginning of the string
- Also, the findall() function returns  list that contains all of the matches that’s being searched

### Example # 3

In [9]:
phrase = 'abcdefghijklmnopqrstuvwxyz'
capital_letters = 'ABCDEFGHABCDEFGHABCDEFGH'
print(re.search('h', phrase))  # <_sre.SRE_Match object; span=(7, 8), match='h'>
print(re.search('b', capital_letters))  # None
print(re.search('b', capital_letters, re.IGNORECASE))  # <_sre.SRE_Match object; span=(1, 2), match='B'>
print(re.findall('C', capital_letters))  # ['C', 'C', 'C']

<re.Match object; span=(7, 8), match='h'>
None
<re.Match object; span=(1, 2), match='B'>
['C', 'C', 'C']


## What’s Happening Here

- Nothing new here to see except for the flag re.IGNORECASE

### Example # 4

In [5]:

alpha_pattern = '0123456789abcdefgh'

print(re.search('5', alpha_pattern))                    # <_sre.SRE_Match object; span=(5, 6), match='5'>
print(re.search('a', alpha_pattern))                    # <_sre.SRE_Match object; span=(10, 11), match='a'>

alpha_pattern_1 = '[0-9][a-z]'                          # any digit followed by a lowercase letter
print(re.match(alpha_pattern_1, '0h'))                   # None

print(re.search('[a-z]', '9920202022j22929'))       # <_sre.SRE_Match object; span=(10, 11), match='j'>
print(re.search('[A-Z]', 'abcdefghijklmn83nOjsksZ'))  # <_sre.SRE_Match object; span=(17, 18), match='O'>
print(re.search('[!@#$%^&*()-+{}[]|\;"<>?',
                'zvgsggs272292hkOwuyeg%ss'))  # <_sre.SRE_Match object; span=(21, 22), match='%'>
print(re.search('[a-zA-Z]', '22838828289020932;/asksk'))  # <_sre.SRE_Match object; span=(19, 20), match='a'>


<re.Match object; span=(5, 6), match='5'>
<re.Match object; span=(10, 11), match='a'>
None
<re.Match object; span=(10, 11), match='j'>
<re.Match object; span=(17, 18), match='O'>
<re.Match object; span=(21, 22), match='%'>
<re.Match object; span=(19, 20), match='a'>


## What’s Happening Here

- We were introduced to the character class for digits and lowercase letters: [0-9][a-z]

### Example # 5

In [7]:

print(re.search('[a-z]{11}',
                '2517abcd17171179Abs20abracadabra2298'))  # <_sre.SRE_Match object; span=(21, 32), match='abracadabra'>
print(re.search('[a-z]{1,5}', '2517abcde17171179'))       # <_sre.SRE_Match object; span=(4, 9), match='abcde'>
print(re.search('[a-z]{1,5}[0-9]{3}', 'c999'))  # <_sre.SRE_Match object; span=(0, 4), match='c999'>
print(re.search('[a-z]{1,5}[0-9]{3}', 'd17'))  # None. Why?
print()


<re.Match object; span=(21, 32), match='abracadabra'>
<re.Match object; span=(4, 9), match='abcde'>
<re.Match object; span=(0, 4), match='c999'>
<re.Match object; span=(0, 4), match='d175'>



## What’s Happening Here

- We can control the occurrence of how many times we want a character to repeat by using the curly braces {}


### Common Regex Character Classes

- \d -> any digit
- \D -> any non digit
- \w -> any word
- \W -> any non alphanumeric character


# Example # 6

In [24]:
print(re.search('\d', 'jsjjsk273829BHAja'))  # <_sre.SRE_Match object; span=(1, 2), match='2'>
print(re.search('\d\d', '2j8k2c8c34m3ma1'))  # <_sre.SRE_Match object; span=(8, 10), match='34'>
print(re.search('\d{3}-\d{3}-\d{4}', '230-392-9327'))  # <_sre.SRE_Match object; span=(0, 12), match='230-392-9327'>

# matches any alphanumeric character and the underscore
print(re.search('\w', ';\';,.,.,.283jsns9'))  # <_sre.SRE_Match object; span=(9, 10), match='2'>
print(re.search('\w\w\w', '9d1'))  # <_sre.SRE_Match object; span=(0, 3), match='9d1'>

<re.Match object; span=(6, 7), match='2'>
<re.Match object; span=(8, 10), match='34'>
<re.Match object; span=(0, 12), match='230-392-9327'>
<re.Match object; span=(9, 10), match='2'>
<re.Match object; span=(0, 3), match='9d1'>


## What’s Happening Here

- We introduced a new character class which is \w for alphanumeric 


# Common Regex Symbols

|-  | 0 or more repetitions 
|--|--|
| + | 1 or more repetitions 
| ?  | 0 or 1 repetitions of the proceeding  |
| .  | matches any character except for a newline
| $  | matches the end of the string, or just before the newline
| ^   | matches the start of the string

In [30]:
print(re.search('\d*', '28392202'))  # <_sre.SRE_Match object; span=(0, 8), match='28392202'>

# the + operator matches 1 or more repetitions
print(re.search('aba+', 'abaaaaaa'))  # <_sre.SRE_Match object; span=(0, 8), match='abaaaaaa'>

# the ? causes the regex to match 0 or 1 repetitions of the preceding
print(re.search('abc?', 'ab'))  # ab validates
print(re.search('abc?', 'abc')) # abc validates. what about ac or bc?

# the . matches any character
print(re.search('.{3}', '/,@'))  # <_sre.SRE_Match object; span=(0, 3), match='/,@'>

# $ matches just before the end of newline
print(re.search('^\d\d\d$', '278'))  # <_sre.SRE_Match object; span=(0, 3), match='278'>


<re.Match object; span=(0, 8), match='28392202'>
<re.Match object; span=(0, 8), match='abaaaaaa'>
<re.Match object; span=(0, 2), match='ab'>
None
<re.Match object; span=(0, 3), match='/,@'>
<re.Match object; span=(0, 3), match='278'>


In [None]:
# Regex Lab
