### Simple Patterns

### Matching Characters

. ^ $ * + ? { } [ ] \ | ( )

* . - matches anything except for a newline
* ^ - complementing set, a.k.a. except. it must be before the character(s)
* $
* \* - repeating things 0 or more times. it specifies that the previous character can be matched zero or more times. 
* \+ - another repeating meta character, which matches one or more times. 
* ? - another repeating meta character, which matches either once or zero times. 
* { }
* [  ] - used to specify a set of characters to match, e.g. character class. metacharacters are not active inside classes
* \ - perhaps the most important metacharacter. it can be followed by various characters to signal various special sequences. it is also used to escape all the metacharacters. 
 * \w matches any alphanumeric character - same as [a-zA-Z0-9_]
 * \d matches any decimal digit - same as [0-9]
 * \D matches any non-digit character - same as [^0-9]
 * \s matches any whitespace character - same as [ \t\n\r\f\v]
 * \S matches any non-whitespace character - same as [^ \t\n\r\f\v]
 * \w matches any alphanumeric character - same as [a-zA-Z0-9_]
 * \W matches any non-alphanumeric character - same as [^a-zA-Z0-9_].
* |
* ( )




#### A step-by-step example will make this more obvious. Let’s consider the expression a[bcd]*b. This matches the letter 'a', zero or more letters from the class [bcd], and finally ends with a 'b'. Now imagine matching this RE against the string 'abcbd'.

![sample](assets/image_1.png)



In [1]:
import re
p1=re.compile('ab*')
p2=re.compile('[a-z]+')

p1.match('abc')

<re.Match object; span=(0, 2), match='ab'>

![sample2](assets/image_2.png)

In [2]:
m1=p1.match('tempo')
print(m1)
m2=p2.match('tempo')
print(m2)

None
<re.Match object; span=(0, 5), match='tempo'>


In [3]:
p=re.compile('\d+')
p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')

['12', '11', '10']

![sample3](assets/image_3.png)

In [8]:
p=re.compile('\d+')
p.search('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')

<re.Match object; span=(0, 2), match='12'>

In [18]:
re.findall(f'\b\w[aeiou]{2}\w\b', 'testfdsafdsa')

[]

In [39]:
pattern=re.compile(r'\w+[aeiou]{2}\w+')
pattern.findall('testfdsafdsa fdsaarwn fdsakll fdsakd abc abdee')

['fdsaarwn']

In [27]:
re.findall(pattern, 'testfdsafdsa fdsaarwn fdsakll')

['tes', 'saf', 'saa', 'sak']

In [42]:
pattern=re.compile(r'\w*\b')
pattern.findall('testfdsafdsa fdsaarwn fdsakll fdsakd abc abdee')

['testfdsafdsa',
 '',
 'fdsaarwn',
 '',
 'fdsakll',
 '',
 'fdsakd',
 '',
 'abc',
 '',
 'abdee',
 '']

In [49]:
pattern=re.compile(r'\w*\w')
pattern.findall('testfdsafdsa fdsaarwn fdsakll fdsakd abc abdee')

['testfdsafdsa', 'fdsaarwn', 'fdsakll', 'fdsakd', 'abc', 'abdee']

In [78]:
# pattern=re.compile(r'[aeiou]*')

pattern=re.compile(r'[a-z]{2,10}')
pattern.findall('testfdsafdsa fFDsaarwn fdsakll fdsakd abc abdee')

['testfdsafd', 'sa', 'saarwn', 'fdsakll', 'fdsakd', 'abc', 'abdee']

In [81]:
pattern=re.compile(r'a?b')
# pattern.findall('testfdsafdsa fFDsaarwn fdsakll fdsakd abc abdee')
pattern.findall('b ab aab')

['b', 'ab', 'ab']

In [100]:
pattern=re.compile(r'.*\b')
# pattern.findall('testfdsafdsa fFDsaarwn fdsakll fdsakd abc abdee')


['b ab', 'aab']

In [104]:
pattern=re.compile(r'.*pattern')
pattern.findall('string_with_pattern')

['string_with_pattern']

In [108]:
pattern=re.compile(r't')
pattern.findall('string_with_pattern')

['t', 't', 't', 't']

In [107]:
pattern=re.compile(r'str')
pattern.findall('string_with_pattern')

['str']

In [390]:
re.searchall(r'[0-9]', 'abc123')

AttributeError: module 're' has no attribute 'searchall'

In [116]:
re.findall(r'[0-9]', 'abc123 def456')

['1', '2', '3', '4', '5', '6']

In [623]:
re.findall(r'\b[0-9]+.*', 'abc123 def456 789ghi')

['789ghi']

In [631]:
re.findall(r'\b[0-9]+.*\b', 'abc123 def456 789ghi')

['789ghi']

In [392]:
re.findall(r'[1-9][0-9]*', 'abc0123 def456 789ghi')

['123', '456', '789']

In [149]:
re.findall(r'\w+\.(gif|png|jpg|jpeg)', 'the file name is image_1.png')

['png']

In [157]:
re.findall(r'\.(gif|png|jpg|jpeg)', 'file names: image_1.png and image_2.jpeg')

['png', 'jpeg']

In [194]:
re.findall(r"\S+\.(?:png|jpeg)", 'file names: image_1.png and image_2.jpeg')

['image_1.png', 'image_2.jpeg']

In [210]:
re.findall(r"\w+.(?:png|jpeg)", 'file names: image_1.png and image_2.jpeg')

['image_1.png', 'image_2.jpeg']

In [206]:
re.findall(r'[1-9](?:[0-9]*)', 'abc0123 def456 789ghi')

['123', '456', '789']

In [427]:
re.findall(r"\w+.(?:png|jpeg)", 'file names: image_1.png and image_2.jpeg')

['image_1.png', 'image_2.jpeg']

In [429]:
re.findall(r"\w+.(?:png|jpeg)", 'file names: image_1.png and image_2.jpeg')

['image_1.png', 'image_2.jpeg']

In [325]:
re.findall(r'https://\S*/', 'addresses: https://www.google.com/ and https://www.bad address.com/')

['https://www.google.com/']

In [595]:
re.findall(r'\w+(?:[.-]?\w+)*@\w+(?:\.[A-Za-z]{2,3})+(?![A-Za-z])', 'email addresses: joe@yahoo.com.uk john.smith@gmail.com jane.a.smith@company.net jeff@hotmail.orgg')

['joe@yahoo.com.uk', 'john.smith@gmail.com', 'jane.a.smith@company.net']

In [405]:
re.findall(r'(\w+)\1\1', 'hahaha')

['ha']

In [471]:
re.findall(r'(\w+)\1\1', 'hahahahehehe hahaha hehehe aaa')

['ha', 'he', 'ha', 'he', 'a']

In [587]:
[groups[0] for groups in re.findall(r'((.+)\2\2)', 'hahahahehehe hahaha hehehe aaa')]

['hahaha', 'hehehe', 'hahaha', 'hehehe', 'aaa']

In [474]:
[groups for groups in re.findall(r'(?:(\w+)\1\1)', 'hahaha')]

['ha']

In [586]:
re.findall(r'(?:(.+)\1\1)', 'hahahahehehe hahaha hehehe aaa')

['ha', 'he', 'ha', 'he', 'a']

In [585]:
re.findall(r'((.+)\2\2)', 'hahaha')

[('hahaha', 'ha')]

In [481]:
re.findall(r'[a-zA-Z]*', 'Password2020!')

['Password', '', '', '', '', '', '']

In [567]:
re.findall(r"(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}", 'Password2020! Pwd20! password2020! Password2020 PASSWORD2020!')

[]

In [577]:
passwords=['Password2020!', 
           'Pwd20!', 
           'password2020!', 
           'Password2020', 
           'PASSWORD2020!']
validation_pattern=re.compile(r'(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}')
for each_password in passwords: 
    print(re.match(validation_pattern, each_password))

<re.Match object; span=(0, 13), match='Password2020!'>
None
None
None
None


In [575]:
for each_group in groups: 
    print(each_group)

Password2020!
password2020!
Password2020


In [591]:
hands=['a594k', 
       'jkq61', 
       '7a79t', 
       'aj9ta', 
       '2376q']
validation_pattern=re.compile(r'[a2-9tjqk]*([a2-9tjqk])[a2-9tjqk]*\1')
for each_hand in hands: 
    print(re.match(validation_pattern, each_hand))

None
None
<re.Match object; span=(0, 3), match='7a7'>
<re.Match object; span=(0, 5), match='aj9ta'>
None


In [618]:
re.findall(r'\b[^0-9]+\b', '123abc def456 7g8h9i jkl')

[' ', ' ', ' jkl']

In [619]:
re.findall(r'\b[A-Za-z]+\b', '123abc def456 7g8h9i jkl')

['jkl']