This project discovers Python regular expressions through re library.

**Agenda**:
* Regular expressions in Python
* Basic characters: ordinary characters
* Wild card characters: special characters
* Repetitions
* Groups and grouping using regular expressions
* Greedy vs non-greedy matching
* re Python library: search() vs match()

**Regular expressions in Python**

In [1]:
import re

**Basic characters: ordinary characters**

In [2]:
pattern = r"Cookie"
#r is a raw string literal
sequence = "Cookie"
if re.match(pattern, sequence):
    print("match!")
else:
    print("not match!")

match!


**Wild card characters: special characters**

".": matches any single character except newline character

In [4]:
re.search(r'Co.k.e', 'Cookie').group()
#group() returns the string matched by re

'Cookie'

"\w": matches any single letter, digit or underscore

In [5]:
re.search(r'Co\wk\we','Cookie').group()

'Cookie'

"\W": matches any character not part of \w

In [6]:
re.search(r'C\Wke','C@ke').group()

'C@ke'

"\s": matches a single whitespace character like space, newline, tab, return

In [7]:
re.search(r'Eat\scake', 'Eat cake').group()

'Eat cake'

"\S": matches any character not part of \s

In [8]:
re.search(r'Cook\Se', 'Cookie').group()

'Cookie'

"\t": matches tab

"\n": matches newline

"\r": matches return

"\d": matches decimal digit 0-9

In [11]:
re.search(r'c\d\dkie','c00kie').group()

'c00kie'

"^": matches a pattern at the start of the string

In [12]:
re.search(r'^Eat','Eat cake').group()

'Eat'

"$": matches a pattern at the end of string

In [13]:
re.search(r'cake$','Eat cake').group()

'cake'

In [14]:
re.search(r'num: [0-6]','num: 5').group()

'num: 5'

In [15]:
#matches any character except 5
re.search(r'num: [^5]','num: 0').group()

'num: 0'

"\A": matches only at the start of the string

In [16]:
re.search(r'\A[A-E]ookie','Cookie').group()

'Cookie'

**Repetitions**

"+": checks for one or more characters to its left

"*": checks for zero or more characters to its left

"?": checks for zero or one character to its left

"{x}": repeat x number of times

"x,": repeat at least x times or more

"x,y": repeat at least x times but no more than y times

**Groups and grouping using regular expressions**

In [24]:
email_address = 'Please contact us at: support@datacamp.com'
match = re.search(r'([\w\.-]+)@([\w\.-]+)', email_address)
if match:
    print(match.group())
    print(match.group(1))
    print(match.group(2))

support@datacamp.com
support
datacamp.com


**Greedy vs non-greedy matching**

greedy match: special character matches as much of the search sequence as possible

In [25]:
heading = r'<h1>TITLE</h1>'
re.match(r'<.*>', heading).group()

'<h1>TITLE</h1>'

add a qualifier to perform non-greedy match

In [26]:
re.match(r'<.*?>', heading).group()

'<h1>'

**re Python library**

match(): checks for a match only at the beginning of the string

search(): checks for a match anywhere in the string

findall(): finds all the possible matches, returns them as a list of strings

In [27]:
email_address = "Please contact us at: support@datacamp.com, xyz@datacamp.com"
addresses = re.findall(r'[\w\.-]+@[\w\.-]+', email_address)
for address in addresses:
    print(address)

support@datacamp.com
xyz@datacamp.com


sub(pattern, repl, string): substitute func

In [29]:
email_address = "Please contact us at: xyz@datacamp.com"
new_email_address = re.sub(r'[\w\.-]+@[\w\.-]+', r'support@datacamp.com', email_address)
print(new_email_address)

Please contact us at: support@datacamp.com


compile(): save pattern for reuse

In [33]:
pattern = re.compile(r"cookie")
sequence = "cake and cookie"
pattern.search(sequence).group()

#re.search(pattern, sequence).group()

'cookie'