# Introduction to Regex

A regular expression is a string which defines a pattern. The pattern can be used to match strings.
For example, to check if an email or an phone number is valid.

Python has a standard libary `re` which provides regex functionalities.

In [1]:
import re

* Pattern
* match()
* search()

## Matching

### Exercise 1

Check if a string consists of 7 digits.

In [22]:
pattern = re.compile('[0-9]{7}')
pattern

re.compile(r'[0-9]{7}', re.UNICODE)

In [25]:
m = pattern.match('abc1234567')
if m:
    print(m.group())
else:
    print("String doesn't start with 7 digits")

String doesn't start with 7 digits


### Exercise 2

Check if a string starts with at least 5 alphabets.

In [28]:
pattern = re.compile('[a-zA-Z]{5,}')

In [30]:
result = pattern.match('abc1234')
print(result)

None


In [34]:
result = pattern.match('abcdefg')
print(result.group())

abcdefg


### Exercise 3

Singapore phone number starts with 6, 8 or 9. It can be 7 or 8 digits.

In [50]:
pattern = re.compile('[689][0-9]{7,8}')
# pattern = re.compile('^[689][0-9]{7,8}$')
pattern = re.compile('^(6[0-9]{6}|[89][0-9]{7})$')

In [51]:
pattern.match('61234567')

In [52]:
pattern.match('81234567')

<re.Match object; span=(0, 8), match='81234567'>

In [53]:
pattern.match('51234567')

In [54]:
pattern.match('8123456789')

### Exercise 4

Validate email addresses.

In [55]:
s = '^\w[\w\.]*@\w[\w\.]+\w'
pattern = re.compile(s)

In [58]:
pattern.match('abc@gmail.com')

<re.Match object; span=(0, 13), match='abc@gmail.com'>

## Searching versus Matching

The `match()` method checks for a match only at the beginning of the string, while a `search()` checks for a match anywhere in the string.

In [72]:
string = "\n  dreamer dreamers"
result = re.search(r"dreamer\w+", string) 
print(result.group())

dreamers


## Splitting

In [89]:
s = '0 Sunday 1 Monday 2 Tuesday'

pattern = re.compile('\s*\d\s*')
pattern.split(s)

['', 'Sunday', 'Monday', 'Tuesday']

## Finding All

In [81]:
s = "Good morning, Singapore"
result_list = re.findall(r"\w+", s)
print(result_list)

['Good', 'morning', 'Singapore']


In [83]:
s = "Income 1000, expenses 100, net amount 900"
result_list = re.findall(r"\d+", s)
print(result_list)

['1000', '100', '900']


## Look Ahead

In [77]:
pattern = re.compile(r'\w+(?=\sfox)')
result = pattern.search("The quick brown fox")
print(result.group())

brown


## Substitution

In [86]:
pattern = re.compile(r"\d{4}-\d{2}-\d{2}")
s = "From 2020-01-01 to 2020-03-03"
result = pattern.sub("__", s)
print(result)

From __ to __


## Reference

* https://regexone.com/lesson/letters_and_digits
* https://scotch.io/tutorials/an-introduction-to-regex-in-python