## 範例目標

範例程式終將以 Python 中的 `re` 套件實作 Regular Expression 進行字串的檢索，我們將會涵蓋本日知識點所提及的內容，包含以下主題：

- Literal Match
- Character Classes
- Alternations
- Meta Characters
- Quantifiers
- Group
- Lookarounds
- Anchor



In [1]:
# 載入套件
import re

## Literal Match

In [2]:
text = 'Good Morning, Good Morning'

# 從 text中找出 'Good'
pattern = r'Good'

# 透過 re.findall() 從 text 中將符合 pattern 的 substring 取出
re.findall(pattern, text)

['Good', 'Good']

補充：

*   `r'Good'` 在字串前面加上一個 `r` 表示此字串為 Raw String。例如：`r'\n'` 是由兩個字元所組成的字串 (Raw String)，而 `'\n'` 則是一個字元 (換行字元)。



## Character Classes

In [10]:
text = "'Gray' vs. 'grey': What is the Difference?'"

# 從 text中找出 'Gray' 與 'grey'
pattern = r'[Gg]r[ae]y'

# 透過 re.findall() 從 text 中將符合 pattern 的 substring 取出
re.findall(pattern, text)

['Gray', 'grey']

## Alternations

In [11]:
text = "Which do you want, apple or orange ?"

# 從 text中找出 'apple' 與 'orange'
pattern = r'apple|orange'

# 透過 re.findall() 從 text 中將符合 pattern 的 substring 取出
re.findall(pattern, text)

['apple', 'orange']

## Meta Characters

In [18]:
text1 = "My phone number is 0800-000-123 and 0800-000-456"
text2 = "My email is example1@gamil.com and example2@gamil.com"

In [19]:
# 配對 text1 中的任意字元
pattern = r'.'
re.findall(pattern, text1)

['M',
 'y',
 ' ',
 'p',
 'h',
 'o',
 'n',
 'e',
 ' ',
 'n',
 'u',
 'm',
 'b',
 'e',
 'r',
 ' ',
 'i',
 's',
 ' ',
 '0',
 '8',
 '0',
 '0',
 '-',
 '0',
 '0',
 '0',
 '-',
 '1',
 '2',
 '3',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 '0',
 '8',
 '0',
 '0',
 '-',
 '0',
 '0',
 '0',
 '-',
 '4',
 '5',
 '6']

In [20]:
# 配對 text1 中的電話號碼
pattern = r'\d\d\d\d-\d\d\d-\d\d\d'
re.findall(pattern, text1)

['0800-000-123', '0800-000-456']

In [23]:
# 配對 text2 中的 email
pattern = r'\w\w\w\w\w\w\w\w@\w\w\w\w\w\.\w\w\w'
re.findall(pattern, text2)

['example1@gamil.com', 'example2@gamil.com']

## Quantifiers

In [24]:
text1 = "My phone number : 0800-000-123 and 0800-000-456"
text2 = "My email : example1@gamil.com and example2@gamil.com"

In [25]:
# 配對 text1 中的電話號碼
pattern = r'\d{4}-\d{3}-\d{3}'
re.findall(pattern, text1)

['0800-000-123', '0800-000-456']

In [28]:
# 配對 text2 中的 email
pattern = r'\w+@[a-z.]+'
re.findall(pattern, text2)

['example1@gamil.com', 'example2@gamil.com']

## Group

In [29]:
text1 = "My phone number : 0800-000-123 and 0800-000-456 and 0800-000-789 and 0800-000-999"

In [33]:
# 配對 text1 中電話號碼的末三碼為 123 與 456
pattern = r'(\d{4}-\d{3}-(123|456))'
re.findall(pattern, text1)

[('0800-000-123', '123'), ('0800-000-456', '456')]

## Lookarounds

In [36]:
text1 = "My phone number : 0937-000-123 and 0800-000-456 and 0965-000-789 and 0832-000-999 and 0800-001-123 and 0800-005-123 and 0895-005-123"

In [37]:
# 配對 text1 中電話號碼的末三碼為 123 (lookahead)
pattern = r'\d{4}-\d{3}-(?=123)'
re.findall(pattern, text1)

['0937-000-', '0800-001-', '0800-005-', '0895-005-']

In [40]:
# 配對 text1 中電話號碼的開頭為 0800 (lookbehind)
pattern = r'(?<=0800)-\d{3}-\d{3}'
re.findall(pattern, text1)

['-000-456', '-001-123', '-005-123']

## Anchor

In [52]:
text = 'apple orange watermelon Apricots Banana Blackberries Cantaloupe'

In [53]:
# 配對出現在「行首」的水果
pattern = r'^\w+'
re.findall(pattern, text)

['apple']

In [54]:
# 配對出現在「行尾」的水果
pattern = r'\w+$'
re.findall(pattern, text)

['Cantaloupe']