## Regular expression (正規表示式)
推薦一個好用的即時檢查re的工具——https://regex101.com/

基本的替代符號
- \d:一位數字
- \D:任意一位不是數字的符號
- \s:一個空格或換行符號
- \w:英文字母
- \W:任意一位不是英文字母的符號
- [\u4e00-\u9fa5]:一個中文字

一些特別的符號
- .:一位任意符號
- *:(必須接在某個符號後面)，0個或更多的該符號
- +:(必須接在某個符號後面)，1個或更多的該符號
- {3,4}:(必須接在某個符號後面)，3個~5個的該符號

In [1]:
text = """
中央流行疫情指揮中心今(29)日公布國內新增11例COVID-19確定病例，分別為1例本土及10例境外移入；另確診個案中無新增死亡。

指揮中心表示，今日新增1例本土個案(案16326)，為印尼籍30多歲女性，今(2021)年9月27日出現頭痛症狀，9月28日就醫採檢，於今日確診。衛生單位已匡列接觸者7人，均列居家隔離，其餘接觸者匡列中。

指揮中心指出，今日新增10例境外移入個案，為9例男性、1例女性，年齡介於20多歲至40多歲，入境日介於9月3日至9月28日，分別自美國(案16316)、哈薩克(2例，案16317、案16318)、巴基斯坦(案16319)、柬埔寨(16320)、俄羅斯(案16324)及菲律賓(案16325)入境，餘3例 (案16321、案16322、案16323)的旅遊國家調查中；詳如新聞稿附件。

指揮中心統計，截至目前國內累計3,358,228例新型冠狀病毒肺炎相關通報(含3,341,439例排除)，其中16,216例確診，分別為1,581例境外移入，14,581例本土病例，36例敦睦艦隊、3例航空器感染、1例不明及14例調查中；另累計110例移除為空號。2020年起累計842例COVID-19死亡病例，其中830例本土，個案居住縣市分布為新北市412例、臺北市318例、基隆市28例、桃園市26例、彰化縣15例、新竹縣13例、臺中市5例、苗栗縣3例、宜蘭縣及花蓮縣各2例，臺東縣、雲林縣、臺南市、南投縣、高雄市及屏東縣各1例；另12例為境外移入。

指揮中心再次呼籲，民眾應落實手部衛生、咳嗽禮節及佩戴口罩等個人防護措施，減少不必要移動、活動或集會，避免出入人多擁擠的場所，或高感染傳播風險場域，並主動積極配合各項防疫措施，共同嚴守社區防線。
"""

In [2]:
import re
print(re.search("新增\d+例本土", text)) # re.search找到第一筆匹配的,回傳re.Match型態
print(re.findall("新增\d+例境外", text)) # re.findall找所有匹配的，回傳一個list

<re.Match object; span=(78, 84), match='新增1例本土'>
['新增10例境外']


In [3]:
print(re.search("新增(\d+)例本土", text).group(1)) # group(n)表示取出這個表達式中第n個括號的內容
print(re.findall("新增(\d+)例境外", text)) # 有括號時findall會取出所有()內的結果

1
['10']


In [4]:
print(re.findall("..[縣市]", text)) # []內表示都可以

['居住縣', '新北市', '臺北市', '基隆市', '桃園市', '彰化縣', '新竹縣', '臺中市', '苗栗縣', '宜蘭縣', '花蓮縣', '臺東縣', '雲林縣', '臺南市', '南投縣', '高雄市', '屏東縣']


## Search

- re.match只匹配字符串的开始，如果字符串开始不符合正则表达式，则匹配失败，函数返回None；
- re.search匹配整个字符串，直到找到一个匹配。


In [5]:
import re
s="ac"
re.search('[abc]',s)

<re.Match object; span=(0, 1), match='a'>

In [6]:
s="ac"
re.search('.',s)

<re.Match object; span=(0, 1), match='a'>

In [7]:
s='mumbai'
x = re.search('^m',s)
print(x)

<re.Match object; span=(0, 1), match='m'>


In [8]:
a = re.search('a$','Delhi')
print(a)
b = re.search('a$','Odissa')
print(b)

None
<re.Match object; span=(5, 6), match='a'>


In [9]:
a = re.search('ma*','man')
print(a)
b = re.search('ma*','woman')
print(b)

<re.Match object; span=(0, 2), match='ma'>
<re.Match object; span=(2, 4), match='ma'>


In [10]:
a = re.search('ma+','man')
print(a)
b = re.search('ma+','woman')
print(b)

<re.Match object; span=(0, 2), match='ma'>
<re.Match object; span=(2, 4), match='ma'>


In [11]:
a = re.search('a{2}', 'abc')
print(a)
b = re.search('a{2}', 'aab')
print(b)

None
<re.Match object; span=(0, 2), match='aa'>


In [12]:
a = re.search('a|b', 'mango')
print(a)
b = re.search('a|b', 'Pune')
print(b)

<re.Match object; span=(1, 2), match='a'>
None


In [13]:
a = re.search('(a)m', 'amm')
print(a)
b = re.search('(a)m', 'abm')
print(b)

<re.Match object; span=(0, 2), match='am'>
None


In [14]:
a=re.search('\$a', '$abc')
print(a)

<re.Match object; span=(0, 2), match='$a'>


In [15]:
if re.search("ape", "This is ape"):
    print("ape")

ape


In [16]:
#Searching string to see if it starts with "The" and ends with "Germany"
import re

newtext = "The match in Germany"
x = re.search("^The.*Germany$", newtext)
# print(x)
if x:
    print("Yes! We have a match")
else:
    print("No match")

Yes! We have a match


In [17]:
# Searching for the first white-space using search()
import re

newtext = "The rain in Germany"
searching = re.search("\s", newtext)

print("The first white space is located in position:", searching.start())


The first white space is located in position: 3


In [18]:
# If no matches are found, the value None is returned
import re

newtext = "The rain in Germany"
matching = re.search("Brazil", newtext)

print(matching)

None


In [19]:
st = "India is my country"
x=re.search('Delhi',st)
print(x)

None


## build
- re.sub用於替換字串

In [20]:
s="I love watching movies"
x = re.sub('\s','2',s)
print(x)

I2love2watching2movies


In [21]:
import re

owlFood = "rat cat mat pet"

regex = re.compile("[cr]at")
owlFood = regex.sub("owl",owlFood)

print(owlFood)

owl owl mat pet


In [22]:
# replacing the matches with the text of your choice
import re

newtext = "Python is a cross-platform language"
matching = re.sub("\s", "|", newtext)

print(matching)

Python|is|a|cross-platform|language


In [23]:
randStr = """This is a long
string that goes
on for many lines
"""

regex = re.compile('\n')
randStr = regex.sub(" ", randStr)

print(randStr)

This is a long string that goes on for many lines 


In [24]:
# strip sub
import re
def strip(text):
    stripStartRegex = re.compile(r'(^\s*)')
    stripEndRegex = re.compile(r'(\s*$)')

    textStartStripped = stripStartRegex.sub('', text)
    textStripped = stripEndRegex.sub('', textStartStripped)

    return textStripped

if __name__ == "__main__":
    text = ' test ffs   '
    print(strip(text))

test ffs


## Split

In [25]:
s="I love watching movies"
x = re.split('\s',s)
print(x)

['I', 'love', 'watching', 'movies']


In [26]:
# returning a list where the string has been split at each match
import re

newtext = "Python is a cross-platform language"
# newmatch = re.split("\s", newtext)

# Split string at first occurrence
newmatch = re.split("\s", newtext, 1)

print(newmatch)

['Python', 'is a cross-platform language']


## Findall

- 字符串中找到所匹配的所有串，返回一List

In [27]:
import re
st = "India is my country"
x=re.findall('nd',st)
print(x)

['nd']


In [28]:
import re

s="I love watching movies"
x=re.findall("\AThe",s)
x

[]

In [29]:
s="I love watching movies"
x = re.findall('\D',s)
print(x)

['I', ' ', 'l', 'o', 'v', 'e', ' ', 'w', 'a', 't', 'c', 'h', 'i', 'n', 'g', ' ', 'm', 'o', 'v', 'i', 'e', 's']


In [30]:
allApes = re.findall("ape","ape... together...strong... apes")
for i in allApes:
    print(i)

ape
ape


In [31]:
theStr = "ape... together...strong... apes"

for i in re.finditer("ape",theStr):
    locTuple = i.span()
    print(locTuple)
    print(theStr[locTuple[0]:locTuple[1]])

(0, 3)
ape
(28, 31)
ape


In [32]:
theStr = "Ape Bpe Cpe Dpe"
allape = re.findall("[ABCD]pe",theStr)

for i in allape:
    print(i)


Ape
Bpe
Cpe
Dpe


In [33]:
theStr = "Ape Bpe Cpe Dpe"
allape = re.findall("[A-Z]pe",theStr)

for i in allape:
    print(i)

Ape
Bpe
Cpe
Dpe


In [34]:
theStr = "Ape Bpe Cpe Dpe"
allape = re.findall("[^C]pe",theStr)

for i in allape:
    print(i)

Ape
Bpe
Dpe


In [35]:
# Returning a list containing all matches
import re

newtext = "The rain rain rain in Germany and Spain"
x = re.findall("Brazil", newtext)

# decision if match or no
if x:
    print("Match")
else:
    print("No match")

# displaying
print(x)

No match
[]


## Sets for Regular Expressions

In [36]:
import re
s= "I love watching movies"
x = re.findall('[arn]',s)
print(x)

['a', 'n']


In [37]:
s="I love watching movies"
x = re.findall('[a-n]',s)
print(x)

['l', 'e', 'a', 'c', 'h', 'i', 'n', 'g', 'm', 'i', 'e']


In [38]:
s="I love watching movies"
x = re.findall('[^arn]',s)
print(x)

['I', ' ', 'l', 'o', 'v', 'e', ' ', 'w', 't', 'c', 'h', 'i', 'g', ' ', 'm', 'o', 'v', 'i', 'e', 's']


In [39]:
s="I love watching movies"
x = re.findall('[a-zA-Z]',s)
print(x)

['I', 'l', 'o', 'v', 'e', 'w', 'a', 't', 'c', 'h', 'i', 'n', 'g', 'm', 'o', 'v', 'i', 'e', 's']


## Project

In [40]:
randStr = "Here is \\stuff"
print("Find \\stuff:", re.search("\\stuff", randStr))
print("Find \\stuff:", re.search("\\\\stuff", randStr))
print("Find \\stuff:", re.search(r"\\stuff", randStr))

Find \stuff: None
Find \stuff: <re.Match object; span=(8, 14), match='\\stuff'>
Find \stuff: <re.Match object; span=(8, 14), match='\\stuff'>


In [41]:
randStr = "F.B.I. I.R.S. CIA"
print("Matches:", len(re.findall(".\..\..",randStr)))
print("Matches:", re.findall(".\..\..",randStr))

Matches: 2
Matches: ['F.B.I', 'I.R.S']


In [42]:
randStr = "12345"
print("Matches:", len(re.findall("\d",randStr)))

Matches: 5


## Check System : pythex
- www.https://pythex.org

In [43]:
import re

if re.search("\d{5}","12345"):
    print("It is a zip code")

numStr = "123 12345 123456 1234567"
print("Matches:", len(re.findall("\d{5,7}",numStr)))

It is a zip code
Matches: 3


In [52]:
phoneNum = input()

if re.search("\w{3}-\w{3}-\w{4}",phoneNum):
    print("It is a phoneNum")
else:
    print("It is not TWphoneNum")

090900123
It is not TWphoneNum


In [53]:
import re

Name = "Toshio Mauramatsu"

if re.search("\w{2,20}\s\w{2,20}",Name):
    print("It is a full name")

It is a full name


In [54]:
import re

mail = "macbook@gmail.com"


print("Email Matches:",len(re.findall("[\w._%+-]{1,20}@[\w.-]{2,20}.[A-Za-z]{2,3}",mail)))
print("Email Matches:",re.findall("[\w._%+-]{1,20}@[\w.-]{2,20}.[A-Za-z]{2,3}",mail))


Email Matches: 1
Email Matches: ['macbook@gmail.com']


In [55]:
import re 
  
passwd = input('Enter a valid password: ')
reg = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*#?&])[A-Za-z\d@$!#%*?&]{6,20}$"
pat = re.compile(reg)    
mat = re.search(pat, passwd) 
       
if mat: 
    print("Password is valid.") 
else: 
    print("Password invalid !!") 

Enter a valid password: 
Password invalid !!


In [56]:
email = re.compile('\w+@\w+\.[a-z]{3}')
text = "To email Guido, try guido@python.org or the older address guido@google.com."
email.findall(text)

['guido@python.org', 'guido@google.com']

In [57]:
email.sub('--@--.--', text)

'To email Guido, try --@--.-- or the older address --@--.--.'

In [58]:
email.findall('barack.obama@whitehouse.gov')

['obama@whitehouse.gov']

##  Simple strings are matched directly

In [59]:
regex = re.compile('ion')
regex.findall('Great Expectations')

['ion']

### characters have special meanings 特殊字符

雖然簡單的字母或數字是直接匹配的，但在正則表達式中有少數具有特殊含義的字符。 他們是：. ^ $ * + ? { } [ ] \ | ( )

我們將暫時討論其中一些的含義。同時，您應該知道，如果您想直接匹配這些字符中的任何一個，您可以使用反斜杠：

In [60]:
regex = re.compile(r'\$')
regex.findall("the cost is $20")

['$']

In [61]:
print('a\tb\tc')

a	b	c


- | ``"\d"`` | 匹配任何數字 |
- | ``"\D"`` | 匹配任何非數字 |
- | ``"\s"`` | 匹配任何空格 |
- | ``"\S"`` | 匹配任何非空格 |
- | ``"\w"`` | 匹配任何字母數字字符 |
- | ``"\W"`` | 匹配任何非字母數字字符 |

In [62]:
regex = re.compile(r'\w\s\w')
regex.findall('the fox is 9 years old')

['e f', 'x i', 's 9', 's o']

### [方括號]匹配自定義字符組 Special characters can match character
如果內置字符組對您來說不夠具體，您
可以使用方括號來指定您感興趣的任何字符集。

In [63]:
regex = re.compile('[aeiou]')
regex.split('consequential')

['c', 'ns', 'q', '', 'nt', '', 'l']

In [64]:
regex = re.compile('[A-Z][0-9]')
regex.findall('1043879, G2, H6')

['G2', 'H6']

### 匹配{重複}的字符 Wildcards match repeated characters

如果你想匹配一個字符串，比如，一行中的三個字母數字字符，可以寫``"\w\w\w"``。
因為這是一個非常普遍的需求，所以有一個特定的語法來匹配重複——帶數字的花括號：

In [65]:
regex = re.compile(r'\w{3}')
regex.findall('The quick brown fox')

['The', 'qui', 'bro', 'fox']

In [66]:
regex = re.compile(r'\w+')
regex.findall('The quick brown fox')

['The', 'quick', 'brown', 'fox']


我們想要一個或多個字母數字字符 (``"\w+"``) 後跟 *at 符號* (``"@"``)，後跟一個或多個字母數字字符 ( ``"\w+"``)，後跟一個句點（``"\."`` - 注意需要反斜杠轉義），後跟正好三個小寫字母。

如果我們現在想修改它以便奧巴馬的電子郵件地址匹配，我們可以使用方括號表示法：

In [67]:
email2 = re.compile(r'[\w.]+@\w+\.[a-z]{3}')
email2.findall('barack.obama@whitehouse.gov')

['barack.obama@whitehouse.gov']

In [68]:
email3 = re.compile(r'([\w.]+)@(\w+)\.([a-z]{3})')
text = "To email Guido, try guido@python.org or the older address guido@google.com."
email3.findall(text)

[('guido', 'python', 'org'), ('guido', 'google', 'com')]

In [69]:
email4 = re.compile(r'(?P<user>[\w.]+)@(?P<domain>\w+)\.(?P<suffix>[a-z]{3})')
match = email4.match('guido@python.org')
match.groupdict()

{'user': 'guido', 'domain': 'python', 'suffix': 'org'}

The following is a table of the repetition markers available for use in regular expressions:

| Character | Description | Example |
|-----------|-------------|---------|
| ``?`` | Match zero or one repetitions of preceding  | ``"ab?"`` matches ``"a"`` or ``"ab"`` |
| ``*`` | Match zero or more repetitions of preceding | ``"ab*"`` matches ``"a"``, ``"ab"``, ``"abb"``, ``"abbb"``... |
| ``+`` | Match one or more repetitions of preceding  | ``"ab+"`` matches ``"ab"``, ``"abb"``, ``"abbb"``... but not ``"a"`` |
| ``{n}`` | Match ``n`` repetitions of preeeding | ``"ab{2}"`` matches ``"abb"`` |
| ``{m,n}`` | Match between ``m`` and ``n`` repetitions of preceding | ``"ab{2,3}"`` matches ``"abb"`` or ``"abbb"`` |

## Split

In [70]:
line = 'the quick brown fox jumped over a lazy dog'

In [71]:
for s in ["     ", "abc  ", "  abc"]:
    if regex.match(s):
        print(repr(s), "matches")
    else:
        print(repr(s), "does not match")

'     ' does not match
'abc  ' matches
'  abc' does not match


In [72]:
line.index('fox')

16

In [73]:
regex = re.compile('fox')
match = regex.search(line)
match.start()

16

In [74]:
line.replace('fox', 'BEAR')

'the quick brown BEAR jumped over a lazy dog'

In [75]:
regex.sub('BEAR', line)

'the quick brown BEAR jumped over a lazy dog'

## Pattern Matching with Regular Expressions

In [76]:
# 正規表達式 Regular Expression 
# Regular expression is an enhanced version of ctrl + F. 

# 1. Import the regular expression module: import re
import re

# 2. Use re.compile() to generate a regular expression object based on the parameters
phoneNumberRegex = re.compile(r'\d{3}-\d{3}-\d{4}')

# 3. search() of the regular expression object to retrieve a piece of text,
#    and the text is passed to the regular expression object for retrieval in the form of parameters
mo = phoneNumberRegex.search('My number is 415-555-4242.')

# 4. Return a match object, remember to store it in a variable. 
#    group() on this match object to get the actual matching result. 
print('phone number found: '+mo.group())

phone number found: 415-555-4242


In [77]:
import re
batRegex = re.compile(r'Bat(wo)?man')
mo = batRegex.search('The Adventures of Batman')
print(mo.group())
mo = batRegex.search('The Adventures of Batwoman')
print(mo.group())

Batman
Batwoman


In [78]:
import re
batRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = batRegex.search('Call 415-555-1011')
print(mo.group())
mo = batRegex.search('415-555-9999 my office')
print(mo.group())

415-555-1011
415-555-9999


In [79]:
import re
batRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = batRegex.search('Call 415-555-1011')
print(mo.group(0))
print(mo.group(1))
print(mo.group(2))

415-555-1011
415
555-1011


In [80]:
import re
batRegex = re.compile(r'(Ha){3}')
mo = batRegex.search('He said "HaHaHa"')
print(mo.group())

HaHaHa


In [81]:
import re
batRegex = re.compile(r'(\d){3,5}')
mo = batRegex.search('1234567890')
print(mo.group())

12345


In [82]:
import re
batRegex = re.compile(r'(\d){3,5}?')
mo = batRegex.search('1234567890')
print(mo.group())

123


In [83]:
# Displaying start and end position of the first match occurrence
import re

newtext = "Python is a Cross-platform Language"
searching = re.search(r"\bL\w+", newtext)
# print(searching)
# print(searching.span())
# print(searching.string)
print(searching.group())

Language


In [86]:
!pip install pyperclip

Collecting pyperclip
  Downloading pyperclip-1.8.2.tar.gz (20 kB)
Building wheels for collected packages: pyperclip
  Building wheel for pyperclip (setup.py): started
  Building wheel for pyperclip (setup.py): finished with status 'done'
  Created wheel for pyperclip: filename=pyperclip-1.8.2-py3-none-any.whl size=11136 sha256=c6bb8fe398f6e756ac3f7f895b2e01b521a3ec1a11ed1624690f8db5571031eb
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\0c\09\9e\49e21a6840ef7955b06d47394afef0058f0378c0914e48b8b8
Successfully built pyperclip
Installing collected packages: pyperclip
Successfully installed pyperclip-1.8.2


In [87]:
#Finds phone numbers and email addresses on the clipboard.

import pyperclip, re
phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))? # area code
    (\s|-|\.)?         # separator
    (\d{3})              # first 3 digits
    (\s|-|\.)          # separator
    (\d{4})              # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))?  # extension
    )''', re.VERBOSE)

# Create email regex.
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+      # username
    @                      # @ symbol
    [a-zA-Z0-9.-]+         # domain name
    (\.[a-zA-Z]{2,4}){1,2} # dot-something
    )''', re.VERBOSE)

# Find matches in clipboard text.
text = str(pyperclip.paste())

matches = []
for groups in phoneRegex.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups[8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)
for groups in emailRegex.findall(text):
    matches.append(groups[0])

# Copy results to the clipboard.
if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard:')
    print('\n'.join(matches))
else:
    print('No phone numbers or email addresses found.')


No phone numbers or email addresses found.


In [88]:
import re
lyrics = '12 Drummers Drumming 11 Pipers Piping 10 Lords a Leaping 9 Ladies Dancing 8 Maids a Milking \
7 Swans a Swimming 6 Geese a Laying 5 Golden Rings 4 Calling Birds 3 French Hens 2 Turtle Doves and a Partridge in a Pear Tree'

xmasRegex = re.compile(r'\d+\s\w+')
xmasRegex.findall(lyrics)

['12 Drummers',
 '11 Pipers',
 '10 Lords',
 '9 Ladies',
 '8 Maids',
 '7 Swans',
 '6 Geese',
 '5 Golden',
 '4 Calling',
 '3 French',
 '2 Turtle']

In [89]:
vowelRegex = re.compile(r'[aeiouAEIOU]')   #vowelRegex
#vowelRegex = re.compile(r'[^aeiouAEIOU]') #nativeRegex
vowelRegex.findall(lyrics)

['u',
 'e',
 'u',
 'i',
 'i',
 'e',
 'i',
 'i',
 'o',
 'a',
 'e',
 'a',
 'i',
 'a',
 'i',
 'e',
 'a',
 'i',
 'a',
 'i',
 'a',
 'i',
 'i',
 'a',
 'a',
 'i',
 'i',
 'e',
 'e',
 'e',
 'a',
 'a',
 'i',
 'o',
 'e',
 'i',
 'a',
 'i',
 'i',
 'e',
 'e',
 'u',
 'e',
 'o',
 'e',
 'a',
 'a',
 'a',
 'i',
 'e',
 'i',
 'a',
 'e',
 'a',
 'e',
 'e']

In [90]:
import re
cat = 'The cat : in the hat sat on the flat mat.'
xmasRegex = re.compile(r'.at')
xmasRegex.findall(cat)

['cat', 'hat', 'sat', 'lat', 'mat']

In [46]:
xmasRegex = re.compile(r'.{1,2}at')
xmasRegex.findall(cat)

[' cat', ' hat', ' sat', 'flat', ' mat']

In [47]:
import re
name = 'First Name: Al Last Name: Seight'
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
nameRegex.findall(name)

[('Al', 'Seight')]

In [48]:
import re
serve = '<To serve humans> for dinner.>'
nogreedy = re.compile(r'<(.*?)>')  #have ?
nogreedy.findall(serve)

['To serve humans']

In [49]:
greedy = re.compile(r'<(.*)>')    # no ?
greedy.findall(serve)

['To serve humans> for dinner.']

In [50]:
#23 須深入 Regular Expression 
# 非正規表達式 (擁擠)

def isPhoneNumber(text):
    if len(text) != 12:  #long ?
        return False
    for i in range(0,3):
        if not text[i].isdecimal():  #is number?
            return False
    if text[3] != '-':   #in - 
        return False
    for i in range(4,7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
    for i in range(8,12):
        if not text[i].isdecimal():
            return False
    return True
    #[7]如果前面所有的判斷基線都沒有發現問題，那就認為這個字符串是電話號碼

print('415-555-4242 is a phone number:')
print(isPhoneNumber('415-555-4242'))
print('Moshi moshi is a phone number:')
print(isPhoneNumber('Moshi moshi'))
print('----------')

message = 'Call 415-555-1011 office'
for i in range(len(message)):
    chunk = message[i:i+12]
    print(chunk)

415-555-4242 is a phone number:
True
Moshi moshi is a phone number:
False
----------
Call 415-555
all 415-555-
ll 415-555-1
l 415-555-10
 415-555-101
415-555-1011
15-555-1011 
5-555-1011 o
-555-1011 of
555-1011 off
55-1011 offi
5-1011 offic
-1011 office
1011 office
011 office
11 office
1 office
 office
office
ffice
fice
ice
ce
e


## strong-password Detect

In [51]:
import re

def testPasswordStrength(password):
    eightCharsLongRegex = re.compile(r'[\w\d\s\W\D\S]{8,}')
    upperCaseRegex = re.compile(r'[A-Z]+')
    lowerCaseRegex = re.compile(r'[a-z]+')
    oneOrMoreDigitRegex = re.compile(r'\d+')
    
    if not eightCharsLongRegex.search(password):
        return False
    elif not upperCaseRegex.search(password):
        return False
    elif not lowerCaseRegex.search(password):
        return False
    elif not oneOrMoreDigitRegex.search(password):
        return False
    return True
    

if __name__ == "__main__":
    password = 'A&dsas9$_'
    print(testPasswordStrength(password))

True


## Built-in Functions for Regular Expressions

In [1]:
import re
st = "India is my country"
x=re.findall('nd',st)
print(x)

['nd']


In [2]:
st = "India is my country"
x=re.search('Delhi',st)
print(x)

None


In [3]:
s="I love watching movies"
x = re.split('\s',s)
print(x)

['I', 'love', 'watching', 'movies']


In [4]:
s="I love watching movies"
x = re.sub('\s','2',s)
print(x)

I2love2watching2movies


## Regular

In [2]:
import re
phoneNumberRegex = re.compile(r'\d{3}-\d{3}-\d{4}')
mo = phoneNumberRegex.search('My number is 415-555-4242.')
print('phone number found: '+mo.group())

phone number found: 415-555-4242


In [3]:
import re
batRegex = re.compile(r'Bat(wo)?man')
mo = batRegex.search('The Adventures of Batman')
print(mo.group())
mo = batRegex.search('The Adventures of Batwoman')
print(mo.group())

Batman
Batwoman


In [4]:
import re
batRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = batRegex.search('Call 415-555-1011')
print(mo.group())
mo = batRegex.search('415-555-9999 my office')
print(mo.group())

415-555-1011
415-555-9999


In [5]:
import re
batRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = batRegex.search('Call 415-555-1011')
print(mo.group(0))
print(mo.group(1))
print(mo.group(2))

415-555-1011
415
555-1011


In [6]:
import re
batRegex = re.compile(r'(Ha){3}')
mo = batRegex.search('He said "HaHaHa"')
print(mo.group())

HaHaHa


In [7]:
import re
batRegex = re.compile(r'(\d){3,5}')
mo = batRegex.search('1234567890')
print(mo.group())

12345


In [8]:
import re
batRegex = re.compile(r'(\d){3,5}?')
mo = batRegex.search('1234567890')
print(mo.group())

123


In [10]:
import re
lyrics = '12 Drummers Drumming 11 Pipers Piping 10 Lords a Leaping 9 Ladies Dancing 8 Maids a Milking \
7 Swans a Swimming 6 Geese a Laying 5 Golden Rings 4 Calling Birds 3 French Hens 2 Turtle Doves and a Partridge in a Pear Tree'

xmasRegex = re.compile(r'\d+\s\w+')
xmasRegex.findall(lyrics)

['12 Drummers',
 '11 Pipers',
 '10 Lords',
 '9 Ladies',
 '8 Maids',
 '7 Swans',
 '6 Geese',
 '5 Golden',
 '4 Calling',
 '3 French',
 '2 Turtle']

In [11]:
vowelRegex = re.compile(r'[aeiouAEIOU]')   #vowelRegex
#vowelRegex = re.compile(r'[^aeiouAEIOU]') #nativeRegex
vowelRegex.findall(lyrics)

['u',
 'e',
 'u',
 'i',
 'i',
 'e',
 'i',
 'i',
 'o',
 'a',
 'e',
 'a',
 'i',
 'a',
 'i',
 'e',
 'a',
 'i',
 'a',
 'i',
 'a',
 'i',
 'i',
 'a',
 'a',
 'i',
 'i',
 'e',
 'e',
 'e',
 'a',
 'a',
 'i',
 'o',
 'e',
 'i',
 'a',
 'i',
 'i',
 'e',
 'e',
 'u',
 'e',
 'o',
 'e',
 'a',
 'a',
 'a',
 'i',
 'e',
 'i',
 'a',
 'e',
 'a',
 'e',
 'e']

In [None]:
# import the regular expressions module
# import re

# Search the string to see if it starts with "You" and ends with "Python"
# newTxt = "You can start learning Python now"
# x = re.search("^You.*Python$", newTxt) 

# if x:
#     print("Yes, we have a match!")
# else:
#     print("No match!")
# Use findall and print a list of all matches
# import re

# newTxt = "Python is very very easy to understand"
# x = re.findall("place", newTxt)
# print(x)
# Search for the first white-space
# import re

# newTxt = "Python is very easy"
# x = re.search("\s", newTxt)

# print("First white-space position:", x.start())
# Making a search that returns no match
# import re

# newTxt = "Python is very fun"
# x = re.search("rain", newTxt)
# print(x)
# Split at each white-space inthe string
# import re

# newTxt = "Try to code with Python"
# x = re.split("\s", newTxt, 2)

# print(x)

# replace every white space with $
# import re

# newTxt = "Python is very easy"
# x = re.sub("\s", "$", newTxt, 1)
# print(x)

# search and return a match object
# import re

# newTxt = "Python is very very easy"
# x = re.search("very", newTxt)

# print(x)  # print an object
# import re

# # Searching for upper case C in the beginning of a word and print its position
# newTxt = "Code with Python now"
# x = re.search(r"\bC\w+", newTxt)

# print(x.span())

import re

# string property to return the string passed into the function

newStr = "Python is so good for beginners"
x = re.search(r"\bP\w+", newStr)

# print(x.string)

# Print the word that contains upper case P in the beginning
print(x.group())

## Profundizando Sistemas Numeración

In [1]:
# Profundizando Sistemas Numeración
# Decimal
a = 10
# Binario
a = 0b1010
# Octal
a = 0o12
# Hexadecimal
a = 0xA
print(f'a: {a}')

a: 10


In [2]:
# Convertir un tipo entero, incluyendo base
# Base Decimal
a = int('23', 10)
# Base Binario
a = int('10111', 2)
# Base Octal
a = int('27', 8)
# Base Hexadecimal
a = int('17', 16)
# Base 5
a = int('344', 5)
print(f'a: {a}')

a: 99


In [None]:
# Profundizando en el tipo str

# caracteres unicode
print('Hola\u0020Mundo')
print('Notación simple:', '\u0041')
print('Notación extendida:','\U00000041')
print('Notación hexadecimal','\x41')
print('Corazón:','\u2665')
print('Cara sonriendo:','\U0001f600')
print('Serpiente:','\U0001F40D')

# Caracteres ascii
caracter = chr(65)
print('A mayúscula:', caracter)
caracter = chr(64)
print('Símbolo @:', caracter)
caracter = chr(97)
print('a minúscula:', caracter)

In [None]:
# Profundizando en el tipo str

# caracteres bytes
caracteres_en_bytes = b'Hola Mundo'
print(caracteres_en_bytes)

mensaje = b'Universidad Python'
print(mensaje[1])
print(chr(mensaje[1]))

lista_caracteres = mensaje.split()
print(lista_caracteres)

# Convertir str a bytes
string = 'Programación con Python'
print('string original:', string)
bytes = string.encode('UTF-8')
print('bytes codificado:', bytes)
# Convertir bytes a str
string2 = bytes.decode('UTF-8')
print('string decodificado:', string2)
print(string == string2)

In [None]:
# * desempaquetar
numeros = [1,2,3]
print(numeros)
print(*numeros)
print(*numeros, sep=' - ')

# Desempaquetando al momento de pasar un parámetro a una función
def sumar(a, b, c):
    print(f'Resultado suma: {a + b + c}')

sumar(*numeros)

# Extraer algunas partes de una lista
mi_lista = [1,2,3,4,5,6]
a, *b, c, d = mi_lista
print(a,b,c,d)

# Unir lista
lista1 = [1,2,3]
lista2 = [4,5,6]
lista3 = [lista1, lista2]
print(f'Lista de listas: {lista3}')
lista3 = [*lista1, *lista2]
print(f'Unir listas: {lista3}')

# Unir diccionarios
dic1 = {'A':1, 'B':2, 'C':3}
dic2 = {'D':4, 'E':5}
dic3 = {**dic1, **dic2}
print(f'Unir diccionarios: {dic3}')

# Construir una lista a partir de un str
lista = [*'HolaMundo']
print(lista)
print(*lista, sep='')