## 1 - match

--- match 方法会尝试从字符串的起始位置匹配正则表达式，如果匹配，就返回匹配成功的结果；如果不匹配，就返回 None

--- match 方法在使用时需要考虑到开头的内容，这在做匹配时并不方便。它更适合用来检测某个字符串是否符合某个正则表达式的规则。

In [1]:
import re

In [2]:
content = 'Hello 123 4567 World_This is a Regex Demo'
result = re.match('^Hello\s\d\d\d\s\d{4}\s\w{10}', content)
print(result)
print(result.group())
print(result.span())

<re.Match object; span=(0, 25), match='Hello 123 4567 World_This'>
Hello 123 4567 World_This
(0, 25)


### 匹配目标

In [3]:
content = 'Hello 1234567 World_This is a Regex Demo'
result = re.match('^Hello\s(\d+)\sWorld', content)
print(result)
print(result.group())
print(result.group(1))
print(result.span())

<re.Match object; span=(0, 19), match='Hello 1234567 World'>
Hello 1234567 World
1234567
(0, 19)


## 通用匹配

In [4]:
result = re.match('^Hello(.*?)Demo$', content)
print(result)
print(result.group())
print(result.group(1))
print(result.span())

<re.Match object; span=(0, 40), match='Hello 1234567 World_This is a Regex Demo'>
Hello 1234567 World_This is a Regex Demo
 1234567 World_This is a Regex 
(0, 40)


## 贪婪与非贪婪

In [5]:
result = re.match('^He.*(\d+).*Demo$', content)
print(result)
print(result.group())
print(result.group(1))

<re.Match object; span=(0, 40), match='Hello 1234567 World_This is a Regex Demo'>
Hello 1234567 World_This is a Regex Demo
7


In [6]:
result = re.match('^He.*?(\d+).*Demo$', content)
print(result)
print(result.group())
print(result.group(1))

<re.Match object; span=(0, 40), match='Hello 1234567 World_This is a Regex Demo'>
Hello 1234567 World_This is a Regex Demo
1234567


In [7]:
# 匹配中间的字符尽量适用非贪婪，匹配结尾的字符尽量用贪婪
content = 'http://weibo.com/comment/kEraCN'
result1 = re.match('http.*?comment(.*?)', content)
result2 = re.match('http.*?comment(.*)', content)
print('result1 ',result1.group(1))
print('result2 ',result2.group(1))

result1  
result2  /kEraCN


## 修饰符

In [8]:
content = '''Hello 1234567 World_This
is a Regex Demo
'''
result = re.match('^He.*?(\d+).*Demo$', content, re.S)
print(result.group(1))

1234567


## 转义匹配

In [9]:
content = '(百度) www.baidu.com'
result = re.match('\(百度\)(.*)\.(.*)\.(.*)', content)
print(result)
print(result.group(1))
print(result.group(2))
print(result.group(3))

<re.Match object; span=(0, 18), match='(百度) www.baidu.com'>
 www
baidu
com


## 2- search

--- match 方法在使用时需要考虑到开头的内容，这在做匹配时并不方便。它更适合用来检测某个字符串是否符合某个正则表达式的规则。

In [10]:
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
result = re.search('Hello.*?(\d+).*Demo', content)
print(result)
print(result.group(1))

<re.Match object; span=(13, 53), match='Hello 1234567 World_This is a Regex Demo'>
1234567


In [21]:
html = '''
<div id="songs-list">
  <h2 class="title">经典老歌</h2>
  <p class="introduction">经典老歌列表</p>
  <ul id="list" class="list-group">
    <li data-view="2">一路上有你</li>
    <li data-view="7">
      <a href="/2.mp3" singer="任贤齐">沧海一声笑</a>
    </li>
    <li data-view="4" class="active">
      <a href="/3.mp3" singer="齐秦">往事随风</a>
    </li>
    <li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li>
    <li data-view="5"><a href="/5.mp3" singer="陈慧琳">记事本</a></li>
    <li data-view="5">
      <a href="/6.mp3" singer="邓丽君">但愿人长久</a>
    </li>
  </ul>
</div>
'''

In [43]:
pattern = '<li.*?active.*?href="(.*?)"\ssinger="(.*?)">(.*?)</a>'
result = re.search(pattern, html, re.S)
# print(result)
print(result.group(0))
print(result.group(1))
print(result.group(2))
print(result.group(3))

<li data-view="2">一路上有你</li>
    <li data-view="7">
      <a href="/2.mp3" singer="任贤齐">沧海一声笑</a>
    </li>
    <li data-view="4" class="active">
      <a href="/3.mp3" singer="齐秦">往事随风</a>
/3.mp3
齐秦
往事随风


## 3 - findall

In [47]:
results = re.findall('<li.*?href="(.*?)"\ssinger="(.*?)">(.*?)</a>', html, re.S)
# print(results)
for result in results:
    print(result)
    print(result[0], result[1], result[2])

('/2.mp3', '任贤齐', '沧海一声笑')
/2.mp3 任贤齐 沧海一声笑
('/3.mp3', '齐秦', '往事随风')
/3.mp3 齐秦 往事随风
('/4.mp3', 'beyond', '光辉岁月')
/4.mp3 beyond 光辉岁月
('/5.mp3', '陈慧琳', '记事本')
/5.mp3 陈慧琳 记事本
('/6.mp3', '邓丽君', '但愿人长久')
/6.mp3 邓丽君 但愿人长久


## 4 - sub

In [48]:
content = '54aK54yr5oiR54ix5L2g'
content = re.sub('\d+', '', content)
print(content)f

aKyroiRixLg


In [51]:
content1 = '2019-12-15 12:00'
content2 = '2019-12-17 12:55'
content3 = '2019-12-22 13:21'
pattern = re.compile('\d{2}:\d{2}')
result1 = re.sub(pattern, '', content1)
result2 = re.sub(pattern, '', content2)
result3 = re.sub(pattern, '', content3)
print(result1, result2, result3)

2019-12-15  2019-12-17  2019-12-22 
