正则表达式，又称规则表达式（英语：Regular Expression，在代码中常简写为 RegEx、RegExp 或 RE），是计算机科学的一个概念。正则表达式通常被用来检索、替换那些符合某个模式（规则）的文本。

正则表达式这个概念最初是由 Unix 中的工具软件（例如 sed 和 grep ）普及开的。

正则表达式的功能：

- 匹配
- 替换

收藏帖子如下:
[正则表达式语法](https://blog.csdn.net/JankinChan/article/details/88780054)

## regular expression 语法符号部分
~~~
^      start
$      stop
.      Any character
*      match one character 0+times
+      match one character 1+times
?      non-greedy
\s     whitespace
\S     non-whitespace 不含空格
[abc]  match one character in the specified set
[^abc] match one character not in the specified set
~~~

### 一个str的n中正则表达式查找

In [1]:
import re
my_string='15 send an email from this@email.com to test@user.com 3466 times.'

result0 = re.findall('[abcd]', my_string)
print('result0:\n',result0,end='\n\n')

result1 = re.findall('[0-9]+', my_string) #可以尝试各类符号出来不一样的东西
print('result1:\n',result1,end='\n\n')

result2 = re.findall('[0-9].', my_string) #可以尝试各类符号出来不一样的东西
print('result2:\n',result2,end='\n\n')

result3 = re.findall('[^0-9].', my_string) #可以尝试各类符号出来不一样的东西
print('result3:\n',result3,end='\n\n')

result4 = re.findall('[^a-z]+', my_string) #可以尝试各类符号出来不一样的东西
print('result4:\n',result4,end='\n\n')

result5 = re.findall('[a-z]+', my_string) #可以尝试各类符号出来不一样的东西
print('result5:\n',result5,end='\n\n')

result6 = re.findall('\s+', my_string) #可以尝试各类符号出来不一样的东西
print('result6:\n',result6,end='\n\n')

result7 = re.findall('\S+', my_string) #可以尝试各类符号出来不一样的东西
print('result7:\n',result7,end='\n\n')

result8 = re.findall('\S+@\S+', my_string) #可以尝试各类符号出来不一样的东西
print('result8:\n',result8,end='\n\n')

result0:
 ['d', 'a', 'a', 'a', 'c', 'c']

result1:
 ['15', '3466']

result2:
 ['15', '34', '66']

result3:
 [' s', 'en', 'd ', 'an', ' e', 'ma', 'il', ' f', 'ro', 'm ', 'th', 'is', '@e', 'ma', 'il', '.c', 'om', ' t', 'o ', 'te', 'st', '@u', 'se', 'r.', 'co', 'm ', ' t', 'im', 'es']

result4:
 ['15 ', ' ', ' ', ' ', ' ', '@', '.', ' ', ' ', '@', '.', ' 3466 ', '.']

result5:
 ['send', 'an', 'email', 'from', 'this', 'email', 'com', 'to', 'test', 'user', 'com', 'times']

result6:
 [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

result7:
 ['15', 'send', 'an', 'email', 'from', 'this@email.com', 'to', 'test@user.com', '3466', 'times.']

result8:
 ['this@email.com', 'test@user.com']



### 一个str的 compile用法

In [86]:
import re
str = 'The quick brown fox jumps over the lazy dog'
pttn = re.compile(r'\wo\w')
re.findall(pttn, str)

['row', 'fox', 'dog']

| 排列 |         原子与操作符优先级      |（从高到低）|
|---|-----------------------------------|------------------------|
| 1 | 转义符号 (Escaping Symbol)               | `\` |
| 2 | 分组、捕获 (Grouping or Capturing)                          | `(...)` `(?:...)` `(?=...)` `(?!...)` `(?<=...)` `(?<!...)`     |
| 3 | 数量 (Quantifiers)      | `a*` `a+` `a?` `a{n, m}` |
| 4 | 序列与定位（Sequence and Anchor）| `abc` `^` `$` `\b` `\B`               |
| 5 | 或（Alternation）| <code>a&#124;b&#124;c</code>                   |
| 6 | 原子 (Atoms)                 | `a` `[^abc]` `\t` `\r` `\n` `\d` `\D` `\s` `\S` `\w` `\W` `.` |


### 开头+中间随机内容正则
标示集合原子，使用方括号 `[]`。`[abc]` 的意思是说，“`a` or `b` or `c`”，即，`abc` 中的任意一个字符。

比如，[`beg[iau]n`](https://regexper.com#beg[iau]n) 能够代表 `begin`、`began`，以及 `begun`。

In [2]:
import re

str = 'begin began begun bigins begining begsn'
pttn = r'beg[iaus]n'
re.findall(pttn, str)

['begin', 'began', 'begun', 'begin', 'begsn']

## 正则表达式在shakespeare文档应用案例


### 打开文档,找A开头的句子

In [77]:
text= open ('shakespeare.txt')
for line in text:
#     print(line)
    line=line.rstrip()
#   + 是匹配一个或多个
    if re.search('^A.+$',line):
        print(line)

Act 2, Scene 2
Arise, fair sun, and kill the envious moon,
And none but fools do wear it; cast it off.
As daylight doth a lamp; her eyes in heaven
Ay me!
As glorious to this night, being o'er my head
As is a winged messenger of heaven
And sails upon the bosom of the air.
And I'll no longer be a Capulet.
And for that name which is no part of thee
Art thou not Romeo and a Montague?
And the place death, considering who thou art,
And what love can do that dares love attempt;
Alack, there lies more peril in thine eye
And I am proof against their enmity.
And but thou love me, let them find me here:
As that vast shore wash'd with the farthest sea,
And I will take thy word: yet if thou swear'st,
And therefore thou mayst think my 'havior light:
And not impute this yielding to light love,
And I'll believe thee.
And yet I would it were to give again.
And yet I wish but for the thing I have:
Anon, good nurse! Sweet Montague, be true.
And all my fortunes at thy foot I'll lay
And follow thee my lord

### 打开文档,找M开头的,中间有o的句子

In [59]:
text= open ('shakespeare.txt')
for line in text:
    line=line.rstrip()
    if re.findall('^M.+o.+,$',line):
        print(line)

My name, dear saint, is hateful to myself,
My true love's passion: therefore pardon me,
My bounty is as boundless as the sea,
My love as deep; the more I give to thee,


### 打开文档,找A开头的,中建有again的句子

In [60]:
text= open ('shakespeare.txt')
for line in text:
    line=line.rstrip()
    if re.search('^A.+again.+$',line):
        print(line)

And I am proof against their enmity.
And yet I would it were to give again.
And with a silk thread plucks it back again,


### 打开文档,找A-Z开头的,长3-8个长度的内容
#### 但这样写只能找到全部大写的,小写的找不到

In [105]:
text= open ('shakespeare.txt')
for line in text:
    line=line.rstrip()
    if re.search('^[A-Z]{3,8}$',line):
        print(line)

ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
JULIET
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
EXIT
AAA
ABC
BCD
CCC
FSDDF
GGDGDG


#### 这样写能找到全部大小写的
条件写在前边的中括号中

In [109]:
text= open ('shakespeare.txt')
for line in text:
    line=line.rstrip()
    if re.search('^[A-Z,a-z,0-9]{1,8}$',line):
        print(line)

99
3242
T800
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
Nurse
JULIET
Nurse
JULIET
ROMEO
JULIET
ROMEO
Retiring
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
JULIET
ROMEO
Exit
Exit
EXIT
AAA
ABC
Abc
BCD
CCC
FSDDF
GGDGDG
