### Section 74.1: Matching the beginning of a string

In [2]:
import re
pattern = r"123"
string = "123zzb"
re.match(pattern, string)

<_sre.SRE_Match object; span=(0, 3), match='123'>

In [3]:
match = re.match(pattern, string)
match.group()

'123'

1. You may notice that the pattern variable is a string prefixed with r , which indicates that the string is a raw string
literal.
2. A raw string literal has a slightly different syntax than a string literal, namely a backslash \ in a raw string literal
means "just a backslash" and there's no need for doubling up backlashes to escape "escape sequences" such as
newlines ( \n ), tabs ( \t ), backspaces ( \ ), form-feeds ( \r ), and so on. In normal string literals, each backslash must be
doubled up to avoid being taken as the start of an escape sequence.
3. Hence, r"\n" is a string of 2 characters: \ and n . Regex patterns also use backslashes, e.g. \d refers to any digit
character. We can avoid having to double escape our strings ( "\\d" ) by using raw strings ( r"\d" ).

In [4]:
string = "\\t123zzb" # here the backslash is escaped, so there's no tab, just '\' and 't'
pattern = "\\t123" # this will match \t (escaping the backslash) followed by 123
re.match(pattern, string).group() # no match

AttributeError: 'NoneType' object has no attribute 'group'

In [5]:
re.match(pattern, "\t123zzb").group()

'\t123'

In [6]:
pattern = r"\\t123"
re.match(pattern, string).group()

'\\t123'

In [7]:
match = re.match(r"(123)", "a123zzb")
match is None

True

In [8]:
match = re.search(r"(123)", "a123zzb")

In [9]:
match.group()

'123'

### Section 74.2: Searching

In [10]:
pattern = r"(your base)"
sentence = "All your base are belong to us."
match = re.search(pattern, sentence)
match.group(1)

'your base'

In [11]:
match = re.search(r"(belong.*)", sentence)
match.group(1)

'belong to us.'

In [12]:
match = re.search(r"^123", "123zzb")
match.group(0)

'123'

In [13]:
match = re.search(r"^123", "a123zzb")
match is None

True

In [14]:
match = re.search(r"123$", "zzb123")
match.group(0)

'123'

In [15]:
match = re.search(r"123$", "123zzb")
match is None

True

In [16]:
match = re.search(r"^123$", "123")
match.group(0)

'123'

### Section 74.3: Precompiled patterns

In [17]:
import re
precompiled_pattern = re.compile(r"(\d+)")
matches = precompiled_pattern.search("The answer is 41!")
matches.group(1)

'41'

In [18]:
matches

<_sre.SRE_Match object; span=(14, 16), match='41'>

In [24]:
matches.group(0)

'41'

In [25]:
matches = precompiled_pattern.search("Or was it 42?")
matches.group(1)

'42'

In [26]:
import re
precompiled_pattern = re.compile(r"(.*\d+)")
matches = precompiled_pattern.match("The answer is 41!")
print(matches.group(1))
# Out: The answer is 41
matches = precompiled_pattern.match("Or was it 42?")
print(matches.group(1))

The answer is 41
Or was it 42


### Section 74.4: Flags

In [27]:
m = re.search("b", "ABC")
m is None

True

In [28]:
m = re.search("b", "ABC", flags=re.IGNORECASE)
m.group()

'B'

In [29]:
m = re.search("a.b", "A\nBC", flags=re.IGNORECASE)
m is None

True

In [30]:
m = re.search("a.b", "A\nBC", flags=re.IGNORECASE|re.DOTALL)
m.group()

'A\nB'

Inline flags
From the docs:
> (?iLmsux) (One or more letters from the set 'i', 'L', 'm', 's', 'u', 'x'.)

> The group matches the empty string; the letters set the corresponding flags: re.I (ignore case), re.L (locale
> dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode dependent), and re.X (verbose), for the
> entire regular expression. This is useful if you wish to include the flags as part of the regular expression,
> instead of passing a flag argument to the re.compile() function.
> Note that the (?x) flag changes how the expression is parsed. It should be used first in the expression
> string, or after one or more whitespace characters. If there are non-whitespace characters before the flag,
> the results are undefined.

### Section 74.5: Replacing

In [31]:
re.sub(r"t[0-9][0-9]", "foo", "my name t13 is t44 what t99 ever t44")

'my name foo is foo what foo ever foo'

In [32]:
re.sub(r"t([0-9])([0-9])", r"t\2\1", "t13 t19 t81 t25")

't31 t91 t18 t52'

In [33]:
re.sub(r"t([0-9])([0-9])", r"t\g<2>\g<1>", "t13 t19 t81 t25")

't31 t91 t18 t52'

In [34]:
items = ["zero", "one", "two"]
re.sub(r"a\[([0-3])\]", lambda match: items[int(match.group(1))], "Items: a[0], a[1], something,a[2]")

'Items: zero, one, something,two'

### Section 74.6: Find All Non-Overlapping Matches

In [35]:
re.findall(r"[0-9]{2,3}", "some 1 text 12 is 945 here 4445588899")

['12', '945', '444', '558', '889']

In [36]:
results = re.finditer(r"([0-9]{2,3})", "some 1 text 12 is 945 here 4445588899")
print(results)

<callable_iterator object at 0x000001BE14189080>


In [37]:
for result in results:
    print(result.group(0))

12
945
444
558
889


### Section 74.7: Checking for allowed characters

In [38]:
import re
def is_allowed(string):
    characherRegex = re.compile(r'[^a-zA-Z0-9.]')
    string = characherRegex.search(string)
    return not bool(string)
print (is_allowed("abyzABYZ0099"))

True


In [40]:
print (is_allowed("#*@#$%^123"))

False


### Section 74.8: Splitting a string using regular expressions

In [41]:
import re
data = re.split(r'\s+', 'James 94 Samantha 417 Scarlett 74')
print( data )

['James', '94', 'Samantha', '417', 'Scarlett', '74']


### section 74.9: Grouping

Grouping is done with parentheses. Calling group() returns a string formed of the matching parenthesized
subgroups

In [43]:
match.group() # Group without argument returns the entire match found
# Out: '123'

'123'

In [44]:
match.group(0) # Specifying 0 gives the same result as specifying no argument
# Out: '123'

'123'

In [47]:
sentence = "This is a phone number 672-123-456-9910"
pattern = r".*(phone).*?([\d-]+)"
match = re.match(pattern, sentence)
match.groups()

('phone', '672-123-456-9910')

In [49]:
match.group()

'This is a phone number 672-123-456-9910'

In [50]:
match.group(0)

'This is a phone number 672-123-456-9910'

In [51]:
match.group(1)

'phone'

In [52]:
match.group(2)

'672-123-456-9910'

In [54]:
match.group(1, 2)

('phone', '672-123-456-9910')

##### Named groups

In [55]:
match = re.search(r'My name is (?P<name>[A-Za-z ]+)', 'My name is John Smith')
match.group('name')

'John Smith'

In [56]:
match.group(1)

'John Smith'

#### Non-capturing groups
Using (?:) creates a group, but the group isn't captured. This means you can use it as a group, but it won't pollute
your "group space".

In [57]:
re.match(r'(\d+)(\+(\d+))?', '11+22').groups()

('11', '+22', '22')

In [58]:
re.match(r'(\d+)(?:\+(\d+))?', '11+22').groups()

('11', '22')

### Section 74.10: Escaping Special Characters

In [59]:
match = re.search(r'[b]', 'a[b]c')
match.group()

'b'

In [60]:
match = re.search(r'\[b\]', 'a[b]c')
match.group()

'[b]'

In [61]:
re.escape('a[b]c')

'a\\[b\\]c'

In [62]:
match = re.search(re.escape('a[b]c'), 'a[b]c')
match.group()

'a[b]c'

In [63]:
username = 'A.C.' # suppose this came from the user
re.findall(r'Hi {}!'.format(username), 'Hi A.C.! Hi ABCD!')

['Hi A.C.!', 'Hi ABCD!']

In [64]:
re.findall(r'Hi {}!'.format(re.escape(username)), 'Hi A.C.! Hi ABCD!')

['Hi A.C.!']

### Section 74.11: Match an expression only in specific locations

1. Often you want to match an expression only in specific places (leaving them untouched in others, that is). Consider
the following sentence:
2. An apple a day keeps the doctor away (I eat an apple everyday).
3. Here the "apple" occurs twice which can be solved with so called backtracking control verbs which are supported by
**the newer regex module**. The idea is:
forget_this | or this | and this as well | (but keep this)
4. With our apple example, this would be:

In [73]:
import regex as re
string = "An apple a day keeps the doctor away (I eat an apple everyday)."
rx = re.compile(r'''
    \([^()]*\) (*SKIP)(*FAIL) # match anything in parentheses and "throw it away"
    |                         # or
    apple                     # match an apple
    ''', re.VERBOSE)
apples = rx.findall(string)
print(apples)

ModuleNotFoundError: No module named 'regex'

In [None]:
没有找到regex 这个新的module

### Section 74.12: Iterating over matches using `re.finditer`

In [71]:
import re
text = 'You can try to find an ant in this string'
pattern = 'an?\w' # find 'an' either with or without a following word character
for match in re.finditer(pattern, text):
    # Start index of match (integer)
    sStart = match.start()
    # Final index of match (integer)
    sEnd = match.end()
    # Complete match (string)
    sGroup = match.group()
    # Print match
    print('Match "{}" found at: [{},{}]'.format(sGroup, sStart,sEnd))

Match "an" found at: [5,7]
Match "an" found at: [20,22]
Match "ant" found at: [23,26]
