In [1]:
import re

**What's a matching group?**

Like you use parentheses to structure mathematical expressions, **(2 + 2) * 2** versus **2 + (2 * 2)**, you use parentheses to structure regular expressions.

An example regex that does this is **'a(b|c)'**.

The whole content enclosed in the opening and closing parentheses is called matching group (or capture group). You can have multiple matching groups in a single regex. And you can even have hierarchical matching groups, for example **'a(b|(cd))'**.

One big advantage of a matching group is that it captures the matched substring. You can retrieve it in other parts of the regular expression---or after analyzing the result of the whole regex matching.

Let's have a short example for the most basic use of a matching group---to structure the regex.

Say you create regex b?(a.)* with the matching group (a.) that matches all patterns starting with zero or one occurrence of character 'b' and an arbitrary number of two-character-sequences starting with the character 'a'.

Hence, the strings **'bacacaca', 'aaaa', ''** (the empty string), and **'Xababababab'** all match your regex.

The use of the parentheses for structuring the regular expression is intuitive and should come naturally to you because the same rules apply as for arithmetic operations. However, there's a more advanced use of regex groups: **retrieval**.

You can retrieve the matched content of each matching group. So the next question naturally arises:

**How to Get the First Matching Group?**

There are two scenarios when you want to access the content of your matching groups:

 1. Access the matching group in the regex pattern to reuse partially matched text from one group somewhere else.
 2. Access the matching group after the whole match operation to analyze the matched text in your Python code.

In the first case, you simply get the first matching group with the **\number** special sequence. For example, to get the first matching group, you'd use the special sequence.

In [2]:
re.search(r'(j.n) is ','jon is jon')

<re.Match object; span=(0, 7), match='jon is '>

You'll use this feature a lot because it gives you much more expression power: for example, you can search for a name in a text and then process specifically this name in the rest of the text (and not all other names that would also fit the pattern).

Note that the numbering of the groups start with 1 and not with 0 -- a rare exception to the rule that in programming, all numbering starts with 0.

In the second case, you want to know the contents of the first group after the whole match. How do you do that?

The answer is also simple: use the m.group(0) method on the matching object m. Here's an example:

In [3]:
m = re.search(r'(j.n)','jon is jon')

In [4]:
m.group(1)

'jon'

The numbering works consistently with the previously introduced regex group numbering: start with identifier 1 to access the contents of the first group.

**How to Get All Other Matching Groups?**

Again, there are two different intentions when asking this question:

 1. Access the matching group in the regex pattern to reuse partially matched text from one group somewhere else.
 2. Access the matching group after the whole match operation to analyze the matched text in your Python code.

In the first case, you use the special sequence to access the second matching group, to access the third matching group, and 9 to access the ninety-ninth matching group.

Here's an example:


In [5]:
re.search(r'(j..) (j..)\s+', 'jon jim jim')

<re.Match object; span=(0, 8), match='jon jim '>

In [6]:
re.search(r'(j..) (j..)\s+', 'jon jim jon')

<re.Match object; span=(0, 8), match='jon jim '>

As you can see, the special sequence refers to the matching contents of the second group **'jim'**.

In the second case, you can simply increase the identifier too to access the other matching groups in your Python code:

In [7]:
m =  re.search(r'(j..) (j..)\s+', 'jon jim jim')

In [8]:
m.group(0)

'jon jim '

In [9]:
m.group(1)

'jon'

In [10]:
m.group(2)

'jim'

This code also shows an interesting feature: if you use the identifier 0 as an argument to the **m.group(0)** method, the regex module will give you the contents of the whole match. You can think of it as the first group being the whole match.