# <center>RegEx in Python</center>

![](images/memes/meme23.jpg)

# Grouping

> Frequently you need to obtain more information than just whether the regex pattern matched or not.

By placing part of a regular expression inside round brackets or parentheses `(`, `)`, you can **group that part** of the regex pattern together.

### Applications of grouping:

#### 1. apply a quantifier to the entire group.

For example, `(ab)+` will match one or more repetitions of `ab`.

In [1]:
import re
from utils import highlight_regex_matches

In [2]:
txt = "abbbbbabbbb"

In [3]:
pattern1 = re.compile("ab+")
pattern2 = re.compile("(ab)+")

In [4]:
highlight_regex_matches(pattern1, txt)

[43m[1mabbbbb[0m[43m[1mabbbb[0m


In [5]:
highlight_regex_matches(pattern2, txt)

[43m[1mab[0mbbbb[43m[1mab[0mbbb


#### 2. restrict alternation to part of the regex.

For example, `my name is ram|sam` will match `my name is ram` and `sam` whereas `my name is (ram|sam)` will match `my name is ram` and `my name is sam`.

In [6]:
txt = """
my name is ram
my name is sam
"""

In [7]:
pattern1 = re.compile("my name is ram|sam")
pattern2 = re.compile("my name is (ram|sam)")

In [8]:
highlight_regex_matches(pattern1, txt)


[43m[1mmy name is ram[0m
my name is [43m[1msam[0m



In [9]:
highlight_regex_matches(pattern2, txt)


[43m[1mmy name is ram[0m
[43m[1mmy name is sam[0m



#### 3. capture the text matched by group.

- Groups indicated with `(`, `)` also capture the **starting** and **ending** index of the text that they match.

- Groups can be retrieved by passing an argument to `group()`, `start()`, `end()`, and `span()` of the `Match` object. 

- Groups are numbered starting with `0`. 

- Group `0` is always present; it captures the whole regex pattern, so all `Match` object methods have group `0` as their default argument.

Consider an example where we want to parse a date and determine day, month and year.

In [10]:
txt = "12/02/2019" 

In [11]:
pattern = re.compile("(\d{2})\/(\d{2})\/(\d{4})")

In [12]:
match = pattern.match(txt)

In [13]:
# group 0: matches entire regex pattern
match.group(0)

'12/02/2019'

In [14]:
# group 1: match 1st group
match.group(1)

'12'

In [15]:
match.group(2)

'02'

In [16]:
match.group(3)

'2019'

In [17]:
day, month, year = match.groups()

In [18]:
day, month, year

('12', '02', '2019')

Let's try one more example of group capturing. 

In the given text, find all the patterns with `Name: <some-name>` and extract `<some-name>`. 

In [20]:
txt = """
Name: Nikhil
Age: 0
Roll No.: 15
Grade: S

Name: Ravi
Age: -1
Roll No.: 123
Grade: K

Name: Ram
Age: N/A
Roll No.: 1
Grade: G
"""

In [21]:
pattern = re.compile("Name: (.+)\n")

In [22]:
pattern.findall(txt)

['Nikhil', 'Ravi', 'Ram']

> Parentheses cannot be used inside character classes, at least not as metacharacters. When you put a parenthesis in a character class, it is treated as a literal character. So the regex `[(a)b]` matches `a`, `b`, `(`, and `)`.

![](images/memes/meme24.jpg)