In [2]:
import re

### What's the Python Re + Quantifier?

Say, you have any regular expression A. The regular expression (regex) A+ then matches one or more occurrences of A.

I call the + symbol the **at-least-once quantifier** because it requires at least one occurrence of the preceding regex.

For example, the regular expression 'yes+' matches strings 'yes', 'yess', and 'yesssssss'. But it does neither match the string 'ye', nor the empty string '' because the plus quantifier + does not apply to the whole regex 'yes' but only to the preceding regex 's'.

Let's study some examples to help you gain a deeper understanding.


### Greedy Plus (+) Quantifiers

In [3]:
re.findall('a+b', 'aaaaaab')

['aaaaaab']

You use the re.findall() method that finds all occurrences of the pattern in the string and returns a list of all matching substrings.

The first argument is the regular expression pattern 'a+b' and the second argument is the string to be searched. In plain English, you want to find all patterns in the string that start with at least one, but possibly many, characters 'a', followed by the character 'b'.

The findall() method returns the matching substring: 'aaaaaab'.

**The asterisk quantifier + is greedy**. This means that it tries to match as many occurrences of the preceding regex as possible.

So in our case, it wants to match as many arbitrary characters as possible so that the pattern is still matched. Therefore, the regex engine "consumes" the whole sentence.


In [4]:
re.findall('ab+', 'aaaaaabb') 

['abb']

You search for the character 'a' followed by at least one character 'b'. As the plus (+) quantifier is greedy, it matches as many 'b's as it can lay its hands on.

### Non-Greedy Plus (+) Quantifiers

But what if you want to match at least one occurrence of a regex in a non-greedy manner? 
And you don't want the regex engine to consume more and more as long as it can but returns as quickly as it can from the processing.

In [5]:
re.findall('ab+', 'aaaaaabbbbb')

['abbbbb']

The regex engine starts with the first character 'a' and finds that it's a partial match. So, it moves on to match the second 'a'---which violates the pattern 'ab+' that allows only for a single character 'a'.

So it moves on to the third character, and so on, until it reaches the last character 'a' in the string 'aaaaaabbbbb'. It's a partial match, so it moves on to the first occurrence of the character 'b'. It realizes that the 'b' character can be matched by the part of the regex 'b+'. Thus, the engine starts matching 'b's. And it greedily matches 'b's until it cannot match any further character.

At this point it looks at the result and sees that it has found a matching substring which is the result of the operation.

However, it could have stopped far earlier to produce a non-greedy match after matching the first character 'b'. Here's an example of the non-greedy quantifier '+?' (both symbols together form one regex expression).

In [6]:
re.findall('ab+?', 'aaaaaabbbbb')

['ab']

In [9]:
re.findall('ab+?', 'aaaaaabcccabbbb')

['ab', 'ab']

Now, the regex engine does not greedily "consume" as many 'b' characters as possible. Instead, it stops as soon as the pattern is matched (non-greedy).