# <center>RegEx in Python</center>

![](images/memes/meme31.jpg)

# Zero-width assertions

- Characters which indicate positions rather than actual content are called **zero-width assertions**.


- For instance, the caret symbol (`^`) is a representation of the beginning of a line or the dollar sign (`$`) for the end of a line. 


- They effectively do assertion without consuming characters; they just return a positive or negative result of the match.


- A more powerful kind of **zero-width assertion** is **look around**, a mechanism with which it is possible to match a certain previous (**look behind**) or ulterior (**look ahead**) value to the current position.


# Look around


**Look around** is a simple mechanism which during the matching process, at the current position, looks forward (or behind, depends on type of lookaround used) to see if **some** pattern matches before continuing with the actual match.

The most important thing to understand here is that **look around** mechanism consists of 2 parts:
- **actual expression**: an expression whose match constitutes the final **result**.
- **non-consuming expression**: an expression whose match is evaluated before the actual expression, just to see if it can succeed. It is **not actually consumed** by the regex engine.
    - If the non-consuming match **succeeds**, the regex engine forgets about this non-consuming expression and starts evaluating the next character from the current position of the actual expression. 
    - If the non-consuming match **does not succeed**, we simply move to next character of the given text and repeat the whole match process again.

There are 2 main categories of **look around**  which, in turn, have 2 sub-categories each.

![](images/lookaround.png)

Let's explore each one of them one by one.

# Look ahead

**Look ahead** mechanism checks the match for a non-consuming expression **ahead** of a given pattern.


## Positive look ahead

- **Positive look ahead** will succeed if the passed non-consuming expression **does match** against the forthcoming input.

- The syntax is `A(?=B)` where `A` is the **actual expression** and `B` is the **non-consuming expression**. 


Let's check out an example to understand the concept. Let's assume that we want to find a match for `love` in the given text only if it is followed by `regex`.

In [1]:
import re
from utils import highlight_regex_matches

In [2]:
txt = "i love python, i love regex"

In [3]:
pattern = re.compile('love regex')

In [4]:
match = pattern.search(txt)

In [5]:
match.span()

(17, 27)

In [6]:
pattern.findall(txt)

['love regex']

In [7]:
highlight_regex_matches(pattern, txt)

i love python, i [43m[1mlove regex[0m


As we can see, a total of 10 (index 17 to 27) characters, i.e. `love regex` are consumed to search for the given pattern in the text.

Now consider the regex pattern `love(?=\sregex)`.

In [8]:
pattern = re.compile("love(?=\sregex)")

In [9]:
match = pattern.search(txt)

In [10]:
match.span()

(17, 21)

In [11]:
highlight_regex_matches(pattern, txt)

i love python, i [43m[1mlove[0m regex


Now, using **positive look ahead** mechanism, we consumed only 4 (index 17 to 21) characters are consumed for the match.

Let us check out another example to find all words in given text which are followed by `.` or `,`.

In [12]:
txt = "My favorite colors are red, green, and blue."

In [13]:
pattern = re.compile("\w+(?=,|\.)")

In [14]:
pattern.findall(txt)

['red', 'green', 'blue']

In [15]:
highlight_regex_matches(pattern, txt)

My favorite colors are [43m[1mred[0m, [43m[1mgreen[0m, and [43m[1mblue[0m.


## Negative look ahead

- **Negative look ahead** will succeed if the passed non-consuming expression **does not match** against the forthcoming input.

- The syntax is `A(?!B)` where `A` is the **actual expression** and `B` is the **non-consuming expression**. 


Let's assume that we want to find a match for `love` in the given text only if it is NOT followed by `regex`.

In [16]:
txt = "i love python, i love regex"

In [17]:
pattern = re.compile("love(?!\sregex)")

In [18]:
highlight_regex_matches(pattern, txt)

i [43m[1mlove[0m python, i love regex


![](images/memes/meme32.jpg)