# Boundaries

With regular expressions, we can use **boundaries** to define where exactly in a text a match should occur. For example, using boundary specifications, we can match a pattern only if it occurs as one word.

In [1]:
import re

## Word boundary

The `\b` character is known as a **word boundary**. In other words, we can use `\b` to ensure that a match against a pattern is only made if the matched text is a word.

In [2]:
print(re.search("apple", "I have an apple"))
print(re.search("apple", "I have a pineapple"))
print(re.search("apple", "I have some apples"))

<re.Match object; span=(10, 15), match='apple'>
<re.Match object; span=(13, 18), match='apple'>
<re.Match object; span=(12, 17), match='apple'>


In [3]:
print(re.search(r"\bapple\b", "I have an apple"))
print(re.search(r"\bapple\b", "I have a pineapple"))
print(re.search(r"\bapple\b", "I have some apples"))

<re.Match object; span=(10, 15), match='apple'>
None
None


To be more precise, `\b` defines a boundary between a **word character** and a **non-word character**. If at the location of the `\b` there is such a word boundary, then a match will be made. Otherwise, a match will not be made. Below are some examples of what are and what are not considered non-word characters. As you will see, characters in class `\w` are considered word characters, and those in `\W` are not.

In [4]:
print(re.search(r"\bpython", "i-love-python"))
print(re.search(r"\bpython", "i love python"))
print(re.search(r"\bpython", "i.love.python"))

<re.Match object; span=(7, 13), match='python'>
<re.Match object; span=(7, 13), match='python'>
<re.Match object; span=(7, 13), match='python'>


In [5]:
print(re.search(r"\bpython", "i1love1python"))
print(re.search(r"\bpython", "i_love_python"))

None
None


`\B` has the opposite behavior of `\b`. It matches a position that is not considered a word boundary.

In [6]:
print(re.search(r"\Bcity", "electricity"))
print(re.search(r"\Bcity", "velocity"))
print(re.search(r"\Bcity", "city"))

<re.Match object; span=(7, 11), match='city'>
<re.Match object; span=(4, 8), match='city'>
None


## Line boundary

The characters `^` and `$` indicate the **start** and **end** of a line/string in regex, respectively.

For example, `^ABC` would only match `ABC` if it is at the start of a string. `ABC$` would only be matched if `ABC` is at the end of a string. The two can be used in combination, of course.

In [7]:
print(re.search("^Hello world", " Hello world"))
print(re.search("Hello world$", "Hello world "))
print(re.search("^Hello world$", " Hello world "))

print(re.search("^Hello world$", "Hello world"))

None
None
None
<re.Match object; span=(0, 11), match='Hello world'>


In [8]:
pattern = re.compile(r"^\d+$")
print(pattern.search("123478435O927462"))
print(pattern.search("1234784350927462"))

None
<re.Match object; span=(0, 16), match='1234784350927462'>


## Summary

In this lesson, you learned how to use `\b`, `\B`, `^`, and `$` to indicate  the positional requirements for regex matches in terms of boundaries of words and strings.