# Word Boundaries

We will now learn about another special sequence that you can create using the backslash:

* `\b`

This special sequence doesn't really match a particular set of characters, but rather determines word boundaries. A word in this context is defined as a sequence of alphanumeric characters, while a boundary is defined as a white space, a non-alphanumeric character, or the beginning or end of a string. We can have boundaries either before or after a word. Let's see how this works with an example.

In the code below, our `sample_text` string contains the following sentence:

```
The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.
```

As we can see the word `class` appears in three different positions:

1. As a stand-alone word: The word `class` has white spaces both before and after it.


2. At the beginning of a word: The word `class`  in `classroom` has a white space before it.


3. At the end of a word: The word `class`  in `subclass` has a whitespace after it.

If we use `class` as our regular expression, we will match the word `class` in all three positions as shown in the code below:

In [1]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression 'class'
regex = re.compile(r'class')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>
<_sre.SRE_Match object; span=(47, 52), match='class'>
<_sre.SRE_Match object; span=(85, 90), match='class'>


We can see that we have three matches, corresponding to all the instances of the word `class` in our `sample_text` string.

Now, let's use word boundaries to only find the word `class` when it appears in particular positions. Let’s start by using `\b` to only find the word `class` when it appears at the beginning of a word. We can do this by adding `\b` before the word `class` in our regular expression as shown below:

In [2]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression '\bclass'
regex = re.compile(r'\bclass')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>
<_sre.SRE_Match object; span=(47, 52), match='class'>


We can see that now we only have two matches because it's only matching the stand-alone word, `class`, and the `class` in `classroom` since both of them have a word boundary (in this case a white space) directly before them. We can also see that it is not matching the `class` in `subclass` because there is no word boundary directly before it. 

Now, let's use `\b` to only find the word `class` when it appears at the end of a word. We can do this by adding `\b` after the word `class` in our regular expression as shown below:

In [3]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression 'class\b'
regex = re.compile(r'class\b')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>
<_sre.SRE_Match object; span=(85, 90), match='class'>


We can see that in this case we have two matches as well because it's matching the stand-alone word, `class` again, and the `class` in `subclass` since both of them have a word boundary (in this case a white space) directly after them. We can also see that it is not matching the `class` in `classroom` because there is no word boundary directly after it.

Now, let's use `\b` to only find the word `class` when it appears as a stand-alone word. We can do this by adding `\b` both before and after the word `class` in our regular expression as shown below:

In [4]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression '\bclass\b'
regex = re.compile(r'\bclass\b')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(12, 17), match='class'>


We can see that now we only have one match because the stand-alone word, `class`, is the only one that has a word boundary (in this case a white space) directly before and after it.

# TODO: Find All 3-Letter Words

In the cell below, write a regular expression that can match all 3-letter words in the `sample_text` string. As usual, save the regular expression object in a variable called `regex`. Then use the `.finditer()` method to search the `sample_text` string for the given regular expression. Finally, write a loop to print all the `matches` found by the `.finditer()` method.

In [None]:
# Import re module


# Sample text
sample_text = 'John went to the store in his car, but forgot to buy bread.'

# Create a regular expression object with the regular expression
regex = 

# Search the sample_text for the regular expression
matches = 

# Print all the matches


# Not A Word Boundary

As with the other special sequences that we saw before, we also have the uppercase version of `\b`, namely:

* `\B`

As with the other special sequences, `\B` indicates the opposite of `\b`. So if `\b` is used to indicate a word boundary, `\B` is used to indicate **not** a word boundary. Let's see how this works:

Let's use `\B` to only find the word `class` when it **doesn't** have a word boundary directly before it. We can do this by adding `\B` before the word `class` in our regular expression as shown below:

In [5]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression '\Bclass'
regex = re.compile(r'\Bclass')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(85, 90), match='class'>


We can see that we only get one match because the `class` in `subclass` is the only one that **doesn't** have a word boundary directly before it. 

Now, let's use `\B` to only find the word `class` when it **doesn't** have a word boundary directly after it. We can do this by adding `\B` after the word `class` in our regular expression as shown below:

In [6]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression 'class\B'
regex = re.compile(r'class\B')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(47, 52), match='class'>


We can see that again we only have one match because the `class` in `classroom` is the only one that **doesn't** have a boundary directly after it. 

Finally, let's use `\B` to only find the word `class` when it **doesn't** have a word boundary directly before or after it. We can do this by adding `\B` both before and after the word `class` in our regular expression as shown below:

In [7]:
# Import re module
import re

# Sample text
sample_text = 'The biology class will meet in the first floor classroom to learn about Theria, a subclass of mammals.'

# Create a regular expression object with the regular expression '\Bclass\B'
regex = re.compile(r'\Bclass\B')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

In this case, we can see that we get no matches. This is because all instances of the word `class` in our `sample_text` string, have a boundary either before or after it. In order to have a match in this case, the word `class` will have to appear in the middle of a word, such as in the word `declassified`. Let's see an example:

In [8]:
# Import re module
import re

# Sample text
sample_text = 'declassified'

# Create a regular expression object with the regular expression '\Bclass\B'
regex = re.compile(r'\Bclass\B')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(2, 7), match='class'>


# TODO: Finding Last Digits

In the cell below, our `sample_text` string contains some numbers separated by whitespace characters.

Write code that uses a regular expression to count how many numbers (greater than 3), have 3 as their last digit. For example, 93 is greater than 3 and its last digit is 3, so your code should count this number as a match. However, the number 3 by itself should not be counted as a match. 

As usual, save the regular expression object in a variable called `regex`. Then use the `.finditer()` method to search the `sample_text` string for the given regular expression. Then, write a loop to print all the `matches` found by the `.finditer()` method. Finally, print the total number of matches.

In [None]:
# Import re module

# Sample text
sample_text = '203 3 403 687 283 234 983 345 23 3 74 978'

# Create a regular expression object with the regular expression
regex = 

# Search the sample_text for the regular expression
matches = 


# Print all the matches

    
# Print the total number of matches    


If you wrote your code correctly you should get a total of 5 matches.

# Solution

[Solution notebook](word_boundaries_solution.ipynb)