# Simple MetaCharacters

As we indicated in a previous lesson, regular expressions use metacharacters to give special instructions. Here again is a complete list of all the metacharacters used in regular expressions:

```python
. ^ $ * + ? { } [ ] \ | ( )
```
We already learned how to use one of these metacharacters, the backslash (`\`), to create special sequences. In the following lessons we will learn how to use the remaining metacharacters to create more complicated regular expressions. 

In this notebook, we will take a look at the following metacharacters:

```python
. ^ $
```

Let’s start by looking at the dot (`.`) metacharacter.

### The Dot (`.`)

As we saw in a previous lesson, the dot (`.`) matches any character except for newline (`\n`) characters. In the code below, we will use `.` as our regular expression to find all the characters in our multi-line `sample_text` string:

In [1]:
# Import re module
import re

# Sample text
sample_text = '''
\tAlice lives in:\f
1230 First St.\r
Ocean City, MD 156789.\v
'''

# Create a regular expression object with the regular expression '.'
regex = re.compile(r'.')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(1, 2), match='\t'>
<_sre.SRE_Match object; span=(2, 3), match='A'>
<_sre.SRE_Match object; span=(3, 4), match='l'>
<_sre.SRE_Match object; span=(4, 5), match='i'>
<_sre.SRE_Match object; span=(5, 6), match='c'>
<_sre.SRE_Match object; span=(6, 7), match='e'>
<_sre.SRE_Match object; span=(7, 8), match=' '>
<_sre.SRE_Match object; span=(8, 9), match='l'>
<_sre.SRE_Match object; span=(9, 10), match='i'>
<_sre.SRE_Match object; span=(10, 11), match='v'>
<_sre.SRE_Match object; span=(11, 12), match='e'>
<_sre.SRE_Match object; span=(12, 13), match='s'>
<_sre.SRE_Match object; span=(13, 14), match=' '>
<_sre.SRE_Match object; span=(14, 15), match='i'>
<_sre.SRE_Match object; span=(15, 16), match='n'>
<_sre.SRE_Match object; span=(16, 17), match=':'>
<_sre.SRE_Match object; span=(17, 18), match='\x0c'>
<_sre.SRE_Match object; span=(19, 20), match='1'>
<_sre.SRE_Match object; span=(20, 21), match='2'>
<_sre.SRE_Match object; span=(21, 22), match='3'>
<_sre.SRE_Mat

As we can see, we were able to match all the characters in our `sample_text` string, except for newline characters.

### The Caret (`^`)

The caret (`^`) is used to match a sequence of characters when they appear at the beginning of a string. Let's take a look at an example.

In the code below, our `sample_text` string has the word `this` written twice:

```
this watch belongs in this box.
```

As we can see, the first instance of the word `this` occurs at the beginning of the string; while the second instance of the word `this` occurs towards the end of the string.

If we use `this` as our regular expression, we will match both instances of the word as shown in the code below:

In [2]:
# Import re module
import re

# Sample text
sample_text = 'this watch belongs in this box.'

# Create a regular expression object with the regular expression 'this'
regex = re.compile(r'this')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(0, 4), match='this'>
<_sre.SRE_Match object; span=(22, 26), match='this'>


We can clearly see that we get two matches that correspond to both instances of the word `this` in our `sample_text` string.

Now, let's use the caret to only find the word `this` that appears at the beginning of the string. We can do this by adding the caret (`^`) before the word `this` in our regular expression as shown below:

In [3]:
# Import re module
import re

# Sample text
sample_text = 'this watch belongs in this box.'

# Create a regular expression object with the regular expression '^this'
regex = re.compile(r'^this')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(0, 4), match='this'>


We can see that now, we only get one match, corresponding to the word `this` that appears at the beginning of the string. It didn't match the second instance of word `this` because it wasn't at the beginning of our `sample_text` string.

### The Dollar Sign (`$`)

The dollar sign (`$`) is used to match a sequence of characters when they appear at the end of a string. Let's take a look at an example.

In the code below, our `sample_text` string has the word `watch` written twice:

```
this watch is better than this watch
```

As we can see, the first instance of the word `watch` occurs towards the beginning of the string; while the second instance of the word `watch` occurs at the end of the string.

If we use `watch` as our regular expression, we will match both instances of the word as shown in the code below:

In [4]:
# Import re module
import re

# Sample text
sample_text = 'this watch is better than this watch'

# Create a regular expression object with the regular expression 'watch'
regex = re.compile(r'watch')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(5, 10), match='watch'>
<_sre.SRE_Match object; span=(31, 36), match='watch'>


We can clearly see that we get two matches that correspond to both instances of the word `watch` in our `sample_text` string.

Now, let's use the dollar sign to only find the word `watch` that appears at the end of the string. We can do this by adding the dollar sign (`$`) after the word `watch` in our regular expression as shown below:

In [5]:
# Import re module
import re

# Sample text
sample_text = 'this watch is better than this watch'

# Create a regular expression object with the regular expression 'watch$'
regex = re.compile(r'watch$')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<_sre.SRE_Match object; span=(31, 36), match='watch'>


We can see that now, we only get one match, corresponding to the word `watch` that appears at the end of the string. It didn't match the first instance of word `watch` because it wasn't at the end of our `sample_text` string.

# Character Sets

In this lesson, we will continue to look at metacharacters. In particular, we will learn how to look for phone numbers by employing the following metacharacters:

```python
{} []
```

### Finding Phone Numbers

In the code below, our `sample_text` consists of a multi-line string that mimics a phone book:

```
Mr. Brown: 555-123-4567
Mrs. Smith: 455 555 4549
Mr. Jackson: 655-777-7346
Ms. Wilson: (555)999-8464
```

We can notice that even though all the phone numbers have different digits, they all have the same pattern, namely, 3 digits followed by a single character, followed by 3 more digits, followed by another single character, followed by 4 digits. We will take advantage of this pattern to create a regular expression that can match all these phone numbers. To do this, we will use the special sequence `\d` and the dot (`.`) in our regular expression, as shown in the code below:

In [1]:
# Import re module
import re

# Sample text
sample_text = '''
Mr. Brown: 555-123-4567
Mrs. Smith: 455 555 4549
Mr. Jackson: 655-777-7346
Ms. Wilson: (555)999-8464
'''

# Create a regular expression object with a regular expression that can match all the
# phone numbers in our sample_text
regex = re.compile(r'\d\d\d.\d\d\d.\d\d\d\d')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<re.Match object; span=(12, 24), match='555-123-4567'>
<re.Match object; span=(37, 49), match='455 555 4549'>
<re.Match object; span=(63, 75), match='655-777-7346'>
<re.Match object; span=(89, 101), match='555)999-8464'>


We can see that we managed to find all the phone numbers in our multi-line string even though, they all have different digits and different characters in between the groups of numbers. Notice that by using the dot we were able to match either the dash (`-`), the white space (` `), and the parenthesis `)` separating the groups of numbers. By using the dot we avoid having to create three different regular expressions to match the three possible characters separating the groups of numbers.

Now we can write the above regular expression in a more compact form by using the `{ }` metacharacters. The sequence `{m}` specifies that exactly `m` copies of the previous regular expression should be matched. For example, the sequence `\d{3}` specifies that exactly `3` copies of the `\d` regular expression should be matched. Therefore, the sequence `\d{3}` is equivalent to the sequence ` \d\d\d`.

Consequently, we can employ the `{}` metacharacters to write the previous code in a more compact form, as shown below:

In [3]:
# Import re module
import re

# Sample text
sample_text = '''
Mr. Brown: 555-123-4567
Mrs. Smith: 455 555 4549
Mr. Jackson: 655-777-7346
Ms. Wilson: (555)999-8464
'''

# Create a regular expression object with a regular expression that can match all the
# phone numbers in our sample_text using the {} metacharacters
regex = re.compile(r'\d{3}.\d{3}.\d{4}')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<re.Match object; span=(12, 24), match='555-123-4567'>
<re.Match object; span=(37, 49), match='455 555 4549'>
<re.Match object; span=(63, 75), match='655-777-7346'>
<re.Match object; span=(89, 101), match='555)999-8464'>


As we can see, we get the same result as before.

### Finding Phone Numbers With Specific Separators

Now let's suppose we only wanted to find phone numbers in which the groups of digits were separated by either a dash (`-`) or a white space (` `). In this case we can use what is known as a **Character Set**. Character sets are specified using the `[]` metacharacters and are used to indicate a set of characters that you wish to match. Let’s see an example.

In the code below, we employ the character set `[-  ]` (notice that there is a whitespace after the dash) in our regular expression to only match phone numbers whose groups of numbers are separated by either a dash (`-`) or a white space (` `):

In [4]:
# Import re module
import re

# Sample text
sample_text = '''
Mr. Brown: 555-123-4567
Mrs. Smith: 455 555 4549
Mr. Jackson: 655-777-7346
Ms. Wilson: (555)999-8464
'''

# Create a regular expression object with a regular expression that can match all the
# phone numbers that have either a dash or a white space as a separator
regex = re.compile(r'\d{3}[- ]\d{3}[- ]\d{4}')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

<re.Match object; span=(12, 24), match='555-123-4567'>
<re.Match object; span=(37, 49), match='455 555 4549'>
<re.Match object; span=(63, 75), match='655-777-7346'>


We can clearly see that now, we only match the phone numbers that have either a dash (`-`) or a white space (` `) as a separator. Notice, the last phone number is not matched because even though the last group of numbers is separated by a dash (`-`), the first group of numbers is separated by a parenthesis `)` which is not in our character set.

It is important to note that even though a character set can have many characters, it only matches one of those characters at a time. For example, suppose I added a white space after the dash in Mr. Brown's phone number, as shown below:

In [5]:
# Import re module
import re

# Sample text
sample_text = '''
Mr. Brown: 555- 123- 4567
'''

# Create a regular expression object with a regular expression that can match all the
# phone numbers that have either a dash or a white space as a separator
regex = re.compile(r'\d{3}[- ]\d{3}[- ]\d{4}')

# Search the sample_text for the regular expression
matches = regex.finditer(sample_text)

# Print all the matches
for match in matches:
    print(match)

We can see that now, we get no matches. This is because the character set `[-  ]`, used in our regular expression, is only matching one of those characters at a time.  In other words, in order to get a match there must be either a dash **or** a white space separating the groups of numbers but **not** both.

### Finding Phone Numbers With Specific Separators and Area Codes

Let's see another example of a character set. Now, let's suppose we only wanted to find phone numbers in which the groups of digits were separated by either a dash or a white space, and that have area code `455` or `655`. Since all the area codes in our `sample_text` end in 55:

```
Mr. Brown: 555-123-4567
Mrs. Smith: 455 555 4549
Mr. Jackson: 655-777-7346
Ms. Wilson: (555)999-8464
```

Then, in order to find all the phone numbers that have area code `455` or `655`, we only need to indicate that the first digit in the area code must be either a `4` or a `6`. 

To do this, we can use the character set `[46]` in our regular expression to indicate that the first number should be either a `4` or a `6`, as shown in the code below: