## **Why use regular expressions?**
### **Introduction**

This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
```

```python
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
index = log.index("[")
print(log[index+1:index+6])
```

```python
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
print(result[1])
```

### **About this code**

Here the re module is used which lets us use the search function to find regular expressions inside strings. Then, a regular expression is defined as 

```python
r"\[(\d+)\]"
```

This regular expression matches a string enclosed in square brackets followed by one or more digits. Then, it uses the `re.search()` function to search the string log for a match to the regular expression. The `re.search()` function returns a Match object if a match is found, or None if no match is found. the `re.search()` function returns a Match object because the string log contains a match to the regular expression. The Match object has a `group()` method that returns the captured groups from the match. In this case, the only captured group is the number, which is returned by the `result[1]` expression.

## **Simple Matching**
### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.


```python
import re
result = re.search(r"aza", "plaza")
print(result)
```
**`<_sre.SRE_Match object; span=(2, 5), match='aza'>`**


```python
import re
result = re.search(r"aza", "bazaar")
print(result)
```
**`<_sre.SRE_Match object; span=(1, 4), match='aza'>`**


```python
import re
result = re.search(r"aza", "maze")
print(result)
print(re.search(r"^x", "xenon"))
```
**`None`**
**`<_sre.SRE_Match object; span=(0, 1), match='x'>`**


```python
import re
print(re.search(r"p.ng", "penguin"))
```
**`<_sre.SRE_Match object; span=(0, 4), match='peng'>`**


```python
import re
print(re.search(r"p.ng", "clapping"))
print(re.search(r"p.ng", "sponge"))
```
**`<_sre.SRE_Match object; span=(4, 8), match='ping'>`**
**`<_sre.SRE_Match object; span=(1, 5), match='pong'>`**


```python
import re
print(re.search(r"p.ng", "Pangaea", re.IGNORECASE))
```
**`<_sre.SRE_Match object; span=(0, 4), match='Pang'>`**

## **Wildcards and character classes**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
print(re.search(r"[Pp]ython", "Python"))
```
**`<_sre.SRE_Match object; span=(0, 6), match='Python'>`**


```python
import re
print(re.search(r"[a-z]way", "The end of the highway"))
print(re.search(r"[a-z]way", "What a way to go"))
print(re.search("cloud[a-zA-Z0-9]", "cloudy"))
print(re.search("cloud[a-zA-Z0-9]", "cloud9"))
<_sre.SRE_Match object; span=(18, 22), match='hway'>
```
**`None`**
**`<_sre.SRE_Match object; span=(0, 6), match='cloudy'>`**
**`<_sre.SRE_Match object; span=(0, 6), match='cloud9'>`**


```python
import re
print(re.search(r"[^a-zA-Z]", "This is a sentence with spaces."))
print(re.search(r"[^a-zA-Z ]", "This is a sentence with spaces."))

print(re.search(r"cat|dog", "I like cats."))
print(re.search(r"cat|dog", "I love dogs!"))
print(re.search(r"cat|dog", "I like both dogs and cats."))

print(re.search(r"cat|dog", "I like cats."))
print(re.search(r"cat|dog", "I love dogs!"))
```

**`<_sre.SRE_Match object; span=(4, 5), match=' '>`**<br>
**`<_sre.SRE_Match object; span=(30, 31), match='.'>`**<br>
**`<_sre.SRE_Match object; span=(7, 10), match='cat'>`**<br>
**`<_sre.SRE_Match object; span=(7, 10), match='dog'>`**<br>
**`<_sre.SRE_Match object; span=(12, 15), match='dog'>`**<br>
**`<_sre.SRE_Match object; span=(7, 10), match='cat'>`**<br>
**`<_sre.SRE_Match object; span=(7, 10), match='dog'>`**<br>
**`<_sre.SRE_Match object; span=(12, 15), match='dog'>`**<br>
**`['dog', 'cat']`**

## **Repetition qualifiers**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
print(re.search(r"Py.*n", "Pygmalion"))
print(re.search(r"Py.*n", "Python Programming"))
print(re.search(r"Py[a-z]*n", "Python Programming"))
print(re.search(r"Py[a-z]*n", "Pyn"))
```
**`<_sre.SRE_Match object; span=(0, 9), match='Pygmalion'>`**
**`<_sre.SRE_Match object; span=(0, 17), match='Python Programmin'>`**
**`<_sre.SRE_Match object; span=(0, 6), match='Python'>`**
**`<_sre.SRE_Match object; span=(0, 3), match='Pyn'>`**


```python
import re
print(re.search(r"o+l+", "goldfish"))
print(re.search(r"o+l+", "woolly"))
print(re.search(r"o+l+", "boil"))
```
**`<_sre.SRE_Match object; span=(1, 3), match='ol'>`**
**`<_sre.SRE_Match object; span=(1, 5), match='ooll'>`**
**`None`**


```python
import re
print(re.search(r"p?each", "To each their own"))
print(re.search(r"p?each", "I like peaches"))
```
**`<_sre.SRE_Match object; span=(3, 7), match='each'>`**
**`<_sre.SRE_Match object; span=(7, 12), match='peach'>`**

## **Escaping characters**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
print(re.search(r".com", "welcome"))
print(re.search(r"\.com", "welcome"))
print(re.search(r"\.com", "mydomain.com"))
```
**`<_sre.SRE_Match object; span=(2, 6), match='lcom'>`**<br>
**`None`**<br>
**`<_sre.SRE_Match object; span=(8, 12), match='.com'>`**<br>

```python
import re
print(re.search(r"\w*", "This is an example"))
print(re.search(r"\w*", "And_this_is_another"))
```
**`<_sre.SRE_Match object; span=(0, 4), match='This'>`**<br>
**`<_sre.SRE_Match object; span=(0, 19), match='And_this_is_another'>`**<br>

## **Regular expressions in action**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
print(re.search(r"A.*a", "Argentina"))
print(re.search(r"A.*a", "Azerbaijan"))
print(re.search(r"^A.*a$", "Australia"))
```
**`<_sre.SRE_Match object; span=(0, 9), match='Argentina'>`**
**`<_sre.SRE_Match object; span=(0, 9), match='Azerbaija'>`**
**`<_sre.SRE_Match object; span=(0, 9), match='Australia'>`**

```python
import re
pattern = r"^[a-zA-Z_][a-zA-Z0-9_]*$"
print(re.search(pattern, "_this_is_a_valid_variable_name"))
print(re.search(pattern, "this isn't a valid variable"))
print(re.search(pattern, "my_variable1"))
print(re.search(pattern, "2my_variable1"))
```
**`<_sre.SRE_Match object; span=(0, 30), match='_this_is_a_valid_variable_name'>`**
**`None`**
**`<_sre.SRE_Match object; span=(0, 12), match='my_variable1'>`**
**`None`**

## **Regular expressions**
A regular expression—sometimes called regex—is a string of characters that specifies a pattern to match against some text. In addition to matching patterns, they can search to extract specific parts of text, validate input data, and are supported by code editors and integrated development environments (IDEs). In this reading, you will look at some examples of common regexes used in coding. 

### **Regex examples**
`r”\d{3}-\d{3}-\d{4}”`  This line of code matches U.S. phone numbers in the format 111-222-3333.


`r”^-?\d*(\.\d+)?$”`  This line of code matches any positive or negative number, with or without decimal places.


`r”^(.+)\/([^\/]+)\/”` This line of code matches any path and filename.


### **Helpful tool**
Sometimes regexes can be complex and difficult to read and understand—even for experienced programmers! There are tools available to help break down the regex and explain what each part of the expression does. A common tool that you can use to help with understanding each stage of a regular expression is:

https://regex101.com/

### **Key takeaways**
Regular expressions offer powerful capabilities to programmers but, at times, can be complex and difficult to understand. The more you code with regular expressions, the more comfortable you will be using and understanding them. For more information on regex, check out the following links:

https://docs.python.org/3/howto/regex.html

https://docs.python.org/3/library/re.html

https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

## **Capturing groups**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
result = re.search(r"^(\w*), (\w*)$", "Lovelace, Ada")
print(result)
print(result.groups())
print(result[0])
print(result[1])
print(result[2])
"{} {}".format(result[2], result[1])
```
**`<_sre.SRE_Match object; span=(0, 13), match='Lovelace, Ada'>`**
**`('Lovelace', 'Ada')`**
**`Lovelace, Ada`**
**`Lovelace`**
**`Ada`**
**`Ada Lovelace`**

```python
import re
def rearrange_name(name):
    result = re.search(r"^(\w*), (\w*)$", name)
    if result is None:
        return name
    return "{} {}".format(result[2], result[1])
rearrange_name("Lovelace, Ada")
```
**`Ada Lovelace`**

```python
import re
def rearrange_name(name):
    result = re.search(r"^(\w*), (\w*)$", name)
    if result is None:
        return name
    return "{} {}".format(result[2], result[1])
rearrange_name("Ritchie, Dennis")
```
**`Dennis Ritchie`**

```python
import re
def rearrange_name(name):
    result = re.search(r"^([\w \.-]*), ([\w \.-]*)$", name)
    if result == None:
        return name
    return "{} {}".format(result[2], result[1])
rearrange_name("Hopper, Grace M.")
```
**`Grace M. Hopper`**


## **More on repetition qualifiers**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.


```python
import re
print(re.search(r"[a-zA-Z]{5}", "a ghost"))
```
**`<_sre.SRE_Match object; span=(2, 7), match='ghost'>`**

```python
import re
print(re.search(r"[a-zA-Z]{5}", "a scary ghost appeared"))
```
**`<_sre.SRE_Match object; span=(2, 7), match='scary'>`**

```python
import re
print(re.findall(r"[a-zA-Z]{5}", "a scary ghost appeared"))
```
**`['scary', 'ghost', 'appea']`**

```python
import re
re.findall(r"\b[a-zA-Z]{5}\b", "A scary ghost appeared")
```
**`['scary', 'ghost']`**

```python
import re
print(re.findall(r"\w{5,10}", "I really like strawberries"))
```
**`['really', 'strawberri']`**

```python
import re
print(re.findall(r"\w{5,}", "I really like strawberries"))
```
**`['really', 'strawberries']`**

```python
import re
print(re.search(r"s\w{,20}", "I really like strawberries"))
```
**`<_sre.SRE_Match object; span=(14, 26), match='strawberries'>`**


## **Extracting a PID using regexes in Python**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
print(result[1])
```
**`12345`**

```python
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
print(result[1])
```
**`34567`**

```python
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
result = re.search(regex, "99 elephants in a [cage]")
print(result[1])
```
**`#Note that this print command results in an error as shown in the video.`**<br> 
**`Error on line 7:`**<br>
    **`print(result[1])`**<br>
**`TypeError: 'NoneType' object is not subscriptable`**<br>

```python
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
result = re.search(regex, "99 elephants in a [cage]")
def extract_pid(log_line):
    regex = r"\[(\d+)\]"
    result = re.search(regex, log_line)
    if result is None:
        return ""
    return result[1]
print(extract_pid(log))
```
**`12345`**

```python
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)
result = re.search(regex, "A completely different string that also has numbers [34567]")
result = re.search(regex, "99 elephants in a [cage]")
def extract_pid(log_line):
    regex = r"\[(\d+)\]"
    result = re.search(regex, log_line)
    if result is None:
        return ""
    return result[1]
print(extract_pid(log))
print(extract_pid("99 elephants in a [cage]"))
```
**`12345`**

## **Splitting and replacing**

### **Introduction**
This follow-along reading is organized to match the content in the video that follows. It contains the same code shown in the next video. These code blocks will provide you with the opportunity to see how the code is written, allow you to practice running it, and can be used as a reference to refer back to. 

You can follow along in the reading as the instructor discusses the code or review the code after watching the video.

```python
import re
re.split(r"[.?!]", "One sentence. Another one? And the last one!")
```
**`['One sentence', ' Another one', ' And the last one', '']`**

```python
import re
re.split(r"([.?!])", "One sentence. Another one? And the last one!")
```
**`['One sentence', '.', ' Another one', '?', ' And the last one', '!', '']`**

```python
import re
re.sub(r"[\w.%+-]+@[\w.-]+", "[REDACTED]", "Received an email for go_nuts95@my.example.com")
```
**`Received an email for [REDACTED]`**

```python
import re
re.sub(r"^([\w .-]*), ([\w .-]*)$", r"\2 \1", "Lovelace, Ada")
```
**`Ada Lovelace`**

## **Advanced regular expressions**
Advanced regular expressions—commonly referred to as advanced regexes—are used by developers to execute complicated pattern matching against strings. In this reading, you will learn about some of the common examples of advanced regular expressions.

### **Alterations**
An alteration matches any one of the alternatives separated by the pipe | symbol. Let’s look at an example:

**`r"location.*(London|Berlin|Madrid)"`** 

This line of code will match the text string **`location is London`**, **`location is Berlin`**, or **`location is Madrid`**.

### **Matching only at the beginning or end**
If you use the circumflex symbol (also known as a caret symbol) ^ as the first character of your regex, it will match only if the pattern occurs at the start of the string. Alternatively, if you use the dollar sign symbol $ at the end of a regex, it will match only if the pattern occurs at the end. Let’s look at an example:

**`r”^My name is (\w+)” `**

This line of code will match **`My name is Asha`** but not **`Hello. My name is Asha.`**

### **Character ranges**
Character ranges can be used to match a single character against a set of possibilities. Let’s look at a couple of examples:

**`r”[A-Z]`** This will match a single uppercase letter.

**`r”[0-9$-,.]`** This will match any of the digits zero through nine, or the dollar sign, hyphen, comma, or period.

The two examples above are often combined with the repetition qualifiers. Let’s look at one more example:

**`r”([0-9]{3}-[0-9]{3}-[0-9]{4})”`**

This line of code will match a U.S. phone number such as **`888-123-7612`**.

### **Backreferences**
A backreference is used when using re.sub() to substitute the value of a capture group into the output. Let’s look at an example:

**`re.sub(r”([A-Z])\.\s+(\w+)”, r”Ms. \2”, “A. Weber and B. Bellmas have joined the team.”)`**

This line of code will produce **`Ms. Weber and Ms. Bellmas have joined the team.`**

### **Lookahead**
A lookahead matches a pattern only if it’s followed by another pattern. Let’s look at an example:

If the regex was `r”(Test\d)-(?=Passed)”` and the string was `“Test1-Passed, Test2-Passed, Test3-Failed, Test4-Passed, Test5-Failed”` the output would be:

**`Test1, Test2, Test4`**

Key takeaways
The types of advanced regular expressions explained in this reading are just some of the more commonly used ones by developers. They are beneficial in pattern matching, text manipulation, and data validation. For more information, check out the following link:

https://regexcrossword.com/

## **Practice Quiz: Advanced Regular Expressions**

1. Working with a CSV file that contains employee information. Each record has a name field, followed by a phone number field, and a role field. The phone number field contains U.S. phone numbers and needs to be modified to the international format, with "+1-" in front of the phone number. The rest of the phone number should not change. Fill in the regular expression, using groups, to use the transform_record function to do that.

```python
import re
def transform_record(record):
  new_record = re.sub(r',([\d\-]+),',r',+1-\1,',record)
  return new_record

print(transform_record("Sabrina Green,802-867-5309,System Administrator")) 
# Sabrina Green,+1-802-867-5309,System Administrator

print(transform_record("Eli Jones,684-3481127,IT specialist")) 
# Eli Jones,+1-684-3481127,IT specialist

print(transform_record("Melody Daniels,846-687-7436,Programmer")) 
# Melody Daniels,+1-846-687-7436,Programmer

print(transform_record("Charlie Rivera,698-746-3357,Web Developer")) 
# Charlie Rivera,+1-698-746-3357,Web Developer

```
2. The multi_vowel_words function returns all words with 3 or more consecutive vowels (a, e, i, o, u). Fill in the regular expression to do that.

```python
import re
def multi_vowel_words(text):
  pattern = r'\w*(?:a|e|i|o|u){3,}\w*'
  result = re.findall(pattern, text)
  return result

print(multi_vowel_words("Life is beautiful")) 
# ['beautiful']

print(multi_vowel_words("Obviously, the queen is courageous and gracious.")) 
# ['Obviously', 'queen', 'courageous', 'gracious']

print(multi_vowel_words("The rambunctious children had to sit quietly and await their delicious dinner.")) 
# ['rambunctious', 'quietly', 'delicious']

print(multi_vowel_words("The order of a data queue is First In First Out (FIFO)")) 
# ['queue']

print(multi_vowel_words("Hello world!")) 
# []

```

3. When capturing regex groups, what datatype does the groups method return?

    **A tuple**

4. The transform_comments function converts comments in a Python script into those usable by a C compiler. This means looking for text that begins with a hash mark (#) and replacing it with double slashes (//), which is the C single-line comment indicator. For the purpose of this exercise, we'll ignore the possibility of a hash mark embedded inside of a Python command, and assume that it's only used to indicate a comment. We also want to treat repetitive hash marks (##), (###), etc., as a single comment indicator, to be replaced with just (//) and not (#//) or (//#). Fill in the parameters of the substitution method to complete this function: 

```python
import re
def transform_comments(line_of_code):
  result = re.sub(r'(#+)','//', line_of_code)
  return result

print(transform_comments("### Start of program")) 
# Should be "// Start of program"
print(transform_comments("  number = 0   ## Initialize the variable")) 
# Should be "  number = 0   // Initialize the variable"
print(transform_comments("  number += 1   # Increment the variable")) 
# Should be "  number += 1   // Increment the variable"
print(transform_comments("  return(number)")) 
# Should be "  return(number)"

```

5. The convert_phone_number function checks for a U.S. phone number format: XXX-XXX-XXXX (3 digits followed by a dash, 3 more digits followed by a dash, and 4 digits), and converts it to a more formal format that looks like this: (XXX) XXX-XXXX. Fill in the regular expression to complete this function.

```python
import re
def convert_phone_number(phone):
  result = re.sub(r'(\b\d{3})-(\d{3}-\d{4})\b',r'(\1) \2', phone)
  return result

print(convert_phone_number("My number is 212-345-9999.")) # My number is (212) 345-9999.
print(convert_phone_number("Please call 888-555-1234")) # Please call (888) 555-1234
print(convert_phone_number("123-123-12345")) # 123-123-12345
print(convert_phone_number("Phone number of Buckingham Palace is +44 303 123 7300")) # Phone number of Buckingham Palace is +44 303 123 7300

```