# Why Use Regular Expressions?

### Before starting Regular Expression (RegEx), let's take an example without using RegEx.
* Let's say we have log entries with a typical log line format, we want to extract the process identifier from this line, which is a number between the square brackets '12345'. 
* There's a lot of extra text in this log line that we don't need, like the date, the computer name and other info. We could extract the process ID by using the index method to find the first square bracket in the string.
* Why not straight find the number '1'? Because there are other texts also '1', only square bracket has the only special character in the log entry.

In [3]:
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Perfroming package upgrade"
log.index("[")

39

**Brittle way to extracting numbers by using index() function**
* Assign the index of open square bracket to a variable, and then use masking technique + indexing to extract the number of '12345'

In [15]:
index = log.index("[")
print(log[index+1 : index+6])

12345


**Although we get the text that we wanted, we might hit a few bumps down the road. One problem is we don't know for sure how long the process ID string will be in all cases. In this example, we can see that it's 5 characters long.
But that may change in the future if the computers restarted, or the number of processes increases. This could also break if for any reason, the line includes another square bracket before the process ID. So it's a solution but it's a very brittle one.** 
* Instead, we could use a RegEx to extract the process ID in a more robust fashion. For that, we're going to import the re module, which let's us use the search() function to search a text for strings matching a spcific pattern.

In [25]:
import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Perfroming package upgrade"

regex = r"\[(\d+)\]"
result = re.search(regex, log)
print(result[1])

12345
