# Regular Expresions #

Also known as "regex" or "regexp". They provide a concise and flexible means for matching strings of text; characters, words, or patterns. Regex is written in a language that can be interpreted by a regular expression processor. 

- It is programming with characters.
- It is fun once you understand them.
- Powerful but very cryptic.

### Quick Guide ###

| Characters | Description |
| --- | --- |
| ^ | Matches the beginning of a line |
| $ | Matches the end of the line |
| . | Matches any character |
| \s | Matches whitespace |
| \S | Matches any non-whitespace character |
| * | Repeats a character zero or more times |
| *? | Repeats a character zero or more times (non-greedy) |
| + | Repeats a character one or more times |
| +? | Repeats a character one or more times (non-greedy) |
| [aeiou] | Matches a  single character in the listed set |
| [^XYZ] | Matches a single character not in the listed set |
| [a-z0-9] | The set of characters can include a range |
| ( | Indicates where string extraction starts | 
| ) | Indicates where string extraction ends |
<br>

**Regex are not part of base python** so in order to use regular expressions you need to use `import re` statement.
Then you can use `re.search()` to check if a string matches a regular expression and `re.findall()` to extract portions of a string that match your regular expression.

Example:

Applying pattern '^X-\S+' on following strings:  <br><br>
X-Sieve: CMU Sieve 2.3 <br>
X-DSPAM-Result: Innocent <br>
X-Plane is behind: two weeks <br>

Would give back the following matches:<br><br>
X-Sieve: <br>
X-DSPAM-Result: <br>
X-Plane <br>

### Matching and Extracting ###

`match` and `search` methods will search for a regular expression pattern and return only the first ocurrence within a `Match object`, if the pattern is not found they return `None`. However, if we want to have all the ocurrences in the string within a list we can use the `findall` method. 
___

In [15]:
import re
str_ex = 'Address 0xF43E contains value 0x982'
print("String: " + str_ex)
lst0 = re.match('0x\S+', str_ex)
print("Match '0x\S+:'", lst0)
lst1 = re.search('0x\S+', str_ex)
print("Search '0x\S+':", lst1)
lst2 = re.findall('0x\S+', str_ex)
print("Find all '0x\S+':", lst2)

lst1 = re.match('Ad\w+', str_ex)
print("Match 'Ad\w+':", lst1)


String: Address 0xF43E contains value 0x982
Match '0x\S+:' None
Search '0x\S+': <re.Match object; span=(8, 14), match='0xF43E'>
Find all '0x\S+': ['0xF43E', '0x982']
Match 'Ad\w+': <re.Match object; span=(0, 7), match='Address'>


____

What is the difference between `match` and `search`?

### Substituting ###

The `re.sub(pattern, replacement, string)` searches a string and replace it with another value:

For example:
____

In [2]:
import re
email = "lcorrales@gmail.com"
print(re.sub("[a-z]*@", "abc@", email))

abc@gmail.com


____
#### Greedy vs Non-greedy ####

The greedy search tries to capture as much as possible while non-greedy search stops capturing as soon as the pattern is satisfied.
____

In [6]:
import re
line = 'From: using colon :'
txt = re.findall('^F.+:', line)
print("Greedy:",txt)
txt = re.findall('^F.+?:', line)
print("Non-greedy:", txt)

Greedy: ['From: using colon :']
Non-greedy: ['From:']


____
### Exercise ###

Repeat exercise from the String section by creating a script that parses text_file.txt and prints the domain name of the sender's email address. This time do it by using a regular expression.
____

____
## Equivalent Perl Code ##

#### Matching and Extracting ####

&emsp;Python:
``` python
import re
str_ex = 'Address 0xF43E contains value 0x982'
lst0 = re.match('0x\S+', str_ex)
print("Match '0x\S+:'", lst0)
lst1 = re.search('0x\S+', str_ex)
print("Search '0x\S+':", lst1)
lst2 = re.findall('0x\S+', str_ex)
print("Find all '0x\S+':", lst2)

lst1 = re.match('Ad\w+', str_ex)
print("Match 'Ad\w+':", lst1)
```

&emsp;Perl:
```
$str_ex = 'Address 0xF43E contains value 0x982';
print "Match '0x\S+:'" . $str_ex =~ m/0x\S+/ . "\n";
$str_ex =~ /(0x\S+)/;
print "Find one '0x\S+': $1 \n" ;
@all_captured = $str_ex =~ /0x\S+/g;
print "Find all '0x\S+': @all_captured \n";
```

#### Substituting ####

&emsp;Python:
``` python
import re
email = "lcorrales@gmail.com"
print(re.sub("[a-z]*@", "abc@", email))   # Global, so it needs @ in the pattern to have only one match
```

&emsp;Perl:
```
$email = 'lcorrales@gmail.com';  # Note this needs single quotes so perl takes @ literal
print $email =~s/[a-z]*/abc/r;   # Not global, substitute first match only.
```

#### Greedy vs Non-greedy ####

&emsp;Python:
``` python
import re
line = 'From: using colon :'
txt = re.findall('^F.+:', line)
print("Greedy:",txt)
txt = re.findall('^F.+?:', line)
print("Non-greedy:", txt)
```

&emsp;Perl:
```
$line = 'From: using colon :';
$line =~ /(^F.+:)/;
print "Greedy: $1 \n";
$line =~ /(^F.+?:)/;
print "Non-greedy: $1 \n";
```

- For more info about regex in Perl check https://perldoc.perl.org/perlre

___
### References ###
 - https://pythex.org/
 - https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/regular-expressions