# RegEx Modules (String handle)
- A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

- RegEx can be used to check if a string contains the specified search pattern.

[ ]()

In [1]:
import re

In [12]:
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
print(x)

<re.Match object; span=(0, 17), match='The rain in Spain'>


# RegEx Functions

<table style="width:100%">
  <tr>
    <th>Function</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>findall()</td>
    <td>Returns a list containing all matches</td>
  </tr>
  <tr>
    <td>search()</td>
    <td>Returns a Match object if there is a match anywhere in the string</td>
  </tr>
  <tr>
    <td>split()</td>
    <td>Returns a list where the string has been split at each match</td>
  </tr>
  <tr>
    <td>sub()</td>
    <td>Replaces one or many matches with a string</td>
  </tr>
</table>


In [13]:
x = re.findall("ai", txt)
print(x)
x = re.findall("Portugal", txt)
print(x)

['ai', 'ai']
[]


In [14]:
x = re.search('\s', txt)
print("The first white-space character is located in position:", x.start())

x = re.search("Portugal", txt)
print(x)

The first white-space character is located in position: 3
None


  x = re.search('\s', txt)


In [17]:
x = re.split("\s", txt)
print(x)

x = re.split("\s", txt, 1)
print(x)

['The', 'rain', 'in', 'Spain']
['The', 'rain in Spain']


  x = re.split("\s", txt)
  x = re.split("\s", txt, 1)


In [20]:
x = re.sub("\s", "9", txt)
print(x)

x = re.sub("\s", "9", txt, 2)
print(x)

The9rain9in9Spain
The9rain9in Spain


  x = re.sub("\s", "9", txt)
  x = re.sub("\s", "9", txt, 2)


In [24]:
x = re.search(r"\bS\w+", txt)
print(x.span()) #  returns a tuple containing the start-, and end positions of the match.
# print(x.string()) # returns the string passed into the function
print(x.group()) # returns the part of the string where there was a match

(12, 17)
Spain


# Metacharacters
- Metacharacters are characters with a special meaning

<table style="width:100%">
  <tr>
    <th>Character</th>
    <th>Description</th>
    <th>Example</th>
  </tr>
  <tr>
    <td>[]</td>
    <td>A set of characters</td>
    <td>"[a-m]"</td>
  </tr>
  <tr>
    <td>\</td>
    <td>Signals a special sequence (can also be used to escape special characters)</td>
    <td>"\d"</td>
  </tr>
  <tr>
    <td>.</td>
    <td>Any character (except newline character)</td>
    <td>"he..o"</td>
  </tr>
  <tr>
    <td>^</td>
    <td>Starts with</td>
    <td>"^hello"</td>
  </tr>
  
  <tr>
    <td>$\$</td>
    <td>Ends with</td>
    <td>"planet$"</td>
  </tr>
  
  <tr>
    <td>*</td>
    <td>Zero or more occurrences</td>
    <td>"he.*o"</td>
  </tr>
  <tr>
    <td>+</td>
    <td>One or more occurrences</td>
    <td>"he.+o"</td>
  </tr>
  <tr>
    <td>?</td>
    <td>Zero or one occurrences</td>
    <td>"he.?o"</td>
  </tr>
  <tr>
    <td>{}</td>
    <td>Exactly the specified number of occurrences</td>
    <td>"he.{2}o"</td>
  </tr>
  <tr>
    <td>|</td>
    <td>Either or</td>
    <td>"falls|stays"</td>
  </tr>
  <tr>
    <td>()</td>
    <td>Capture and group</td>
    <td></td>
  </tr>
</table>


# Special Sequences
- A special sequence is a \ followed by one of the characters in the list below, and has a special meaning

<table style="width:100%">
  <tr>
    <th>Character</th>
    <th>Description</th>
    <th>Example</th>
  </tr>
  <tr>
    <td>\A</td>
    <td>Returns a match if the specified characters are at the beginning of the string</td>
    <td>"\AThe"</td>
  </tr>
  <tr>
    <td>\b</td>
    <td>Returns a match where the specified characters are at the beginning or at the end of a word</td>
    <td>r"\bain" <br> r"ain\b"</td>
  </tr>
  <tr>
    <td>\B</td>
    <td>Returns a match where the specified characters are present, but NOT at the beginning or at the end of a word</td>
    <td>r"\Bain" <br> r"ain\B"</td>
  </tr>
  <tr>
    <td>\d</td>
    <td>Returns a match where the string contains digits (numbers from 0-9)</td>
    <td>"\d"</td>
  </tr>
  <tr>
    <td>\D</td>
    <td>Returns a match where the string DOES NOT contain digits</td>
    <td>"\D"</td>
  </tr>
  <tr>
    <td>\s</td>
    <td>Returns a match where the string contains a white space character</td>
    <td>"\s"</td>
  </tr>
  <tr>
    <td>\S</td>
    <td>Returns a match where the string DOES NOT contain a white space character</td>
    <td>"\S"</td>
  </tr>
  <tr>
    <td>\w</td>
    <td>Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)</td>
    <td>"\w"</td>
  </tr>
  <tr>
    <td>\W</td>
    <td>Returns a match where the string DOES NOT contain any word characters</td>
    <td>"\W"</td>
  </tr>
  <tr>
    <td>\Z</td>
    <td>Returns a match if the specified characters are at the end of the string</td>
    <td>"Spain\Z"</td>
  </tr>
</table>


# Sets
- A set is a set of characters inside a pair of square brackets [] with a special meaning

<table style="width:100%">
  <tr>
    <th>Set</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>[arn]</td>
    <td>Returns a match where one of the specified characters (a, r, or n) is present</td>
  </tr>
  <tr>
    <td>[a-n]</td>
    <td>Returns a match for any lower case character, alphabetically between a and n</td>
  </tr>
  <tr>
    <td>[^arn]</td>
    <td>Returns a match for any character EXCEPT a, r, and n</td>
  </tr>
  <tr>
    <td>[0123]</td>
    <td>Returns a match where any of the specified digits (0, 1, 2, or 3) are present</td>
  </tr>
  <tr>
    <td>[0-9]</td>
    <td>Returns a match for any digit between 0 and 9</td>
  </tr>
  <tr>
    <td>[0-5][0-9]</td>
    <td>Returns a match for any two-digit numbers from 00 and 59</td>
  </tr>
  <tr>
    <td>[a-zA-Z]</td>
    <td>Returns a match for any character alphabetically between a and z, lower case OR upper case</td>
  </tr>
  <tr>
    <td>[+]</td>
    <td>In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string</td>
  </tr>
</table>
