## Python Regular Expressions

* Regular expressions are a powerful language for matching text patterns.
* The Python "re" module provides regular expression support.

In [None]:
match = re.search(pat, str)

The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None otherwise

In [37]:
# Example
import re

str = 'an example word:catt!!'
match = re.search(r'word:\w\w\w', str)
# If-statement after search() tests if it succeeded
if match:
  print('found', match.group()) ## 'found word:cat'
else:
  print('did not find')


found word:cat


### Explanation
* match = re.search(pat, str) stores the search result in a variable named "match"
* if-statement tests the match -- if true the search succeeded and match.group() is the matching text (e.g. 'word:cat'). Otherwise if the match is false (None to be more specific), then the search did not succeed, and there is no matching text.

* The 'r' at the start of the pattern string designates a python "raw" string which passes through backslashes without change 

In [41]:
# Other Examples
## All of the pattern must match, but it may appear anywhere.
match = re.search(r'iii', 'piiiig') # found, match.group() == "iii"
print(match)

<re.Match object; span=(1, 5), match='iiii'>


In [43]:
match = re.search(r'ig', 'piig') # not found, match == None
print(match)

<re.Match object; span=(3, 5), match='ig'>


In [46]:
## . = any char but \n
match = re.search(r'..g', 'piiig') # found, match.group() == "iig"
match

<re.Match object; span=(2, 5), match='iig'>

In [49]:
## \d = digit char, \w = word char
match = re.search(r'\d\d', 'p123g') # found, match.group() == "123"
match

<re.Match object; span=(1, 3), match='12'>

In [52]:
#\w = word char
match = re.search(r'\w+', '@@abcdfff!!') # found, match.group() == "abc"
match

<re.Match object; span=(2, 9), match='abcdfff'>

In [14]:
str = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())  ## 'b@google'


b@google


The search does not get the whole email address in this case because the \w does not match the '-' or '.' in the address. We'll fix this using the regular expression features below.



Square brackets can be used to indicate a set of chars, so [abc] matches 'a' or 'b' or 'c'. The codes \w, \s etc. work inside square brackets too with the one exception that dot (.) just means a literal dot. For the emails problem, the square brackets are an easy way to add '.' and '-' to the set of chars which can appear around the @ with the pattern r'[\w.-]+@[\w.-]+' to get the whole email address:

In [55]:
str = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
    print(match.group())  ## 'alice-b@google.com'
def str(value):
    .find("@")
    [::-1]

alice-b@google.com
