**Participation. 1/23**

**Name: `Your Name`**

In [2]:
import re

# Python Regular Expressions

The Python `re` module provides many functions for regular expression support.  Here you will learn more about the different functions and complete exercises to practice their use.

## `re.match`

The `match() function is used to check a pattern against some text.  It only tries to find the pattern in the beginning of the text.  

`re.match` Documentation:  https://docs.python.org/3.7/library/re.html#re.match


*Reminder* the 'r' at the start of the pattern, indicates that it is a "raw" string which passes through backslashes (handy for regular expresssions).

### Example

In [3]:
tmpStr1 = 'Regular expressions are great'
tmpStr2 = 'It is fun learning about regular expressions'
match = re.match(r'[Rr]egular', tmpStr1)
if match:
    print('found ', match.group())
else:
    print("did not find")

match = re.match(r'[Rr]egular', tmpStr2)
if match:
    print('found ', match.group())
else:
    print("did not find")

found  Regular
did not find


## `re.search`

The `re.search(pat, str)` function takes two main arguments: `pat` a regular expression pattern and a `str` string.  The method searches for that first occurence of the pattern within the string.  If sucessful, `search()` returns a match object; otherwise it returns None.

`re.search()` Documentation: https://docs.python.org/3.7/library/re.html#re.search

### Example

In [4]:
tmpStr1 = 'Regular expressions are great'
tmpStr2 = 'It is fun learning about regular expressions'
match = re.search(r'[Rr]egular', tmpStr1)
if match:
    print('found ', match.group())
else:
    print("did not find")

match = re.search(r'[Rr]egular', tmpStr2)
if match:
    print('found ', match.group())
else:
    print("did not find")

found  Regular
found  regular


### Example

In [5]:
tmpStr1 = 'I have a cat, Fido'
tmpStr2 = 'I have a cat, Felix'
tmpStr3 = 'I have a cat, It'
match = re.search(r'cat,\s\w\w\w\w', tmpStr1)
if match:
    print('found ', match.group())
else:
    print("did not find")

found  cat, Fido


Try running the expression above on the three test strings.


### Exercise 1 - Properties of search

Examine the following search uses of the search function.

In [6]:
tmpStr1 = 'baa baaa black sheep'
match = re.search(r'ba+', tmpStr1)
if match:
    print('found: ', match.group())
else:
    print("did not find")

tmpStr2 = 'baa2 baaaa4 baaa3'
match = re.search(r'ba+\d', tmpStr2)
if match:
    print('found: ', match.group())
else:
    print("did not find")

found:  baa
found:  baa2


**Q** Which of the "baa" words is returned in tmpStr2?  Will the function return the leftmost or rightmost occurance in a string?

**ANS**
baa2. Leftmost since it will be the first occcurance.

### Example - Anchors

The exception to you answer above is if the pattern specifies anchors to find a match at the beginning `^` or end `$` of a string.

In [7]:
tmpStr1 = 'foobar1 foobar2 foobar3'
match = re.search(r'^f\w+\d', tmpStr1)
if match:
    print('found: ', match.group())
else:
    print("did not find")

match = re.search(r'f\w+\d$', tmpStr1)
if match:
    print('found: ', match.group())
else:
    print("did not find")

found:  foobar1
found:  foobar3


### Exercise 2 - Create a pattern

Create a regular expression pattern that matches all the positive examples below, but none of the negative examples.  You can not simply list the positives strings "or"ed together.

| Positive | Negative |
|----------|----------|
| pit      | pt       |
| spot     | Pot      |
| spate    | peat     |
| slap two | part     |
| respite  |          |

In [8]:
cases = ['pit', 'spot', 'spate', 'slap two', 'respite', 'pt', 'Pot', 'peat', 'part']
print('Positive Cases: \n')
for ex in cases: 
    match = re.search(r's\w|pit', ex)
    if ex=="pt": 
        print("\nNegative Cases: \n")
    if match: 
        print("%9s: found" % ex)
    else: 
        print("%9s: not found" % ex)


Positive Cases: 

      pit: found
     spot: found
    spate: found
 slap two: found
  respite: found

Negative Cases: 

       pt: not found
      Pot: not found
     peat: not found
     part: not found


### Example - Group Extraction

The "group" part of regular expressions allows for part of the matching text to be selected out.  Let's say we want to extract an email from a string, but in addition to finding the email we want to extract the username and host separately, e.g., to pull out a MTU ISO login.

The parenthesis in the pattern are used to identify the "groups" inside the text.  

In [11]:
tempStr = 'send an email to John, jdoe@mtu.edu, by tomorrow'
match = re.search('([\w]+)@([\w.]+)', tempStr)
if match:
    print("Email:    ", match.group())
    print("username: ", match.group(1))
    print("hostname: ", match.group(2))
else:
    print("no match")

Email:     jdoe@mtu.edu
username:  jdoe
hostname:  mtu.edu


### Exercise 3 - Groups

There are discussions on what is the best regular expression pattern to match emails (e.g, used to verify emails in forms).  But, let's think about how to extend the pattern above to handle the following cases:

* usernames, can have both characters and numbers and underscores, but will not start with a number, e.g, jdoe15@mtu.edu, sherlock24@gmail.com, tom_brady@gmail.com
* an email may have task-specific email address (for example, google allows this), where you can add additional identifiers after your username, e.g., harrypotter+news@gmail.com or jonstark+dragons@gmail.com.  Make sure you can separate out a username from the tasks.



In [30]:
cases = ['jdoe@gmail.com', 'sherlock24@gmail.com', 'tom_brady@gmail.com',
         'harrypotter+news@gmail.com', 'jonstark+dragons@gmail.com',
         'juliet_capulet+poison@gmail.com']
for ex in cases:
    match = re.search(r'(\w+)(\+(\w+))?@([\w.]+)', ex)
    if match:
        print("Email: ", match.group(), end='')
        print(" username: ", match.group(1), end='')
        print(" identifiers: ", match.group(2), end='')
        print(" hostname: ", match.group(3))
    else:
        print("no match")

Email:  jdoe@gmail.com username:  jdoe identifiers:  None hostname:  None
Email:  sherlock24@gmail.com username:  sherlock24 identifiers:  None hostname:  None
Email:  tom_brady@gmail.com username:  tom_brady identifiers:  None hostname:  None
Email:  harrypotter+news@gmail.com username:  harrypotter identifiers:  +news hostname:  news
Email:  jonstark+dragons@gmail.com username:  jonstark identifiers:  +dragons hostname:  dragons
Email:  juliet_capulet+poison@gmail.com username:  juliet_capulet identifiers:  +poison hostname:  poison
