In [1]:
import re

### Emails example guidance

- `name` part matches the part before the `@` or `AT`
- `at` part matches `@` or `(at)` or `(AT)` or `(aT)` or `(At)`, etc.,
- `domain` part matches `cs.wisc.edu` or `wisc.edu` or `something.com` or `something1.something2.org`, etc.,
- `full_regex` puts the entire regex together using the above three parts

### Self-practice

### Q1: Which regex will NOT match "123"
1. r"\d\d\d"
2. r"\d{3}"
3. r"\D\D\D"
4. r"..."

A: 3. `\D\D\D`

Explanation: `\D` matches everything except digits

In [2]:
print(re.findall(r"\d\d\d", "123"))
print(re.findall(r"\d{3}", "123"))
print(re.findall(r"\D\D\D", "123"))
print(re.findall(r"...", "123"))

['123']
['123']
[]
['123']


### Q2: What will r"^A" match?
1. "A"
2. "^A"
3. "BA"
4. "B"
5. "BB"

A: 1. `"A"`

Explanation: `^` is anchor symbol indicating string should begin with `A`. So `^A` can only match option 1, which is the only option that begins with `A`. 

Option 2 has a literal `^` - to match that regex should escape the special meaning using backslash, that is regex should be `r"\^A"`

In [3]:
print(re.findall(r"^A", "A"))
print(re.findall(r"^A", "^A"))
print(re.findall(r"^A", "BA"))
print(re.findall(r"^A", "B"))
print(re.findall(r"^A", "BB"))
# To match option 2 you need to escape regular meaning of ^
print(re.findall(r"\^A", "^A"))

['A']
[]
[]
[]
[]
['^A']


### Q3: Which one can match "HH"?
1. r"HA+H"
2. r"HA+?H"
3. r"H(A+)?H"

A: 3. `r"H(A+)?H"`

Explanation: 

Option 1 specifies `A+` which is one or more `A`'s in between the two `H`'s, which wouldn't match `HH`. This option will match strings like "HAH", "HAAH", "HAAAH", etc.,

Option 2 specifies `A+?` which is non-greedy match for one or more `A`'s in between the two `H`'s, which wouldn't match `HH`.

Option 3 is the correct way to specify that you want to optionally match one or more `A`'s, by specifying `(A+)?` ---> now the `?` acts as 0 or 1 operator, instead of greedy versus non-greedy.

In [4]:
print(re.findall(r"HA+H", "HH"))
print(re.findall(r"HA+?H", "HH"))
print(re.findall(r"H(A+)?H", "HH"))
# To see actual match with option 3, you must surround the whole regex 
# using a group capture, that is:
print(re.findall(r"(H(A+)?H)", "HH"))

[]
[]
['']
[('HH', '')]


In [5]:
# Option 1: strings that can be matched
print(re.findall(r"HA+H", "HAH"))
print(re.findall(r"HA+H", "HAAH"))
print(re.findall(r"HA+H", "HAAAH"))
print(re.findall(r"HA+H", "HAHAAH"))
# Option 2: same set of matches but non-greedy 
# (which doesn't make a difference for these examples)

['HAH']
['HAAH']
['HAAAH']
['HAH']


### Q4: Which string(s) will match r"^(ha)*$"
1. ""
2. "hahah"
3. "that"
4. "HAHA"

A: 1. `""`

Explanation: The string is supposed to begin with `ha` and end with `ha` --- this is because of `^` and `$`. In between `(ha)*` can match 0 or more `ha`'s.

In [6]:
print(re.findall(r"^(ha)*$", ""))
print(re.findall(r"^(ha)*$", "hahah"))
print(re.findall(r"^(ha)*$", "that"))
print(re.findall(r"^(ha)*$", "HAHA"))

['']
[]
[]
[]


In [7]:
# Strings that can have a match with `r"^(ha)*$"`
print(re.findall(r"^(ha)*$", "ha"))
print(re.findall(r"^(ha)*$", "haha"))
print(re.findall(r"^(ha)*$", "hahaha"))
# and so on

['ha']
['ha']
['ha']


### Q5: What is the type of the following?

`re.findall(r"(\d) (\w+)", some_str)[0]`

1. list
2. tuple
3. string

A: 2. `tuple`

Explanation: there are two groups indicated by two () inside the regex. So the return value will be a `list` of `tuples`. So inner data structure indexed using 0 will be `tuple`.

### Q6: What will it do?
```python
re.sub(r"(\d{3})-(\d{3}-\d{4})",
       r"(\g<1>) \g<2>",
       "608-123-4567")
```

A: converts 608-123-4567 phone number format to (608) 123-4567 format.

Explanation: 

Regex parts: 
1. group capture of 3 \d matches `(\d{3})`
2. hyphen match `-`
3. group capture of 3 \d matches, then a `-`, and then 4 \d matches `(\d{3}-\d{4})`

Replacement string part (which is also a raw string)
1. group 1 within parenthesis `(\g<1>)`
2. space
3. group 2 as such

In [8]:
re.sub(r"(\d{3})-(\d{3}-\d{4})",
       r"(\g<1>) \g<2>",
       "608-123-4567")

'(608) 123-4567'