# Regular Expressions


## Table of Contents

* [Turn 1](#Turn-1)

* [References](#References)



## Task

Our task is to develop a chatbot that can talk about smartphones.
Here is an example dialogue (`S`: system, `U`: user):

```
S0: are you using a smartphone?
U0: yes, i have an iphone.
S1: how long have you been using iphone?
U1: about 2 years.
S2: oh, are you using iphone 10?
U2: no, it's actually 7s.
S3: that model came out over 3 years ago. have you thought about changing your phone?
U3: no, I am good for now.
```

## Turn 1

Given the following question initiated by the system:

> S0: are you using a smartphone?

We expect either `Yes`, `No`, or `None` as the user response.

### Yes Response

The following code defines a group `(yes|yeah)` in the regular expression that matches the literals:

In [54]:
import re

re_yn = re.compile(r'(yes|yeah)')
m = re_yn.match('yeah, i am')
print(m)

<re.Match object; span=(0, 4), match='yeah'>


If there is a match, we can retrive the literal:

In [55]:
if m:
    yes = m.group()
    print(yes)

yeah


If no match is found, it returns `None`:

In [56]:
m = re_yn.match('sure, i am')
print(m)

None


### Issue 1

`re_yn` can overmatch:

In [57]:
m = re_yn.match('yesterday was my birthday')
if m: print(m.group())

yes


Match only if the literals are followed by a space (`\s`), a comma (`,`), a period (`\.`), or the end of the string (`$`):

In [58]:
re_yn = re.compile(r'(yes|yeah)(\s|,|\.|$)')

m = re_yn.match('yesterday was my birthday')
print(m)

m = re_yn.match('yes, i am')
print(m.groups())
for i in range(len(m.groups())): print(i, m.group(i))

None
('yes', ',')
0 yes,
1 yes


Exclude the second group from capturing with `?:`:

In [60]:
re_yn = re.compile(r'(yes|yeah)(?:\s|,|\.|$)')

m = re_yn.match('yes, i am')
for i in range(len(m.groups())): print(i, m.group(i))

0 yes,


### Issue 2

`re_yn` matches only from the beginning of the string:

In [61]:
m = re_yn.match('well, yes I am')
print(m)

None


Allow the regular expression to match the literals with any prior characters followed by a space (`\s`), a comma (`,`), a period (`\.`), or the beginning of the string (`^`):

In [8]:
re_yn = re.compile(r'(?:.*)(?:\s|,|\.|^)(yes|yeah)(?:\s|,|\.|$)')

m = re_yn.match('yeah, I am')
if m: print(m.group(1))

m = re_yn.match('well, yes I am')
if m: print(m.group(1))

yeah
yes


### No Response

Define another group `(no|not really)` in the same reqular expression that matches the literals:

In [9]:
re_yn = re.compile(r'(?:.*)(?:\s|,|\.|^)(yes|yeah)|(no|not really)(?:\s|,|\.|$)')

m = re_yn.match('yes, I am')
if m: print(m.groups())
    
m = re_yn.match('no I am not')
if m: print(m.groups())

('yes', None)
(None, 'no')


### Issue

The matching stops after the first match:

In [10]:
m = re_yn.match('yes or no')
if m: print(m.groups())

('yes', None)


Use the `findall` method instead of `match`:

In [11]:
re_yn = re.compile(r'(?:\s|,|\.|^)(yes|yeah)|(no|not really)(?:\s|,|\.|$)')
ts = re_yn.findall('yes or no')
for t in ts: print(t)

('yes', '')
('', 'no')


### Function

Write a function that returns a list of literals defined in the regular expression matching the input string:

In [12]:
from typing import List

def regex_matcher(regex: str, input: str) -> List[str]:
    ts = None
    for t in regex.findall(input):
        if ts is None: ts = [None] * len(t)
        for i, literal in enumerate(t):
            if ts[i] is None and literal:
                ts[i] = literal
    return ts

In [13]:
print(regex_matcher(re_yn, 'yes, I am'))
print(regex_matcher(re_yn, 'no I am not'))
print(regex_matcher(re_yn, 'yes or no'))

['yes', None]
[None, 'no']
['yes', 'no']


## Response: Phone Model

The yes/no response can be followed by the user specifying the phone model.

In [19]:
re_phone = re.compile(r'(?:\s|,|\.|^)(apple|google|samsung)|(iphone|pixel|galaxy|android)(?:\s|,|\.|$)')

print(regex_matcher(re_phone, 'yes I have an iphone'))
print(regex_matcher(re_phone, 'yes I have google pixel'))
print(regex_matcher(re_phone, 'yes I have a galaxy phone'))

[None, 'iphone']
['google', 'pixel']
[None, 'galaxy']


## Put Together

In [47]:
TAB = '    '

s0 = 'are you using a smartphone?'
u0 = input(s0+TAB)

are you using a smartphone?    i have an iphone


In [48]:
yn = regex_matcher(re_yn, u0)
phone = regex_matcher(re_phone, u0)

if phone:
    p = phone[1] if phone[1] else phone[0]
    s1 = 'how long have you been using {}?'.format(p)
elif yn:
    if yn[0]:
        s1 = 'what kind of smartphone do you have?'
    else:
        s1 = 'really? you should consider getting one.'

print(s1)

how long have you been using iphone?


## Response: Timeline

In [53]:
re_duration = re.compile(r'(\d+)\s(day|month|year)')
print(regex_matcher(re_duration, 'about 2 years'))
print(regex_matcher(re_duration, 'about 15 days'))

['2', 'year']
['15', 'day']


## References

* https://www.regular-expressions.info/tutorial.html
* https://regex101.com