# Regular Expressions

>*"Let’s say you have a problem, and you decide to solve it with regular expressions. Well, now you have two problems"*

## Regular Expressions

Regular expressions (a.k.a. **REs**, **regexes**, **regexpses**, **regex patterns**) are a tiny, specialized **programming language** used to parse and manipulate text.

Regular expressions are available in:


- applications involving text processing, e.g text editors and IDEs (e.g. emacs, vi, Notepad++);


- the Unix command line (see `sed`, `grep`, `awk`);


- those programming language that natively support them (e.g. Perl, Ruby, Tcl);


- those programming language that support them through extension libraries (e.g. C, C++, Python).

Basically, a regex is **a pattern describing a portion of text**, that is typically used to:


- check **if** a string has a given form


- find **a** text portion that satisfies a given search criterion


- find **all** the text portions that satisfy a given search criterion


The ultimate goal being either to inspect a portion of text or to modify it in various possible ways (e.g. string replacement, tokenization etc.)

Usually, search patterns can be paraphrased into natural languages as something like:

- *"non-digits sequences where the second character is upper-case"*


- *"everything preceding a question mark"*


- *"all those words that starts with a digit"*

## Regular Expressions in Python

>### Some Literature:
>
> - **[Sections 3.4 - 3.8](http://www.nltk.org/book/ch03.html#sec-regular-expressions-word-patterns)** of: S. Bird, S., E. Klein & W. Loper (2009). Natural Language Processing with Python, O'Reilly
>
>
> - [**Regular Expression HOWTO**](https://docs.python.org/2/howto/regex.html): a gentle tutorial
>
> 
> - [**the official documentation**](https://docs.python.org/2/library/re.html) for the `re` module
>
>
> - [**PyMOTW post**](https://pymotw.com/2/re/index.html) on the `re` module

In [1]:
## Notebook settings 

# multiple lines of output per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

There are multiple regex implementations, each sharing a core syntax but with different extensions or advanced features, i.e. each has a different **flavor**. 

Python provides support for regex in the `re` module, whose syntax is based on the one used in Perl regex, plus some Python-specific enhancements.

In [2]:
import re

In the most simplistic form, the use of regular expressions in Python is a three step process:


- a pattern is defined through a regex in string format;


- the regex is compiled into a pattern object by means of the `re.compile()` method (note that this step is often implicitly performed by most of the top-level `re` methods we'll use);


- the appropriate pattern object methods are used in accordance with our purposes (e.g. do we want to look for all thee matches or only one? do we want the match to start from the first character? do we want to modify the matching string?)

For instance, the `re.findall()` function is a top-level method that find all substrings where the regex matches, and returns them as a list. When used according to the following syntax:


```python
re.findall("pattern_string", target_string, **Kwargs)
```

it creates a pattern object by calling the `re.compile()` method on the `pattern_string` and looks for all the matched in the "target_string".

In [3]:
# let's look for all the occurrence of the bigram "is" in the string: "this is a string"
re.findall(r"is", "this is a string")

['is', 'is']

**NOTE**. The following alternative syntax can be used for most top-level methods:

```python
pattern_re = re.compile("pattern_string")
pattern_re.method(target_string)
```

(This may be useful if you don't want to compile the same regex each time your applying a given method, or when you want to apply several methods on the same pattern objects)

In [4]:
# the same as the previous code cell
pattern_re = re.compile(r"is")
print(pattern_re.findall("this is a string"))

['is', 'is']


> **Suggestion**
>
> An educational tool that can be used to consolidate your understanding of regular expression is the nltk **`re_show(pattern, string)`** module, which annotates the `string` in every place where the `pattern` is matched. Try it:

In [5]:
import nltk
nltk.re_show(r"is", "this is a string")

th{is} {is} a string


## Regex: Syntax & Semantics

A pattern consists of:

- **Atoms**: units specifying what we're looking for and where


- **Operators** combining atoms into complex expressions

### Atoms

- Single Characters


- Dot


- Class


- Anchor

**Ordinary Characters** simply match themselves exactly. 

In [6]:
re.findall(r"t", "This is a string")

['t']

By default, regex are **case-sensitive**, but you can use the `re.I` (short for `re.IGNORECASE`) compilation flag 

In [7]:
re.findall(r"t", "This is a string", re.I)

['T', 't']

A **dot** matches any single character, except the new line character "`\n`"

In [8]:
print(re.findall(r".", "this\nis\na\nstring"))

['t', 'h', 'i', 's', 'i', 's', 'a', 's', 't', 'r', 'i', 'n', 'g']


... but if you want the dot to match also the new line character, use the `re.S` (short for `re.DOTALL`) flag 

In [9]:
print(re.findall(r".", "this\nis\na\nstring", re.S))

['t', 'h', 'i', 's', '\n', 'i', 's', '\n', 'a', '\n', 's', 't', 'r', 'i', 'n', 'g']


**Classes** define sets of characters, **any one of which** may match **one of the characters** in our string. 

The set of characters of interest in enclosed in `[`square brackets`]`

In [10]:
# let's look for "i" OR "s"
print(re.findall(r"[is]", "this is a string"))

['i', 's', 'i', 's', 's', 'i']


A **range** is indicated by a dash `-`:

In [11]:
# let's look for any of the following letters: "a", "b", "c", "d", "e"
print(re.findall(r"[a-e]", "this is definitively a string"))

['d', 'e', 'e', 'a']


When used in the definition of a class, `^` is an **exclusion operator**. It means "all but this set of characters":

In [12]:
# let's look for any character other than a vowel
print(re.findall(r"[^aeiou]", "this is definitively a string"))

['t', 'h', 's', ' ', 's', ' ', 'd', 'f', 'n', 't', 'v', 'l', 'y', ' ', ' ', 's', 't', 'r', 'n', 'g']


Some classes are associated with a special notation:

- `\d`: any decimal digit; this is equivalent to the class `[0-9]`


- `\D`: any non-digit character; this is equivalent to the class `[^0-9]`


- `\s`: any whitespace character; this is equivalent to the class `[ \t\n\r\f\v]`


- `\S`: any non-whitespace character; this is equivalent to the class `[^ \t\n\r\f\v]`


- `\w`: any alphanumeric character; this is equivalent to the class `[a-zA-Z0-9_]`


- `\W`: any non-alphanumeric character; this is equivalent to the class `[^a-zA-Z0-9_]`

In [13]:
# let's look for any character but the whitespaces
print(re.findall(r"\S", "this is definitively a string"))

['t', 'h', 'i', 's', 'i', 's', 'd', 'e', 'f', 'i', 'n', 'i', 't', 'i', 'v', 'e', 'l', 'y', 'a', 's', 't', 'r', 'i', 'n', 'g']


**Tab**, **newlines** and **returns** can be matched also by using the special characters `\t`, `\n` and `\r`

In [14]:
print(re.findall(r"\n", "this\nis\na\nstring", re.S))

['\n', '\n', '\n']


** A Note about the escape character "`\`"**: The escape character is used to mark:


- a special use of a normal character (e.g. the `\d` class above)


- the literal use of meta-characters (i.e. `".", "^", "$", "*", "+", "?", "{", "[", "]", "\", "|", "(", ")"`)...

In [15]:
# let's look for any dot in our string
print(re.findall(r"\.", "this.is.a.string"))

['.', '.', '.']


In [16]:
# let's look for square brackets in our string
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
print(re.findall(r"[\[\]]", "[this] is a string"))

['[', ']']


The use of the backslash as an escape characters may [conflict](https://docs.python.org/2/howto/regex.html#the-backslash-plague) with the Python use of the same character for the same purpose. 

The solution is to use the so-called **raw string notation** for regular expressions: by prepending `r` to our string (as we're doing in this notes) we signal the Python interpreter that backslashes should be not handled in any special way:

In [17]:
# let's look for the sequence "\s" by using the REGULAR string notation
print(re.findall("\\\s", "\section")[0])

\s


In [18]:
# let's look for the sequence "\s" by using the RAW string notation
print(re.findall(r"\\s", "\section")[0])

\s


**Anchors** are special characters that specify where (in which position of the target string) a given pattern should appear. As such, they **are not matched** to the text:

- `^`: beginning of the line, it is used to match the pattern that follows when at the beginning of the lines


- `$`: end of the line, it is used to match the preceding pattern at the end of the lines


- `b`: word boundaries, i.e. every whitespace or non-alphanumeric character (underscore excluded)

In [19]:
# let's look for the character "s" at the beginning of a line
print(re.findall(r"^s", "satisfaction"))

['s']


In [20]:
# let's look for the character "a" at the end of a line
print(re.findall(r"a$", "gianluca"))

['a']


In [21]:
# let's look for the character "s" when at the end of a word
print(re.findall(r"s\b", "this isn't a matter of strings"))

['s', 's']


**A Note about flags**: multiple flags can be concatenate by using the "`|`" symbol, in the following way:. 

In [22]:
print(re.findall(r"t.", "This\nis\na\nstring", re.S|re.I))

['Th', 'tr']


See the [official tutorial](https://docs.python.org/2/howto/regex.html#compilation-flags) for more info on the available compilation flags.

### Operators

- Sequence


- Alternation


- Repetition


- Groups

When a regex is composed by a **sequence** of atoms, it is implied that these are connected by a void sequence operator.

In [23]:
# matches the pattern "th"
re.findall(r"th", "this is a string")

['th']

In [24]:
# matches the pattern "t" + any character + "i"
re.findall(r"t.i", "this is a string")

['thi', 'tri']

In [25]:
# matches any number between 30 and 49
re.findall(r"[34][0-9]", "Nick is 29, Jason is 38 and Nick is 45")

['38', '45']

The **alternation** operator "`|`" is used to concatenate two or more alternative strings

In [26]:
# matches the patterns "thi" or "tri"
re.findall(r"thi|tri", "this is a string")

['thi', 'tri']

**Repetition** operators are used to specify that the atom or expression immediately before may be repeated.

| Operator      |    Behavior  |
|:-------------:|:-------------|
| \*	        | Zero or more of previous atom |
| +	            | One or more of previous atom |
| ?             | Zero or one of the previous atom (i.e. optional) |
| {n}           | Exactly *n* repeats where *n* is a non-negative integer |
| {n,}          | At least *n* repeats |
| {,n}          | No more than *n* repeats |
| {m,n}         | At least *m* and no more than *n* repeats |

In [27]:
# matches a "b" followed by 0 or 1 "o"s
re.findall(r"bo?", "bob is the abboooot")

['bo', 'b', 'b', 'bo']

In [28]:
# matches a "b" followed by 1 or more "o"s
re.findall(r"bo+", "bob is the abboooot")

['bo', 'boooo']

In [29]:
# matches a "b" followed by 0 or more "o"s
re.findall(r"bo*", "bob is the abboooot")

['bo', 'b', 'b', 'boooo']

In [30]:
# matches a "b" followed by 1 to 4 "o"s
re.findall(r"bo{1,4}", "bob is the abboooot")

['bo', 'boooo']

In [31]:
# matches a "b" followed by 2 "o"s
re.findall(r"bo{2}", "bob is the abboooot")

['boo']

In [32]:
# matches a "b" followed by at least 2 "o"s
re.findall(r"bo{2,}", "bob is the abboooot")

['boooo']

##### Multipliers in Python are greedy: they match the LONGEST possible string.

By appending a question mark to the multipliers (i.e. using the mutipliers `*?`, `+?`, `??`, `{m,n}?`) we can force the multipliers to match **as LITTLE text as possible** (i.e. they become **lazy**).

In [33]:
print ("greedy:\t" + str(re.findall(r"bo?", "bob is the abboooot")))
print ("lazy:\t" + str(re.findall(r"bo??", "bob is the abboooot")))

greedy:	['bo', 'b', 'b', 'bo']
lazy:	['b', 'b', 'b', 'b']


In [34]:
print ("greedy:\t" + str(re.findall(r"bo+", "bob is the abboooot")))
print ("lazy:\t" + str(re.findall(r"bo+?", "bob is the abboooot")))

greedy:	['bo', 'boooo']
lazy:	['bo', 'bo']


In [35]:
print ("greedy:\t" + str(re.findall(r"bo*", "bob is the abboooot")))
print ("lazy:\t" + str(re.findall(r"bo*?", "bob is the abboooot")))

greedy:	['bo', 'b', 'b', 'boooo']
lazy:	['b', 'b', 'b', 'b']


In [36]:
print ("greedy:\t" + str(re.findall(r"bo{1,4}", "bob is the abboooot")))
print ("lazy:\t" + str(re.findall(r"bo{1,4}?", "bob is the abboooot")))

greedy:	['bo', 'boooo']
lazy:	['bo', 'bo']


Operators apply also to **sequences of atoms**, and their scope is delimited by `(` round brackets `)` 

**NOTE**: due to its implementation, the `re.findall()` function may return counterintuitive results when dealing with groups. 

(TEST IT: try the following regexes `(a|b)+` and `((?:ab)+)\1`, and compare the results with the examples below)

We will resort to **`re.finditer()`**, whose syntax is identical to `re.findall()` but that returns *a sequence of match object instances* as an iterator.

**Match object instances** are the results of a verification of a regex on a string (if not match is found, `None` is returned). They have several methods and attributes, the most important of which are:

| Method / Attribute |    Purpose  |
|:------------------:|:-------------|
| group()	         | Return the string matched by the regex |
| start()	         | Return the starting position of the match |
| end()              | Return the ending position of the match |
| span()             | Return a tuple containing the (start, end) of the match |

In [37]:
# matches "a" followed by 1 or more "b"s
matches = re.finditer(r"ab+", "bob is the ababboooot")
for m in matches:
    print (m.group())

ab
abb


In [38]:
# matches one or more "ab" sequences
matches = re.finditer(r"(ab)+", "bob is the ababboooot")
for m in matches:
    print (m.group())

abab


In [39]:
# matches any sequence composed by at least 1 of the following characters "a" or "b"
matches = re.finditer(r"(a|b)+", "bob is the ababboooot")
for m in matches:
    print (m.group())

b
b
ababb


Depending on whether the enclosed sequence of characters should be **memorized** or not, groups can be divided into:


- **Capturing** groups save the enclosed string into a temporary variable whose value can be referred to using the notation `\NUMBER`, where number points at the position of the group in the regex. For instance: `\1` denotes the leftmost group of characters, `\2` the one following the leftmost and so forth. Capturing groups are marked by `(` round brackets `)`, that is by the notation we've adopted so far.


- **Non capturing** groups do not memorize the enclosed sequence of atoms. These groups are marked by the `(?: ... )` syntax.


(Capturing groups can also be **named**, a [topic](https://docs.python.org/2/howto/regex.html#non-capturing-and-named-groups) that we leave to the interested student)

In [40]:
# matches sequences composed by: one or more "ab" sequences followed by a "c" plus one "ab" sequence
matches = re.finditer(r"(ab)+c\1", "abcab ababcab ababcabab")
for m in matches:
    print (m.group())

abcab
ababcab
ababcab


In [41]:
# matches sequences composed by: one or more "ab" sequences followed by a "c" plus the same "ab" sequence preceding "c"
matches = re.finditer(r"((?:ab)+)c\1", "abcab ababcab ababcabab")
for m in matches:
    print (m.group())

abcab
abcab
ababcabab


In [42]:
# matches one or more "a"s followed by one or more "b"s  plus the repetition of the preceding pattern reversed
matches = re.finditer(r"(a+)(b+)\2\1", "abba abbaa abbba abbbba aabbaa")
for m in matches:
    print (m.group())

abba
abba
abbbba
aabbaa


Groups info (i.e. start and ending position, matching subgroup) can be retrieved also by passing their index to the `group()`, `start()`, `end()` and `span()` methods, by adopting the following convention:


- **Group 0** is always present; it’s the whole regex (i.e the default method of any match object method):

In [43]:
matches = re.finditer(r"(a+)(b+)\2\1", "abba abbaa abbba abbbba aabbaa")
for m in matches:
    print (m.group(), m.group(0))

abba abba
abba abba
abbbba abbbba
aabbaa aabbaa


- **Subgroups** are numbered from 1 upward: to determine the number, just count the opening parenthesis characters, going **from left to right**.

In [44]:
matches = re.finditer(r"(a+)(b+)\2\1", "abba abbaa abbba abbbba aabbaa")
for m in matches:
    print (m.group(0, 1, 2))

('abba', 'a', 'b')
('abba', 'a', 'b')
('abbbba', 'a', 'bb')
('aabbaa', 'aa', 'b')


In [45]:
# nested groups: group 2 includes group 1
matches = re.finditer(r"(a+(b+))(c+)\2\1\3", "abcbabc abbcbbabbc aabbcbbaabbc")
for m in matches:
    print (m.group(0, 1, 2, 3))

('abcbabc', 'ab', 'b', 'c')
('abbcbbabbc', 'abb', 'bb', 'c')
('aabbcbbaabbc', 'aabb', 'bb', 'c')


The `.groups()` method returns a tuple containing the strings for all the subgroups, **from 1 upwards** (i.e. there's no 0 position).

In [46]:
matches = re.finditer(r"(ab)+c\1", "abcab ababcab ababcabab")
for m in matches:
    print (m.groups())

('ab',)
('ab',)
('ab',)


## Main uses for Python Regexes:

In [47]:
# the data we'll work with

taxi_services = """Amsterdam (North Holland): 020 677 7777
The Hague (South Holland): 070 383 0830
Rotterdam (South Holland): 010 462 6333
Utrecht (Utrecht): 030 230 0400"""

pattern = r'([\w ]+) \(([\w ]+)\): ([\d ]+)'
regex = re.compile(pattern)

### 1. Match / Find (just one match)

- `re.search(pattern,string)`: look for the first location where the regex pattern produces a match, and return a MatchObject instance (return `None` if no matches)

- `re.match(pattern,string)`: return a MatchObject instance only if the regex patterns produces a match at the beginning of the string

In [48]:
# Task 1: Is there a match?
print("*** Is there a Match? ***")
if regex.search(taxi_services):
    print ("Yes")
else:
    print ("No")

*** Is there a Match? ***
Yes


In [49]:
# Task 2: What is the first match?
print("*** First Match ***")
match = regex.search(taxi_services)
if match:
    print("Overall match: ", match.group(0))
    print("Group 1 : ", match.group(1))
    print("Group 2 : ", match.group(2))
    print("Group 3 : ", match.group(3))

*** First Match ***
Overall match:  Amsterdam (North Holland): 020 677 7777
Group 1 :  Amsterdam
Group 2 :  North Holland
Group 3 :  020 677 7777


### 2. Match / Find (multiple matches)

- `re.findall(pattern,string)`: return all non-overlapping matches of pattern in string, as a list of strings

- `re.finditer(pattern,string)`: return an iterator yielding MatchObject instances

In [50]:
# Task 3: How many matches are there?
print("*** Number of Matches ***")
matches = regex.findall(taxi_services)
print(len(matches))

*** Number of Matches ***
4


In [51]:
# Task 4: What are all the matches?
print("*** All Matches ***\n")
print("------ Method 1: finditer ------\n")
for match in regex.finditer(taxi_services):
    print ("--- Start of Match ---")
    print("Overall match: ", match.group(0))
    print("Group 1 : ", match.group(1))
    print("Group 2 : ", match.group(2))
    print("Group 3 : ", match.group(3))
    print ("--- End of Match---\n")

*** All Matches ***

------ Method 1: finditer ------

--- Start of Match ---
Overall match:  Amsterdam (North Holland): 020 677 7777
Group 1 :  Amsterdam
Group 2 :  North Holland
Group 3 :  020 677 7777
--- End of Match---

--- Start of Match ---
Overall match:  The Hague (South Holland): 070 383 0830
Group 1 :  The Hague
Group 2 :  South Holland
Group 3 :  070 383 0830
--- End of Match---

--- Start of Match ---
Overall match:  Rotterdam (South Holland): 010 462 6333
Group 1 :  Rotterdam
Group 2 :  South Holland
Group 3 :  010 462 6333
--- End of Match---

--- Start of Match ---
Overall match:  Utrecht (Utrecht): 030 230 0400
Group 1 :  Utrecht
Group 2 :  Utrecht
Group 3 :  030 230 0400
--- End of Match---



In [52]:
print("------ Method 2: findall ------\n")
# if there are capture groups, findall doesn't return the overall match
# a simple workaround is warp the whole pattern in a group

wrappedregex = re.compile("".join(["(", pattern, ")"]))

for match in wrappedregex.findall(taxi_services):
    print ("--- Start of Match ---")
    print ("Overall Match: ",match[0])
    print ("Group 1: ",match[1])
    print ("Group 2: ",match[2])
    print ("Group 3: ",match[3])
    print ("--- End of Match---\n")

------ Method 2: findall ------

--- Start of Match ---
Overall Match:  Amsterdam (North Holland): 020 677 7777
Group 1:  Amsterdam
Group 2:  North Holland
Group 3:  020 677 7777
--- End of Match---

--- Start of Match ---
Overall Match:  The Hague (South Holland): 070 383 0830
Group 1:  The Hague
Group 2:  South Holland
Group 3:  070 383 0830
--- End of Match---

--- Start of Match ---
Overall Match:  Rotterdam (South Holland): 010 462 6333
Group 1:  Rotterdam
Group 2:  South Holland
Group 3:  010 462 6333
--- End of Match---

--- Start of Match ---
Overall Match:  Utrecht (Utrecht): 030 230 0400
Group 1:  Utrecht
Group 2:  Utrecht
Group 3:  030 230 0400
--- End of Match---



### 3. Search and Replace

- `re.sub(pattern, repl, string)`: return the string obtained by replacing the leftmost non-overlapping occurrences of `pattern` in `string` by the replacement `repl` (If the pattern isn’t found, `string` is returned unchanged)

    * Note that you can use backreferences in `repl`: e.g. \6 is replaced with the substring matched by group 6 in the pattern

In [53]:
# Task 5: Replace the matches

print("*** Replacements ***")
print("Let's reverse the groups")
print (regex.sub(r'\3: \1 (\2)', taxi_services))

*** Replacements ***
Let's reverse the groups
020 677 7777: Amsterdam (North Holland)
070 383 0830: The Hague (South Holland)
010 462 6333: Rotterdam (South Holland)
030 230 0400: Utrecht (Utrecht)


### 4. Split

- `re.split(pattern, string, maxsplit=0)`: split `string` by the occurrences of `pattern` for a maximum of `maxsplit > 0` times. 

    * If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

In [54]:
# Task 6: Split
# Let's split at colons or newline characters

print("*** Splits ***")

for split in re.split(r"[:\n]", taxi_services):
    print (split)

*** Splits ***
Amsterdam (North Holland)
 020 677 7777
The Hague (South Holland)
 070 383 0830
Rotterdam (South Holland)
 010 462 6333
Utrecht (Utrecht)
 030 230 0400


### Quiz. 

How would you remove the whitespaces at the beginning of each phone number string?

In [55]:
# your code here

---

## Your Turn

### Exercise 1.

Solve the following exercise (from [NLTK CH 3.12](http://www.nltk.org/book/ch03.html#exercises): Exercises 6)

Describe the class of strings matched by the following regular expressions and test them by using any appropriate function from the `re` module:

- `[a-zA-Z]+`


- `[A-Z][a-z]*`


- `p[aeiou]{,2}t`


- `\d+(\.\d+)?`


- `([^aeiou][aeiou][^aeiou])*`


- `\w+|[^\w\s]+`

In [56]:
# your code here

### Exercise 2 (advanced, but helps with assignment 1).

Implement the tokenizer described in [NLTK CH 3.7](http://www.nltk.org/book/ch03.html#regular-expressions-for-tokenizing-text) by using the appropriate method from the `re` module.


- Test it on the raw text of the Wall Street Journal corpus, available in the `nltk.corpus.treebank_raw.raw()` method.


- Explain how this tokenizer works by decomposing the regex pattern into smaller units (e.g. `([A-Z]\.)+`) and reporting WSJ examples of character sequences caught by each subpattern.


- List at least two linguistic phenomena that this tokenizer cannot handle properly.

In [57]:
# your code here

### Exercise 3 (advanced, see J&M, ch. 2).

Let me introduce you to [ELIZA](https://en.wikipedia.org/wiki/ELIZA), a (simulation of a) Rogerian psychotherapist.

Eliza is implemented by using a pattern matching and substitution methodology that gave users an illusion of understanding. It is a simple program that recognizes phrases like "You are X" and outputs questions or request of the form "What makes you think I am X?".

Can you write a code that uses the `raw_input()` method and regular expressions to allow a user to recreate the following psychotherapy session?

```
User: Men are all alike.

ELIZA: IN WHAT WAY

User: They're always bugging us about something or other.

ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE

User: Well, my boyfriend made me come here.

ELIZA: YOUR BOYFRIEND MADE YOU COME HERE

User: He says I'm depressed much of the time.

ELIZA: I AM SORRY TO HEAR YOU ARE DEPRESSED

User: You are like my father in some ways.

ELIZA: WHAT RESEMBLANCE DO YOU SEE

User: You are not very aggressive but I think you don't want me to notice that.

ELIZA: WHAT MAKES YOU THINK I AM NOT VERY AGGRESSIVE

User: You don't argue with me.

ELIZA: WHY DO YOU THINK I DON'T ARGUE WITH YOU

User: You are afraid of me.

ELIZA: DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU
```

In [58]:
# your code here

---