# ELIZA-like chatbot

This chatbot is based on **Riccardo Di Maio**'s implementation of the ELIZA chatbot. <br>
See: https://github.com/rdimaio/eliza-py

ELIZA itself was developed by Joseph Weizenbaum from 1964 to 1966.
The original paper can be downloaded from:
https://dl.acm.org/doi/10.1145/365153.365168

In the introduction Weizenbaum writes:

*«It is said that to explain is to explain away. This maxim is nowhere so well fulfilled as in the area of computer programming, especially in what is called heuristic programming and artificial intelligence. For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induce understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible. The observer says to himself "I could have written that". With that thought he moves the program in question from the shelf marked "intelligent", to that reserved for curios, fit to be discussed only with people less enlightened than he.»* 

**By writing your own chatbot, you can best understand this statement. The script-based implementation of a chatbot shows that the dialog is based on tricks that fake a conversation.**

One of the fundamental questions surrounding chatbots is at what point the fake tips over and you believe you are communicating with a real person. With Eliza the fake is clear, but what about GPT 3 based deep learning solutions? What's the point anymore of knowing how the code works?

For example, the 2013 film "Her" addresses this question of how deep the relationship we build with a piece of software can be.

https://www.google.com/search?q=youtube+her+film&rlz=1C5CHFA_enES893ES893&oq=youtube+her+film&aqs=chrome..69i57j0i22i30l9.6654j0j7&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:903bf2d1,vid:3fJd4DGjLBs


# How does ELIZA work?

*"ELIZA performs best when its human correspondent is initially instructed to "talk" to it, via the typewriter of course, just as one would to a psychiatrist. This mode of conversation was chosen because the psychiatric interview is one of the few examples of categorized dyadic natural language communication in which one of the participating pair is free to assume the pose of knowing almost nothing of the real world. If, for example, one were to tell a psychiatrist "I went for a long boat ride" and he responded "Tell me about boats", one would not assume that he knew nothing about boats, but that he had some purpose in so directing the subsequent conversation. (Weizenbaum 1966, 42)*

## Rule-based Chatbot
The input of the user is decompsed and then reassembled. The decomposition starts from a list of ranked keywords. The keyword with the highest rank controls the decomposition and reassemble process.

<span style="font-family:Courier;">

**function** ELIZA(user sentence) **returns** response 
- Read in the scripts
- Parse User input   
- **while** user_input is not in exit_inputs
    - Find the word w in sentence that has the highest keyword rank
    - **if** w exists <br>
        - Choose the highest ranked rule r for w that matches sentence
        - response = Apply the transform in r to sentence
        - **if** w = ‘my’ <br>
            future = apply a transformation from the ‘memory’ rule list to sentence.
            Push future onto memory queue.
    - **else** (no keyword applies) 
        - **either** 
            response: Pop the oldest response from the memory queue
        - **or** 
            response: Apply the transform for the NONE keyword to sentence 
- **return**(*response*)

</span>


## Regular expressions
Regular expression matching operations are very power <br>
Have a look at the regular-expression.pdf


In [1]:
# some string manipulation of the ELIZA notation into regular expressions

def processDecompRules(script, tags):
    # Cycle through each dict in the JSON script
    for d in script:
        # Cycle through all the rules in each dict
        for rule in d['rules']:
            # Convert decomposition rule from Weizenbaum notation to regex
            rule['decomp'] = decompToRegex(rule['decomp'], tags) 
    return script

def decompToRegex(in_str, tags):
    out_str = ''

    in_str = re.sub('[()]', '', in_str)

    for w in in_str:
        w = regexify(w, tags)
        # Parentheses are needed to properly divide sentence into components
        # \s* matches zero or more whitespace characters 
        out_str += '(' + w + r')\s*' 
    return out_str

def regexify(w, tags):
    # 0 means "an indefinite number of words"
    if w == '0': 
        w = '.*'
    # A positive non-zero integer means "this specific amount of words"
    elif w.isnumeric() and int(w) > 0:
        w = r'(?:\b\w+\b[\s\r\n]*){' + w + '}'
    # A word starting with @ signifies a tag
    elif w[0] == "@":
        # Get tag name
        tag_name = w[1:].lower()
        if tag_name in tags:
        # Make a regex separating each option with OR operator (e.g. x|y|z)
            w = r'\b(' + '|'.join(tags[tag_name]) + r')\b'
    else:
        # Add word boundaries to match on a whole word basis
        w = r'\b' + w + r'\b'
    return w


In [2]:
# Utility functions for the main loop in the next cell
import re

# substitute string according to the dictionary (general, substitutions)
def substitute(sentence, substitutions):  
    substitute = ''
    # go through all words of the sentence
    for word in sentence.split():
        # If substitutions specifies a substitution for this word, substitute it
        if word in substitutions:
            substitute += substitutions[word] + ' '
        # Otherwise let the word as it is
        else:
            substitute += word + ' '
    return substitute

def determineRanks(keywords, num_keywords, script):
    ranks = []
    
    # calculate list of ranks
    for keyword in keywords:
        for d in script:
            if d['keyword'] == keyword:
                ranks.append(d['rank'])
                num_keywords += 1
                break
        else:
            ranks.append(0)
    return ranks, num_keywords

def findMostImportantSentenceAndItsKeywords(sentences, substitutions, script):
    keywords = []
    ranks = []
    maxima = []
    all_keywords = []
    all_ranks = []
    sentences = re.split(r'[.,!?](?!$)', sentences)
    num_keywords = 0
    
    for i in range(0, len(sentences)):
        sentences[i] = re.sub(r'[#$%&()*+,-./:;<=>?@[\]^_{|}~]', '', sentences[i])
        sentences[i] = substitute(sentences[i], substitutions)       
        if sentences[i]:
            keywords = sentences[i].split()
            all_keywords.append(keywords)
            ranks, num_keywords = determineRanks(keywords, num_keywords, script)
            maxima.append(max(ranks))
            all_ranks.append(ranks)
    # Return earliest sentence with highest keyword rank
    max_rank = max(maxima)
    max_index = maxima.index(max_rank) 
    keywords = all_keywords[max_index]
    ranks = all_ranks[max_index]
    
    # Sort list of keywords according to list of ranks
    sorted_keywords = [x for _,x in sorted(zip(ranks, keywords), reverse=True)]
    return sentences[max_index], sorted_keywords

def decomposeSentence(keyword, sentence, script):
    comps = []
    reassembly_rule = ''
    
    # Cycle through elements in script
    for d in script: 
        if d['keyword'] == keyword:
            # Cycle through decomp rules for that keyword
            for rule in d['rules']:
                m = re.match(rule['decomp'], sentence, re.IGNORECASE)
                # If decomposition rule matches
                if m:
                    # Decompose string according to decomposition rule
                    comps = list(m.groups())
                    reassembly_rule = rule['reassembly'][rule['last_used_reassembly_rule']]
                    # Update last used reassembly rule ID
                    next_id = rule['last_used_reassembly_rule']+1
                    # If all reassembly rules have been used, start over
                    if next_id >= len(rule['reassembly']):
                        next_id = 0
                    rule['last_used_reassembly_rule'] = next_id
                    break
            break
    return comps, reassembly_rule

def reassembleResponse(components, reassembly_rule):
    response = ''
    reassembly_rule = reassembly_rule.split() 
    
    for comp in reassembly_rule:
        # If comp is a number, then place the component at that index
        if comp.isnumeric():
            # int(comp)-1 due to the fact that 
            # reassembly rules in Weizenbaum notation are 1-indexed
            response += components[int(comp)-1] + ' '
        # Otherwise, place the word itself
        else:
            response += comp + ' '

    # Remove trailing space
    response = response[:-1]
    return response

def generateMemoryResponse(sentence, script, memory_stack):
    # '^' is the memory stack keyword
    mem_comps, mem_reassembly_rule = decomposeSentence('^', sentence, script)
    mem_response = reassembleResponse(mem_comps, mem_reassembly_rule)
    memory_stack.append(mem_response)
    
def generateGenericResponse(script): 
    comps, reassembly_rule = decomposeSentence('$', '$', script)
    return reassembleResponse(comps, reassembly_rule)


In [4]:
import json
import random
memory_stack = []

# read the scripts, that define the course of the conversation
f1 = 'Skripte/general.json'
f2 = 'Skripte/doctor.json'

file1 = open(f1,'r')
json_str = file1.read()
general_script = json.loads(json_str)
#print(type(general_script))

file2 = open(f2,'r')
json_str = file2.read()
dialog_script = json.loads(json_str)
#print(type(specific_script))

# process decomposition rules in custom script
dialog_script = processDecompRules(dialog_script, general_script['tags'])

exit_inputs = general_script['exit_inputs']
memory_inputs = general_script['memory_inputs']
substitutions = general_script['substitutions']
tags = general_script['tags']

# read and clean user input
user_input = str(input("Eliza: Welcome.\nYou: "))
user_input = user_input.lower()
while not (any(c.isalpha() for c in user_input)):
    user_input = str(input("Please use letters an write a sentence."))

# The main loop only breaks when the user types one of the exit_inputs.
# Multiple sentences are processed which are separated by a full stop. 
# Only the sentence with the highest ranked keyword is taken and processed.

while user_input not in exit_inputs:
    sentence, keywords = findMostImportantSentenceAndItsKeywords(user_input, substitutions, dialog_script) 
    for keyword in keywords:
        comps, reassembly_rule = decomposeSentence(keyword, sentence, dialog_script)
        # Break if matching decomposition rule has been found
        if comps:
            response = reassembleResponse(comps, reassembly_rule)
            # For certain keywords, generate an additional response to push onto memory stack
            if keyword in memory_inputs:
                generateMemoryResponse(sentence, script, memory_stack)
            break
    # We only enter the else condition if no decomposition rule has been found
    else:
        # Try to pop an answer from memory stack
        if memory_stack:
            response = memory_stack.pop()
        # Otherwise, respond with a generic answer
        else:
            response = generateGenericResponse(dialog_script)

    # Remove extra whitespaces
    response = ' '.join(str(response).split())
    # Remove whitespaces before punctuation
    response = re.sub(r'\s([?.!"](?:\s|$))', r'\1', response)
    response = 'Eliza: '+ response    
    user_input = str(input(response+"\nYou: "))
    user_input = user_input.lower()
    while not (any(c.isalpha() for c in user_input)):  
        user_input = str(input("Please use letters and write a sentence."))

print("Eliza: Goodbye.\n")


Eliza: Welcome.
You:  hi
Eliza: Please go on.
You:  how are you?
Eliza: Why do you ask?
You:  because I feel a bit lost today.
Eliza: You say because you feel a bit lost today.
You:  bye


Eliza: Goodbye.

