Slackbot with bAbI implementation
==================

###### In class we created a model that scored well on the single supporting fact task... But we couldn't tangibly use it
---
We wanted to put our model to the test with untrained data, **and see the output**

### Our vision:
+ Be able to use our model with our own human input
+ Be able to actually see the results of the model, instead of just seeing the score
+ To create a bot that would correctly predict the answer to our questions

### Challenges:
+ Complexity in saving and loading models and their weights
+ Understanding how the input varies when using a chatbot
+ Successfully getting the bot to output the answer back into the chat

![Where is Nick](http://imgur.com/uqbn3Oy)

http://imgur.com/uqbn3Oy

![Robot](http://imgur.com/2YqeIwq)

http://imgur.com/2YqeIwq

### Approach
+ Finalize the model and successfully save it
+ Load the model into another jupyter notebook
+ Design an input method for the user to write stories
+ Redesign functions to take input and parse it into stories / questions
+ Have the model successfully calculate the answer and answer in Slack

## Methods:
+ Data: Facebook bAbI EN-10k dataset to train
+ Model: Recurrent Neural Network
+ End-2-End Pipeline: Train model -> Model used on Slackbot -> User input into chat -> Bot output back into chat

## How we measure success:
+ Test score against testing data
+ Successfully setting up a chatbot in Slack
+ Successfully having the chatbot give the right answer to our stories / questions

### How did we do?
+ Test score: 100%
+ Slackbot set up: @best_friend, @hopeful_friend
+ In the end = **@no_friends**

![SlackBot](http://imgur.com/a/gXftR)

http://imgur.com/a/gXftR

In [1]:
#Imports
import os
from slackclient import SlackClient
import re
import tarfile
import functools
import numpy as np
from keras.utils.data_utils import get_file
from keras.layers.embeddings import Embedding
from keras import layers
from keras.layers import recurrent
from keras.models import Model
from keras.models import load_model
from keras.preprocessing.sequence import pad_sequences
import pickle
import time

Using TensorFlow backend.


In [2]:
#Establishing bot name and API 
BOT_NAME = 'hopeful_friend'
slack_client = SlackClient('xoxp-200374995520-201140520773-201892794883-8f7016436a178221d358c3844da225b7')

#Bot's ID
BOT_ID = 'U5WB3LDLY'
AT_BOT = "<@" + BOT_ID + ">"

In [3]:
#Connect find our bot
api_call = slack_client.api_call("users.list")
if api_call.get('ok'):
    users = api_call.get('members')
    #loop through to find our bot
    for user in users:
        if 'name' in user and user.get('name') == BOT_NAME:
            print("Bot ID for '" + user['name']
                  + "' is " + user.get('id'))
        else: 
            print("could not find bot user with the name " + BOT_NAME)
            

In [46]:
#Loading our trained bAbI model
def _load_db():
    bAbI_model = load_model("TrainedModel2.h5")
    #create a tuple to import our variables altogether and pick save them into file
    touple = pickle.load( open("vocab_save.p", "rb"))
    return bAbI_model, touple

model, touple = _load_db()
vocab = touple[0]
story_maxlen = touple[1]
q_maxlen = touple[2]

In [5]:
def tokenize(sent):
    '''Return the tokens of a sentence including punctuation.
    >>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']
    '''
    return [x.strip() for x in re.split('(\W+)?', sent) if x.strip()]

In [29]:
#need to read from single line of slack command not from file 
#input 
# "Mary in kitchen.//Bob in garden.//Where Mary?" -> [[[["Mary", "in", "kitchen", "."], ["Bob", ...]], ["Where", "Mary", "?"]]]
# "Mary in kitchen.//Bob in garden.//Where Mary?".split('//') -> ["Mary in kitchen.", "Bob ...", "Where ..."]

def parse_stories(lines, only_supporting=False):
    '''Parse stories provided in the bAbi tasks format
    If only_supporting is true,
    only the sentences that support the answer are kept.
    '''
    data = []
    story = []
    question = []
   
    story_lines = lines.split('//')
    story_parse = story_lines[:2]
    story_q = story_lines[2]
    
    for x in story_parse:
        x_token = tokenize(x)
        story.extend(x_token)
   
    story_q = tokenize(story_q)
    question.extend(story_q)
       
    data.append((story, question))

    return data

In [30]:
def get_stories(f, only_supporting=False, max_length=None):
    '''Given a file name, read the file, retrieve the stories,
    and then convert the sentences into a single story.
    If max_length is supplied,
    any stories longer than max_length tokens will be discarded.
    '''
    data = parse_stories(f, only_supporting=only_supporting)
    #flatten = lambda data: functools.reduce(lambda x, y: x + y, data)
    data = [(story, q) for story, q in data if not max_length or len(flatten(story)) < max_length]
    return data

In [31]:
def vectorize_stories(data, word_idx, story_maxlen, query_maxlen):
    xs = []
    xqs = []
    for story, query in data:
        for w in story:
            print(w)
        x = [word_idx[w] for w in story]
        xq = [word_idx[w] for w in query]
        # let's not forget that index 0 is reserved
        xs.append(x)
        xqs.append(xq)
    return pad_sequences(xs, maxlen=story_maxlen), pad_sequences(xqs, maxlen=query_maxlen)

In [32]:
# Reserve 0 for masking via pad_sequences
word_idx = dict((c, i + 1) for i, c in enumerate(vocab))
print(type(word_idx))

<class 'dict'>


In [47]:
"""
    Receives commands directed at the bot and determines if they
    are valid commands. If so, then acts on the commands. If not,
    returns back what it needs for clarification.
"""
def handle_command(x, xq):
    response = "Output"
    response = model.predict(([x, xq]))
    response = np.argmax(response) - 1
    response = vocab[response]
    print(vocab)
    return response
    
    """
    slack_client.api_call("chat.postMessage", channel=channel,
                          text=response, as_user=True)
    """
def parse_slack_output(slack_rtm_output):
    """
        The Slack Real Time Messaging API is an events firehose.
        this parsing function returns None unless a message is
        directed at the Bot, based on its ID.
    """
    output_list = slack_rtm_output
    if output_list and len(output_list) > 0:
        for output in output_list:
            if output and 'text' in output and AT_BOT in output['text']:
                # return text after the @ mention, whitespace removed
                return output['text'].split(AT_BOT)[1].strip().lower(), \
                       output['channel']
    return None, None


In [48]:
READ_WEBSOCKET_DELAY = 1
if slack_client.rtm_connect():
    print("StarterBot connected and running!")
    while True:
        command, channel = parse_slack_output(slack_client.rtm_read())
        if command and channel:
            
            command = get_stories(command)
            x, xq = vectorize_stories(command, word_idx, story_maxlen, query_maxlen)
            
            handle_command(x, xq)
            
        time.sleep(READ_WEBSOCKET_DELAY)
else:
    print("Connection failed. Invalid slack token or bot ID")

Connection failed. Invalid slack token or bot ID


In [51]:
test_str = "John went to the hallway. // Sandra journeyed to the kitchen. // Where is Sandra?"
test_str = get_stories(test_str)
print(test_str)

x, xq = vectorize_stories(test_str, word_idx, story_maxlen, q_maxlen)
ans = handle_command(x, xq)
print("-------")
print(ans)

[(['John', 'went', 'to', 'the', 'hallway', '.', 'Sandra', 'journeyed', 'to', 'the', 'kitchen', '.'], ['Where', 'is', 'Sandra', '?'])]
John
went
to
the
hallway
.
Sandra
journeyed
to
the
kitchen
.
['.', '?', 'Daniel', 'John', 'Mary', 'Sandra', 'Where', 'back', 'bathroom', 'bedroom', 'garden', 'hallway', 'is', 'journeyed', 'kitchen', 'moved', 'office', 'the', 'to', 'travelled', 'went']
-------
kitchen


  return _compile(pattern, flags).split(string, maxsplit)


## Conclusion:
#### What we learned:
+ How to train a model using labeled datasets
+ What features of the model influence how well it does against unseen data
+ Getting comfortable with using different APIs
+ How to create a bot on slack using it's API
+ How to implement a created model into a chatbot
+ A great understanding of all the functions that are used in creating a bAbI NLP model

#### Room for improvement:
+ **LOTS!**
+ Successfully have the chatbot reply to the Slack user
+ Successfully feeling comfortable using the bot and getting it active
+ Making specialized bots that would be more useful in slack channels

# A big thank you to Raymond & Henry for a great class and a great 4 weeks in Cape Town! :)