# JamesBot

**Language understanding module (LU):**
- inputs:
    - sequence of token embeddings
- outputs:
    - final encoder state
    - sequence of token representations (bi-rnn encoded)

**Informable slots recognizer:**
- inputs:
    - current utterance LU outputs
- outputs:
    - probability distribution that a token puts a constraint on a slot.
    - probability distribution that an utterance puts a constraint on a slot.
    
**Requestable slots recognizer:**
- inputs:
    - current utterance LU outputs
- outputs:
    - probability distribution that an user requests information about a slot.
    
**Frame tracker:**
- inputs:
    - LU states
    - informable slots constraints for every utterance
    - requestable slots for every utterance
- outputs:
    - intent vector (compare, inform, request, offer, confirm, book ...)
    - slot constraints vector
    - frame reference vector
    
- for every act give me a probability that it references utterance
    
**Belief tracker:**
- inputs:
    - current utterance LU outputs
    - previous utterance LU outputs
    - informable slots constraints p for current utterance
    - requestable slots p for current utterance
- outputs:
    - intent vector (inform, request, offer, confirm, book ...)
    - updated slot constraints vector (probability that user has expressed a constraint on the slot during the    dialogue)
- state:
    - slot constraints vector

**Query engine:**
- inputs:
    - slot constraints vector
    - informable slots recognizer p
- output:
    - updated KB results pointer
- state:
    - KB results pointer (probability distribution over results) -> argmax is the current one

**Agent policy:**
- inputs:
    - current utterance bi-rnn outputs/state
    - intent vector
    - requestable slots recognizer p
    - slot constraints vector
    - KB results pointer
- outputs:
    - dialog state vector
    
**Language generation module (LG):**
- inputs:
    - LU outputs/state
    - dialog state vector
- outputs:
    - sequence of probability distributions over tokens/placeholders in dictionary

## Preprocessing

In [1]:
import pandas as pd
import numpy as np
import nltk
import json
import time

In [1471]:
with open('../data/processed/frames/v3/turns_train.json') as f:
    turns_train = json.load(f)
with open('../data/processed/frames/v3/turns_test.json') as f:
    turns_test = json.load(f)
with open('../data/processed/frames/v3/glove_vectors.json') as f:
    embeddings = np.array(json.load(f))
with open('../data/processed/frames/v3/glove_dictionary.json') as f:
    glove_dictionary = json.load(f)
with open('../data/processed/frames/v3/slots_dictionary.json') as f:
    slots_dictionary = json.load(f)
with open('../data/processed/frames/v3/agent_actions_dictionary.json') as f:
    agent_actions_dictionary = json.load(f)
with open('../data/processed/frames/v3/agent_sub_actions_dictionary.json') as f:
    agent_sub_actions_dictionary = json.load(f)

In [1346]:
slots_index = {val: key for (key, val) in slots_dictionary.items()}
glove_index = {val: key for (key, val) in glove_dictionary.items()}

## Samples padding/iteration

In [1512]:
import random

def pad_sequences(sequences, max_len, pad_value=0):
    '''
    :param sequences: An array of arrays of different lengths that will be padded to [len(sequences) x max_len] matrix
    '''
    result = []
    for sequence in sequences:
        if len(sequence) < max_len:
            result.append(sequence + [pad_value]*(max_len - len(sequence)))
        if len(sequence) >= max_len:
            result.append(sequence[:max_len])
    return np.array(result, dtype=np.int32)

def pad_complex(struct, max_keys_len, max_values_len, pad_value=0, shuffle=True):
    '''
    :param struct: Dictionary of arrays. 
    :returns: Vector of slot indices, matrix of value indices, number of slots and a vector of value lengths.
    '''
    if len(struct) == 0:
        return (
            np.zeros(shape=(max_keys_len), dtype=np.int32),
            np.zeros(shape=(max_keys_len, max_values_len), dtype=np.int32)
        )
    
    struct_items = list(struct.items())
    if shuffle == True:
        random.shuffle(struct_items)
        
    keys, values = [], []
    for (key, value) in struct_items:
        keys.append(key)
        values.append(value)
    if len(keys) < max_keys_len:
        keys += [pad_value]*(max_keys_len-len(keys))
        values += [[pad_value]]*(max_keys_len-len(values))
    
    return (
        np.array(keys).astype(int),
        pad_sequences(values, max_values_len)
    )

def pad_array_of_complex(structs, max_keys_count, max_values_length):
    '''
    :param structs: An array of structs
    :returns: Padded keys, values, struct sizes and value lengths
    '''
    if len(structs) == 0:
        return (
            np.zeros(shape=(1,max_keys_count), dtype=np.int32),
            np.zeros(shape=(1,max_keys_count,max_values_length), dtype=np.int32),
            np.zeros(shape=(1,), dtype=np.int32),
            np.zeros(shape=(1,max_keys_count), dtype=np.int32)
        )
    key_counts = [len(struct) for struct in structs]
    value_lengths = [[len(value) for (_, value) in struct.items()] for struct in structs]
    
    keys_padded, values_padded = [], []
    for struct in structs:
        keys, values = pad_complex(struct, max_keys_count, max_values_length)
        keys_padded.append(keys)
        values_padded.append(values)
    
    keys_padded = np.array(keys_padded, dtype=np.int32)
    values_padded = np.array(values_padded, dtype=np.int32)
    
    for i in range(len(structs)):
        if len(value_lengths[i]) < max_keys_count:
            value_lengths[i] += [0]*(max_keys_count-len(value_lengths[i]))
            
    return (
        keys_padded,
        values_padded,
        np.array(key_counts, dtype=np.int32),
        np.array(value_lengths, dtype=np.int32)
    )

def pad_nested_array_of_complex(frames_batch):
    frames_counts, frames_slot_counts, frames_value_lengths = [], [], []
    for frames in frames_batch:
        frames_counts.append(len(frames))
        for frame in frames:
            frames_slot_counts.append(len(frame))
            frames_value_lengths.extend([len(value) for (_, value) in frame.items()])
    max_frames_count = max(frames_counts)
    max_slots_count = max(frames_slot_counts)
    max_values_length = max(frames_value_lengths)
    
    frames_batch_slots, frames_batch_values, frames_batch_key_counts, frames_batch_value_lengths = [], [], [], []
    for frames in frames_batch:
        slots, values, key_counts, value_lengths = pad_array_of_complex(frames, max_slots_count, max_values_length)
        if len(slots) < max_frames_count:
            slots_pad = np.zeros(shape=(max_frames_count-len(slots),max_slots_count), dtype=np.int32)
            slots = np.append(slots, slots_pad, axis=0)
        if len(values) < max_frames_count:
            values_pad = np.zeros(shape=(max_frames_count-len(values),max_slots_count,max_values_length), dtype=np.int32)
            values = np.append(values, values_pad, axis=0)
        if len(key_counts) < max_frames_count:
            key_counts_pad = np.zeros(shape=(max_frames_count-len(key_counts)), dtype=np.int32)
            key_counts = np.append(key_counts, key_counts_pad, axis=0)
        if len(value_lengths) < max_frames_count:
            value_lengths_pad = np.zeros(shape=(max_frames_count-len(value_lengths),max_slots_count), dtype=np.int32)
            value_lengths = np.append(value_lengths, value_lengths_pad, axis=0)
        frames_batch_slots.append(slots)
        frames_batch_values.append(values)
        frames_batch_key_counts.append(key_counts),
        frames_batch_value_lengths.append(value_lengths)
        
    return (
        np.array(frames_batch_slots, dtype=np.int32),
        np.array(frames_batch_values, dtype=np.int32),
        np.array(frames_batch_key_counts, dtype=np.int32),
        np.array(frames_batch_value_lengths, dtype=np.int32)
    )

def pad_complex_categorical(structs, n_categories):
    result = np.zeros([len(structs), n_categories], dtype=np.int32)
    for idx, struct in enumerate(structs):
        if len(struct) > 0:
            for (category_idx, value) in struct.items():
                result[idx, int(category_idx)] = int(value)
    return result

def pad_turns(turns, n_slots=51):
    user_inputs_batch = [turn['user_input_embedding_ids'] for turn in turns]
    user_inputs_lengths = list(map(len, user_inputs_batch))
    previous_responses_batch = [turn['previous_agent_embedding_ids'] for turn in turns]
    previous_responses_lengths = list(map(len, previous_responses_batch))
    next_responses_batch = [turn['next_agent_delexicalized_embedding_ids'] + [1] for turn in turns]
    next_responses_lengths = list(map(len, next_responses_batch))
    
    user_informed_value_slots_batch = [turn['user_informed_value_slot_ids'] for turn in turns]
    user_informed_slot_states_batch = [turn['user_informed_slot_state_ids'] for turn in turns]

    frames_counts = [len(turn['input_frames_embedded']) for turn in turns]
    frames_authors_batch = [turn['input_frame_authors'] for turn in turns]
    frames_recently_created_batch = [turn['input_frame_recently_created'] for turn in turns]
    frames_slots, frames_values, frames_slot_counts, frames_value_lengths = pad_nested_array_of_complex([turn['input_frames_embedded'] for turn in turns])
    db_result_counts = [turn['database_results_count'] for turn in turns]
    
    referenced_frames_batch = [turn['referenced_frame_embedded'] for turn in turns]
    referenced_frame_slots, referenced_frame_values, referenced_frame_slot_counts, referenced_frame_value_lengths = pad_array_of_complex(
        referenced_frames_batch,
        max(map(len, referenced_frames_batch)),
        max([len(value) for frame in referenced_frames_batch for (_, value) in frame.items()])
    )
    
    previous_active_frames_batch = [turn['previous_active_frame_id'] for turn in turns]
    current_active_frames_batch = [turn['active_frame_id'] for turn in turns]
    
    agent_actions_batch = [turn['agent_action_ids'] for turn in turns]
    agent_sub_actions_batch = [turn['agent_sub_action_ids'] for turn in turns]
    
    return {
        'user_inputs': pad_sequences(user_inputs_batch, max(user_inputs_lengths)),
        'user_input_lengths': np.array(user_inputs_lengths, dtype=np.int32),
        'previous_responses': pad_sequences(previous_responses_batch, max(previous_responses_lengths)),
        'previous_response_lengths': np.array(previous_responses_lengths, dtype=np.int32),
        'next_responses': pad_sequences(next_responses_batch, max(next_responses_lengths)),
        'next_response_lengths': np.array(next_responses_lengths, dtype=np.int32),
        'user_informed_value_slots': pad_sequences(user_informed_value_slots_batch, max(user_inputs_lengths)),
        'user_informed_slot_states': pad_complex_categorical(user_informed_slot_states_batch, n_slots),
        'frames_counts': np.array(frames_counts, dtype=np.int32),
        'frames_slots': frames_slots,
        'frames_values': frames_values,
        'frames_slot_counts': frames_slot_counts,
        'frames_value_lengths': frames_value_lengths,
        'frames_authors': pad_sequences(frames_authors_batch, max(map(len, frames_authors_batch)), pad_value=-1),
        'frames_recently_created': pad_sequences(frames_recently_created_batch, max(map(len, frames_recently_created_batch)), pad_value=-1),
        'referenced_frame_slots': referenced_frame_slots,
        'referenced_frame_values': referenced_frame_values,
        'referenced_frame_slot_counts': referenced_frame_slot_counts,
        'referenced_frame_value_lengths': referenced_frame_value_lengths,
        'previous_active_frames': np.array(previous_active_frames_batch, dtype=np.int32),
        'current_active_frames': np.array(current_active_frames_batch, dtype=np.int32),
        'db_result_counts': np.array(db_result_counts, dtype=np.int32),
        'agent_actions': pad_sequences(agent_actions_batch, max(map(len, agent_actions_batch)), pad_value=-1),
        'agent_sub_actions': pad_sequences(agent_sub_actions_batch, max(map(len, agent_sub_actions_batch)), pad_value=-1),
    }

def samples_iterator(data, batch_size=64, max_len=50):
    for i in range(int(len(data)/batch_size)):
        rows = data[i*batch_size:i*batch_size+batch_size]
        
        yield pad_turns(rows)

In [1513]:
import time

start_at = time.time()

for i, batch in enumerate(samples_iterator(turns_train)):
    print('INPUTS:')
    print('User inputs:', batch['user_inputs'].shape, 'lengths:', batch['user_input_lengths'].shape)
    print('Previous responses:', batch['previous_responses'].shape, 'lengths:', batch['previous_response_lengths'].shape)
    print('Next responses:', batch['next_responses'].shape, 'lengths:', batch['next_response_lengths'].shape)
    print('Frames counts:', batch['frames_counts'].shape)
    print('Frames slots:', batch['frames_slots'].shape)
    print('Frames values:', batch['frames_values'].shape)
    print('Frames slot counts:', batch['frames_slot_counts'].shape)
    print('Frames value lengths:', batch['frames_value_lengths'].shape)
    print('Frames authors:', batch['frames_authors'].shape)
    print('Frames recently created:', batch['frames_recently_created'].shape)
    print('Referenced frame slots:', batch['referenced_frame_slots'].shape)
    print('Referenced frame values:', batch['referenced_frame_values'].shape)
    print('Referenced frame slot counts:', batch['referenced_frame_slot_counts'].shape)
    print('Referenced frame value lengths:', batch['referenced_frame_value_lengths'].shape)
    print('DB result counts:', batch['db_result_counts'].shape)
    print("TARGETS:")
    print('Informed value slots:', batch['user_informed_value_slots'].shape)
    print('Informed slot states:', batch['user_informed_slot_states'].shape)
    print('Previous active frame:', batch['previous_active_frames'].shape)
    print('Current active frame:', batch['current_active_frames'].shape)
    print('Agent actions:', batch['agent_actions'].shape)
    print('Agent sub actions:', batch['agent_sub_actions'].shape)
    
    if i >= 0:
        break
    
print((time.time() - start_at)*1000, 'ms')

INPUTS:
User inputs: (64, 31) lengths: (64,)
Previous responses: (64, 52) lengths: (64,)
Next responses: (64, 53) lengths: (64,)
Frames counts: (64,)
Frames slots: (64, 9, 13)
Frames values: (64, 9, 13, 4)
Frames slot counts: (64, 9)
Frames value lengths: (64, 9, 13)
Frames authors: (64, 9)
Frames recently created: (64, 3)
Referenced frame slots: (64, 13)
Referenced frame values: (64, 13, 4)
Referenced frame slot counts: (64,)
Referenced frame value lengths: (64, 13)
DB result counts: (64,)
TARGETS:
Informed value slots: (64, 31)
Informed slot states: (64, 51)
Previous active frame: (64,)
Current active frame: (64,)
Agent actions: (64, 3)
Agent sub actions: (64, 8)
22.220849990844727 ms


In [1415]:
print('Actions:', len(agent_actions_dictionary))
print('Sub actions:', len(agent_sub_actions_dictionary))

Actions: 17
Sub actions: 139


## Model

In [1]:
import tensorflow as tf
from tensorflow.contrib.rnn import GRUCell, MultiRNNCell, ResidualWrapper, DropoutWrapper
from tensorflow.contrib import seq2seq
from tensorflow.python.layers.core import Dense

In [3]:
def top_k_weighted(values, weights, k=2):
    row_ids = tf.tile(tf.expand_dims(tf.range(tf.shape(values)[0]), -1), [1, k])
    
    selected_weights, selected_value_ids = tf.nn.top_k(weights, k=k)
    
    gather_indices = tf.concat([
        tf.reshape(row_ids, [-1, 1]),
        tf.reshape(selected_value_ids, [-1, 1])
    ], 1)
    
    gathered_values = tf.gather_nd(values, gather_indices)
    
    return tf.multiply(
        tf.expand_dims(selected_weights, -1),
        tf.reshape(gathered_values, [-1, k, tf.shape(values)[-1]])
    )

class JamesBotModel():
    
    def __init__(self, is_training, inputs, input_lengths, previous_responses, previous_response_lengths, frames_counts, frames_slots, frames_values, frames_slot_counts, frames_value_lengths,  frames_authors, frames_recently_created, referenced_frame_slots, referenced_frame_values, referenced_frame_slot_counts, referenced_frame_value_lengths, previous_active_frame_id, database_results_count, decoder_targets=None, decoder_target_lengths=None, word_embeddings_shape=None, n_slots=None, n_actions=None, n_sub_actions=None, train=True, epoch=None, rnn_dropout_keep_prob=1.0):
        assert word_embeddings_shape is not None
        assert n_slots is not None
        assert n_actions is not None
        assert n_sub_actions is not None
        if train:
            assert decoder_targets is not None
            assert decoder_target_lengths is not None
            assert epoch is not None
        
        # Inputs
        self._is_training = is_training
        # Current/Previous Text Input
        self._inputs = inputs
        self._input_lengths = input_lengths
        self._previous_responses = previous_responses
        self._previous_response_lengths = previous_response_lengths
        # Frames
        self._frames_counts = frames_counts
        self._frames_slots = frames_slots
        self._frames_values = frames_values
        self._frames_slot_counts = frames_slot_counts
        self._frames_value_lengths = frames_value_lengths
        self._frames_authors = frames_authors
        self._frames_recently_created = frames_recently_created
        # Referenced frame
        self._referenced_frame_slots = referenced_frame_slots
        self._referenced_frame_values = referenced_frame_values
        self._referenced_frame_slot_counts = referenced_frame_slot_counts
        self._referenced_frame_value_lengths = referenced_frame_value_lengths
        # Indicators
        self._previous_active_frame_id = previous_active_frame_id
        self._database_results_count = database_results_count
        
        # Decoder targets (only for training)
        self._decoder_targets = decoder_targets
        self._decoder_target_lengths = decoder_target_lengths
        
        # Conf
        self._word_embeddings_shape = word_embeddings_shape
        self._n_slots = int(n_slots)
        self._n_actions = int(n_actions)
        self._n_sub_actions = int(n_sub_actions)
        self._train = bool(train)
        self._greedy_p = (0.5 / (1 + tf.exp(-0.2*epoch)) - 0.25)
        self._rnn_dropout_keep_prob = rnn_dropout_keep_prob
        
        self._le_cell_output_size = 300
        self._slot_embeddings_size = 300
        self._n_slot_states = 5
    
    def build(self):
        self._embeddings_module()
        self._language_understanding_module()
        self._frames_encoder_module()
        self._informable_slots_module()
        self._frame_references_module()
        self._context_module()
        self._dialog_policy_module()
        self._response_generation_module()
    
    def embeddings_initializer(self):
        placeholder = tf.placeholder(tf.float32)
        init_op = self._word_embeddings.assign(placeholder)
        return placeholder, init_op
    
    def _embeddings_module(self):
        with tf.name_scope('embeddings_module'):
            self._word_embeddings = tf.Variable(
                tf.random_normal(self._word_embeddings_shape, -.3, .3),
                trainable = True,
                name = 'word_embeddings'
            )
            self._slot_embeddings = tf.Variable(
                tf.random_normal([self._n_slots, self._slot_embeddings_size], -.3, .3),
                trainable = True,
                name = 'slot_embeddings'
            )
            
            self._inputs_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._inputs)
            self._previous_responses_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._previous_responses)
            
            self._frames_slots_embedded = tf.nn.embedding_lookup(self._slot_embeddings, self._frames_slots)
            self._frames_values_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._frames_values)
            
            self._referenced_frame_slots_embedded = tf.nn.embedding_lookup(self._slot_embeddings, self._referenced_frame_slots)
            self._referenced_frame_values_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._referenced_frame_values)
        
    def _text_encoder(self, inputs, sequence_lengths, name='text_encoder', reuse=False):
        with tf.variable_scope(name, reuse=reuse):
            fw_cell = GRUCell(self._le_cell_output_size, activation=tf.nn.tanh)
            bw_cell = GRUCell(self._le_cell_output_size, activation=tf.nn.tanh)

            outputs, state = tf.nn.bidirectional_dynamic_rnn(
                fw_cell, bw_cell,
                inputs = inputs,
                sequence_length = sequence_lengths,
                dtype = tf.float32,
            )
            
            return tf.concat(outputs, 2, name='le_outputs'), tf.concat(state, 1, name='le_state')
    
    def _structs_encoder(self, slots, values, slot_counts, value_lengths, name='structs_encoder', reuse=False):
        batch_size, n_structs, n_slots, n_tokens, _ = tf.unstack(tf.shape(values))

        _, le_state = self._text_encoder(
            tf.reshape(values, [-1, n_tokens, self._word_embeddings_shape[1]]),
            tf.reshape(value_lengths, [-1]),
            reuse=True
        )
        with tf.variable_scope(name, reuse=reuse):
            structs_slot_value = tf.concat([
                tf.reshape(slots, [batch_size*n_structs, n_slots, self._slot_embeddings_size]),
                tf.reshape(le_state, [batch_size*n_structs, n_slots, 2*self._le_cell_output_size])
            ], 2)
            
            _, structs_encoder_state = tf.nn.dynamic_rnn(
                GRUCell(2*self._le_cell_output_size, activation=tf.nn.tanh),
                inputs = structs_slot_value,
                sequence_length = tf.reshape(slot_counts, [-1]),
                dtype = tf.float32
            )
            
            return tf.reshape(structs_encoder_state, [batch_size, n_structs, 2*self._le_cell_output_size])
    
    def _language_understanding_module(self):
        with tf.name_scope('language_understanding_module'):
            self._inputs_encoder_outputs, self._inputs_encoder_state = self._text_encoder(
                self._inputs_embedded,
                self._input_lengths
            )
            
            self._previous_responses_encoder_outputs, self._previous_responses_encoder_state = self._text_encoder(
                self._previous_responses_embedded,
                self._previous_response_lengths,
                reuse=True
            )
    
    def _frames_encoder_module(self):
        with tf.name_scope('frames_encoder_module'):
            # Input frames
            encoder_output = self._structs_encoder(
                self._frames_slots_embedded,
                self._frames_values_embedded,
                self._frames_slot_counts,
                self._frames_value_lengths
            )
            mask = tf.sequence_mask(self._frames_counts, tf.reduce_max(self._frames_counts), dtype=tf.float32)
                
            self._frames_encoded = tf.multiply(
                tf.expand_dims(mask, -1),
                encoder_output
            )
            
            # Referenced frame
            referenced_frame_encoder_output = self._structs_encoder(
                tf.expand_dims(self._referenced_frame_slots_embedded, 1),
                tf.expand_dims(self._referenced_frame_values_embedded, 1),
                self._referenced_frame_slot_counts,
                self._referenced_frame_value_lengths,
                reuse = True
            )
            self._referenced_frame_encoded = tf.squeeze(referenced_frame_encoder_output, 1)

            # Referenced frame slot indicators [batch_size,n_slots]
            self._referenced_frame_slot_indicators = tf.multiply(
                tf.reduce_sum(tf.one_hot(self._referenced_frame_slots, self._n_slots), 1),
                tf.tile(1 - tf.sequence_mask([1], self._n_slots, dtype=tf.float32), [tf.shape(self._referenced_frame_slots)[0], 1])
            )
    
    def _informable_slots_module(self):
        with tf.variable_scope('informable_slots_module'):
            # Informable slots
            e_inputs = tf.layers.dense(
                self._inputs_encoder_outputs,
                2*self._le_cell_output_size,
                activation=tf.nn.tanh,
                name='weights_projection'
            )
            e_previous_responses = tf.layers.dense(
                self._previous_responses_encoder_outputs,
                2*self._le_cell_output_size,
                activation=tf.nn.tanh,
                name='weights_projection',
                reuse=True
            )
            
            e = tf.matmul(e_inputs, e_previous_responses, transpose_b=True, name='e')
            beta = tf.matmul(tf.nn.softmax(e), self._previous_responses_encoder_outputs)
            
            inputs_compared = tf.layers.dense(
                tf.concat([self._inputs_encoder_outputs, beta], 2),
                2*self._le_cell_output_size,
                activation=tf.nn.tanh,
                name='comparison'
            )
            
            self._informable_slots_logits = tf.layers.dense(
                inputs_compared,
                self._n_slots,
            )
            self.informable_slots_p = tf.nn.softmax(self._informable_slots_logits)
            self.informable_slots_indicators = tf.reduce_sum(self.informable_slots_p, 1)
            
            # Slot states
            slot_states_input = tf.concat([
                self._previous_responses_encoder_state,
                self._inputs_encoder_state
            ], 1)
            
            slot_states_output = tf.layers.dense(slot_states_input, self._n_slots*self._n_slot_states)
            self._informable_slot_states_logits = tf.reshape(slot_states_output, [tf.shape(self._inputs)[0], self._n_slots, self._n_slot_states])
            self.informable_slot_states_p = tf.nn.sigmoid(self._informable_slot_states_logits)
    
    def _frame_references_module(self):
        with tf.name_scope('frame_references_module'):
            batch_size, n_frames, _ = tf.unstack(tf.shape(self._frames_encoded))
            
            self._frames_authors_onehot = tf.one_hot(self._frames_authors, 2)
            self._frames_recently_created_onehot = tf.reduce_sum(tf.one_hot(self._frames_recently_created, n_frames), 1)
            self._previous_active_frame_onehot = tf.one_hot(self._previous_active_frame_id, n_frames)
            
            frames_with_indicators = tf.concat([
                self._frames_encoded, # [batch_size,n_frames,2*cell_size]
                self._frames_authors_onehot, # [batch_size,n_frames,2]
                tf.expand_dims(self._frames_recently_created_onehot, -1), # [batch_size,n_frames,1]
                tf.expand_dims(self._previous_active_frame_onehot, -1) # [batch_size,n_frames,1]
            ], 2)
            
            self._available_frames = tf.concat([
                frames_with_indicators,
                tf.zeros([batch_size, 1, 2*self._le_cell_output_size+4])
            ], 1, name='available_frames')
            
            # Project frames
            frames_sim_proj = tf.layers.dense(
                tf.layers.dropout(
                    self._available_frames,
                    rate = 0.3,
                    training = self._is_training
                ),
                int(self._le_cell_output_size/2),
                activation=tf.nn.tanh
            )
            
            # Project inputs
            inputs_sim_proj = tf.layers.dense(
                tf.layers.dropout(
                    tf.concat([
                        self._inputs_encoder_state,
                        tf.reshape(self.informable_slot_states_p, [batch_size, self._n_slots*self._n_slot_states])
                    ], 1),
                    rate = 0.3,
                    training = self._is_training
                ),
                int(self._le_cell_output_size/2),
                activation=tf.nn.tanh
            )
            
            # Compute input-frame similarity
            sim = tf.matmul(
                tf.expand_dims(inputs_sim_proj, 1),
                frames_sim_proj,
                transpose_b = True
            )
            
            # Mask padded values
            self._frame_references_logits = tf.multiply(
                tf.sequence_mask(self._frames_counts+1, tf.reduce_max(self._frames_counts)+1, dtype=tf.float32),
                tf.squeeze(sim, 1)
            )
            self.frame_references_p = tf.nn.softmax(self._frame_references_logits)
    
    def _context_module(self):
        with tf.variable_scope('context_module'):
            batch_size, n_frames, _ = tf.unstack(tf.shape(self._available_frames))
            self._database_results_count_onehot = tf.one_hot(tf.clip_by_value(self._database_results_count, 0, 5), 6)
            
            context_nn_input = tf.concat([
                self._previous_responses_encoder_state,
                self._inputs_encoder_state,
                self._referenced_frame_encoded,
                self._referenced_frame_slot_indicators,
                self._database_results_count_onehot
            ], 1)
            
            context_L1 = tf.layers.dense(
                tf.layers.dropout(context_nn_input, rate=0.3, training=self._is_training),
                2*self._le_cell_output_size,
                activation=tf.nn.tanh
            )
            
            self._context = tf.layers.dense(
                tf.layers.dropout(context_L1, rate=0.3, training=self._is_training),
                2*self._le_cell_output_size,
                activation=tf.nn.tanh
            )
            
    def _dialog_policy_module(self):
        with tf.name_scope('dialog_policy_module'):
            self._actions_logits = tf.layers.dense(self._context, self._n_actions)
            self.actions_p = tf.nn.sigmoid(self._actions_logits)
            
            self._sub_actions_logits = tf.layers.dense(self._context, self._n_sub_actions)
            self.sub_actions_p = tf.nn.sigmoid(self._sub_actions_logits)
        
    def _prepare_training_decoder(self):
        batch_size, _ = tf.unstack(tf.shape(self._decoder_targets))
        pad = tf.zeros([batch_size, 1], dtype=tf.int64)
        
        self._decoder_inputs = tf.concat([pad, self._decoder_targets], 1)
        self._decoder_inputs_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._decoder_inputs)
        self._decoder_targets_padded = tf.concat([self._decoder_targets, pad], 1)
            
    def _response_generation_module(self):
        with tf.variable_scope('response_generation_module'):
            batch_size, n_tokens = tf.unstack(tf.shape(self._inputs))
            if self._train:
                print('Training decoder')
                self._prepare_training_decoder()

                helper = seq2seq.ScheduledEmbeddingTrainingHelper(
                    inputs = self._decoder_inputs_embedded,
                    sequence_length = self._decoder_target_lengths + 1,
                    embedding = self._word_embeddings,
                    sampling_probability = self._greedy_p,
                )
            else:
                print('Inference decoder')
                helper = seq2seq.GreedyEmbeddingHelper(
                    embedding = self._word_embeddings,
                    start_tokens = tf.tile([0], [batch_size]),
                    end_token = 1
                )
                
            decoder_cell, initial_state = self._decoder_cell()
            
            print(initial_state)
            
            decoder = seq2seq.BasicDecoder(
                cell = decoder_cell,
                helper = helper,
                initial_state = initial_state,
                output_layer=Dense(self._word_embeddings_shape[0])
            )
            decoder_outputs, _, _ = seq2seq.dynamic_decode(decoder, impute_finished=True)
            
            self._decoder_logits = decoder_outputs.rnn_output
            self.decoder_embedding_ids = tf.argmax(self._decoder_logits, -1)
    
    def _decoder_cell(self):
        def _base_cell():
            return DropoutWrapper(
                GRUCell(2*self._le_cell_output_size),
                output_keep_prob=self._rnn_dropout_keep_prob
            )
        
        batch_size, n_tokens = tf.unstack(tf.shape(self._inputs))
        
        attention_memory = tf.concat([
            self._inputs_encoder_outputs,
            tf.tile(tf.expand_dims(self._referenced_frame_encoded, 1), [1,n_tokens,1]),
            tf.tile(tf.expand_dims(self._referenced_frame_slot_indicators, 1), [1,n_tokens,1]),
            tf.tile(tf.expand_dims(self._database_results_count_onehot, 1), [1,n_tokens,1])
        ], 2)

        attention_mechanism = seq2seq.BahdanauAttention(
            num_units = 2*self._le_cell_output_size,
            memory = attention_memory,
            memory_sequence_length = self._input_lengths
        )

        attentive_cell = seq2seq.AttentionWrapper(
            cell = _base_cell(),
            attention_mechanism = attention_mechanism,
            attention_layer_size = 2*self._le_cell_output_size,
            initial_cell_state = self._context
        )
        
        initial_state = tuple([
            self._context,
            attentive_cell.zero_state(batch_size, dtype=tf.float32),
        ])
        
        return MultiRNNCell([
            _base_cell(),
            attentive_cell,
        ], state_is_tuple=True), initial_state
           
    def losses(self, informable_slots_targets, informable_slot_states_targets, frame_references_targets, actions_targets, sub_actions_targets):
        # Informable slots
        informable_slots_stepwise_ce = tf.nn.softmax_cross_entropy_with_logits(
            labels = tf.one_hot(informable_slots_targets, self._n_slots),
            logits = self._informable_slots_logits
        )
        informable_slots_loss = tf.reduce_mean(informable_slots_stepwise_ce)
        informable_slots_accuracy = tf.reduce_mean(tf.cast(tf.equal(informable_slots_targets, tf.argmax(self.informable_slots_p, -1)), tf.float32))
        
        informable_slot_statewise_ce = tf.nn.sigmoid_cross_entropy_with_logits(
            labels = tf.one_hot(informable_slot_states_targets, self._n_slot_states),
            logits = self._informable_slot_states_logits
        )
        informable_slot_states_loss = tf.reduce_mean(informable_slot_statewise_ce)
        informable_slot_states_accuracy = tf.reduce_mean(tf.cast(tf.equal(informable_slot_states_targets, tf.argmax(self.informable_slot_states_p, -1)), tf.float32))
        
        # Frames
        frame_references_framewise_ce = tf.nn.softmax_cross_entropy_with_logits(
            labels = tf.one_hot(frame_references_targets, (tf.shape(self._frames_slots)[1]+1)),
            logits = self._frame_references_logits
        )
        frame_references_loss = tf.reduce_mean(frame_references_framewise_ce)
        frame_references_accuracy = tf.reduce_mean(tf.cast(tf.equal(frame_references_targets, tf.argmax(self.frame_references_p, -1)), tf.float32))
        
        # Actions
        actions_targets_encoded = tf.reduce_sum(tf.one_hot(actions_targets, self._n_actions), 1)
        actions_ce = tf.nn.sigmoid_cross_entropy_with_logits(
            labels = actions_targets_encoded,
            logits = self._actions_logits
        )
        actions_loss = tf.reduce_mean(actions_ce)
        actions_accuracy = tf.reduce_mean(tf.cast(tf.equal(actions_targets_encoded, tf.round(self.actions_p)), tf.float32))
        
        sub_actions_targets_encoded = tf.reduce_sum(tf.one_hot(sub_actions_targets, self._n_sub_actions), 1)
        sub_actions_ce = tf.nn.sigmoid_cross_entropy_with_logits(
            labels = sub_actions_targets_encoded,
            logits = self._sub_actions_logits
        )
        sub_actions_loss = tf.reduce_mean(sub_actions_ce)
        sub_actions_accuracy = tf.reduce_mean(tf.cast(tf.equal(sub_actions_targets_encoded, tf.round(self.sub_actions_p)), tf.float32))
        
        # Decoder
        decoder_stepwise_ce = tf.nn.softmax_cross_entropy_with_logits(
            labels = tf.one_hot(self._decoder_targets_padded, self._word_embeddings_shape[0]),
            logits = self._decoder_logits
        )
        decoder_loss = tf.reduce_mean(decoder_stepwise_ce)
        decoder_accuracy = tf.reduce_mean(tf.cast(tf.equal(self._decoder_targets_padded, self.decoder_embedding_ids), tf.float32))
        
        total_loss = tf.reduce_sum([
            informable_slots_loss,
            informable_slot_states_loss,
            frame_references_loss,
            actions_loss,
            sub_actions_loss,
            decoder_loss,
        ])

        tf.summary.histogram('actions_targets_encoded', actions_targets_encoded)
        tf.summary.histogram('sub_actions_targets_encoded', sub_actions_targets_encoded)
        tf.summary.scalar('total_loss', total_loss)
        tf.summary.scalar('informable_slots_loss', informable_slots_loss)
        tf.summary.scalar('informable_slots_accuracy', informable_slots_accuracy)
        tf.summary.scalar('informable_slot_states_loss', informable_slot_states_loss)
        tf.summary.scalar('informable_slot_states_accuracy', informable_slot_states_accuracy)
        tf.summary.scalar('frame_references_loss', frame_references_loss)
        tf.summary.scalar('frame_references_accuracy', frame_references_accuracy)
        tf.summary.scalar('actions_loss', actions_loss)
        tf.summary.scalar('actions_accuracy', actions_accuracy)
        tf.summary.scalar('sub_actions_loss', sub_actions_loss)
        tf.summary.scalar('sub_actions_accuracy', sub_actions_accuracy)
        tf.summary.scalar('decoder_loss', decoder_loss)
        tf.summary.scalar('decoder_accuracy', decoder_accuracy)
        tf.summary.scalar('decoder_sampling_p', self._greedy_p)
        
        return total_loss

with tf.Graph().as_default():
    training_ph = tf.placeholder(tf.bool)
    # Text
    inputs_ph = tf.placeholder(tf.int64, [64, 10])
    input_lengths_ph = tf.placeholder(tf.int32, [64])
    previous_responses_ph = tf.placeholder(tf.int64, [64,7])
    previous_response_lengths_ph = tf.placeholder(tf.int32, [64])
    # Frames
    frames_counts_ph = tf.placeholder(tf.int32, [64])
    frames_slots_ph = tf.placeholder(tf.int64, [64,3,5])
    frames_values_ph = tf.placeholder(tf.int64, [64,3,5,10])
    frames_slot_counts_ph = tf.placeholder(tf.int64, [64,3])
    frames_value_lengths_ph = tf.placeholder(tf.int64, [64,3,5])
    frames_authors_ph = tf.placeholder(tf.int32, [64,3])
    frames_recently_created_ph = tf.placeholder(tf.int32, [64,2])
    # Referenced frame
    referenced_frame_slots_ph = tf.placeholder(tf.int64, [64,7])
    referenced_frame_values_ph = tf.placeholder(tf.int64, [64,7,3])
    referenced_frame_slot_counts_ph = tf.placeholder(tf.int64, [64])
    referenced_frame_value_lengths_ph = tf.placeholder(tf.int64, [64,7])
    # Indicators
    previous_active_frame_ph = tf.placeholder(tf.int32, [64])
    database_results_count_ph = tf.placeholder(tf.int32, [64])
    
    decoder_targets_ph = tf.placeholder(tf.int64, [64, 12])
    decoder_target_lengths_ph = tf.placeholder(tf.int32, [64])
    
    informable_slots_targets_ph = tf.placeholder(tf.int64, [64,10])
    informable_slots_states_targets_ph = tf.placeholder(tf.int64, [64,12])
    frame_references_targets_ph = tf.placeholder(tf.int64, [64])
    
    actions_targets_ph = tf.placeholder(tf.int64, [64, 5])
    sub_actions_targets_ph = tf.placeholder(tf.int64, [64, 8])
    
    epoch_ph = tf.placeholder(tf.float32, [])
    rnn_dropout_keep_prob_ph = tf.placeholder(tf.float32)
    
    model = JamesBotModel(
        training_ph,
        inputs_ph, input_lengths_ph,
        previous_responses_ph, previous_response_lengths_ph,
        frames_counts_ph, frames_slots_ph, frames_values_ph, frames_slot_counts_ph, frames_value_lengths_ph, frames_authors_ph, frames_recently_created_ph,
        referenced_frame_slots_ph, referenced_frame_values_ph, referenced_frame_slot_counts_ph, referenced_frame_value_lengths_ph,
        previous_active_frame_ph, database_results_count_ph,
        decoder_targets=decoder_targets_ph, decoder_target_lengths=decoder_target_lengths_ph,
        word_embeddings_shape=[9600,300], n_slots=12, n_actions=9, n_sub_actions=21,
        epoch=epoch_ph, rnn_dropout_keep_prob=rnn_dropout_keep_prob_ph
    )
    model.build()
    
    model.losses(informable_slots_targets_ph, informable_slots_states_targets_ph, frame_references_targets_ph, actions_targets_ph, sub_actions_targets_ph) 

Training decoder
(<tf.Tensor 'context_module/dense_2/Tanh:0' shape=(64, 600) dtype=float32>, AttentionWrapperState(cell_state=<tf.Tensor 'response_generation_module/AttentionWrapperZeroState/checked_cell_state:0' shape=(64, 600) dtype=float32>, attention=<tf.Tensor 'response_generation_module/AttentionWrapperZeroState/zeros_1:0' shape=(?, 600) dtype=float32>, time=<tf.Tensor 'response_generation_module/AttentionWrapperZeroState/zeros:0' shape=() dtype=int32>, alignments=<tf.Tensor 'response_generation_module/AttentionWrapperZeroState/zeros_2:0' shape=(?, 10) dtype=float32>, alignment_history=()))


In [1431]:
sess = tf.InteractiveSession()

In [1532]:
graph = tf.Graph()

with graph.as_default():
    is_training_ph = tf.placeholder(tf.bool)
    # Text
    inputs_ph = tf.placeholder(tf.int64, [None, None])
    input_lengths_ph = tf.placeholder(tf.int32, [None])
    previous_responses_ph = tf.placeholder(tf.int64, [None, None])
    previous_response_lengths_ph = tf.placeholder(tf.int32, [None])
    # Frames
    frames_counts_ph = tf.placeholder(tf.int32, [None])
    frames_slots_ph = tf.placeholder(tf.int64, [None, None, None])
    frames_values_ph = tf.placeholder(tf.int64, [None, None, None, None])
    frames_slot_counts_ph = tf.placeholder(tf.int64, [None, None])
    frames_value_lengths_ph = tf.placeholder(tf.int64, [None, None, None])
    frames_authors_ph = tf.placeholder(tf.int32, [None, None])
    frames_recently_created_ph = tf.placeholder(tf.int32, [None, None])
    # Referenced frame
    referenced_frame_slots_ph = tf.placeholder(tf.int64, [None, None])
    referenced_frame_values_ph = tf.placeholder(tf.int64, [None, None, None])
    referenced_frame_slot_counts_ph = tf.placeholder(tf.int64, [None])
    referenced_frame_value_lengths_ph = tf.placeholder(tf.int64, [None, None])
    # Indicators
    previous_active_frame_ph = tf.placeholder(tf.int32, [None])
    database_results_count_ph = tf.placeholder(tf.int32, [None])
    
    decoder_targets_ph = tf.placeholder(tf.int64, [None, None])
    decoder_target_lengths_ph = tf.placeholder(tf.int32, [None])
    
    informable_slots_targets_ph = tf.placeholder(tf.int64, [None, None])
    informable_slot_states_targets_ph = tf.placeholder(tf.int64, [None, None])
    frame_references_targets_ph = tf.placeholder(tf.int64, [None])
    
    actions_targets_ph = tf.placeholder(tf.int64, [None, None])
    sub_actions_targets_ph = tf.placeholder(tf.int64, [None, None])
    
    epoch_ph = tf.placeholder(tf.float32, [])
    rnn_dropout_keep_prob_ph = tf.placeholder(tf.float32)
    
    model = JamesBotModel(
        is_training_ph,
        inputs_ph, input_lengths_ph,
        previous_responses_ph, previous_response_lengths_ph,
        frames_counts_ph, frames_slots_ph, frames_values_ph, frames_slot_counts_ph, frames_value_lengths_ph, frames_authors_ph, frames_recently_created_ph,
        referenced_frame_slots_ph, referenced_frame_values_ph, referenced_frame_slot_counts_ph, referenced_frame_value_lengths_ph,
        previous_active_frame_ph, database_results_count_ph,
        decoder_targets=decoder_targets_ph, decoder_target_lengths=decoder_target_lengths_ph,
        word_embeddings_shape=embeddings.shape, n_slots=len(slots_dictionary),
        n_actions=len(agent_actions_dictionary), n_sub_actions=len(agent_sub_actions_dictionary),
        epoch=epoch_ph, train=True, rnn_dropout_keep_prob=rnn_dropout_keep_prob_ph
    )
    model.build()
    embeddings_ph, embeddings_init_op = model.embeddings_initializer()
    
    total_loss = model.losses(informable_slots_targets_ph, informable_slot_states_targets_ph, frame_references_targets_ph, actions_targets_ph, sub_actions_targets_ph)
    
    train_op = tf.train.AdamOptimizer().minimize(total_loss)
    saver = tf.train.Saver(max_to_keep=None)
    
    for var in tf.trainable_variables():
        print(var.name)
    
    print('Trainable parameters:', np.sum([np.prod(var.get_shape().as_list()) for var in tf.trainable_variables()]))

Training decoder
embeddings_module/word_embeddings:0
embeddings_module/slot_embeddings:0
text_encoder/bidirectional_rnn/fw/gru_cell/gates/kernel:0
text_encoder/bidirectional_rnn/fw/gru_cell/gates/bias:0
text_encoder/bidirectional_rnn/fw/gru_cell/candidate/kernel:0
text_encoder/bidirectional_rnn/fw/gru_cell/candidate/bias:0
text_encoder/bidirectional_rnn/bw/gru_cell/gates/kernel:0
text_encoder/bidirectional_rnn/bw/gru_cell/gates/bias:0
text_encoder/bidirectional_rnn/bw/gru_cell/candidate/kernel:0
text_encoder/bidirectional_rnn/bw/gru_cell/candidate/bias:0
structs_encoder/rnn/gru_cell/gates/kernel:0
structs_encoder/rnn/gru_cell/gates/bias:0
structs_encoder/rnn/gru_cell/candidate/kernel:0
structs_encoder/rnn/gru_cell/candidate/bias:0
informable_slots_module/weights_projection/kernel:0
informable_slots_module/weights_projection/bias:0
informable_slots_module/comparison/kernel:0
informable_slots_module/comparison/bias:0
informable_slots_module/dense/kernel:0
informable_slots_module/dense/bi

In [1467]:
len(slots_dictionary)

139

In [1524]:
# sess = tf.InteractiveSession(graph=graph)

with tf.Session(graph=graph) as sess:

    sess.run(tf.global_variables_initializer())
    sess.run(embeddings_init_op, feed_dict={embeddings_ph: embeddings})

    for i, batch in enumerate(samples_iterator(turns_train)):
        
        fd = {
            is_training_ph: True,
            inputs_ph: batch['user_inputs'],
            input_lengths_ph: batch['user_input_lengths'],
            previous_responses_ph: batch['previous_responses'],
            previous_response_lengths_ph: batch['previous_response_lengths'],
            # Frames
            frames_counts_ph: batch['frames_counts'],
            frames_slots_ph: batch['frames_slots'],
            frames_values_ph: batch['frames_values'],
            frames_slot_counts_ph: batch['frames_slot_counts'],
            frames_value_lengths_ph: batch['frames_value_lengths'],
            frames_authors_ph: batch['frames_authors'],
            frames_recently_created_ph: batch['frames_recently_created'],
            # Referenced frame
            referenced_frame_slots_ph: batch['referenced_frame_slots'],
            referenced_frame_values_ph: batch['referenced_frame_values'],
            referenced_frame_slot_counts_ph: batch['referenced_frame_slot_counts'],
            referenced_frame_value_lengths_ph: batch['referenced_frame_value_lengths'],
            # Indicators
            previous_active_frame_ph: batch['previous_active_frames'],
            database_results_count_ph: batch['db_result_counts'],
            # Targets
            informable_slots_targets_ph: batch['user_informed_value_slots'],
            informable_slot_states_targets_ph: batch['user_informed_slot_states'],
            frame_references_targets_ph: batch['current_active_frames'],
            decoder_targets_ph: batch['next_responses'],
            decoder_target_lengths_ph: batch['next_response_lengths'],
            actions_targets_ph: batch['agent_actions'],
            sub_actions_targets_ph: batch['agent_sub_actions'],
            # Conf
            epoch_ph: float(i),
            rnn_dropout_keep_prob_ph: 0.7
        }

        _, loss_val, sampling_p = sess.run([train_op, total_loss, model._greedy_p], feed_dict = fd)
        print('Sampling p:', sampling_p)
#         if i % 5 == 0:
        print(i, 'Total loss:', loss_val)
    
    

Sampling p: 0.0
0 Total loss: 15.4035
Sampling p: 0.024917
1 Total loss: 13.1524
Sampling p: 0.0493438
2 Total loss: 12.0684
Sampling p: 0.0728281
3 Total loss: 11.0698


KeyboardInterrupt: 

In [1304]:
sess = tf.InteractiveSession()

In [1332]:
n_slots = 51

present_slots = tf.constant([
    [1,17,4,45,0,0],
    [1,45,9,21,11,13]
], tf.int32)

slot_indicators = tf.multiply(
    tf.reduce_sum(tf.one_hot(present_slots, n_slots), 1),
    tf.tile(1 - tf.sequence_mask([1], 51, dtype=tf.float32), [2, 1])
)

sess.run(slot_indicators)

array([[ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,
         1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)

In [1092]:
def embed_sentence(sentence):
    return np.array([glove_dictionary.get(token, 2) for token in nltk.word_tokenize(str(sentence).lower())])

def embed_frame(frame):
    embedded = {}
    for (key, value) in frame.items():
        embedded[slots_dictionary.get(key)] = embed_sentence(value)
    return embedded
        

def parse_slots(sentence, slot_ids):
    tokens = nltk.word_tokenize(str(sentence).lower())
    assert len(tokens) == len(slot_ids)
    
    resolved = []
    for idx, slot_id in enumerate(slot_ids):
        if slot_id > 0:
            if len(resolved) == 0:
                resolved.append([slot_id, tokens[idx]])
            else:
                if resolved[-1][0] == slot_id:
                    resolved[-1].append(tokens[idx])
                else:
                    resolved.append([slot_id, tokens[idx]])
    
    return {slots_index.get(val[0]): ' '.join(val[1:]) for val in resolved}

frames = [
    {'or_city': 'UNK', 'dst_city': 'UNK'},
#     {'or_city': 'Frankfurt', 'dst_city': 'Toronto', 'price': 2432.32},
#     {'or_city': 'Frankfurt', 'dst_city': 'Caprica'},
#     {'or_city': 'Frankfurt', 'dst_city': 'Vancouver'}
]
embedded_frames = [embed_frame(frame) for frame in frames]

frames_slots, frames_values, frames_slot_counts, frames_value_lengths = pad_array_of_complex(embedded_frames, 3, 1)

previous_active_frame_id = 0
previous_response = '_'
current_input = 'Hey! I would like to book a flight to Frankfurt'

previous_response_embedding_ids = embed_sentence(previous_response)
current_input_embedding_ids = embed_sentence(current_input)

# frames_slot_counts, frames_value_lengths, frames_slots, frames_values = pad_frames([frames_embedded], len(frames_embedded))

with tf.Session(graph=graph) as sess:
    saver.restore(sess, '../checkpoints/gcloud_responses_20/checkpoints.ckpt-9')
    fd = {
        is_training_ph: False,
        inputs_ph: [current_input_embedding_ids],
        input_lengths_ph: [len(current_input_embedding_ids)],
        previous_responses_ph: [previous_response_embedding_ids],
        previous_response_lengths_ph: [len(previous_response_embedding_ids)],
        frames_slots_ph: [frames_slots],
        frames_values_ph: [frames_values],
        frames_slot_counts_ph: [frames_slot_counts],
        frames_value_lengths_ph: [frames_value_lengths],
        previous_active_frame_ph: [previous_active_frame_id],
        database_results_count_ph: [1]
    }
    result, frame_references, decoded_ids = sess.run([model.informable_slots_p, model.frame_references_p, model.decoder_embedding_ids], feed_dict = fd)
    
    print('Frames:')
    for frame in frames:
        print(frame)
    
    print('Agent response:', previous_response)
    print('User input:', current_input)
    print('Agent response:', ' '.join([glove_index.get(word_idx) for word_idx in decoded_ids.reshape(-1).tolist()]))
    
    refered_frame_idx = np.argmax(frame_references, -1)
    parsed_slots = parse_slots(current_input, np.argmax(result, axis=2).reshape(-1))
    
    if refered_frame_idx == 0:
        print('Previous frame')
    elif len(frame_references[0])-1 == refered_frame_idx:
        print('New frame:', dict(frames[previous_active_frame_id], **parsed_slots))
    else:
        print('Refered frame:', frames[refered_frame_idx[0]-1])
    print()

        
    print(frame_references)
    
#     print(np.argmax(fr_p, axis=-1))
    print(parsed_slots)

INFO:tensorflow:Restoring parameters from ../checkpoints/gcloud_responses_9/checkpoints.ckpt-23
Frames:
{'or_city': 'UNK', 'dst_city': 'UNK'}
Agent response: _
User input: Hey! I would like to book a flight to Frankfurt
Agent response: okay , i can help you with that ! where are you coming from ? <EOS>
Previous frame

[[ 0.48071796  0.48071796  0.03856414]]
{'dst_city': 'frankfurt'}


In [936]:
glove_index

{0: '<PAD>',
 1: '<EOS>',
 2: '<UNK>',
 3: '<VAL.true>',
 4: '<VAL.false>',
 5: '<VAL.any>',
 6: '<SLOT.<NO_SLOT>>',
 7: '<SLOT.price>',
 8: '<SLOT.min_duration>',
 9: '<SLOT.str_date>',
 10: '<SLOT.count_name>',
 11: '<SLOT.dst_city>',
 12: '<SLOT.category>',
 13: '<SLOT.wifi>',
 14: '<SLOT.dep_time_dst>',
 15: '<SLOT.breakfast>',
 16: '<SLOT.count>',
 17: '<SLOT.university>',
 18: '<SLOT.gym>',
 19: '<SLOT.budget_ok>',
 20: '<SLOT.park>',
 21: '<SLOT.max_duration>',
 22: '<SLOT.count_dst_city>',
 23: '<SLOT.dep_time_or>',
 24: '<SLOT.downtown>',
 25: '<SLOT.vicinity>',
 26: '<SLOT.count_amenities>',
 27: '<SLOT.arr_time_or>',
 28: '<SLOT.beach>',
 29: '<SLOT.amenities>',
 30: '<SLOT.count_category>',
 31: '<SLOT.end_date>',
 32: '<SLOT.name>',
 33: '<SLOT.end_date_ok>',
 34: '<SLOT.dst_city_ok>',
 35: '<SLOT.arr_time_dst>',
 36: '<SLOT.or_city>',
 37: '<SLOT.spa>',
 38: '<SLOT.n_children>',
 39: '<SLOT.palace>',
 40: '<SLOT.duration>',
 41: '<SLOT.seat_ok>',
 42: '<SLOT.flex>',
 43: 

In [214]:

class ConversationalModel():
    
    def __init__(self, inputs, input_lengths, last_responses, last_response_lengths, frame_slots, frame_values, frame_slot_counts, frame_value_lengths, embeddings_shape=None, trainable_embeddings=False, n_slots=5):
        '''
        :param inputs: Current user input [batch_size x n_tokens]
        :param input_lengths: Number of tokens per input utterance [batch_size]
        :param last_responses: Last agent responses [batch_size x n_tokens]
        :param last_response_lengths: Number of tokens per response [batch_size]
        :param frame_slots: Input frames slots [batch_size x n_frames x n_slots]
        :param frame_values: Input frames values [batch_size x n_frames x n_slots x n_tokens]
        :param frame_value_lengths: Input frame value lengths [batch_size x n_frames x n_slots]
        '''
        self._inputs = inputs
        self._input_lenghts = input_lengths
        self._last_responses = last_responses
        self._last_response_lengths = last_response_lengths
        self._frame_slots = frame_slots
        self._frame_values = frame_values
        self._frame_slot_counts = frame_slot_counts
        self._frame_value_lengths = frame_value_lengths
        
        self._embeddings_shape = embeddings_shape
        self._trainable_embeddings = trainable_embeddings
        self._n_slots = n_slots
        
        self._le_cell_output_size = 300
        self._attention_output_size = 600
        self._slot_embeddings_size = 300

    def build(self):
        self._embeddings_module()
        self._language_encoder_module()
        self._informable_slots_parser()
#         self._requestable_slots_parser()
        self._frames_encoder_module()
        self._frame_references_parser()

    def embeddings_initializer(self):
        _placeholder = tf.placeholder(tf.float32)
        _op = self._word_embeddings.assign(_placeholder)
        return _placeholder, _op
        
    def _embeddings_module(self):
        with tf.name_scope('embeddings_module'):
            self._word_embeddings = tf.Variable(
                tf.random_normal(self._embeddings_shape, -.3, .3),
                trainable = self._trainable_embeddings,
                name = 'word_embeddings'
            )
            self._slot_embeddings = tf.Variable(
                tf.random_normal([self._n_slots, self._slot_embeddings_size], -.3, .3),
                trainable = True,
                name = 'slot_embeddings'
            )
            
            self._inputs_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._inputs)
            self._last_responses_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._last_responses)
            
            self._frame_slots_embedded = tf.nn.embedding_lookup(self._slot_embeddings, self._frame_slots)
            self._frame_values_embedded = tf.nn.embedding_lookup(self._word_embeddings, self._frame_values)
        
    def _language_encoder(self, inputs, sequence_lengths, name='language_encoder', reuse=False):
        with tf.variable_scope(name, reuse=reuse):
            zero_pad = tf.zeros([tf.shape(inputs)[0], 1, self._embeddings_shape[1]])
            inputs_padded = tf.concat([zero_pad, inputs, zero_pad], 1)
            
            inputs_convolved = tf.layers.conv1d(
                inputs_padded,
                filters = self._embeddings_shape[1],
                kernel_size = 3,
                use_bias = False,
                padding = 'valid',
                name='outputs_conv'
            )
            
            fw_cell = GRUCell(self._le_cell_output_size, activation=tf.nn.tanh)
            bw_cell = GRUCell(self._le_cell_output_size, activation=tf.nn.tanh)

            outputs, state = tf.nn.bidirectional_dynamic_rnn(
                fw_cell, bw_cell,
                inputs = inputs,
                sequence_length = sequence_lengths,
                dtype = tf.float32,
            )
            
            return tf.concat(outputs, 2, name='le_outputs'), tf.concat(state, 1, name='le_state')
    
    def _language_encoder_module(self):
        with tf.name_scope('language_encoder_module'):
            self._inputs_encoded_outputs, self._inputs_encoded_state = self._language_encoder(
                inputs = self._inputs_embedded,
                sequence_lengths = self._input_lenghts
            )
            self._last_responses_encoded_outputs, self._last_responses_encoded_state = self._language_encoder(
                inputs = self._last_responses_embedded,
                sequence_lengths = self._last_response_lengths,
                reuse = True
            )
    
    def _informable_slots_parser(self):
        with tf.name_scope('informable_slots_parser_module'):
            e_inputs = tf.layers.dense(
                self._inputs_encoded_outputs,
                self._attention_output_size,
                activation=tf.nn.tanh,
                name='weights_projection'
            )
            e_last_responses = tf.layers.dense(
                self._last_responses_encoded_outputs,
                self._attention_output_size,
                activation=tf.nn.tanh,
                name='weights_projection',
                reuse=True
            )
            
            e = tf.matmul(e_inputs, e_last_responses, transpose_b=True, name='e')
            beta = tf.matmul(tf.nn.softmax(e), self._last_responses_encoded_outputs)
            
            inputs_compared = tf.layers.dense(
                tf.concat([self._inputs_encoded_outputs, beta], 2),
                self._attention_output_size,
                activation=tf.nn.tanh,
                name='comparison'
            )
            
            self._informable_slots_logits = tf.layers.dense(
                inputs_compared,
                self._n_slots,
            )
            self.informable_slots_p = tf.nn.softmax(self._informable_slots_logits)
    
    def _requestable_slots_parser(self):
        pass
    
    def _frames_encoder_module(self):
        with tf.name_scope('frames_encoder_module'):
            batch_size, n_frames, n_slots, n_tokens, _ = tf.unstack(tf.shape(self._frame_values_embedded))
            _, le_state = self._language_encoder(
                tf.reshape(self._frame_values_embedded, [-1, n_tokens, self._embeddings_shape[1]]),
                tf.reshape(self._frame_value_lengths, [-1]),
                reuse=True
            )
            frames_slot_value = tf.concat([
                tf.reshape(self._frame_slots_embedded, [batch_size*n_frames, n_slots, self._slot_embeddings_size]),
                tf.reshape(le_state, [batch_size*n_frames, n_slots, 2*self._le_cell_output_size]),
            ], 2)
            
            _, frames_encoder_state = tf.nn.dynamic_rnn(
                GRUCell(2*self._le_cell_output_size, activation=tf.nn.tanh),
                inputs = frames_slot_value,
                sequence_length = tf.reshape(self._frame_slot_counts, [-1]),
                dtype=tf.float32
            )

            self._frames_encoded = tf.reshape(frames_encoder_state, [batch_size, n_frames, 2*self._le_cell_output_size])
            
    def _frame_references_parser(self):
        batch_size, n_tokens = tf.unstack(tf.shape(self._inputs))
        token_slots = tf.matmul(
            tf.reshape(self.informable_slots_p, [-1, self._n_slots]),
            self._slot_embeddings
        )
        token_slot_value = tf.concat([
            tf.reshape(token_slots, [batch_size, n_tokens, self._slot_embeddings_size]),
            self._inputs_encoded_outputs,
        ], 2)
        token_slot_projected = tf.layers.dense(
            token_slot_value,
            2*self._le_cell_output_size,
            activation = tf.nn.tanh
        )
        
        # Projection
        e_tokens = tf.layers.dense(
            token_slot_projected,
            self._attention_output_size,
            activation = tf.nn.tanh,
            name='frames_attention_weights_projection',
        )
        e_frames = tf.layers.dense(
            self._frames_encoded,
            self._attention_output_size,
            activation = tf.nn.tanh,
            name='frames_attention_weights_projection',
            reuse=True
        )
        
        self._frame_references_logits = tf.matmul(e_tokens, e_frames, transpose_b=True, name='e')
        self.frame_references_p = tf.nn.softmax(self._frame_references_logits)
        
    def losses(self, informable_slots_targets, frame_references_targets):
        informable_slots_stepwise_ce = tf.nn.softmax_cross_entropy_with_logits(
            labels = tf.one_hot(informable_slots_targets, self._n_slots),
            logits = self._informable_slots_logits
        )    
        informable_slots_loss = tf.reduce_mean(informable_slots_stepwise_ce)
        informable_slots_accuracy = tf.reduce_mean(tf.cast(tf.equal(informable_slots_targets, tf.argmax(self.informable_slots_p, -1)), tf.float32))
        
        frame_references_framewise_ce = tf.nn.softmax_cross_entropy_with_logits(
            labels = tf.one_hot(frame_references_targets, tf.shape(self._frame_slots)[1]+1),
            logits = self._frame_references_logits,
        )
        
        frame_references_loss = tf.reduce_mean(frame_references_framewise_ce)
        frame_references_accuracy = tf.reduce_mean(tf.cast(tf.equal(frame_references_targets, tf.argmax(self.frame_references_p, -1)), tf.float32))
        
        total_loss = tf.reduce_sum([informable_slots_loss, frame_references_loss])

        tf.summary.scalar('total_loss', total_loss)
        tf.summary.scalar('informable_slots_loss', informable_slots_loss)
        tf.summary.scalar('informable_slots_accuracy', informable_slots_accuracy)
        tf.summary.scalar('frame_references_loss', frame_references_loss)
        tf.summary.scalar('frame_references_accuracy', frame_references_accuracy)
        
        return (
            total_loss,
            informable_slots_loss,
            informable_slots_accuracy,
            frame_references_loss,
            frame_references_accuracy,
        )
    
    
with tf.Graph().as_default():
    inputs_ph = tf.placeholder(tf.int64, [64, 10])
    input_lengths_ph = tf.placeholder(tf.int64, [64])
    last_responses_ph = tf.placeholder(tf.int64, [64,7])
    last_response_lengths_ph = tf.placeholder(tf.int64, [64])
    frame_slots_ph = tf.placeholder(tf.int64, [64,3,5])
    frame_values_ph = tf.placeholder(tf.int64, [64,3,5,10])
    frame_slot_counts_ph = tf.placeholder(tf.int64, [64,3])
    frame_value_lengths_ph = tf.placeholder(tf.int64, [64,3,5])
    
    informable_slots_targets_ph = tf.placeholder(tf.int64, [64,10])
    frame_references_targets_ph = tf.placeholder(tf.int64, [64,10])
    
    model = ConversationalModel(inputs_ph, input_lengths_ph, last_responses_ph, last_response_lengths_ph, frame_slots_ph, frame_values_ph, frame_slot_counts_ph, frame_value_lengths_ph, embeddings_shape=[9600,300])
    model.build()
    
    print(model.losses(informable_slots_targets_ph, frame_references_targets_ph))
    
    

(<tf.Tensor 'Sum:0' shape=() dtype=float32>, <tf.Tensor 'Mean:0' shape=() dtype=float32>, <tf.Tensor 'Mean_1:0' shape=() dtype=float32>, <tf.Tensor 'Mean_2:0' shape=() dtype=float32>, <tf.Tensor 'Mean_3:0' shape=() dtype=float32>)


## Training

In [215]:
graph = tf.Graph()

with graph.as_default():
    inputs_ph = tf.placeholder(tf.int64, [None, None])
    input_lengths_ph = tf.placeholder(tf.int64, [None])
    last_responses_ph = tf.placeholder(tf.int64, [None, None])
    last_response_lengths_ph = tf.placeholder(tf.int64, [None])
    frame_slots_ph = tf.placeholder(tf.int64, [None, None, None])
    frame_values_ph = tf.placeholder(tf.int64, [None, None, None, None])
    frame_slot_counts_ph = tf.placeholder(tf.int64, [None, None])
    frame_value_lengths_ph = tf.placeholder(tf.int64, [None, None, None])
    
    informable_slots_targets_ph = tf.placeholder(tf.int64, [None, None])
    frame_references_targets_ph = tf.placeholder(tf.int64, [None, None])
    
    model = ConversationalModel(inputs_ph, input_lengths_ph, last_responses_ph, last_response_lengths_ph, frame_slots_ph, frame_values_ph, frame_slot_counts_ph, frame_value_lengths_ph, embeddings_shape=[len(embeddings),300], n_slots=len(slots_dictionary))
    model.build()
    
    embeddings_ph, embeddings_init_op = model.embeddings_initializer()
    
    total_loss, informable_slots_loss, informable_slots_accuracy, frame_references_loss, frame_references_accuracy = model.losses(informable_slots_targets_ph, frame_references_targets_ph)
    train_op = tf.train.AdamOptimizer().minimize(total_loss)
    
    saver = tf.train.Saver(max_to_keep=None)

In [216]:
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(embeddings_init_op, feed_dict={embeddings_ph: embeddings})
    
    try:
        for e in range(30):
            i = 0
            for (inputs, input_lengths, previous_responses, previous_response_lengths, frames_slot_counts, frames_value_lengths, frames_slots, frames_values, informable_slots_targets, frame_references_targets) in samples_iterator(train_data):
                i += 1
                fd = {
                    inputs_ph: inputs,
                    input_lengths_ph: input_lengths,
                    last_responses_ph: previous_responses,
                    last_response_lengths_ph: previous_response_lengths,
                    frame_slots_ph: frames_slots,
                    frame_values_ph: frames_values,
                    frame_slot_counts_ph: frames_slot_counts,
                    frame_value_lengths_ph: frames_value_lengths,
                    informable_slots_targets_ph: informable_slots_targets,
                    frame_references_targets_ph: frame_references_targets
                }
                
                _, total_loss_val, informable_slots_loss_val, informable_slots_accuracy_val, frame_references_loss_val, frame_references_accuracy_val = sess.run([train_op, total_loss, informable_slots_loss, informable_slots_accuracy, frame_references_loss, frame_references_accuracy], feed_dict = fd)
                print('Total loss:', total_loss_val, 'IS loss:', informable_slots_loss_val, 'IS acc:', informable_slots_accuracy_val, 'FR loss:', frame_references_loss_val, 'FR acc:', frame_references_accuracy_val)
#                 if i % 20 == 0:
#                     print('Epoch:', e, 'Step:', i, 'Loss:', loss_val, 'Accuracy:', accuracy_val)

            saver.save(sess, '../checkpoints/conversational/intent_frame_ref_1/model', global_step=e)

            losses, accuracies = [], []
            for (inputs, input_lengths, previous_responses, previous_response_lengths, targets) in samples_iterator(test_data):
                fd = {
                    inputs_ph: inputs,
                    input_lengths_ph: input_lengths,
                    last_responses_ph: previous_responses,
                    last_response_lengths_ph: previous_response_lengths,
                    informable_slots_targets_ph: targets,
                }

                loss_val, accuracy_val = sess.run([informable_slots_loss, informable_slots_accuracy], feed_dict = fd)
                losses.append(loss_val)
                accuracies.append(accuracy_val)

            print('Epoch:', e, 'Mean loss:', np.mean(losses), 'Mean accuracy:', np.mean(accuracies))
            losses, accuracies = [], []
    except KeyboardInterrupt:
        print('Interrupted')

Total loss: 6.75288 IS loss: 3.99973 IS acc: 0.0388105 FR loss: 2.75315 FR acc: 0.165827
Total loss: 3.5116 IS loss: 2.26261 IS acc: 0.942857 FR loss: 1.24899 FR acc: 0.984821
Total loss: 3.56339 IS loss: 1.73722 IS acc: 0.946484 FR loss: 1.82617 FR acc: 0.982031
Total loss: 2.28559 IS loss: 1.13627 IS acc: 0.962402 FR loss: 1.14931 FR acc: 0.982422
Interrupted


In [198]:
def embed_sentence(sentence):
    return np.array([glove_dictionary.get(token, 2) for token in nltk.word_tokenize(str(sentence).lower())])

def parse_slots(sentence, slot_ids):
    tokens = nltk.word_tokenize(str(sentence).lower())
    assert len(tokens) == len(slot_ids)
    
    resolved = []
    for idx, slot_id in enumerate(slot_ids):
        if slot_id > 0:
            if len(resolved) == 0:
                resolved.append([slot_id, tokens[idx]])
            else:
                if resolved[-1][0] == slot_id:
                    resolved[-1].append(tokens[idx])
                else:
                    resolved.append([slot_id, tokens[idx]])
    
    return {slots_index.get(val[0]): ' '.join(val[1:]) for val in resolved}

turn = frames_processed[12][1]

previous_active_frame_id = 0
frames = turn['frames']
previous_response = turn['previous_response']
current_input = turn['user_input']

previous_response_embedding_ids = embed_sentence(previous_response)
current_input_embedding_ids = embed_sentence(current_input)
frames_embedded = embed_frames(frames)

frames_slot_counts, frames_value_lengths, frames_slots, frames_values = pad_frames([frames_embedded], len(frames_embedded))

with tf.Session(graph=graph) as sess:
    saver.restore(sess, '../checkpoints/gcloud_intent_frames_4/checkpoints.ckpt-21')
    fd = {
        inputs_ph: [current_input_embedding_ids],
        input_lengths_ph: [len(current_input_embedding_ids)],
        last_responses_ph: [previous_response_embedding_ids],
        last_response_lengths_ph: [len(previous_response_embedding_ids)],
        frame_slots_ph: frames_slots,
        frame_values_ph: frames_values,
        frame_slot_counts_ph: frames_slot_counts,
        frame_value_lengths_ph: frames_value_lengths,
    }
    result, fr_p = sess.run([model.informable_slots_p, model.frame_references_p], feed_dict = fd)
    
    print(fr_p)
    print(np.argmax(fr_p, axis=-1))
    print(parse_slots(
        current_input,
        np.argmax(result, axis=2).reshape(-1)
    ))

INFO:tensorflow:Restoring parameters from ../checkpoints/gcloud_intent_frames_4/checkpoints.ckpt-21
[[[  9.99997973e-01   1.89331308e-06   3.18753521e-08   1.17094636e-07]
  [  9.99651313e-01   2.84858339e-04   4.80335711e-06   5.90107593e-05]
  [  9.99999762e-01   1.82410517e-07   8.70987310e-11   1.22497373e-10]
  [  9.99763310e-01   2.25951866e-04   1.68528231e-06   9.11747793e-06]
  [  9.99996781e-01   3.19054561e-06   6.29734265e-09   1.18549073e-08]
  [  9.99998212e-01   1.84210762e-06   9.81873680e-11   1.29314712e-10]]]
[[0 0 0 0 0 0]]
{}
