# Hands On

Now let's get our hands dirty. We show how the whole training pipeline works and then we also look at how to write a custom Keras Layer for the SC-LSTM cell. You can donwload the whole codebase here: [Cool Code Here](https://github.com/jderiu/e2e_nlg) 

## Preprocessing
The first step in every machine learning pipeline is the preprocessing. The preprocessing consists of the following steps:
- Delexicalizing the data: Replacing the names of the restaurant by placeholders. 
- Vectorizing the data: translating the meaning represenations into binarized vectors as well as transforming the utterances in a list of indices, each character is represented by an index from the vocabulary (i.e. char2idx mapping).
- Extracting the syntactic information: get the first word of the utterance and the follow-up sentences and encode those into a binary vector.

In [1]:
import logging
import os, sys
import pprint

pp = pprint.PrettyPrinter(width=41, compact=True)
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

#makes sure that the modules can be loaded
nb_dir = os.path.split(os.getcwd())[0]
nb_dir = nb_dir.replace('\\src', '')
sys.path.append(nb_dir)
logging.info('Base directory:' + nb_dir)
## should output: /some-path/e2e_nlg

from src.data_processing.delexicalise_data import _delex_nlg_data, _retrieve_mr_ontology, _load_attributes
data_path = os.path.join(nb_dir, 'data/e2e_nlg')
# List of what attributes we want to replace with a placeholder
delex_attributes = ["name", "near", "food"]
# File name for the table of attribute-placeholder pairs
attribute_fname = 'ontology/attribute_tags.txt'
attribute_tokens = _load_attributes(os.path.join(data_path, attribute_fname))

train_delex = _delex_nlg_data('trainset.csv', data_path, delex_attributes, attribute_tokens)
valid_delex = _delex_nlg_data('devset.csv', data_path, delex_attributes, attribute_tokens)
test_delex = _delex_nlg_data('testset.csv', data_path, delex_attributes, attribute_tokens)

print('Lengths of Trainset: {} Validationset: {} Testset: {}'.format(
    len(train_delex['mr_raw']), 
    len(valid_delex['mr_raw']), 
    len(test_delex['mr_raw'])))

#Print an example
idx = 110 #(change me)
print('Parsed MR:')
pp.pprint(train_delex['parsed_mrs'][idx])
print('Original Output: ', train_delex['outputs_raw'][idx])
print('Delexicalized Output: ', train_delex['delexicalised_texts'][idx])
pp.pprint(attribute_tokens)


2019-02-13 21:38:45,166 : INFO : Base directory:D:\GitRepos\e2e_nlg


Lengths of Trainset: 42061 Validationset: 4672 Testset: 4693
Parsed MR:
{'area': 'riverside',
 'customer rating': '5 out of 5',
 'eatType': 'coffee shop',
 'food': 'Japanese',
 'name': 'The Golden Palace',
 'priceRange': 'more than £30'}
Original Output:  The coffee shop The Golden Palace is north of the city centre. It serves expensive food and has a 5 star rating.
Delexicalized Output:  The coffee shop XNAMEX is north of the city centre. It serves expensive food and has a 5 star rating.
{'area': 'XAREAX',
 'customer rating': 'XCUSTX',
 'eatType': 'XEATX',
 'familyFriendly': 'XFAMX',
 'food': 'XFOODX',
 'name': 'XNAMEX',
 'near': 'XNESRX',
 'priceRange': 'XPRICX'}


Next we need to extract the data ontology, which we need to vectorize the data later. The ontoloty is a dictonary of values2idx vocabulaires. 

In [2]:
full_mr_list = train_delex['parsed_mrs'] + valid_delex['parsed_mrs'] + test_delex['parsed_mrs']
mr_data_ontology = _retrieve_mr_ontology(full_mr_list)
print('List of attributes:')
pp.pprint(list(mr_data_ontology.keys()))
print('Value2Idx Vocabulary for priceRange:')
pp.pprint(mr_data_ontology['priceRange'])

List of attributes:
['name', 'eatType', 'priceRange',
 'customer rating', 'near', 'food',
 'area', 'familyFriendly']
Value2Idx Vocabulary for priceRange:
{'cheap': 0,
 'high': 1,
 'less than £20': 2,
 'moderate': 3,
 'more than £30': 4,
 '£20-25': 5}


# Vectorization
Next, we transform the preprocessed data into vectors, which can be interpreted by our neural network. This requires the following steps:
- Transform the meaning representations into a binary representation. For this, we rely on the ontology we extracted in the cell above.
- Transform the utterances into a list of indices, which are then given as input to the neural network. Each index corresponds to a alphanumeric character.  

## Vectorize Meaning Representations
For each attribute, we create a one-hot encoded vector, which indicates which value is present in the utterance. We add am extra dimension to the vectors for those cases where the attrbute is missing. Note that the delexicalized attributes only have lenghts of two. This is just to indicate if the attribute is present or not, since the value is replaced by a placeholder.

We frist take the processing of one MR apart and then we process the whole dataset.

In [3]:
from src.data_processing.vectorize_data import _compute_vector_length, _vectorize_single_mr

# First compute the length of the one-hot encoded vectors:
vector_lengts = _compute_vector_length(mr_data_ontology, delex_attributes)
pp.pprint(vector_lengts)

{'area': 3,
 'customer rating': 7,
 'eatType': 4,
 'familyFriendly': 3,
 'food': 2,
 'name': 2,
 'near': 2,
 'priceRange': 7}


In [4]:
#Process one meaning representation:
mr = train_delex['parsed_mrs'][idx]
vec = _vectorize_single_mr(mr, mr_data_ontology, vector_lengts, delex_attributes)
pp.pprint(train_delex['parsed_mrs'][idx]['priceRange'])
pp.pprint(vec['priceRange'])

'more than £30'
array([[0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.]])


Now we run the meaning representation vectorization over the whole dataset. We store the result in a dictionary of attribute name to vectors. Each row corresponds to one datapoint. 

In [5]:
# Vectorize meaning representations
from src.data_processing.vectorize_data import _vectorize_mrs

train_mr_vecs = _vectorize_mrs(train_delex['parsed_mrs'], mr_data_ontology, delex_attributes)
valid_mr_vecs = _vectorize_mrs(valid_delex['parsed_mrs'], mr_data_ontology, delex_attributes)
test_mr_vecs = _vectorize_mrs(test_delex['parsed_mrs'], mr_data_ontology, delex_attributes)
print('Dimensions: {}'.format(train_mr_vecs['priceRange'].shape))

Dimensions: (42061, 7)


## Vectorize Utterances
Now we just need to create the representation for the utterances. For this load the vocabulary and then just apply the transforamtion. One important detail: since Keras works with fixed lenght sequences, we need to pad the texts (or cut them off) so that all the vectors have the same length.

In [6]:
#Step 1 Load Vocabulary
import json
from src.data_processing.utils import convert2indices 

char_fname = open(os.path.join(data_path, 'vocabulary.json'), 'rt', encoding='utf-8')
char_vocab = json.load(char_fname)
print('Vocab Len: {}'.format(len(char_vocab)))
print('Idx of character "a": {}'.format(char_vocab['a']))

#Always use a dummy character for padding and a unk character for unknown tokens (or characters in this case)
dummy_char = max(char_vocab.values()) + 1
unk_char = max(char_vocab.values()) + 2

print('Dummy Idx: {} Unknown Idx: {}'.format(dummy_char, unk_char))

#Step 2 Convert to Indices
max_sentence_len = 256

train_idx_data = convert2indices(train_delex['delexicalised_texts'], char_vocab, dummy_char, unk_char, max_sentence_len)
valid_idx_data = convert2indices(valid_delex['delexicalised_texts'], char_vocab, dummy_char, unk_char, max_sentence_len)
test_idx_data = convert2indices(test_delex['delexicalised_texts'], char_vocab, dummy_char, unk_char, max_sentence_len)

#The shape is number of datapoints x sentence length
print('Input Data Shape: {}'.format(train_idx_data.shape))

Vocab Len: 68
Idx of character "a": 5
Dummy Idx: 68 Unknown Idx: 69
Input Data Shape: (42061, 256)


# First Word Features
Next we prepare the syntactic features, which we use for to add more variety to the generated utterances. For sake of brievety, we only show the extraction of the first word features. The other two manipulations are present in the full code version on Github. 

The extraciton of the first word is done in following steps:
- Word tokenize all delexicalized utterances.
- Extract the first word of each utterance. 
- Create a vocabulary of first words, i.e. first word-to-idx mapping. We only keep first words, which appear at least 100 times. Otherwise the neural network has difficulties learning the correlation between the first word and the utterance.

In [7]:
#Step 1: Tokenize the Utterances
from src.data_processing.surface_feature_vectors import _sentence_tok, _utterance_first_word_vocab, _utt_fw_features

train_tok = _sentence_tok(train_delex['delexicalised_texts'])
valid_tok = _sentence_tok(valid_delex['delexicalised_texts'])
test_tok = _sentence_tok(test_delex['delexicalised_texts'])

#Print an example: Note that we both tokenize on sentence and word level. The sentence level tokeniztion can be used for other manipulations.
print(train_tok[idx])

#Step 2: Generate fw2idx mapping
utt_fw_vocab = _utterance_first_word_vocab(train_tok + valid_tok + test_tok, min_freq=100)
inverse_utt_fw_vocab = {v: k for k, v in utt_fw_vocab.items()}
print('Mapping from First Word 2 Index:')
pp.pprint(list(utt_fw_vocab.items()))

#Step 3 Create Surface Level Features
train_utt_fw = _utt_fw_features(train_tok, utt_fw_vocab)
valid_utt_fw = _utt_fw_features(valid_tok, utt_fw_vocab)
test_utt_fw = _utt_fw_features(test_tok, utt_fw_vocab)

utt_fw_input_dimension = train_utt_fw.shape[1]
print('Shape of First words: {}'.format(train_utt_fw.shape))
print('The word "{}" corresponds to : {}'.format(train_tok[idx][0][0], train_utt_fw[idx]))


[['The', 'coffee', 'shop', 'XNAMEX', 'is', 'north', 'of', 'the', 'city', 'centre', '.'], ['It', 'serves', 'expensive', 'food', 'and', 'has', 'a', '5', 'star', 'rating', '.']]
Mapping from First Word 2 Index:
[('XNAMEX', 0), ('Located', 1),
 ('For', 2), ('In', 3), ('A', 4),
 ('XNESRX', 5), ('An', 6), ('Near', 7),
 ('There', 8), ('On', 9), ('XFOODX', 10),
 ('The', 11), ('With', 12),
 ('Serving', 13), ('If', 14), ('At', 15),
 ('Riverside', 16), ('By', 17),
 ('You', 18), ('Family', 19)]
Shape of First words: (42061, 21)
The word "The" corresponds to : [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


# WAIT A MOMENT: THAT'S CHEATING !!!!
Of course, we do nat have access to the correct first word during test time. This is indeed a major drawback of this approach. The solution is to sample n different first words for each meaning representation during test time. This then corresponds to n different utterances. Then we have to rank those utterances according to their semantic correctness, as there are conficlting combinations of meaning represenations and first words. For instance, when there is no location mentioned but the first word is "Located". 

So let's sample 10 different first words for each meaning representation in the test set. We have an extra test set which contains only the meaning representations (i.e. no reference utterances given).

In [8]:
#
import numpy as np
import random
from src.data_processing.generate_evaluation_data import _read_data, _parse_raw_mr
test_mr_only = os.path.join(data_path, 'test_mr_only.csv')

#Read the MR only test set 
test_mr_only_raw = _read_data(test_mr_only)
test_process_mr_only = _parse_raw_mr(test_mr_only_raw)
test_vectorised_mrs_only = _vectorize_mrs(test_process_mr_only, mr_data_ontology, delex_attributes)
test_mr_only_dummy_idx = np.zeros(shape=(len(test_mr_only_raw), max_sentence_len)) #dummy output idx

def sample_utt_fw_for_mr(nsamples):
    utt_fw_samples = random.sample(list(utt_fw_vocab.values()), k=nsamples)
    dummy_idx = max(utt_fw_vocab.values()) + 1
    utt_fw_vec = []
    for fidx in utt_fw_samples:
        v = np.zeros(shape=(dummy_idx + 1, ))
        v[fidx] = 1.0
        utt_fw_vec.append(v)
    return utt_fw_vec
    
first_word_features = []    
for mr in test_process_mr_only:
    utt_fw_vec = sample_utt_fw_for_mr(10)
    first_word_features.append(utt_fw_vec)

print('Len of Test Set: {}'.format(len(first_word_features)))
print(test_vectorised_mrs_only['name'].shape)

Len of Test Set: 630
(630, 2)


# Finally some Deep Learning

The preprocessing is finally done, now we can focus on training the semantically conditioned LSTM. For this we build the architecture and then start the training procedure. 

The architecture is rather simple: 
- Inputs
    - We define inputs for each of the eight possible attribues form the domain ontoloty. 
    - We define inputs for the syntactic features (in this case, we only look at the first word of the utterance).
    - We define the input for the expected output utterance, which is used to perform teacher forcing and to comput the reconstriction loss.
- Embeddings. Since we work on the character level, we just use a one-hot representation of the characters. This menans that we represent each character by a vector, where the character index is set to 1.
- Generator. We use a custom made SC-LSTM Layer (more on this in the next post), which has two inputs and three outputs:
    - Input 1: The meaning representation + syntactic representation input.
    - Input 2: The previously generated token. This is important as the generator needs to learn to produce the next character given the current cell state and the previously generated token. For this we just shift the output-utterance by one to the right.
    - Outputs: The generated utterance, the last state of the extra cell (meaning representation cell) and the history of all meaning representation states. These outputs are used to compute the loss. $$ F(\theta) = \sum_tp_t^Tlog(y_t) + \left \| d_T \right \| + \sum_{t=0}^{T-1}\eta \xi^{\left \| d_t - d_{t-1} \right \|} $$ The loss is a combination of the reconstruction loss, the nrom of the last meaning represenation cell state (should be 0) and the average difference between two consecutive cell states (make sure that the cell does not get consumed to quickly).  
- Models. Finally two models are defined. 
    - The training model, which outputs the three losses.
    - The test model, which outputs the indices of the characters in the generated utterance. 


In [17]:
import keras.backend as K
import numpy as np
from keras.layers import Lambda, Embedding, Input, concatenate, ZeroPadding1D
from src.architectures.custom_layers.sem_recurrent import SC_LSTM
from keras.optimizers import Adadelta
from keras.models import Model

class SCLSTM(object):
    def __init__(self, syntactic_manipulation, vocabulary, sample_out_size):
        self.optimizer = Adadelta(lr=1, epsilon=1e-8, rho=0.95, decay=0.0001, clipnorm=10)
        self.vocabuary = vocabulary
        self.sample_out_size = sample_out_size
        
        self.build_model(syntactic_manipulation)
    
    def build_model(self, syntactic_manipulation):
        #max_sentence_len
        #we need 3 extra classes for the dummy-char, unk-char and one for start-char
        nclasses = len(self.vocabuary) + 3
        lstm_units = 1024
        dropout_word_idx = max(self.vocabuary.values()) + 1
        
        # == == == == == #
        # Define Inputs  #
        # == == == == == #
        
        semantic_inputs = []
        for attribute in sorted(list(vector_lengts.keys())):
            vec_len = vector_lengts[attribute]
            attr_idx = Input(batch_shape=(None, vec_len), dtype='float32', name='{}_idx'.format(attribute.replace(' ', '_')))
            semantic_inputs.append(attr_idx)
        
        if syntactic_manipulation:
            attr_idx = Input(batch_shape=(None, utt_fw_input_dimension), dtype='float32', name='{}_idx'.format('utt_fw'))
            inputs = semantic_inputs + [attr_idx]
        else:
            inputs = semantic_inputs
        
        meaning_representation = concatenate(inputs=inputs)
        
        # == == == == == #
        # Define Outputs #
        # == == == == == #
        
        output_idx = Input(batch_shape=(None, self.sample_out_size), dtype='int32', name='character_output')
        
        #we just represent characters as a one-hot encoded vectors. This makes the computation of the reconstruciton loss easier. 
        one_hot_weights = np.identity(nclasses)

        one_hot_out_embeddings = Embedding(
            input_length=self.sample_out_size,
            input_dim=nclasses,
            output_dim=nclasses,
            weights=[one_hot_weights],
            trainable=False,
            name='one_hot_out_embeddings'
        )
        
        #dimensions: batch_size x max_sentence_len x nclasses
        output_one_hot_embeddings = one_hot_out_embeddings(output_idx)
        
        # == == == == == ==#
        # Define Generator #
        # == == == == == ==#
        
        #Step 1: Preprend a start-vector to the inputs, since the generation of the next character is conditioned on the previous.
        #Thus, we need to shift the inputs by one. w_i ~ P(w_i | w_(i-1), ... , w_0, d_(i-1))
        padding = ZeroPadding1D(padding=(1, 0))(output_one_hot_embeddings)
        previous_char_slice = Lambda(lambda x: x[:, :-1,:], output_shape=(self.sample_out_size, nclasses))(padding)
        
        #Step 2: Define the recurrent layer. 
        lstm = SC_LSTM(
            lstm_units,
            nclasses,
            softmax_temperature=None,
            return_da=True,
            return_state=False,
            use_bias=True,
            return_sequences=True,
            implementation=2,
            dropout=0.2,
            recurrent_dropout=0.2,
            sc_dropout=0.2
        )
        
        generated_output, da_t, da_history = lstm([previous_char_slice, meaning_representation])
        
        # == == == == ==#
        # Define Losses #
        # == == == == ==#
        
        #Reconstruction Loss
        def vae_cross_ent_loss(args):
            x_truth, x_decoded_final = args
            x_truth_flatten = K.reshape(x_truth, shape=(-1, K.shape(x_truth)[-1]))
            x_decoded_flat = K.reshape(x_decoded_final, shape=(-1, K.shape(x_decoded_final)[-1]))
            cross_ent = K.categorical_crossentropy(x_truth_flatten, x_decoded_flat)
            cross_ent = K.reshape(cross_ent, shape=(-1, K.shape(x_truth)[1]))
            sum_over_sentences = K.sum(cross_ent, axis=1)
            return sum_over_sentences
        
        #Make sure the MR vector converges to 0, i.e. all the attributes have been consumend
        def da_loss_fun(args):
            da = args[0]
            return K.l2_normalize(da, axis=1)
        
        #Make sure the changes in the MR vector are not too large
        def da_history_loss_fun(args):
            da_t = args[0]
            zeta = 100
            n = 1e-4
            # shape: batch_size, sample_size
            norm_of_differnece = K.sum(n*zeta**K.l2_normalize(da_t[:, 1:, :] - da_t[:, :-1, :], axis=1), axis=2)
            n1 =norm_of_differnece
            return K.sum(n1, axis=1)
        
        main_loss = Lambda(vae_cross_ent_loss, output_shape=(1,), name='main')([output_one_hot_embeddings, generated_output])
        da_loss = Lambda(da_loss_fun, output_shape=(1,), name='dialogue_act')([da_t])
        da_history_loss = Lambda(da_history_loss_fun, output_shape=(1,), name='dialogue_history')([da_history])
        
        self.train_model = Model(inputs=inputs + [output_idx], outputs=[main_loss, da_loss, da_history_loss])
        
        # == == == == == #
        # Define Outputs #
        # == == == == == #
        
        argmax = Lambda(lambda x: K.argmax(x, axis=2), output_shape=(self.sample_out_size,))(generated_output)
        self.test_model = Model(inputs=inputs + [output_idx], outputs=argmax)
        
        

Before we start the training procedure, we need to make sure, that we can generate some outputs during the training phase. This is important because the losses alone are not enough to get a good grasp of the performance. Thus, we exploit some nice properties of Keras: custom callbacks. 
Callbacks are funcions, which can be passed to the training procedure. You can decide when a particular function should be executed: begin or end of epoch, begin or end of batch, begin or end of training. We want to output the predictions for the test-set at the end of each epoch.

In [18]:
import logging
from keras.callbacks import Callback


class OutputCallback(Callback):
    def __init__(self, test_model, model_input, da_acts, lex_dict, delex_vocab, char_vocab, sampled_features=-1, delimiter='', fname='../../logging/test_output'):
        self.model_input = model_input
        self.char_vocab = char_vocab
        self.test_model = test_model
        self.delimiter = delimiter
        self.fname = fname
        self.lex_dict = lex_dict
        self.da_acts = da_acts
        self.delex_vocab = delex_vocab
        self.sampled_features = sampled_features

        super(OutputCallback, self).__init__()

    def on_epoch_end(self, epoch, logs={}):
        ofile = open('{}_{}.txt'.format(self.fname, str(epoch)), 'wt', encoding='utf-8')
        inverse_vocab = {v: k for (k, v) in self.char_vocab.items()} # idx -> char mapping
        
        #output is a matrix of size: number of test samples x sentence length -> each row is one generated utterance
        predictions = self.test_model.predict(self.model_input, batch_size=1024, verbose=1)
        
        sen_dict = []
        idx = -1
        for i, sentence in enumerate(predictions):
            if i % self.sampled_features == 0 or self.sampled_features == -1:
                idx += 1
            
            #translate each utterance from a list of indices to characters
            list_txt_idx = [int(x) for x in sentence.tolist()]
            txt_list = [inverse_vocab.get(int(x), '') for x in list_txt_idx]
            oline = self.delimiter.join(txt_list)
            #replace the placeholders with their respective values (e.g. XNAME -> Blue Spice)
            for lex_key, val in self.lex_dict.items():
                original_token = self.delex_vocab[lex_key][idx]
                oline = oline.replace(val, original_token)
            sen_dict.append(oline)
            
        
        #write file   
        for sentence in sen_dict:
            ofile.write('{}\n'.format(sentence))
        ofile.write('\n')
        ofile.close()


Finally, we need to prepare the input to the neural network. Keras fit function needs two types of inputs:
- The inputs for the nerual network, which in our case are the attribute vectors, the vector for the syntactic manipulation and the reference utterance.
- The outputs of the neural network, which are usually used to compute the loss. However, we compute the loss directly in our neural network, thus, we just use three dummy vectors - one for each loss component.


In [19]:
from src.data_processing.delexicalise_data import _get_delex_fields
from collections import defaultdict
model_object = SCLSTM(False, char_vocab, max_sentence_len)

def _prepare_input(mr_vecs, idx_data, surface_features):
    sem_input = sorted(list(mr_vecs.items()), key=lambda x: x[0])
    sem_input = [x[1] for x in sem_input]
    
    if surface_features is not None:
        input_data = sem_input + [surface_features]
    else:
        input_data = sem_input
    
    input_data += [idx_data]
    output_data = [np.ones(len(input_data[0]))] * 3
    
    return input_data, output_data

def _get_lexicalize_dict(parsed_mrs, delex_fields):
    """
    Helper function, which creates a mapping for each delexicalized attribute to the correct value. 
    This is done to replace the placeholders by the correct value at the end.
    """
    delex_vocabulary = defaultdict(lambda: [])
    for attribute, replacement_token in delex_fields.items():
        values = [x.get(attribute, '') for x in parsed_mrs]
        delex_vocabulary[attribute] = values
    return delex_vocabulary

#Let's first train without the syntacitc manipulaiton first
train_input, train_output =  _prepare_input(train_mr_vecs, train_idx_data, None)
valid_input, valid_output =  _prepare_input(valid_mr_vecs, valid_idx_data, None)
test_input, _ = _prepare_input(test_vectorised_mrs_only, test_mr_only_dummy_idx, None)

delex_fields = _get_delex_fields(attribute_tokens, delex_attributes)
print('The delexicalized fileds:')
pp.pprint(delex_fields)

test_delex_vocab = _get_lexicalize_dict(test_process_mr_only, delex_fields)
print('Example of Delex Vocabulary: ')
print('\t- Name:', test_delex_vocab['name'][idx])
print('\t- Near:', test_delex_vocab['near'][idx])
print('\t- Food:', test_delex_vocab['food'][idx])


output_callback = OutputCallback(
    model_object.test_model, 
    test_input, 
    test_process_mr_only, 
    delex_fields, 
    test_delex_vocab, 
    char_vocab
)



The delexicalized fileds:
{'food': 'XFOODX',
 'name': 'XNAMEX',
 'near': 'XNESRX'}
Example of Delex Vocabulary: 
	- Name: The Cricketers
	- Near: Avalon
	- Food: 


# Let's Train (finally)
Prepare your GPUs because now train our model. We use the Adadelta optimizer, choose a batch_size of 256, and train for about 30 epochs. The OutputCallback will store the outputs in a file located in the "e2e_nlg/logging" folder (which you need to create in case you haven't). In case your GPU has not enough memory, try to reduce the batch-size first, then reduce the size of the LSTM. 

First, we just train the vanilla SC-LSTM without the syntactic conditioning. 

In [20]:
model_object.train_model.compile(optimizer=model_object.optimizer, loss=lambda y_true, y_pred: y_pred)
model_object.train_model.fit(
    x=train_input,
    y=train_output,
    epochs=30,
    batch_size=256,
    validation_data=(valid_input, valid_output),
    callbacks=[output_callback]
)
output_callback.on_epoch_end('final')

Train on 42061 samples, validate on 4672 samples


Now let's look at a few examples. We load the outputs of the last epoch and print a few utterances at random. 

We see that the neural network relies on the most common sentence structure. After reading 20 of these utterances it becomes clear, why they need a bit of variety. 

In [13]:
import random

ifile = open('{}_{}.txt'.format(output_callback.fname, str('11')), 'rt', encoding='utf-8')
output = ifile.readlines()
for mr, utt in random.sample(list(zip(test_mr_only_raw, output)), k=30):
    print(mr)
    print(utt)

name[The Wrestlers], eatType[pub], food[Italian], priceRange[moderate], area[city centre], familyFriendly[no], near[Raja Indian Cuisine]
The Wrestlers is a pub that serves Italian food in the moderate price range. It is located in the city centre near Raja Indian Cuisine. It is not kid friendly.

name[The Phoenix], eatType[pub], food[French], priceRange[cheap], customer rating[5 out of 5], area[riverside], familyFriendly[no], near[Crowne Plaza Hotel]
The Phoenix is a cheap pub that serves French food and is located in the riverside area near Crowne Plaza Hotel. It is not family-friendly.

name[The Phoenix], eatType[restaurant], food[Fast food], priceRange[more than £30], area[riverside], familyFriendly[yes], near[Raja Indian Cuisine]
The Phoenix is a restaurant that serves Fast food food in the moderate price range. It is located in the riverside area near Raja Indian Cuisine.

name[The Cricketers], eatType[coffee shop], customer rating[low], familyFriendly[no], near[Ranch]
The Cricket

# Bring the Variety 

In [21]:
#Recreate the train data but this time inlcude the first word vectors.
model_object_var = SCLSTM(True, char_vocab, max_sentence_len)

def _prepare_test_input(mr_vecs, idx_data, surface_features):
    sem_input = sorted(list(mr_vecs.items()), key=lambda x: x[0])
    sem_input = [x[1] for x in sem_input]
    
    input_samples = []
    for *isem, isamples in zip(*sem_input, surface_features):
        for sample in isamples:
            input_sample = isem + [sample] + [np.zeros(shape=(max_sentence_len, 1))]
            input_samples.append(input_sample)
    input_samples = list(map(list, zip(*input_samples)))
    input_samples = [np.squeeze(np.array(x), axis=-1) if x[0].shape[-1] == 1 else np.array(x) for x in input_samples]     
    
    return input_samples


train_input, train_output =  _prepare_input(train_mr_vecs, train_idx_data, train_utt_fw)
valid_input, valid_output =  _prepare_input(valid_mr_vecs, valid_idx_data, valid_utt_fw)
test_input= _prepare_test_input(test_vectorised_mrs_only, test_mr_only_dummy_idx, first_word_features)

In [22]:
output_callback_var = OutputCallback(
    model_object_var.test_model, 
    test_input, 
    test_process_mr_only, 
    delex_fields, 
    test_delex_vocab, 
    char_vocab,
    sampled_features=10
    
)

model_object_var.train_model.compile(optimizer=model_object.optimizer, loss=lambda y_true, y_pred: y_pred)
model_object_var.train_model.fit(
    x=train_input,
    y=train_output,
    epochs=30,
    batch_size=256,
    validation_data=(valid_input, valid_output),
    callbacks=[output_callback_var]
)
output_callback_var.on_epoch_end('final')


Train on 42061 samples, validate on 4672 samples
Epoch 1/30
Epoch 2/30
 1024/42061 [..............................] - ETA: 7:13 - loss: 420.6974 - main_loss: 419.4000 - dialogue_act_loss: 0.0000e+00 - dialogue_history_loss: 1.2975

KeyboardInterrupt: 

In [16]:
import random

ifile = open('{}_{}.txt'.format(output_callback.fname, str('11')), 'rt', encoding='utf-8')
output = ifile.readlines()
for mr, utt in random.sample(list(zip(test_mr_only_raw, output)), k=30):
    print(mr)
    print(utt)

name[The Punter], eatType[restaurant], food[Chinese], priceRange[moderate], area[city centre], familyFriendly[no], near[Raja Indian Cuisine]
The Punter is a moderately priced Chinese restaurant located in the city centre. It is not kid friendly.

name[The Cricketers], eatType[restaurant], customer rating[3 out of 5], familyFriendly[yes], near[Avalon]
The Cricketers is a kid friendly restaurant with a customer rating of 3 out of 5.

name[The Phoenix], eatType[pub], food[French], priceRange[cheap], customer rating[5 out of 5], area[city centre], familyFriendly[no], near[Crowne Plaza Hotel]
The Phoenix is a cheap pub that serves French food in the city centre near Crowne Plaza Hotel. It has a customer rating of 5 out of 5.

name[The Phoenix], eatType[restaurant], food[Fast food], priceRange[less than £20], area[city centre], familyFriendly[no], near[Raja Indian Cuisine]
The Phoenix is a Fast food restaurant located in the city centre near Raja Indian Cuisine. It is not family-friendly and