Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URGENT] There is less inputs received while giving the exact number of inputs required #508

Closed
Saboteur20 opened this issue Jul 22, 2022 · 5 comments

Comments

@Saboteur20
Copy link

Saboteur20 commented Jul 22, 2022

I am testing an encoder decoder lstm model .
the training phase went well but the testing phase after building the testing model is giving me an error while trying to input to the decoder layer , here's the error:

ValueError: Layer "model_decoder_testing" expects 3 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor: shape=(1, 61, 300), dtype=float64, numpy=
array([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]])>]

my script is :

import numpy as np
import tensorflow as tf
np.random.seed(0)
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input, LSTM
from tensorflow.keras.utils import plot_model

np.random.seed(1)

from tokenizers import Tokenizer
import json
import os

devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(devices[0], True)
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

def one_hot_decode(encoded_seq):
	return [np.argmax(vector) for vector in encoded_seq]

# Even though we define the encoder and decoder models we still need to dynamically provide the decoder_input_data as follows:

# it begins with a special symbol start
# it will continue with an input created by the decoder at previous time step
# in other words, decoder's output at time step t will be used decoder's input at time step t+1


def decode_sequence(batch_input, tokenizer, id2token, token2id):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(batch_input, verbose = 0)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, max_deco_input_len, embedding_weights.shape[1])) # one batch, sent length, word embedding dim
    # Populate the first token of target sequence with the start token.
    target_seq[:, 0, :] = embedding_weights[token2id['<START>'],:]

    decoded_seq = list()
    stop_condition = False
    
    # Sampling loop for the length of the sequence
    for j in range(len(batch_input[0])):
        if stop_condition:
            break
        # decode the input to a token/output prediction + required states for context vector Update input states (context vector) 
        # with the outputed states
        output_tokens, h, c =  decoder_model.predict_step([target_seq] + states_value)
        print(output_tokens.shape)

        # convert the token/output prediction to a token/output
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_token = id2token[sampled_token_index]
        # add the predicted token/output to output sequence
        decoded_seq.append(sampled_token)
        

        # Exit condition: either hit max length
        # or find <END> token.
        if sampled_token == '<END>':
            stop_condition = True

        # Update the input target sequence with the predicted token/output 
        target_seq[:, j + 1, :] = embedding_weights[sampled_token_index,:]
        
        states_value = [h, c]
        
    return decoded_seq


def decode(decoded_seq):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    encoding -- a list containing the tokens 
    
    Returns: 
    decoding -- a string summerizing all tokens.
    '''
    decoding = ''
    for token in decoded_seq:
        if token == '</w>':
            decoding = decoding + ' '
        else:
            decoding = decoding + token
    return decoding

def calculating_combined_embeddings(embedding_weights, batch_input, tokenizer, token2id):
    '''
    Creates a tensor of shape (batch size, sent length, word embedding dim) as input to the encoder 
    by calculating the combined effect of tokens' weights as a whole word
    
    Arguments:
    embedding_weights -- a 2D array of embbedings
    batch_input -- a list of lists of tokens for input sentences
    tokenizer -- the BPE tokenizer to tokenize batch input
    token2id -- a dictionary the translates tokens to their respective ids
    
    Returns: 
    batch_input_tensor -- a tensor of shape.
    '''
    batch_input_tensor = np.ndarray((len(batch_input), len(batch_input[0]), embedding_weights.shape[1])) # batch size, sent length, word embedding dim
    sent_embedding = np.zeros((len(batch_input[0]), embedding_weights.shape[1]))
    word_embedding = 0
    
    for i, sent in enumerate(batch_input):
        sent_embedding.fill(0)
        for j, word in enumerate(sent):
            word_encoding = tokenizer.encode(word).tokens
            word_embedding = 0
            for k in range(len(word_encoding)):
                try:
                    word_embedding = word_embedding + embedding_weights[token2id[word_encoding[k]]][:]
                except:
                    word_embedding = word_embedding + 0
            word_embedding = word_embedding / len(word_encoding)
            word_embedding.resize((1,embedding_weights.shape[1]))
            np.put_along_axis(arr=sent_embedding, indices=np.full((1,embedding_weights.shape[1]),j), values=word_embedding, axis=0)
        np.put_along_axis(arr=batch_input_tensor, indices=np.full((1,len(batch_input[0]),embedding_weights.shape[1]),i), values=sent_embedding, axis=0)
    return batch_input_tensor

def calculating_combined_tokens(batch_input, token2id):
    batch_input_tensor = np.zeros((len(batch_input), len(batch_input[0]), len(token2id))) # batch size, sent length, num of tokens
    sent_embedding = np.zeros((len(batch_input[0]), len(token2id)))
    
    for i, sent in enumerate(batch_input):
        sent_embedding.fill(0)
        for j, word in enumerate(sent):
            temp = np.zeros((1,len(token2id)))
            temp[:,token2id[word]] = 1
            np.put_along_axis(arr=sent_embedding, indices=np.full((1,len(token2id)),j), values=temp, axis=0)
        np.put_along_axis(arr=batch_input_tensor, indices=np.full((1,len(batch_input[0]),len(token2id)),i), values=sent_embedding, axis=0)
    return batch_input_tensor


with open('D:\\Project Files\\encoder_input_data.json') as f:
   encoder_input_data = json.load(f)

with open('D:\\Project Files\\decoder_input_data.json') as f:
   decoder_input_data = json.load(f)

with open('D:\\Project Files\\decoder_output_data.json') as f:
   decoder_output_data = json.load(f)

tokenizer = Tokenizer.from_file("D:\\Project Files\\tokenizer_trained.json")

embedding_weights = np.load("D:\\Project Files\\embedding_weights.npy")

token2id = tokenizer.get_vocab()
id2token = {v:k for k, v in token2id.items()}
unique_tokens = len(tokenizer.get_vocab())
data_size = len(encoder_input_data)
max_enco_input_len = len(encoder_input_data[0])
max_deco_input_len = len(decoder_input_data[0])
max_deco_output_len = len(decoder_output_data[0])
embedding_dim = embedding_weights.shape[1]
batch_size = 32
cut_percentage = 0.8
cut = floor(data_size*cut_percentage)
n_features = 50

# encoder_input_data = encoder_input_data + [ ['<PAD>'] * max_enco_input_len for _ in range(data_size-cut%32)]

# decoder_input_data = decoder_input_data + [ ['<PAD>'] * max_deco_input_len for _ in range(data_size-cut%32)]

# decoder_predicted_data = decoder_output_data + [ ['<PAD>'] * max_deco_output_len for _ in range(data_size-cut%32)]

X_Train_enco = encoder_input_data[:cut] + [ ['<PAD>'] * max_enco_input_len for _ in range(batch_size-cut%32)]
X_Test_enco = encoder_input_data[cut:] + [ ['<PAD>'] * max_enco_input_len for _ in range(batch_size-(data_size - cut)%32)]

X_Train_deco = decoder_input_data[:cut] + [ ['<PAD>'] * max_deco_input_len for _ in range(batch_size-cut%32)]
X_Test_deco = decoder_input_data[cut:] + [ ['<PAD>'] * max_deco_input_len for _ in range(batch_size-(data_size - cut)%32)]

Y_Train_deco = decoder_output_data[:cut] + [ ['<PAD>'] * max_deco_output_len for _ in range(batch_size-cut%32)]
Y_Test_deco = decoder_output_data[cut:] + [ ['<PAD>'] * max_deco_output_len for _ in range(batch_size-(data_size - cut)%32)]

# TRAINING WITH TEACHER FORCING
encoder_inputs= Input(shape=(max_enco_input_len,embedding_dim), name='encoder_inputs')
encoder_lstm=LSTM(units=50, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs)

# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(max_deco_input_len, embedding_dim), name='decoder_inputs')
decoder_lstm = LSTM(units=50, return_sequences=True, return_state=True, name='decoder_lstm')

# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)

#complete the decoder model by adding a Dense layer with Softmax activation function 
#for prediction of the next output
#Dense layer will output one-hot encoded representation
decoder_dense = Dense(unique_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)

# put together
model_encoder_training = Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')

opt = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, clipnorm=5.0)
loss = tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE)

model_encoder_training.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
model_encoder_training.summary()

# tf.keras.utils.plot_model(
#     model_encoder_training,
#     to_file="training_model.png",
#     show_shapes=True,
#     show_dtype=False,
#     show_layer_names=True,
#     rankdir="TB",
#     expand_nested=True,
#     dpi=96,
#     layer_range=None,
#     show_layer_activations=True
# )

# optimization loop
for epoch in range(1, 200):
    loss = 0
    acc = 0
    for batch in range(int(len(X_Train_enco)/batch_size)):
        X_enco = calculating_combined_embeddings(embedding_weights, X_Train_enco[batch:batch+batch_size], tokenizer, token2id)
        X_deco = calculating_combined_embeddings(embedding_weights, X_Train_deco[batch:batch+batch_size], tokenizer, token2id)
        # Y_deco = calculating_combined_embeddings(embedding_weights, Y_Train_deco[batch:batch+batch_size], tokenizer, token2id)
        Y_deco = calculating_combined_tokens(Y_Train_deco[batch:batch+batch_size], token2id)
        
        temp = model_encoder_training.train_on_batch([X_enco,X_deco], Y_deco)
        loss = loss + temp[0]
        acc = acc + temp[1]
    
    loss = loss / int(len(X_Train_enco)/batch_size)
    acc = acc / int(len(X_Train_enco)/batch_size)
    print('Epoch:', epoch, 'Loss:', loss, 'Accuracy:', acc)

# TESTING WITHOUT TEACHER FORCING
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(50,), name='encoder_state_h')
decoder_state_input_c = Input(shape=(50,), name='encoder_state_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs,[decoder_outputs] + decoder_states, name='model_decoder_testing')
model_encoder_training.summary()

tf.keras.utils.plot_model(
    decoder_model,
    to_file="testing_model.png",
    show_shapes=True,
    show_dtype=False,
    show_layer_names=True,
    rankdir="TB",
    expand_nested=True,
    dpi=96,
    layer_range=None,
    show_layer_activations=True
)

# print('Input \t\t\t  Expected  \t   Predicted \t\tT/F')
# correct = 0 
# sampleNo = 10 #len(X_Test_enco)
# for sample in range(0,sampleNo):
#     X_enco = calculating_combined_embeddings(embedding_weights, [X_Test_enco[sample]], tokenizer, token2id)
#     Y_deco = calculating_combined_tokens([Y_Test_deco[sample]], token2id)
#     predicted = decode_sequence(X_enco, tokenizer=tokenizer, id2token=id2token,token2id=token2id)
#     if (one_hot_decode(Y_Test_deco[sample]) == predicted):
#         correct+=1
#     print( one_hot_decode(X_Test_enco[sample]), '\t\t', one_hot_decode(Y_Test_deco[sample]),'\t', predicted,
#           '\t\t',one_hot_decode(Y_Test_deco[sample])== predicted)
    
# print('Accuracy: ', correct/sampleNo)

correct = 0 
sampleNo = 10 #len(X_Test_enco)
for sample in range(0,sampleNo):
    X_enco = calculating_combined_embeddings(embedding_weights, [X_Test_enco[sample]], tokenizer, token2id)
    Y_deco = calculating_combined_tokens([Y_Test_deco[sample]], token2id)
    predicted = decode_sequence(X_enco, tokenizer=tokenizer, id2token=id2token,token2id=token2id)
    print(len(predicted[0]))

the exact issue is in this line :
output_tokens, h, c = decoder_model.predict_step([target_seq] + states_value)
while the script reached this section:

sampleNo = 10 #len(X_Test_enco)
for sample in range(0,sampleNo):
    X_enco = calculating_combined_embeddings(embedding_weights, [X_Test_enco[sample]], tokenizer, token2id)
    Y_deco = calculating_combined_tokens([Y_Test_deco[sample]], token2id)
    predicted = decode_sequence(X_enco, tokenizer=tokenizer, id2token=id2token,token2id=token2id)
    print(len(predicted[0]))

I have tried fixing the issue by changing the way I fed the inputs multiple times ,but nothing changed.
I am sorry for lengthening the code so much , but I was afraid that I would miss a key line that would help in the issue.
please be kind in responding quickely because I have to submit my full project in less than two weeks.
I have uploaded the required files in case if someone wants to reproduce the issue.
Project Files.zip

@Saboteur20 Saboteur20 changed the title There is less inputs received while giving the exact number of inputs required [URGENT] There is less inputs received while giving the exact number of inputs required Jul 22, 2022
@tilakrayal
Copy link
Collaborator

tilakrayal commented Jul 25, 2022

@Saboteur20,
The code provided is fairly complex hence it would be difficult for us to pinpoint the issue. Could you please get the example down to the simplest possible repro? That will allow us to determine the source of the issue easily.

When passing a dataset with two input images, the first one was passed to the actual network and the second one was left out and treated as the target value.Thank you!

@Saboteur20
Copy link
Author

Saboteur20 commented Jul 25, 2022

@tilakrayal Off course , the actual issue is after creating the testing model with weights of the training model:

# TESTING WITHOUT TEACHER FORCING
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(50,), name='encoder_state_h')
decoder_state_input_c = Input(shape=(50,), name='encoder_state_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs,[decoder_outputs] + decoder_states, name='model_decoder_testing')

when I want to input testing pairs in the below loop :

sampleNo = 10 #len(X_Test_enco)
for sample in range(0,sampleNo):
    X_enco = calculating_combined_embeddings(embedding_weights, [X_Test_enco[sample]], tokenizer, token2id)
    Y_deco = calculating_combined_tokens([Y_Test_deco[sample]], token2id)
    predicted = decode_sequence(X_enco, tokenizer=tokenizer, id2token=id2token,token2id=token2id)
    print(len(predicted[0]))

an error is raised which was stated above , the error is raised when I call the function decode_sequence which is also written above ,when the execution reaches this line:
output_tokens, h, c = decoder_model.predict_step([target_seq] + states_value)

@Saboteur20
Copy link
Author

@Saboteur20, The code provided is fairly complex hence it would be difficult for us to pinpoint the issue. Could you please get the example down to the simplest possible repro? That will allow us to determine the source of the issue easily. Thank you!

@tilakrayal Off course , the actual issue is after creating the testing model with weights of the training model:

# TESTING WITHOUT TEACHER FORCING
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(50,), name='encoder_state_h')
decoder_state_input_c = Input(shape=(50,), name='encoder_state_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs,[decoder_outputs] + decoder_states, name='model_decoder_testing')

when I want to input testing pairs in the below loop :

sampleNo = 10 #len(X_Test_enco)
for sample in range(0,sampleNo):
    X_enco = calculating_combined_embeddings(embedding_weights, [X_Test_enco[sample]], tokenizer, token2id)
    Y_deco = calculating_combined_tokens([Y_Test_deco[sample]], token2id)
    predicted = decode_sequence(X_enco, tokenizer=tokenizer, id2token=id2token,token2id=token2id)
    print(len(predicted[0]))

an error is raised which was stated above , the error is raised when I call the function decode_sequence which is also written above ,when the execution reaches this line:
output_tokens, h, c = decoder_model.predict_step([target_seq] + states_value)

@tilakrayal
Copy link
Collaborator

@Saboteur20,
Could you share a colab gist with issue reported or simple stand alone indented code with all dependencies such that we can replicate the issue reported. Thank you!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants