Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge/concatenate confusion plus multiple layers in different models #13021

Closed
amjass12 opened this issue Jun 27, 2019 · 38 comments
Closed

Merge/concatenate confusion plus multiple layers in different models #13021

amjass12 opened this issue Jun 27, 2019 · 38 comments

Comments

@amjass12
Copy link

Hi all,

I am writing as I have some fundamental confusion about the merge/concatenate layers. I have not found an answer to my question on stackoverflow or other site, so any help would be appreciated.

Context: I have built two sequential models. Both models are two different data types although they both lead to the same classifications on the other side. What I would like to do is merge layers between two models in order to share information and learn new features based on both models that are leading to classifications made.

My models are as follows: (please note they are the same for this post, however one will stay the same one will highly likely have another layer added on plus more neurons in each layer when more data becomes available:

**model1**
model= Sequential()
model.add(Dense(units=64, input_dim=5078, activation="relu"))
model.add(Dense(units=32, activation="relu"))
model.add(Dense(units=100, activation="relu"))
model.add(Dense(units=24, activation="sigmoid"))

**model2**
modelSC= Sequential()
modelSC.add(Dense(units=64, input_dim=5078, activation="relu"))
modelSC.add(Dense(units=32, activation="relu"))
modelSC.add(Dense(units=100, activation="relu"))
modelSC.add(Dense(units=24, activation="sigmoid"))

I would like the penultimate layers in each model to merge before the output: (Dense 100)

My first question: Both models are comprised of multiple layers: The keras documentations states the following for layer merging:

left_branch = Sequential()
left_branch= Sequential()
left_branch.add(Dense(units=64, input_dim=5078, activation="relu"))
left_branch.add(Dense(units=32, activation="relu"))
left_branch.add(Dense(units=100, activation="relu"))

right_branch = Sequential()
right_branch= Sequential()
right_branch.add(Dense(units=64, input_dim=5078, activation="relu"))
right_branch.add(Dense(units=32, activation="relu"))
right_branch.add(Dense(units=100, activation="relu"))

merged = Concatenate([left_branch, right_branch])

final_model.add(Dense(24, activation='sigmoid'))

final_model.compile(optimizer='adam', loss='binary_crossentropy')
final_model.fit([X_train, X_trainSC], trainingtarget1)

For the model.fit call. I don't understand how to implement from training and test data, each tensor that comes from the different models, and also the one-hot data is different in each one, number of samples etc. the model.fit doesn't appear to support adding all of these elements.

Can anybody offer any advice on how to resolve this/further clarify where I have made a mistake? The goal of what I am trying to do is to is to have 2 models (or more in the future with different data sources[that are not the same as each other]), for all the data to be shared at one layer in order to further inform the final classification.

Also, idf two models contain different amounts of layers each, is there a default by which keras defines which layers become merged? is is the last layer in right or left_branch.add?

Thank you for your time!

@dabasajay
Copy link

dabasajay commented Jul 1, 2019

The behavior you want can be achieved using Keras functional API.
Giving a short example to give you a hint of what to do. Add more layers or change parameters according to your use case.

from keras.models import Model
from keras.layers import Input, Dense, concatenate
from keras.utils import plot_model
left_branch_input = Input(shape=(2,), name='Left_input')
left_branch_output = Dense(5, activation='relu')(left_branch_input)

right_branch_input = Input(shape=(2,), name='Right_input')
right_branch_output = Dense(5, activation='relu')(right_branch_input)

concat = concatenate([left_branch_output, right_branch_output], name='Concatenate')
final_model_output = Dense(3, activation='sigmoid')(concat)
final_model = Model(inputs=[left_branch_input, right_branch_input], outputs=final_model_output,
                    name='Final_output')
final_model.compile(optimizer='adam', loss='binary_crossentropy')
# To train
final_model.fit([Left_data,Right_data], labels, epochs=10, batch_size=32)

Here's what the model looks like:
demo

@amjass12
Copy link
Author

amjass12 commented Jul 1, 2019

Thank you so much @dabasajay !!

So i have compiled the model and this indeed seems to make perfect sense! thank you so much for your assistance. Attaching the model with the added layers etc...

There is one thing I am completely confused by still and that is the :

final_model.fit([Left_data,Right_data], **labels**, epochs=10, batch_size=32)

I dont quite understand how the labels should work as they are indeed the same 24 categories, however, the one hot encoding will be different for both sources of datasets. For example, the left arm (at the moment) has 180 samples while the right arm contains 300. The one-hot encoding is going to be entirely different and in fact in some instances, not all of the labels will be present in the right arm (although those that are present will be the same labels as in the final 24 category output)...

So it would then be:

`final_model.fit([Left_dataTRAIN,Right_dataTRAIN], labels???, epochs=10, batch_size=32)

I hope this is making sense.. this is one thing that was confusing to me also and still dont quite understand... Thanks!

model

@dabasajay
Copy link

dabasajay commented Jul 2, 2019

Concatenation requires all dimensions to be of the same shape except for the concatenation axis. So in Concatenate layer, during concatenation of (None,100) & (None,100), that None is the batch_size and has to be the same for both left and right part.
Think about this:
You can concatenate a [4,3] matrix with a [4,5] matrix along axis=1 resulting in a matrix of shape [4,8] but how will you concatenate two matrices of shape [6,3] & [4,3] along axis=1? Doesn't make sense right.

If you want to train this model, then batch_size has to be the same for left and right part and also label has to be the same for a given left and right data sample.
For example, consider a hypothetical problem of prediction sentiment from a given (image, caption) pair. Now the sentiment label is same for a given image+caption input and now you can feed image to left part and caption to right part and have the model predict the sentiment for a given (left, right) pair.

In your case, as you said left and right contains a different number of samples and label is not always the same for a (left, right) pair, I think it's best to train different models for each since now they are two different classification problems instead of one.

@amjass12
Copy link
Author

amjass12 commented Jul 2, 2019

Hi @dabasajay

Thank you! Yes indeed everything you said makes perfect sense! I guess one thing that i did not expect is that this would be a requirement. I am using genomic data and so no two pairs of data will ever have the same number of samples, but also the reason for using two networks (or more) is because of the type of data. So, I wouldn't be able to train the model as one network because of the nature of the different types of data. Instead i wanted to leverage concatenation in order to see how different types of data contribute to the classification.

Even though its a different classification, the end goal is common, so they will always fall in to the same 24 classes. Is there a way to overcome the sample size and train solely on the label itself to see how layer concatenation deals with the classification? or merging the networks together as oppose to layers to see what features are shared between the two networks? This is going to improve classification but also give us relevant features in the data.

@dabasajay
Copy link

Well, batch_size isn't the main problem. It can be easily fixed with duplication of data. The main issue is the grouping of labels and the relationship between input pairs.
For example, consider the example of sentiment analysis I gave you in my last comment. Say I have 5 images(data type 1) and 10 captions(data type 2). First I'll group them together under the same labels. Assume that 3 images and 4 captions have label 0(negative sentiment) and 2 images and 6 captions have label 1(positive sentiment). Now I can just duplicate images(from each group) for each label to get the same batch size and end up with (4 images, 4 captions) and (6 images, 6 captions) and then training is feasible. However, this may or may not work because there is no strong relationship between input pairs i.e. we don't have the relationship that for a given image, we have this particular caption which describes the image and now predict the label for it. Since we duplicated the data, we just know that the input image and label have a common label and we hope the model will capture this weak relationship by concatenating features.

That's why I suggest you train different models if inputs don't have a strong relationship among them.

@amjass12
Copy link
Author

amjass12 commented Jul 2, 2019

hi @dabasajay

Many thanks! this all makes perfect sense! to your last point because this is important:

'However, this may or may not work because there is no strong relationship between input pairs '

We do know that there is a strong relationship. I built model 1 (model) and used the data for modelSC as a validation of the first model and the predictions are phenomenal. Even though it is a different data type, it could predict age perfectly etc.. hence why i would like to make this shared layer in order to understand and further refine the models predictions to features that are truly representative of certain classes. It is also a way of finding common relationships between the different data types and the features that are truly important!

I will attempt to do this, my only concern is that by duplicating samples I am creating more samples which are not actually different... am i correct in saying this? Does the one-hot data have to match exactly? or could it simply just be labelled as one of the subclasses so the model knows what class it should be?

Thanks!

@dabasajay
Copy link

dabasajay commented Jul 2, 2019

You either duplicate the data from the left part or from the right part at a time so overall input to the model is different each time. As for one-hot encoding of labels, just the final shape has to match i.e. 24-dimensional vector. Now if it's a multiclass classification problem, y_hot_enc has to match exactly for both left and right but if it's a multilabel classification problem, you can just add both y vectors from left and right element-wise to have multiple labels and then train.
Example

Classification y_left y_right y_final
multiclass [0,0,1,0] [0,0,1,0] [0,0,1,0]
multilabel [0,0,1,0] [0,1,0,0] [0,1,1,0]

Note: I have never tried this approach before so I'm not 100% sure of what I'm suggesting. Please try yourself and let me know too.

@amjass12
Copy link
Author

amjass12 commented Jul 4, 2019

Hi @dabasajay ,

So i ran the network by duplicating the data from the left to match the sample number on the right: the duplication simply adds another 40 samples plus the same for the one-hot encoded array.

I then did the following:

addition of the one-hot encoded data:

sum_vector=y_train+y_trainSC
However this produces values of 2 where there are overlaps (attached) and i think that as a result i am getting negative values for the training loss... is there a way to correct this or is this ok?

train the model:

historyMerged=final_model.fit([X_train, X_trainSC], sum_vector,
          epochs=200,
          verbose=1,
          validation_split=0.15,
          callbacks=[EarlyStopping(monitor='val_loss', patience=5)], shuffle=True)

accuracy and loss at final epoch:

val_acc=0.96
acc=0.96

val_loss=-0.56
loss=0.6

When i call model.predict: trainpredmerge=final_model.predict([X_train, X_trainSC])
i get the attached screenshot (which btw actually makes sense. The high accuracy is going to the right sample and right label for both data sets.. but they are overlapping (so 180 samples only) screenshot attached..

this leads me to a couple of questions:

Although you said for a binary problem that the one-hot can simply be added. Do the sample rows have to match in both datasets? for example,

Does row one for model1 have to be organ x at age x and row one for model 2 have to also be organ x at age x? and so on...

As i mentioned in the original questions, for model 2 the input data although different, may only contain for example one age and 5 organs instead, however, we still aim to capture the rich relationships between the data for things like age.

As a multi-label problem, I am having trouble interpreting the predict function as there are only as many rows as there samples in the left branch. so one two represents two samples: attached is another screenshot: In this screenshot, as all samples in model 2 are of one age, they are correctly assigned to column 5, but as you can see there are also very high prediction scores for column 3, column 0 at the same time etc.. these are the correct predictions for the samples in column 1. Is there a way to pick apart these two datasets individually to visualise prediction for just one training set?
Screenshot 2019-07-04 at 09 10 33

And finally: I am guessing that where there is overlap in the data (i.e where there are two of the same age in both data datasets (left and right arm) that these relationships are being captured in the data to further improve classification for those ages? does the model know these are the same in the one-hot encoding . in other words, does the merge layer treat the same one-hot encoding data in both datasets as finding relationships, and those that are not present in both datasets treat them as individual samples?

Sorry for the long post... thank you for all your help in advance!

Screenshot 2019-07-04 at 08 35 06

Screenshot 2019-07-04 at 08 26 08

@dabasajay
Copy link

Hey! I'm sorry I can't help you interpret the results here since I've got a limited domain knowledge in genomic data and about the problem you're trying to solve. My part was only to help you solve the issue with Keras to enable you to define the kind of model you wanted and train that model which I guess I did. I'm sorry you had to write that long post and I couldn't be of any use.

@amjass12
Copy link
Author

amjass12 commented Jul 6, 2019

Hi @dabasajay

Thank you so so much! you have been incredibly helpful and got me on the right path to merging the models together! I will continue to dig through the predict function and see how the model is performing!!

Would you be able to comment on if the sum_vector=y_train+y_trainSC is correct for combing both one-hot encoded arrays together?

And finally, can you comment on the order of the samples, does the order of training 1 left arm have to be the same as training 2 right arm? as its a binary multi-label problem, you hinted above that the samples do not need to match on the left and right?

Thanks for all your help!

@dabasajay
Copy link

Assuming y_train and y_trainSC are Numpy arrays, do this:
sum_vector = np.array(y_train.astype(bool) + y_trainSC.astype(bool), dtype=int)
Reason being, if say p=[0,1,0,1] and q=[0,1,0,1] then p+q=[0,2,0,2] which is not what we want so we'll convert it to boolean logic first since True+True=True, we'll get 1+1=1 instead of 2 and p+q will be [0,1,0,1].
As for ordering, taking a pair (left,right), you can shuffle the whole data of input pairs along with y but don't shuffle the left and right individually since that'll change the input pairs for labels y so yes I guess ordering matters and has to be the same. I'm not 100% sure though.

@amjass12
Copy link
Author

amjass12 commented Jul 6, 2019

Thank you so much @dabasajay , you have been incredibly helpful... the results are really interesting! attaching a screenshot for your information in case you are interested!! no need to think about the genomic data specifically but notice the actual predictions:

For each row and each sample, you get two predictions per class (as they are different samples in both training sets) (green arrows for 0-5 which age groups).. notice it has a confidence score column 3 and column 5. column 3 is the y_train data and column 5 is y_trainSC data.. so even though the merge layer is present, it can distinguish between the different datasets... columns 6-18 are organs and the same is observed.

what is super cool is if you look at the red arrows.. this is where this sample is both data sets, same age, or same organ etc.. you only get one prediction.. and thats a confidence score at column 5 as they are the same age, but whats really cool is that they are different organs (so they are both actually different samples, just the same age) and a you see a confidence score for organ in column 6 and column 9 (which btw are correct classifications)....

I am guessing that for the overlapping data (column 5 red arrows),.. as they are the same age (or same one-hot encoded position).. that the merge layer will have come in to play here? I will explore further and see if the merge layer has found specifically at this position features that are correlated in both y_train and y_trainSC to form this age classification...

Thanks! :)

Screenshot 2019-07-06 at 21 06 05

@dabasajay
Copy link

The results you've shown here, are these obtained on the test set/validation set? If not, please do that because the model may be overfitting here.

@amjass12
Copy link
Author

amjass12 commented Jul 6, 2019

@dabasajay ha! yes it was! the one-hot encode for two of the classes were all 1's!! so it must have been memorising a common feature in all data for y_trainSC... this has now been fixed so the only class with all 1's is the age. this now performs well and without overfitting (acc~90%) but the trend is still the same!

@rebeen
Copy link

rebeen commented Apr 2, 2020

could you please let me know how did you solve this problem finally

when I try to fit the model I will get the below error
model.fit([X_train, X_train_a], y_train, batch_size=128, epochs=100, verbose=True)
loss, acc = model.evaluate([X_test,X_test_a], y_test,batch_size=128,verbose=1)

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(24424, 15, 12), (16325, 15, 12)]

@amjass12
Copy link
Author

amjass12 commented Apr 3, 2020

@rebeen

There was a stackoverflow page that offered an alternative solution using automatic differentiation (tf.GradientTape).. link attached --

https://stackoverflow.com/questions/57216216/keras-multitask-learning-with-two-different-input-sample-size

is is not mentioned but I would assume that up to the point the samples are equal... each sample type has to match!

@rebeen
Copy link

rebeen commented Apr 3, 2020

@amjass12 Thank you very much,

but I think if we use this we cannot fit the model?

@amjass12
Copy link
Author

amjass12 commented Apr 4, 2020

@rebeen

with tf.GradientTape, you don.t call the .fit to the model as you would normally after compiling. The GradientTape method is a way of running one training loop for an epoch (wrap it in a for loop to run over multiple epochs), and the weights get updated after each epoch to the model....

so an example of a training loop for my data (which is unequal in length) but gradientTape provides overall flexibility to model building not just for this use case:

#starting from training split
X_train, X_test, y_train, y_test = train_test_split(data, trainingtarget, train_size = 0.8)

def random_batch(X,y, batch_size=32):
    idx= np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

##Further split train data to training set and validation set

X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.15, random_state=1)

##Run autodiff on model

n_epochs=100
batch_size=32
n_steps=len(X_train)//batch_size

optimizer=tf.keras.optimizers.Adam(lr=0.001)
loss=tf.keras.losses.BinaryCrossentropy()

metricLoss=tf.keras.metrics.BinaryCrossentropy()
metricsAcc=tf.keras.metrics.BinaryAccuracy()

val_acc_metric=tf.keras.metrics.BinaryAccuracy()
val_acc_loss=tf.keras.metrics.BinaryCrossentropy()


train_loss_results = []
train_accuracy_results = []

validation_loss_results = []
validation_accuracy_results = []

# for loop iterate over epochs
for epoch in range(n_epochs):

    print("Epoch {}/{}".format(epoch, n_epochs))

    # for loop iterate over batches
    for step in range(1, n_steps + 1):
        X_batch, y_batch=random_batch(X_train.values, y_train)

        # gradientTape autodiff
        with tf.GradientTape() as tape:
            y_pred=model(X_batch, training=True)
            loss_values=loss(y_batch, y_pred)
        gradients=tape.gradient(loss_values, model.trainable_weights)
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))

        metricLoss(y_batch, y_pred)
        metricsAcc.update_state(y_batch, y_pred)

        # Loss and accuracy
        train_loss_results.append(loss_values)
        train_accuracy_results.append(metricsAcc.result())

        # Read out training results
        readout = 'Epoch {}, Training loss: {}, Training accuracy: {}'
        print(readout.format(epoch + 1, loss_values,
                              metricsAcc.result() * 100))

        metricsAcc.reset_states

        # Run a validation loop at the end of each epoch
        val_logits = model(X_val.values)
        # Update val metrics
        val_acc_metric(y_val, val_logits)
        val_acc = val_acc_metric.result()

        val_acc_metric.update_state(y_val, val_logits)



        val_loss=val_acc_loss(y_val, val_logits)

        validation_loss_results.append(val_loss)
        validation_accuracy_results.append(val_acc_metric.result())

        # Read out validation results
        print( 'Validation loss: ' , float(val_loss),'Validation acc: %s' % (float(val_acc * 100),) )

        val_acc_metric.reset_states()

There is an issue with this at the moment that others are experiencing, where the model accuracy does not reflect the same range of accuracy when running the the standard model.fit method with keras, so keep that in mind... i imagine there will be a fix?/ (or we are doing something really wrong ha!)
tensorflow/tensorflow#35585

hope this helps! the GradientTape method overcomes the unequal sample size problem, but i have yet to test it with the real data to see how it work as I am currently waiting for the data (i was just setting this up to establish the pipeline for it) it works when i split random data from the input in to unequal tensors.

my model is multi-input (2 data sets with unequal length, features are the same though)--with a merged later on for information sharing which then splits of to two independent output nodes (one for each dataset)

@rebeen
Copy link

rebeen commented Apr 4, 2020

@amjass12 currently, I am trying to use two LSTM for datasets where the length of the dataset are different and I used concatenation in Keras and I follow this example. for this example, the number of samples should be the same but If I use gradient tape for this example then I will get an error

import keras
from keras.layers import *
from keras.utils import plot_model

from keras.models import Model
import numpy as np
np.random.seed(0) # Set a random seed for reproducibility
main_input = Input(shape=(15,12), dtype='float32', name='main_input')
lstm_out = LSTM(20)(main_input)
auxiliary_output = Dense(10, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')

x =concatenate([lstm_out, auxiliary_input])
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
main_output = Dense(11, activation='sigmoid', name='main_output')(x)

model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])

headline_data = np.round(np.abs(np.random.rand(12, 180) * 100)).reshape(12,15,12)

additional_data = np.random.randn(12, 5)#.reshape(12,15,12)
headline_labels = np.random.randn(12, 11)
additional_labels = np.random.randn(12, 10)
model.fit([headline_data, additional_data], [headline_labels, additional_labels], epochs=50, batch_size=32)
plot_model(model, to_file='fusion_model.png',show_shapes=True)

@amjass12
Copy link
Author

amjass12 commented Apr 6, 2020

@rebeen

I am not too familiar with LSTM model building or behaviour as I am not working with LSTMs -- but, i apologise as I posted an example of a single model with a SINGLE input (this was practice for a single model)... hopefully the below can help!

starting from splitting the data to x and y train FOR both datasets:


data 1:) X_train, X_test, y_train, y_test = train_test_split(
    normCounts1, trainingtarget1, train_size=0.8)

data 2:) X_trainSC, X_testSC, y_trainSC, y_testSC = train_test_split(
    SCscale1, NormcountsOneHot1, train_size=0.8)

# Build shared model
left_branch_input = tf.keras.Input(shape=(5078,), name='Left_input')
left_branch_outputInt = Dense(
    64, activation='relu', name="secondLayerleft")(left_branch_input)
left_branch_outputInt = Dense(
    32, activation="relu", name="thirdLayeleft")(left_branch_outputInt)

right_branch_input = tf.keras.Input(shape=(5078,), name='right_input')
right_branch_outputInt = Dense(
    64, activation='relu', name="secondLayerright")(right_branch_input)
right_branch_outputInt = Dense(
    32, activation="relu", name="thirdLayerright")(right_branch_outputInt)

sharedLayer = (tf.keras.layers.Dense(100, input_dim=5078, activation="relu"))


bulkConcat = sharedLayer(left_branch_outputInt)
scConcat = sharedLayer(right_branch_outputInt)


finalLeftbulk = tf.keras.layers.Dense(
    24, activation='sigmoid', name='output_1Bulk')(bulkConcat)

finalRightSC = tf.keras.layers.Dense(
    24, activation='sigmoid', name='output_2SC')(scConcat)


model = Model(inputs=[left_branch_input, right_branch_input],
              outputs=[finalLeftbulk, finalRightSC])
n_epochs = 50
batch_size = 32
n_steps = len(X_train)//batch_size

optimizer = tf.keras.optimizers.Adam(lr=0.001)
loss = tf.keras.losses.BinaryCrossentropy()

metricLoss = tf.keras.metrics.BinaryCrossentropy()
metricsAcc = tf.keras.metrics.BinaryAccuracy()

train_loss_results = []
train_accuracy_results = []

val_acc_metric = tf.keras.metrics.BinaryAccuracy()
val_loss = tf.keras.metrics.BinaryCrossentropy()

val_logits_loss_results = []
val_logits_accuracy_results = []


def random_batchMerge(X1, X2, y1, y2, batch_size=32):
    idx = np.random.randint(len(X1), size=batch_size)
    idx1 = np.random.randint(len(X2), size=batch_size)
    return X1[idx], X2[idx1], y1[idx], y2[idx1]

# Further split train data to training set and val_logits set


X_trainBulk, X_valbulk, y_trainBulk, y_valBulk = train_test_split(
    X_train, y_train, test_size=0.15, random_state=1)

X_trainSingle, X_valSC, y_trainSingle, y_valSC = train_test_split(
    X_trainSC, y_trainSC, test_size=0.15, random_state=1)

for epoch in range(n_epochs):

    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch1, x_batch2, y_batch1, y_batch2 = random_batchMerge(X_trainBulk.values, X_trainSingle.values,
                                                                   y_trainBulk, y_trainSingle)

        with tf.GradientTape() as tape:
            y_pred = model([X_batch1, x_batch2], training=True)
            loss_values = loss(y_batch1, y_pred[0]) + loss(y_batch2, y_pred[1])
        gradients = tape.gradient(loss_values, model.trainable_weights)
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))

        metricLoss(y_batch1, y_pred[0]) + metricLoss(y_batch2, y_pred[1])

        metricsAcc.update_state(
            y_batch1, y_pred[0]) + metricsAcc.update_state(y_batch2, y_pred[1])
        acc = metricsAcc.result()

        # Loss and accuracy
        train_loss_results.append(loss_values)
        train_accuracy_results.append(metricsAcc.result())

        readout = 'Epoch {}, Loss: {}, Accuracy: {}'
        print(readout.format(epoch + 1, loss_values,
                             acc * 100))

        # Reset accuracy state at end of epoch
        metricsAcc.reset_states

        # Run a val_logits loop at the end of each epoch
        val_logits = model([X_valbulk.values, X_valSC.values])

        # Update val metrics

        val_acc = val_acc_metric.update_state(
            y_valBulk, val_logits[0]) + val_acc_metric(y_valSC, val_logits[1])
        val_accuracy = val_acc_metric.result()

        val_logits_loss = val_loss(
            y_valBulk, val_logits[0]) + val_loss(y_valSC, val_logits[1])

        val_logits_loss_results.append(val_logits_loss)
        val_logits_accuracy_results.append(val_acc_metric.result())

        # Read out val_logits results
        print('val_logits loss: ', float(val_logits_loss),
              'val_logits acc: %s' % (float(val_accuracy * 100),))

        val_acc_metric.reset_states()

this should get you going! but let me know if you need further help (i am also learning this now, so i am still learning and getting to grips with it) I am happy to talk by email!

@rebeen
Copy link

rebeen commented Apr 8, 2020

@amjass12 rebeencs@gmail.com this is my email I am happy to talk to as well to solve this problem

Thank you and best regards
Rebeen

@SrikarNamburu
Copy link

@amjass12 @rebeen
Hello,
I have two different datasets which are images and audio.
I want to build a multi-input model with input 1 as image and input 2 as audio. I have attached the model architecture below. The input shapes are different for both datasets.
multiple_inputs

When I try to fit the model, receiving the following error, any help would be appreciated.

train_y = np.concatenate([visual_y_train,audio_y_train])
val_y =  np.concatenate([visual_y_val, audio_y_val])

model.fit([visual_x_train, audio_x_train], train_y, validation_data=([visual_x_val, audio_x_val], val_y),epochs=40, batch_size=16)

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(2994, 256, 256, 3), (997, 40, 1)]

@rebeen
Copy link

rebeen commented Apr 15, 2020

I think you have two problem first the number of samples are different and then the shape of the samples are also different first here we are trying to solve the first problem

@rebeen
Copy link

rebeen commented Apr 20, 2020

@SrikarNamburu take a look this code

`
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate

visible1 = Input(shape=(64,64,1))
conv11 = Conv2D(32, kernel_size=4, activation='relu')(visible1)
pool11 = MaxPooling2D(pool_size=(2, 2))(conv11)
conv12 = Conv2D(16, kernel_size=4, activation='relu')(pool11)
pool12 = MaxPooling2D(pool_size=(2, 2))(conv12)
flat1 = Flatten()(pool12)

visible2 = Input(shape=(32,32,3))
conv21 = Conv2D(32, kernel_size=4, activation='relu')(visible2)
pool21 = MaxPooling2D(pool_size=(2, 2))(conv21)
conv22 = Conv2D(16, kernel_size=4, activation='relu')(pool21)
pool22 = MaxPooling2D(pool_size=(2, 2))(conv22)
flat2 = Flatten()(pool22)flat2 = Flatten()(pool22)flat2 = Flatten()(pool22)flat2 = Flatten()(pool22)
merge = concatenate([flat1, flat2])

hidden1 = Dense(10, activation='relu')(merge)
hidden2 = Dense(10, activation='relu')(hidden1)
output = Dense(1f, activation='sigmoid')(hidden2)
model = Model(inputs=[visible1, visible2], outputs=output)

print(model.summary())
plot_model(model, to_file='multiple_inputs.png')`

@rebeen
Copy link

rebeen commented Apr 20, 2020

@rebeen

I am not too familiar with LSTM model building or behaviour as I am not working with LSTMs -- but, i apologise as I posted an example of a single model with a SINGLE input (this was practice for a single model)... hopefully the below can help!

starting from splitting the data to x and y train FOR both datasets:


data 1:) X_train, X_test, y_train, y_test = train_test_split(
    normCounts1, trainingtarget1, train_size=0.8)

data 2:) X_trainSC, X_testSC, y_trainSC, y_testSC = train_test_split(
    SCscale1, NormcountsOneHot1, train_size=0.8)

# Build shared model
left_branch_input = tf.keras.Input(shape=(5078,), name='Left_input')
left_branch_outputInt = Dense(
    64, activation='relu', name="secondLayerleft")(left_branch_input)
left_branch_outputInt = Dense(
    32, activation="relu", name="thirdLayeleft")(left_branch_outputInt)

right_branch_input = tf.keras.Input(shape=(5078,), name='right_input')
right_branch_outputInt = Dense(
    64, activation='relu', name="secondLayerright")(right_branch_input)
right_branch_outputInt = Dense(
    32, activation="relu", name="thirdLayerright")(right_branch_outputInt)

sharedLayer = (tf.keras.layers.Dense(100, input_dim=5078, activation="relu"))


bulkConcat = sharedLayer(left_branch_outputInt)
scConcat = sharedLayer(right_branch_outputInt)


finalLeftbulk = tf.keras.layers.Dense(
    24, activation='sigmoid', name='output_1Bulk')(bulkConcat)

finalRightSC = tf.keras.layers.Dense(
    24, activation='sigmoid', name='output_2SC')(scConcat)


model = Model(inputs=[left_branch_input, right_branch_input],
              outputs=[finalLeftbulk, finalRightSC])
n_epochs = 50
batch_size = 32
n_steps = len(X_train)//batch_size

optimizer = tf.keras.optimizers.Adam(lr=0.001)
loss = tf.keras.losses.BinaryCrossentropy()

metricLoss = tf.keras.metrics.BinaryCrossentropy()
metricsAcc = tf.keras.metrics.BinaryAccuracy()

train_loss_results = []
train_accuracy_results = []

val_acc_metric = tf.keras.metrics.BinaryAccuracy()
val_loss = tf.keras.metrics.BinaryCrossentropy()

val_logits_loss_results = []
val_logits_accuracy_results = []


def random_batchMerge(X1, X2, y1, y2, batch_size=32):
    idx = np.random.randint(len(X1), size=batch_size)
    idx1 = np.random.randint(len(X2), size=batch_size)
    return X1[idx], X2[idx1], y1[idx], y2[idx1]

# Further split train data to training set and val_logits set


X_trainBulk, X_valbulk, y_trainBulk, y_valBulk = train_test_split(
    X_train, y_train, test_size=0.15, random_state=1)

X_trainSingle, X_valSC, y_trainSingle, y_valSC = train_test_split(
    X_trainSC, y_trainSC, test_size=0.15, random_state=1)

for epoch in range(n_epochs):

    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch1, x_batch2, y_batch1, y_batch2 = random_batchMerge(X_trainBulk.values, X_trainSingle.values,
                                                                   y_trainBulk, y_trainSingle)

        with tf.GradientTape() as tape:
            y_pred = model([X_batch1, x_batch2], training=True)
            loss_values = loss(y_batch1, y_pred[0]) + loss(y_batch2, y_pred[1])
        gradients = tape.gradient(loss_values, model.trainable_weights)
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))

        metricLoss(y_batch1, y_pred[0]) + metricLoss(y_batch2, y_pred[1])

        metricsAcc.update_state(
            y_batch1, y_pred[0]) + metricsAcc.update_state(y_batch2, y_pred[1])
        acc = metricsAcc.result()

        # Loss and accuracy
        train_loss_results.append(loss_values)
        train_accuracy_results.append(metricsAcc.result())

        readout = 'Epoch {}, Loss: {}, Accuracy: {}'
        print(readout.format(epoch + 1, loss_values,
                             acc * 100))

        # Reset accuracy state at end of epoch
        metricsAcc.reset_states

        # Run a val_logits loop at the end of each epoch
        val_logits = model([X_valbulk.values, X_valSC.values])

        # Update val metrics

        val_acc = val_acc_metric.update_state(
            y_valBulk, val_logits[0]) + val_acc_metric(y_valSC, val_logits[1])
        val_accuracy = val_acc_metric.result()

        val_logits_loss = val_loss(
            y_valBulk, val_logits[0]) + val_loss(y_valSC, val_logits[1])

        val_logits_loss_results.append(val_logits_loss)
        val_logits_accuracy_results.append(val_acc_metric.result())

        # Read out val_logits results
        print('val_logits loss: ', float(val_logits_loss),
              'val_logits acc: %s' % (float(val_accuracy * 100),))

        val_acc_metric.reset_states()

this should get you going! but let me know if you need further help (i am also learning this now, so i am still learning and getting to grips with it) I am happy to talk by email!

@amjass12 so this is your example when we have equal samples?

@amjass12
Copy link
Author

@rebeen ,

the one with both datasets is for unequal samples. one data set has has (x, 5078 features) and the other (y, 5078) features. x and y are of unequal length (samples)

hope that clears it up

@rebeen
Copy link

rebeen commented Apr 28, 2020

Thank you very much @amjass12

@Dnorious
Copy link

Dnorious commented Apr 29, 2020

@amjass12 ,

I'm creating a multi input model where i concatenate a CNN model and a LSTM model. The lstm model contains the last 5 events and the CNN contains a picture of the last event. Both are organized so that each element k in the numpy matches the 5 events and the corresponding picture, as do the output labels which is the 'next' event that should be predicted by the model.

inputShape = (1,25088)
chanDim = -1
inputs = Input(shape=inputShape)
x = inputs
x = Dense(128)(x)
x = Activation("relu")(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.3)(x)
x = Flatten()(x)

x = Dense(64)(x)
x = Activation("relu")(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.1)(x)

x = Dense(10)(x)
x = Activation("relu")(x)
model_cnn = Model(inputs, x)

This creates the CNN model, and the following code represents the LSTM model

visible = Input(shape=(1,5))
hidden1 = LSTM(128)(visible)
hidden2 = Dense(64, activation='relu')(hidden1)
output = Dense(10, activation='relu')(hidden2)
model_lstm = Model(inputs=visible, outputs=output)

Now, when I combine these models and extend them using a simple dense layer to make the multiclass prediction of 14 classes, all the inputs match and I can concat the (none, 10) and (none, 10) into a (none, 20) for the MLP:

combinedInput = concatenate([model_lstm.output, model_cnn.output])
x = Dense(14, activation="softmax")(x)
model_mlp = Model(inputs=[model_lstm.input, model_cnn.input], outputs=x)

This all works fine until I try to compile the model it gives me an error concerning the input of the last dense layer of the mlp model:

ValueError: Error when checking target: expected dense_121 to have shape (14,) but got array with shape (1,)

Do you know how this is possible? If you need more information I'm happy to provide that

@rebeen
Copy link

rebeen commented May 1, 2020

@amjass12 Hi Amir, I think we cannot have a different number of classes when we use your code in addition to different numbers of the datasets?
have you tested to see if it works when we have different number of classes
thank you

@amjass12
Copy link
Author

amjass12 commented May 1, 2020

@rebeen

hey, do you mean with a common output or 2 nodes (one output per dataset)?

the classes i have in both datasets are identical, but because they are inherently from different data sources, i have 2 output nodes (one for one dataset and one for the other)... however the merge layer is critical because i need to understand which salient features are shared between the two datasets as i want to understand what is concordant from both datasets....

does this make sense?

@rebeen
Copy link

rebeen commented May 1, 2020

sorry let is say we have two datasets, the first dataset has 4 classes and the second dataset hast 5 classes and also the number of samples is different, only the number of features are equal

@amjass12
Copy link
Author

amjass12 commented May 1, 2020

I haven't tried this, but this should be fine, I don't see why the number of classes should affect the initiation of training, I will add a dummy class to one of my datesets and test as I haven't tried before. Will let you know!

Apart from class, the conditions of training for me are the same as yours, same number of features, different sample size...

@rebeen
Copy link

rebeen commented May 1, 2020

Thank you I am also investigating this and will keep you updated
Thank you please let me know if you will check the different number of classes

@amjass12
Copy link
Author

amjass12 commented May 2, 2020

Hey @rebeen

I can confirm different class size works fine: please see attached archetecture and metrics when run fro 50 epochs :)

let me know if you'd like to share code but its literally identical to above..

as you can see, the final node on the left contains 20 classes while the one on the right contains 15 (y_train just clipped[0:15]) this is just dummy meaningless data, but it works. similarly, the left node (left network) contains (e.g 100 samples) and network on right contains (e.g. 80 samples)

so different sample sizes (but same feature number as you can see) AND different class size at the end of network.

Screenshot 2020-05-02 at 13 16 12

my model

@rebeen
Copy link

rebeen commented May 2, 2020

Thank you very much yes it works for me as well

thanks for your effort

@19944180
Copy link

I am trying to concatenate two sequential models and I get the following error when I try to fit the model to two different datasets.
fit_history = final_model.fit(
[train_generator,train_generator1],
epochs=num_epochs,
validation_data=[validation_generator,validation_generator1],
verbose=1,
)
error: ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.keras.preprocessing.image.DirectoryIterator'>"}), <class 'NoneType'>

@rasheed790
Copy link

rasheed790 commented May 19, 2021

Hi amjass12,
Thank you for providing the architecture diagram. Can you please advise on how to produce the classification_report or confusion_matrix in this architecture? I have built a similar architecture but classification_report fails.

Thanks,
Rasheed

Hey @rebeen

I can confirm different class size works fine: please see attached archetecture and metrics when run fro 50 epochs :)

let me know if you'd like to share code but its literally identical to above..

as you can see, the final node on the left contains 20 classes while the one on the right contains 15 (y_train just clipped[0:15]) this is just dummy meaningless data, but it works. similarly, the left node (left network) contains (e.g 100 samples) and network on right contains (e.g. 80 samples)

so different sample sizes (but same feature number as you can see) AND different class size at the end of network.

@mohammad69h94
Copy link

The behavior you want can be achieved using Keras functional API. Giving a short example to give you a hint of what to do. Add more layers or change parameters according to your use case.

from keras.models import Model
from keras.layers import Input, Dense, concatenate
from keras.utils import plot_model
left_branch_input = Input(shape=(2,), name='Left_input')
left_branch_output = Dense(5, activation='relu')(left_branch_input)

right_branch_input = Input(shape=(2,), name='Right_input')
right_branch_output = Dense(5, activation='relu')(right_branch_input)

concat = concatenate([left_branch_output, right_branch_output], name='Concatenate')
final_model_output = Dense(3, activation='sigmoid')(concat)
final_model = Model(inputs=[left_branch_input, right_branch_input], outputs=final_model_output,
                    name='Final_output')
final_model.compile(optimizer='adam', loss='binary_crossentropy')
# To train
final_model.fit([Left_data,Right_data], labels, epochs=10, batch_size=32)

Here's what the model looks like: demo

Hi, Despite concatenating the two branches, I also need to obtain classification results for each branch separately during the training of concatenated branches. Is this possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants