Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement a conv layer with different filter sizes (Zhang & Wallace 2015) ? #6547

Closed
ben0it8 opened this issue May 8, 2017 · 13 comments

Comments

@ben0it8
Copy link

ben0it8 commented May 8, 2017

Hello,

I'm trying to reproduce the CNN architecture proposed in this paper, which has the following 1-CNN-layer architecture with two of each of the varying filter sizes (+ global maxpooling and dropout):
screen shot 2017-05-09 at 00 49 10

Is there a way to implement this architecture in Keras?

Best,
ben0it8

@kgrm
Copy link

kgrm commented May 9, 2017

Apply different convolutional layers on the same input and merge their outputs?

@fmailhot
Copy link

fmailhot commented May 22, 2017

I'm in the middle of figuring this out myself. Here's what I think is necessary.

  1. You'll need to replicated your inputs across each of the input "channels" (i.e. for each filter width).
  2. You're doing a "concatenate" merge after the GlobalMaxPooling1D on the Conv1D layer outputs (it looks like there are 2 "merges" happening in the diagram, but I don't believe it's necessary.

Have a look at the following for inspiration:
https://gist.github.com/ameasure/944439a04546f4c02cb9
https://statcompute.wordpress.com/2017/01/08/an-example-of-merge-layer-in-keras/

Let me know if you've made any progress, and I'll do the same.

@fmailhot
Copy link

Here's what I ended up doing, which appears to be doing the right thing, but I'm still new enough to Keras that I haven't figured out how to introspect this properly to make sure...

submodels = []
for kw in (3, 4, 5):    # kernel sizes
    submodel = Sequential()
    submodel.add(Embedding(len(word_index) + 1,
                           EMBEDDING_DIM,
                           weights=[embedding_matrix],
                           input_length=MAX_SEQUENCE_LENGTH,
                           trainable=False))
    submodel.add(Conv1D(FILTERS,
                        kw,
                        padding='valid',
                        activation='relu',
                        strides=1))
    submodel.add(GlobalMaxPooling1D())
    submodels.append(submodel)
big_model = Sequential()
big_model.add(Merge(submodels, mode="concat"))
big_model.add(Dense(HIDDEN_DIMS))
big_model.add(Dropout(P_DROPOUT))
big_model.add(Activation('relu'))
big_model.add(Dense(1))
big_model.add(Activation('sigmoid'))
print('Compiling model')
big_model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

@ben0it8
Copy link
Author

ben0it8 commented May 22, 2017

I was trying to fit your implementation but got:
ValueError: The model expects 3 input arrays, but only received one array. Found: array with shape (48943, 300)

Any idea?

@fmailhot
Copy link

Yes, this is what I meant about "replicating the inputs"...sorry, I should have included the fit() call to clarify.

hist = big_model.fit([x_train, x_train, x_train],
                     y_train,
                     batch_size=BATCH_SIZE,
                     epochs=EPOCHS,
                     validation_data=([x_val, x_val, x_val], y_val),
                     callbacks=callbacks)

You can see...I have x_train and x_val as my training/validation inputs...because I'm using three different filter sizes, it's like the net is expecting 3 different input streams. By turning my inputs into a list of NUM_KERNEL_SIZES times the inputs, that gets handled.

@ben0it8
Copy link
Author

ben0it8 commented May 23, 2017

thank you for sharing that!

@stale stale bot added the stale label Aug 22, 2017
@stale
Copy link

stale bot commented Aug 22, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@nikicc
Copy link

nikicc commented Sep 11, 2017

The same problem seems to be addressed and solved in this issue using the Graph model.

@stale stale bot removed the stale label Sep 11, 2017
@stale
Copy link

stale bot commented Dec 10, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the stale label Dec 10, 2017
@wt-huang
Copy link

Closing as this is resolved

@Yash-099
Copy link

Here's what I ended up doing, which appears to be doing the right thing, but I'm still new enough to Keras that I haven't figured out how to introspect this properly to make sure...

submodels = []
for kw in (3, 4, 5):    # kernel sizes
    submodel = Sequential()
    submodel.add(Embedding(len(word_index) + 1,
                           EMBEDDING_DIM,
                           weights=[embedding_matrix],
                           input_length=MAX_SEQUENCE_LENGTH,
                           trainable=False))
    submodel.add(Conv1D(FILTERS,
                        kw,
                        padding='valid',
                        activation='relu',
                        strides=1))
    submodel.add(GlobalMaxPooling1D())
    submodels.append(submodel)
big_model = Sequential()
big_model.add(Merge(submodels, mode="concat"))
big_model.add(Dense(HIDDEN_DIMS))
big_model.add(Dropout(P_DROPOUT))
big_model.add(Activation('relu'))
big_model.add(Dense(1))
big_model.add(Activation('sigmoid'))
print('Compiling model')
big_model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

In the above code as @fmailhot mentioned. When I tried to compile it says there is no Layer named Merge().

@hazemAmir
Copy link

I had the same problem with the Merge() function
I solved it by downgrading keras:
pip uninstall keras
pip install keras==2.1.2

@gamertrue
Copy link

Yes, this is what I meant about "replicating the inputs"...sorry, I should have included the fit() call to clarify.

hist = big_model.fit([x_train, x_train, x_train],
                     y_train,
                     batch_size=BATCH_SIZE,
                     epochs=EPOCHS,
                     validation_data=([x_val, x_val, x_val], y_val),
                     callbacks=callbacks)

You can see...I have x_train and x_val as my training/validation inputs...because I'm using three different filter sizes, it's like the net is expecting 3 different input streams. By turning my inputs into a list of NUM_KERNEL_SIZES times the inputs, that gets handled.

I tried to this however I got an error message "list indices must be integers or slices, not ListWrapper". And I didn't use Merge but Concatenate instead... Does anyone have a solution to this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants