Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this Reshape step redundant? #3

Closed
hamelsmu opened this issue Jun 16, 2017 · 9 comments
Closed

Is this Reshape step redundant? #3

hamelsmu opened this issue Jun 16, 2017 · 9 comments

Comments

@hamelsmu
Copy link

See this line of code: https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py#L19

Isnt this redundant? Because the Permute layer right before it will reshape the Tensor.

Let me know if I'm missing something. I am trying to understand attention and thus far your writeup is helping

@hamelsmu
Copy link
Author

@philipperemy

Also you don't need this line of code:
https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py#L25

you can pass name = 'name' in any layer

@philipperemy
Copy link
Owner

philipperemy commented Jun 17, 2017

@hamelsmu yes the Reshape layer is redundant and does not add any value to the model (Everything is done by the Permute layer).

It's more to enforce the correct shape. The output of the Permute layer is (?, ?) and by adding this Reshape layer, we make it more clear about the real shapes (they are static and known at compilation time). So I wanted to reflect this idea of static shapes (vs dynamic shapes).

@philipperemy
Copy link
Owner

Thanks for your feedback! Highly appreciated!

a = Dense(TIME_STEPS, activation='softmax', name='attention_vec')(a)
if SINGLE_ATTENTION_VECTOR:
    a = Lambda(lambda x: K.mean(x, axis=1), name='attention_vec')(a)  # this is the attention vector!
    a = RepeatVector(input_dim)(a)

Is this what you meant? Removing the Else clause and adding name='attention_vec' before the If?

@hamelsmu
Copy link
Author

Yeah thats right

@philipperemy
Copy link
Owner

philipperemy commented Jun 18, 2017

It would not work here because we define different layers with the same names twice.

RuntimeError: The name "attention_vec" is used 2 times in the model. All layer names should be unique.

@hamelsmu
Copy link
Author

hamelsmu commented Jun 19, 2017

@philipperemy right. However I suppose you can say that the attention layer is a_probs because that is the layer that is being multiplied by the inputs. So you can re-factor to look like this:

    a = Dense(TIME_STEPS, activation='softmax')(a)
    if SINGLE_ATTENTION_VECTOR:
        a = Lambda(lambda x: K.mean(x, axis=1), name='dim_reduction')(a) 
        a = RepeatVector(input_dim)(a, name='time_repeat')

    a_probs = Permute((2, 1), name='attention_vec')(a)
    output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')

@philipperemy philipperemy changed the title Is this Reshpae step redundant? Is this Reshape step redundant? Jun 19, 2017
@philipperemy
Copy link
Owner

philipperemy commented Jun 19, 2017

Ok seems good for me! The only thing is that attention_vec.shape will change from (1, 2, 20) to (1, 20, 2), where 20 is the number of time steps, and 2 the number of input dims. So we have to change the axis from 1 to 2 (on which we aggregate). Simply because we want to display the vector for the time axis.

attention_vector = np.mean(
      get_activations(
       m,
       testing_inputs_1,
       print_shape_only=True,
       layer_name='attention_vec')[0], axis=2).squeeze()

@philipperemy
Copy link
Owner

philipperemy commented Jun 19, 2017

Let me know if it seems good for you:

PR: #4

@hamelsmu
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants