New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing simple neural attention model (for padded inputs) #2612

Closed
anayebi opened this Issue May 4, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@anayebi
Copy link
Contributor

anayebi commented May 4, 2016

I have an input sequence (because it is variable length, used the Keras pad_sequences function to zero pad it), and I wanted to basically average the outputs of my RNN and concatenate it with its prediction at the last timestep, which would then be the model's prediction.

Here is my code to better visualize what I mean (I'm using the Keras 1.0 Model class):

input_sentences = Input(shape=(max_sen_length,), dtype='int32')
x = Embedding(input_dim=vocab_size, output_dim=vocab_dim, input_length=max_sen_length, mask_zero=True)(input_sentences)
lstm_out = LSTM(150, return_sequences=True)(x)
attention = TimeDistributedDense(32, activation='softmax')(lstm_out)
context = Lambda(avg_timesteps, output_shape=avg_timesteps_output_shape)(attention)
last_lstm_out = Lambda(last_timestep, output_shape=last_timestep_output_shape)(lstm_out)
concat = Merge([last_lstm_out, context], mode='concat')

Note that avg_timesteps and last_timestep were custom functions I wrote to work with the Lambda layers (I won't include them here for brevity).

The issue I am running into is that Lambda layers do not support masking, so I was not able to get the above code to work. Is there a way to essentially implement my context layer (which is the average of the outputs of the TimeDistributedDense layer)?

I am aware of the various Keras threads about implementing neural attention, but this is a relatively simple version of that (since my model's overall prediction is based on the last timestep, so I'm just averaging the predictions up to then and concatenating them with the RNN's last output, last_lstm_out), based on Section 3.1 of this paper. I figure this is probably doable, so I'd be happy to hear anyone's thoughts on this. Thanks!

@anayebi anayebi changed the title Averaging output of RNN and then concatenating it with final prediction (for padded inputs) Implementing simple neural attention model (for padded inputs) May 4, 2016

@braingineer

This comment has been minimized.

Copy link
Contributor

braingineer commented May 5, 2016

I have one here that supports masking:

https://gist.github.com/braingineer/27c6f26755794f6544d83dec2dd27bbb

Note that after attention, there's no need for a mask (it's collapsed across the masked dimension)

@braingineer

This comment has been minimized.

Copy link
Contributor

braingineer commented May 5, 2016

if you need access to both the probability distribution AND the attention, you can do the ProbabilityTensor in that gist and then use it with

class EZAttend(Layer):
    def __init__(self, p_tensor, *args, **kwargs):
        self.supports_masking = True
        self.p_tensor = p_tensor
        super(EZAttend, self).__init__(*args, **kwargs)

    def compute_mask(self, x, mask=None):
        return None

    def get_output_shape_for(self, input_shape):
        last_dim = K.ndim(self.p_tensor)
        output_shape = list(input_shape)
        output_shape.pop(last_dim-1)
        return tuple(output_shape)

    def call(self, target_tensor, mask=None):
        last_dim = K.ndim(self.p_tensor)
        expanded_p = K.repeat_elements(K.expand_dims(self.p_tensor, last_dim), 
                                       K.shape(target_tensor)[last_dim], 
                                       axis=last_dim)
        return K.sum(expanded_p * target_tensor, axis=last_dim-1)
@braingineer

This comment has been minimized.

Copy link
Contributor

braingineer commented May 5, 2016

also, I have a lambda layer with masking that I haven't had time to do a PR for (I put masking on a lot of the core.py layers because I do a lot of masked sequence stuff). (also, this one comes with no guarantees)

https://gist.github.com/braingineer/b64ca35223c7782667984d34ddb7a7fa

@anayebi

This comment has been minimized.

Copy link
Contributor

anayebi commented May 5, 2016

Thanks, @braingineer, I will check these out!

@anayebi anayebi closed this May 5, 2016

@anayebi

This comment has been minimized.

Copy link
Contributor

anayebi commented Jul 25, 2016

@braingineer thanks again, your implementation was very helpful, and I was able to use it (citing your code, of course) in a course project to implement a new recurrent attention model, found here: https://github.com/anayebi/CS-224U-Project

Specifically, the file softattention.py in that repo uses the code you posted and adds on additional custom layers for neural attention (specifically recurrent/LSTM-like attentional layers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment