Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question #8

Closed
Melika-Ayoughi opened this issue Jun 12, 2019 · 7 comments
Closed

A question #8

Melika-Ayoughi opened this issue Jun 12, 2019 · 7 comments

Comments

@Melika-Ayoughi
Copy link

I have a question regarding your implementation:
As I understood the original convolutional lstm formulation is as follows:
Screen Shot 2019-06-12 at 2 24 27 PM

But in your implementation, u used only one convolution layer. I don't understand how these 2 correspond with each other. because in the formulation, c is only used in the Hadamard product and not in convolutions, but here c and h are both used in convolutions.
in fact, all weights are shared for all 4 formulas, although there are 11 weights in the original formula.

@DavideA
Copy link
Collaborator

DavideA commented Jun 12, 2019

Hi @Melika-Ayoughi,

and thank you for your interest.

I'm not sure I understood your concern.
Is it possibly a duplicate of this issue?

Please forgive me if it's not.

D

@Melika-Ayoughi
Copy link
Author

No it's not. I don't have problem with the order.
what u are doing is:
i_t = sigmoid(W *(x_t, h_{t-1}, c_{t-1}))
f_t = sigmoid(W *(x_t, h_{t-1}, c_{t-1}))

So first of all W is the same for all lines, so instead of having W_xi, W_xf, W_hi, W_hf, ... u have one W.
second of all the products are not modelled (for instance W_{ci} .* c_{t-1})

@DavideA
Copy link
Collaborator

DavideA commented Jun 12, 2019

Ok, I think I see your point now.

This repo implements the following dynamic:
image

Which is significantly different from the one you reported. Thank you for pointing it out, we should mention it in the README.

D

@Melika-Ayoughi
Copy link
Author

great! I still don't understand how your implementation has all these weights and not just one weight. can u explain that? because u only convolve x_t and w once. so I assumed there would only be one w and not 8!

@DavideA
Copy link
Collaborator

DavideA commented Jun 12, 2019

Sure!

It's a common trick when dealing with the LSTM family.

  1. As convolution is a linear operation, you can merge the two convolutions for each gate (horizontally, by reading the equations) by concatenating x_t and h_{t-1} and provide the concatenated tensor as input to a single convolution. This leaves us with 4 convolutions. Now,
  2. Since each output channel is computed independently (and each convolution acts on the same input tensor, i.e. the concatenated tensor of point 1), you can actually do a single convolution and split channels afterward. This mechanic becomes clearer by looking at the output channels of the conv (they are 4 * self.hidden_dim) and the line in which they are split:
cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1) 

Does this make sense?

D

@NickLucche
Copy link

I have a question regarding your implementation:
As I understood the original convolutional lstm formulation is as follows:
Screen Shot 2019-06-12 at 2 24 27 PM

But in your implementation, u used only one convolution layer. I don't understand how these 2 correspond with each other. because in the formulation, c is only used in the Hadamard product and not in convolutions, but here c and h are both used in convolutions.
in fact, all weights are shared for all 4 formulas, although there are 11 weights in the original formula.

The reported formulation of the ConvLSTM was first introduced in this paper (I believe): https://arxiv.org/pdf/1506.04214.pdf.
The real difference between this implementation and the reported formulation lies in the fact that the paper 'extends' a variant of the LSTM, where each gates gets a contribution from the past cell state.

@Weiming-Hu
Copy link

Hi, I'm curious are there any particular reasons for implementing this variant of the LSTM model? Rather than the original one from the paper https://arxiv.org/pdf/1506.04214.pdf? I'm specifically asking about using $c_{t-1}$ for $o_t$ rather than in the paper using $c_t$.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants