Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why b_IJ is shared between single batch examples. #21

Closed
pkubik opened this issue Nov 7, 2017 · 8 comments
Closed

Why b_IJ is shared between single batch examples. #21

pkubik opened this issue Nov 7, 2017 · 8 comments

Comments

@pkubik
Copy link

pkubik commented Nov 7, 2017

Forgive me if I got this wrong but it seems like the b_IJ are shared between all examples within a single batch (see reduce_sum and the shape).

I didn't see any mention of the batches in the paper, so I have assumed that there is a separate set of b_IJ weights for every batch. Why do you think that it's better to share those variables?

Edit:
I've corrected the statement:

b_IJ are shared between all batches

to:

b_IJ are shared between all examples within a single batch

which is was I originally meant.

@AlexHex7
Copy link

AlexHex7 commented Nov 9, 2017

It seems that b_IJ will be init to 0 each batch. Or each time when we invoke call function, the b_IJ will be create to 0. I'm not sure whether my opinion is right, I have not use tensorflow for a long time.

b_IJ = tf.constant(np.zeros([1, input.shape[1].value, self.num_outputs, 1, 1], dtype=np.float32))

@naturomics
Copy link
Owner

naturomics commented Nov 9, 2017

@AlexHex7 @pkubik AlexHex7 is right, b_IJ will be re-init to 0 at each batch, it's not shared between batches. Someone(not me) had done experiments about this problem and he told me it does work in that way.

@pkubik
Copy link
Author

pkubik commented Nov 9, 2017

Oh, sorry @AlexHex7 @naturomics . I didn't formulate it correctly. I meant that it (b_IJ) is shared between all examples in a single batch. So I'm not sure whether it is correct to share b_IJ between different examples in the same batch.

@pkubik pkubik changed the title Why b_IJ is shared between batches. Why b_IJ is shared between single batch examples. Nov 9, 2017
@pkubik
Copy link
Author

pkubik commented Nov 9, 2017

To be more specific what I suggest is to change the initialization of b_IJ to:
b_IJ = tf.constant(np.zeros([cfg.batch_size, input.shape[1].value, self.num_outputs, 1, 1], dtype=np.float32))
and remove the reduction from the last line of routing inner loop:
b_IJ += u_produce_v

@naturomics
Copy link
Owner

@pkubik I'm doing a experiment for this problem, please wait for the result of the experiment.

@naturomics
Copy link
Owner

@pkubik Now, I agree with you, though it makes the number of parameters b_IJ be batch_size-related, and experiment shows it doesn't make much difference. here is a related discussion for this problem, It might help us understand why

@Queequeg92
Copy link

Different sample, different object, different entities, so different b_ij. I'm wondering how much difference it makes@naturomics? Did you make the experiment on minist?

@naturomics
Copy link
Owner

@Queequeg92 Yeah, I did some experiments on mnist. It doesn't seem to make much difference in terms of classification accuracy. So I didn't release the corresponding result. Maybe trying it on the Fashion-MNIST that mentioned in issue #20 will see the difference, I will try it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants