Why b_IJ is shared between single batch examples. #21

pkubik · 2017-11-07T12:56:37Z

Forgive me if I got this wrong but it seems like the b_IJ are shared between all examples within a single batch (see reduce_sum and the shape).

I didn't see any mention of the batches in the paper, so I have assumed that there is a separate set of b_IJ weights for every batch. Why do you think that it's better to share those variables?

Edit:
I've corrected the statement:

b_IJ are shared between all batches

to:

b_IJ are shared between all examples within a single batch

which is was I originally meant.

The text was updated successfully, but these errors were encountered:

AlexHex7 · 2017-11-09T01:20:19Z

It seems that b_IJ will be init to 0 each batch. Or each time when we invoke call function, the b_IJ will be create to 0. I'm not sure whether my opinion is right, I have not use tensorflow for a long time.

b_IJ = tf.constant(np.zeros([1, input.shape[1].value, self.num_outputs, 1, 1], dtype=np.float32))

naturomics · 2017-11-09T13:31:39Z

@AlexHex7 @pkubik AlexHex7 is right, b_IJ will be re-init to 0 at each batch, it's not shared between batches. Someone(not me) had done experiments about this problem and he told me it does work in that way.

pkubik · 2017-11-09T13:52:24Z

Oh, sorry @AlexHex7 @naturomics . I didn't formulate it correctly. I meant that it (b_IJ) is shared between all examples in a single batch. So I'm not sure whether it is correct to share b_IJ between different examples in the same batch.

pkubik · 2017-11-09T14:10:03Z

To be more specific what I suggest is to change the initialization of b_IJ to:
b_IJ = tf.constant(np.zeros([cfg.batch_size, input.shape[1].value, self.num_outputs, 1, 1], dtype=np.float32))
and remove the reduction from the last line of routing inner loop:
b_IJ += u_produce_v

naturomics · 2017-11-10T13:22:22Z

@pkubik I'm doing a experiment for this problem, please wait for the result of the experiment.

naturomics · 2017-11-12T13:30:20Z

@pkubik Now, I agree with you, though it makes the number of parameters b_IJ be batch_size-related, and experiment shows it doesn't make much difference. here is a related discussion for this problem, It might help us understand why

Queequeg92 · 2017-11-15T10:26:05Z

Different sample, different object, different entities, so different b_ij. I'm wondering how much difference it makes@naturomics? Did you make the experiment on minist?

naturomics · 2017-11-21T15:34:15Z

@Queequeg92 Yeah, I did some experiments on mnist. It doesn't seem to make much difference in terms of classification accuracy. So I didn't release the corresponding result. Maybe trying it on the Fashion-MNIST that mentioned in issue #20 will see the difference, I will try it soon.

pkubik changed the title ~~Why b_IJ is shared between batches.~~ Why b_IJ is shared between single batch examples. Nov 9, 2017

pkubik mentioned this issue Nov 10, 2017

why average b_ij a cross example? #27

Closed

naturomics closed this as completed Nov 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why b_IJ is shared between single batch examples. #21

Why b_IJ is shared between single batch examples. #21

pkubik commented Nov 7, 2017 •

edited

Loading

AlexHex7 commented Nov 9, 2017

naturomics commented Nov 9, 2017 •

edited

Loading

pkubik commented Nov 9, 2017 •

edited

Loading

pkubik commented Nov 9, 2017

naturomics commented Nov 10, 2017

naturomics commented Nov 12, 2017

Queequeg92 commented Nov 15, 2017

naturomics commented Nov 21, 2017

Why b_IJ is shared between single batch examples. #21

Why b_IJ is shared between single batch examples. #21

Comments

pkubik commented Nov 7, 2017 • edited Loading

AlexHex7 commented Nov 9, 2017

naturomics commented Nov 9, 2017 • edited Loading

pkubik commented Nov 9, 2017 • edited Loading

pkubik commented Nov 9, 2017

naturomics commented Nov 10, 2017

naturomics commented Nov 12, 2017

Queequeg92 commented Nov 15, 2017

naturomics commented Nov 21, 2017

pkubik commented Nov 7, 2017 •

edited

Loading

naturomics commented Nov 9, 2017 •

edited

Loading

pkubik commented Nov 9, 2017 •

edited

Loading