Convergence is really slow with copy task when sequence length is smaller #13

wangshaonan · 2018-07-06T07:46:50Z

Hi,

I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

loudinthecloud · 2018-07-06T09:07:51Z

Interesting question. IMHO there a few things that may affect this behavior.

First of all, the NTM is trying to learn a "for loop" and showing only short examples makes it hard for the NTM to generalize.

Second, the parameters of the networks are used more frequently with larger sequence lengths, yielding more stable gradients and making it harder for the network to converge to a local minima that basically memorize the patterns.

Third, the capacity of the network (the number of parameters) plays a role in this as well, in the extreme, with a sequence length of 1 and large capacity, the easiest thing for network to learn is to memorize the inputs instead of learning the rule.

I increased the batch size and decreased the NTM's memory size, and the network with sequence lengths of 1 to 5 converged in less than 30K training samples. There are still fluctuations since the gradients are not stable enough.

./train.py -psequence_max_len=5 -pbatch_size=2 -pmemory_n=20

wangshaonan · 2018-07-06T09:14:42Z

Thank you for your reply.

loudinthecloud closed this as completed Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convergence is really slow with copy task when sequence length is smaller #13

Convergence is really slow with copy task when sequence length is smaller #13

wangshaonan commented Jul 6, 2018

loudinthecloud commented Jul 6, 2018

wangshaonan commented Jul 6, 2018

Convergence is really slow with copy task when sequence length is smaller #13

Convergence is really slow with copy task when sequence length is smaller #13

Comments

wangshaonan commented Jul 6, 2018

loudinthecloud commented Jul 6, 2018

wangshaonan commented Jul 6, 2018