Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convergence is really slow with copy task when sequence length is smaller #13

Closed
wangshaonan opened this issue Jul 6, 2018 · 2 comments

Comments

@wangshaonan
Copy link

Hi,

I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

figure_1

@loudinthecloud
Copy link
Owner

Interesting question. IMHO there a few things that may affect this behavior.

First of all, the NTM is trying to learn a "for loop" and showing only short examples makes it hard for the NTM to generalize.

Second, the parameters of the networks are used more frequently with larger sequence lengths, yielding more stable gradients and making it harder for the network to converge to a local minima that basically memorize the patterns.

Third, the capacity of the network (the number of parameters) plays a role in this as well, in the extreme, with a sequence length of 1 and large capacity, the easiest thing for network to learn is to memorize the inputs instead of learning the rule.

I increased the batch size and decreased the NTM's memory size, and the network with sequence lengths of 1 to 5 converged in less than 30K training samples. There are still fluctuations since the gradients are not stable enough.

./train.py -psequence_max_len=5 -pbatch_size=2 -pmemory_n=20

@wangshaonan
Copy link
Author

Thank you for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants