Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Implementation of Neural Turing Machine. #1072
After million tries to fix the gradients here is the implementation which passed gradient checks. :)
This implementation introduces the concept of memory first time in our framework thus some new 'Forward' and 'Backward' signatures are introduced (backward compatible) to deal with this extra parameter of memory.
I found out that layers that deal with memory are harder to debug cause the gradient come back in 2 directions, input and memory. To deal with this, I have added a new layer called 'MemoryTest' which fakes a memory content with linear layer and checks the gradient w.r.t. both input and memory. Thus all the layers that accept memory can be checked individually before adding them into bigger framework such as NTM which simulates memory with the combination of these layers.
NTM implementation is broken down into 4 main modules
This modularization will allow us to implement more complicated networks than NTM in later future. For example, NTM with multiple read and write heads. Best example of this is Differential Neural Computer.
1 major thing that is missing is design for different initialization of memory. Current the memory is initialized as constant ones, but according to the paper should be initialised with a bias layer so that the network also learnes the correct initialization. The framework for that is actually similar to the framework used in MemoryTest layer but some design decisions are needed.
I still need to go through the implementation once more to try to optimize wherever possible cause till now I have been paying more attention towards the correct implementation rather than fastest implementation. This PR would help in that as well.
Found another bug in the gradients of Memory Head while testing for Copy Task. Maybe that was the reason for failing tests on Travis. Although the gradients tests are passing on my system. Don;t know the reason for that. Will update as soon as it is resolved...
The gradient test for NTM is failing online, but on my machine, they are passing. I checked the gradients of NTM again but could not find anything. The tests I am running with copy tasks are performing correctly. I am getting above 70 accuracy with sequence length of 10. But the network could be learning to compensate for the wrong gradients. @zoq Can you take a quick look at my gradients in NeuralTuringMachine class?
referenced this pull request
Aug 13, 2017
Looks like this one already has a lot of neat stuff implemented
It would be really neat, however, if one could also use MSE error (e.g.,
added a commit
this pull request
Aug 15, 2017
I'm really excited to see this PR start to come together. It seems like maybe some tests (like the gradient tests for NTM) are not checked in; maybe I overlooked something?
I made some other comments; probably each of them will need to be discussed a bit. Sorry if that takes a long time but I think it will be helpful in the end. :)
@partobs-mdp For the case of NTM the parameter passed on to the FFN object, which is used as a controller are dummy parameter. The parameter passed to the base RNN class will be used instead to initialize the controller network. So my design is, NTM is just a layer, which is added in RNN. So the user can add multiple layers on top of that and use appropriate Error in RNN class.
I am not sure about the design that you are using. Can you tell me the details of your design, maybe I can help you
@zoq I like the idea of creating a visual representation of the recursive grammar. It would make an amazing blog post. I also need to write those scripts for models to get the minimum cell state size required versus the average recursive depth. I am so looking forward to that. Although this M.Sc. dissertation submission is keeping me away from these pleasures. 4 more days and I will be back with some fun experiments :)
@sumedhghaisas: HAM mostly follows the same design (use FFNs for running some sub-steps), but is more complicated, because it uses five different FFNs: one for embedding sequence vectors into memory, one - for traversing the memory (which is tree-based - the noteworthy part of HAM), one - for replacing memory contents, one - for evaluating inner memory nodes by it children, and one - for emitting output sequence.
@rcurtin Before (unsuccessfully) trying to migrate to
As a follow-up: can I use
I think both strategies have their advantages, as pointed out by @partobs-mdp the class definition is much cleaner but on the other side, we are more restricted since we have to each type to layer types. On the other side, we could provide reasonable default values for
About the visitor, if we know the type we don't have to use any visitor we can directly use the function.