Add readout_activation param to models #199

gibrown · 2017-06-30T19:42:33Z

The enables avoiding getting stuck in a NaN loss hole when training. This workaround let's the user fix #189

Example usage:

model = Seq2Seq(input_dim=in_dim, input_length=MAXLENGTH, hidden_dim=HIDDEN_SIZE, output_length=MAXLENGTH, output_dim=out_dim, depth=LAYERS, peek=True, readout_activation='softmax')

I do not fully understand the implications of using softmax as the output activation layer, but in my own project (https://github.com/Automattic/wp-translate) setting the output to softmax using this code does seem to have gotten me past getting stuck with NaN during training.

- setting the readout_activation to something like softmax can avoid NaN in the training - see farizrahman4u#189

ChristopherLu · 2017-07-02T17:14:04Z

Does this change include model.add(TimeDistributed(Dense(output_dim)))?

gibrown · 2017-07-03T16:38:42Z

@ChristopherLu no, it only let's you change the activation function in the decoder output.

BTW, I have since found that this change did not completely solve my problem. Training worked for longer, but I still eventually ran into NaN losses at some point. I am not sure yet whether this is a problem with the data I am providing the model or if it is some complexity in how the network itself fits together.

ChristopherLu · 2017-07-03T16:45:57Z

@gibrown Exactly. I finally met the NaN problem when the number of training epochs goes up. I guess it's a gradient vanishing problem, and it also depends on the learning rate you set.

So can we say, the softmax activation can only alleviate the NaN, but not solve the problem essentially?

gibrown · 2017-07-07T17:51:02Z

@ChristopherLu ya that is my conclusion. Based on this thread and the common problems across applications, I am guessing that it is something inherent in how the functions of the model fit together that for some data you can easily end up in such conditions.

The other possibility is that I have a bug in generating my training data, but I've been looking at that for a while and haven't found it. My next plan (when I get back to this, probably in a few weeks) is to try the Tensorflow seq2seq directly: https://www.tensorflow.org/tutorials/seq2seq

I had tried that method in the past (like a year ago), but been unable to get it to work. I think it has been significantly updated though and that tutorial looks improved. I guess maybe that model could be ported into this lib if that works.

ChristopherLu · 2017-07-07T21:48:51Z

@gibrown Thx. About to try tf seq2seq as well.

gibrown · 2017-07-11T22:34:50Z

@ChristopherLu I've reworked my application to use https://github.com/google/seq2seq and that seems to be working well so far.

ChristopherLu · 2017-07-12T21:30:22Z

@gibrown Thx for the recommendation, I will have a try.

cpury · 2017-08-07T13:33:47Z

This did not make my training converge :(

gibrown · 2017-08-09T22:11:12Z

Ya I don't think this is a real solution to the original problem, I'm just going to close this PR.

Add readout_activation param to models

fbb035f

- setting the readout_activation to something like softmax can avoid NaN in the training - see farizrahman4u#189

gibrown mentioned this pull request Jun 30, 2017

using categorical_crossentropy get loss result is nan #189

Open

gibrown closed this Aug 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add readout_activation param to models #199

Add readout_activation param to models #199

gibrown commented Jun 30, 2017

ChristopherLu commented Jul 2, 2017

gibrown commented Jul 3, 2017

ChristopherLu commented Jul 3, 2017

gibrown commented Jul 7, 2017

ChristopherLu commented Jul 7, 2017

gibrown commented Jul 11, 2017

ChristopherLu commented Jul 12, 2017

cpury commented Aug 7, 2017

gibrown commented Aug 9, 2017

Add readout_activation param to models #199

Add readout_activation param to models #199

Conversation

gibrown commented Jun 30, 2017

ChristopherLu commented Jul 2, 2017

gibrown commented Jul 3, 2017

ChristopherLu commented Jul 3, 2017

gibrown commented Jul 7, 2017

ChristopherLu commented Jul 7, 2017

gibrown commented Jul 11, 2017

ChristopherLu commented Jul 12, 2017

cpury commented Aug 7, 2017

gibrown commented Aug 9, 2017