Can not reach groundhog performance #71

critias · 2016-01-07T17:09:41Z

Hi,
we are having a hard time to reproduce the results we got with GroundHog using Blocks.
Given the exact same training data, vocabulary, test set and settings we are 3 Bleu points behind GroundHog on a German to English translation task. We tried many different setups and number of iterations, but we can't reach it.
The GroundHog translation costs also seem to have a higher correlation between good and bad sentences then blocks. e.g.:

"vielen Dank ." translated to "thank you ."
a perfect translation and a common phrase which should have a low cost.
GroundHog cost: 0.000250929
Blocks cost: 0.357417

"fliegende Katze ." is translated to "fly away , cat ." not wrong but kind of a strange/unusual sentence.
GroundHog: 0.280177
Blocks cost: 0.267061

Blocks gives "thank you ." a higher cost to then "fly away , cat ." which seems strange to me. I take this as a hint that the problem is mainly related to the model and not to the search. The last comment here:
kyunghyuncho/NMT#21
seems to have the same issue. Has there been any progress on this?
Any tips where the Blocks computation graph differs from the GroundHog graph (it's to large to just look at it an see a difference)? Or other hints what the problem could be?

Thanks,

orhanf · 2016-01-07T18:33:23Z

We were getting comparable scores for cs-en when the initial pr was made, around august so the issues in NMT repo might be outdated. iirc the there were fixes at beam-search which uses the generate computational graph (same one we generate samples).

Have you checked whether the cost computational graphs are generating the same cost or not (using the same batch and initial parameters)?

critias · 2016-01-08T16:55:58Z

Thanks for your fast response,
we didn't try that yet. It's next on the list of things to try. Right now we are looking into something else, I let you know if we find something.

orhanf · 2016-01-08T20:32:56Z

Thanks, keep us posted

rizar · 2016-01-08T23:08:16Z

Henry Choi told me that he was able to reproduce English to French results
with this implementation.

On 8 January 2016 at 15:32, Orhan Firat notifications@github.com wrote:

Thanks, keep us posted

—
Reply to this email directly or view it on GitHub
#71 (comment)
.

YilunLiu · 2016-01-15T05:17:20Z

@critias Hi, I am wondering did you reach the Groudhog performance. If you did, how did you reach that? I am trying the example as well and I cannot reach the performance.

critias · 2016-01-16T07:44:35Z

Hi,
yes and no. We got roughly equal results on the validation set during training, but not after reloading the saved model. Since we changed the code base a little to reload and translate the model I guess the error is on our side. It's still kinda unclear and we have to look into this in more detail, but were busy with other things last week.
Beside that we also try using orhanfs fork to see if his code to translate works better for us.

critias · 2016-01-21T18:04:19Z

It turned out the problem was on our side. We changed some minor parts of the code that caused a mismatch between the encoding used to create the vocabulary (just bytes) and the encoding used during training/translation (unicode).
We are now able to reproduce the GroundHog results and even slightly surpassed it (0.4% Bleu).
I'll close the issue. Thanks for your help and keep up the good work.

critias closed this as completed Jan 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not reach groundhog performance #71

Can not reach groundhog performance #71

critias commented Jan 7, 2016

orhanf commented Jan 7, 2016

critias commented Jan 8, 2016

orhanf commented Jan 8, 2016

rizar commented Jan 8, 2016

YilunLiu commented Jan 15, 2016

critias commented Jan 16, 2016

critias commented Jan 21, 2016

Can not reach groundhog performance #71

Can not reach groundhog performance #71

Comments

critias commented Jan 7, 2016

orhanf commented Jan 7, 2016

critias commented Jan 8, 2016

orhanf commented Jan 8, 2016

rizar commented Jan 8, 2016

YilunLiu commented Jan 15, 2016

critias commented Jan 16, 2016

critias commented Jan 21, 2016