Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not reach groundhog performance #71

Closed
critias opened this issue Jan 7, 2016 · 7 comments
Closed

Can not reach groundhog performance #71

critias opened this issue Jan 7, 2016 · 7 comments

Comments

@critias
Copy link

critias commented Jan 7, 2016

Hi,
we are having a hard time to reproduce the results we got with GroundHog using Blocks.
Given the exact same training data, vocabulary, test set and settings we are 3 Bleu points behind GroundHog on a German to English translation task. We tried many different setups and number of iterations, but we can't reach it.
The GroundHog translation costs also seem to have a higher correlation between good and bad sentences then blocks. e.g.:

"vielen Dank ." translated to "thank you ."
a perfect translation and a common phrase which should have a low cost.
GroundHog cost: 0.000250929
Blocks cost: 0.357417

"fliegende Katze ." is translated to "fly away , cat ." not wrong but kind of a strange/unusual sentence.
GroundHog: 0.280177
Blocks cost: 0.267061

Blocks gives "thank you ." a higher cost to then "fly away , cat ." which seems strange to me. I take this as a hint that the problem is mainly related to the model and not to the search. The last comment here:
kyunghyuncho/NMT#21
seems to have the same issue. Has there been any progress on this?
Any tips where the Blocks computation graph differs from the GroundHog graph (it's to large to just look at it an see a difference)? Or other hints what the problem could be?

Thanks,

@orhanf
Copy link
Contributor

orhanf commented Jan 7, 2016

We were getting comparable scores for cs-en when the initial pr was made, around august so the issues in NMT repo might be outdated. iirc the there were fixes at beam-search which uses the generate computational graph (same one we generate samples).

Have you checked whether the cost computational graphs are generating the same cost or not (using the same batch and initial parameters)?

@critias
Copy link
Author

critias commented Jan 8, 2016

Thanks for your fast response,
we didn't try that yet. It's next on the list of things to try. Right now we are looking into something else, I let you know if we find something.

@orhanf
Copy link
Contributor

orhanf commented Jan 8, 2016

Thanks, keep us posted

@rizar
Copy link
Contributor

rizar commented Jan 8, 2016

Henry Choi told me that he was able to reproduce English to French results
with this implementation.

On 8 January 2016 at 15:32, Orhan Firat notifications@github.com wrote:

Thanks, keep us posted


Reply to this email directly or view it on GitHub
#71 (comment)
.

@YilunLiu
Copy link

@critias Hi, I am wondering did you reach the Groudhog performance. If you did, how did you reach that? I am trying the example as well and I cannot reach the performance.

@critias
Copy link
Author

critias commented Jan 16, 2016

Hi,
yes and no. We got roughly equal results on the validation set during training, but not after reloading the saved model. Since we changed the code base a little to reload and translate the model I guess the error is on our side. It's still kinda unclear and we have to look into this in more detail, but were busy with other things last week.
Beside that we also try using orhanfs fork to see if his code to translate works better for us.

@critias
Copy link
Author

critias commented Jan 21, 2016

It turned out the problem was on our side. We changed some minor parts of the code that caused a mismatch between the encoding used to create the vocabulary (just bytes) and the encoding used during training/translation (unicode).
We are now able to reproduce the GroundHog results and even slightly surpassed it (0.4% Bleu).
I'll close the issue. Thanks for your help and keep up the good work.

@critias critias closed this as completed Jan 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants