You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to retrain your model as a baseline, and till now SWDA gave the results as per the paper. actually, slightly better. But for the DailyDialog dataset, even after multiple runs the best we got is, (row1 is no validation, row2 on test set
A, E, G are for sim_bow
BLEU-R | BLEU-P | F1 | A | E | G
0.305 | 0.170 | 0.218 | 0.940 | 0.609 | 0.857
0.298 | 0.163 | 0.211 | 0.940 | 0.605 | 0.857
Whereas the paper mentions the best results to be
Was there any changes made to the code with respect to the configuration in the paper? I couldn't find any discrepancy. Can you point me to what might be the issue?
The text was updated successfully, but these errors were encountered:
Thanks for pointing out.
There seems to be a big deviation to the original results since recently.
Somebody reported better results than that reported in the paper for the DailyDial dataset.
We are not sure whether it is due to any change of environment other than those written in the "requirements.txt". We are figuring it out and will let you know.
Ok, thanks. Although we were using an environment as per the requirements.txt only. Also like you said, we also noticed quite a bit of variance between different runs. (Even when the seed is given as an argument)
Also, in #2341 it was shown that the NLTK lib has some issues, related to the SmoothingFunction() and therefore received an update to fix it.
Hence, it is no longer possible to achieved the same results.
Hi,
I am trying to retrain your model as a baseline, and till now SWDA gave the results as per the paper. actually, slightly better. But for the DailyDialog dataset, even after multiple runs the best we got is, (row1 is no validation, row2 on test set
A, E, G are for sim_bow
BLEU-R | BLEU-P | F1 | A | E | G
0.305 | 0.170 | 0.218 | 0.940 | 0.609 | 0.857
0.298 | 0.163 | 0.211 | 0.940 | 0.605 | 0.857
Whereas the paper mentions the best results to be
Was there any changes made to the code with respect to the configuration in the paper? I couldn't find any discrepancy. Can you point me to what might be the issue?
The text was updated successfully, but these errors were encountered: