You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The file unilm/src/gigaword/eval.py follows the gigaword data preprocessing as in https://github.com/harvardnlp/sent-summary, which substitutes all the digits to #. However, our bpe tokenizer uses # to indicate subwords. So we preprocess the special token # to 1, and then replace them back after prediction. Notice that the unilm/src/cnndm/eval.py script doesn't use the preprocess.
In this line (code for evaluation of CNNDM)
unilm/src/gigaword/eval.py
Line 239 in d22a233
1
is replaced by#
.I don't understand it. Can someone explain me the reason of such post-processing ?
The text was updated successfully, but these errors were encountered: