Possible error in structure prediction tasks #56

tonytan48 · 2021-01-04T04:09:50Z

Hi xtreme team,
Thank you for your work on proposing the leaderboard. However, it seems the evaluation mode reported for UDPOS is inconsistent with the current release in the code. According to Table.20 POS accuracy results in the paper https://arxiv.org/pdf/2003.11080.pdf. The evaluation metric for POS is accuracy, and the average result for XLM-R is 73.8. However, in the code third_party/run_tag.py it only imports f1_score-related measurements from seqeval and the default eval for UDPOS is actually f1 score. I reproduced the experiment on UDPOS and used different measurements on the test set(Sorry I used leaked test set on my local machine for quicker evaluation). By default script and XLM-R large, I can get average of 74.2 f1_score, which is in line with 73.8 reported. For English, the F1 score is 96.15. However if I evaluate with accuracy score. I got 96.7 accuracy and 78.23 on average. Hence I suspect the evaluation on the leaderboard and in the paper for UDPOS is actually f1 score. Could you help to address this issue? I have a reproduced experiment result here: https://docs.google.com/spreadsheets/d/16Cv0IIdZGOyx6xUawcKScb38Cl3ofy0tHJSdWrt07LI/edit?usp=sharing

sebastianruder · 2021-01-04T16:35:16Z

Thanks for flagging this, @tonytan48. @JunjieHu, could you take a look?

tonytan48 · 2021-01-05T03:57:33Z

@sebastianruder Thanks for the prompt reply. I noticed that in the main table Table 2. The metric for POS is F1. So maybe its just a typo in table 20. Out of curiosity, seems the evaluation metric for POS is mostly accuracy in previous works. Is there some intuition for you to use F1 ?

tonytan48 closed this as completed Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible error in structure prediction tasks #56

Possible error in structure prediction tasks #56

tonytan48 commented Jan 4, 2021 •

edited

Loading

sebastianruder commented Jan 4, 2021

tonytan48 commented Jan 5, 2021

Possible error in structure prediction tasks #56

Possible error in structure prediction tasks #56

Comments

tonytan48 commented Jan 4, 2021 • edited Loading

sebastianruder commented Jan 4, 2021

tonytan48 commented Jan 5, 2021

tonytan48 commented Jan 4, 2021 •

edited

Loading