Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何复现SIGHAN15数据集上的结果? #26

Open
HaoyuanPeng opened this issue Mar 21, 2022 · 3 comments
Open

如何复现SIGHAN15数据集上的结果? #26

HaoyuanPeng opened this issue Mar 21, 2022 · 3 comments

Comments

@HaoyuanPeng
Copy link

HaoyuanPeng commented Mar 21, 2022

我的复现步骤如下:

  1. 下载https://drive.google.com/file/d/1gX9YYcGpR44BsUgoJDtbWqV-4PsaWpq1/view?usp=sharing 文件,解压到model/bert目录
  2. 将train.sh中的dname改为SIGHAN15
  3. 运行sh train.sh开始训练

读取数据的日志正常:
16541 Model construction finished. training number is 1800, dev number is 1100, test_num is 1100 Maximum train sequence length: 96, dev sequence length: 110, test sequence length: 110 data is ready

按照训练1000个epoch后,日志显示
gBatch 19000, lBatch 19, loss 0.13127, loss_crf 0.07497, loss_ft 0.00051 At epoch 999, official dev f1 : 0.950363, precision : 0.950363, recall : 0.950363 At this run, the maximum dev f1:0.950363, dev precision:0.950363, dev recall:0.950363

从代码来看,这个应该是字级别的accuracy,并不是纠错的指标。从ckpt/SIGHAN15_0.5/dev_pred.txt的预测结果看,错误也很多,与论文中80%的sentence-level correction F1不符。

请问如何复现论文结果,最好包括训练步骤和测试步骤,谢谢!

@Amber921463001
Copy link

请问你预测句子的时候数字和阿拉伯字母会变成吗?

@Skywalker-Harrison
Copy link

我也遇到了同样的问题, 阿拉伯数字会纠正成汉字

@LucasSpider
Copy link

我不理解为什么f1和precision recall是一样的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants