New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demo脚本无法复现sota效果 #1773
Labels
Comments
Repository owner
locked as spam and limited conversation to collaborators
Aug 11, 2022
Repository owner
unlocked this conversation
Aug 11, 2022
第一时间响应:这段脚本适配早期版本 安装 |
成功复现,的确与第三方库的版本有关。需要安装如下版本:
然后运行该脚本后即可复现,日志为:
你也可以在colab上复现该实验:https://colab.research.google.com/drive/12w6qmHg0xyrvnRHOE7oTehRRD_5ZCBlI?usp=sharing |
hankcs
added a commit
that referenced
this issue
Aug 11, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
执行sota脚本无法复现效果
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Describe the current behavior
1、下载sota脚本 HanLP/ plugins / hanlp_demo / hanlp_demo / zh / train_sota_bert_pku.py
2、打开python shell 执行 脚本中的python代码 代码如下:
from hanlp.common.dataset import SortingSamplerBuilder
from hanlp.components.tokenizers.transformer import TransformerTaggingTokenizer
from hanlp.datasets.tokenization.sighan2005.pku import SIGHAN2005_PKU_TRAIN_ALL, SIGHAN2005_PKU_TEST
from tests import cdroot
cdroot()
tokenizer = TransformerTaggingTokenizer()
save_dir = 'data/model/cws/sighan2005_pku_bert_base_96.66'
tokenizer.fit(
SIGHAN2005_PKU_TRAIN_ALL,
SIGHAN2005_PKU_TEST, # Conventionally, no devset is used. See Tian et al. (2020).
save_dir,
'bert-base-chinese',
max_seq_len=300,
char_level=True,
hard_constraint=True,
sampler_builder=SortingSamplerBuilder(batch_size=32),
epochs=10,
adam_epsilon=1e-6,
warmup_steps=0.1,
weight_decay=0.01,
word_dropout=0.1,
seed=1609422632,
)
tokenizer.evaluate(SIGHAN2005_PKU_TEST, save_dir)
Expected behavior
A clear and concise description of what you expected to happen.
System information
Other info / logs
623/623 loss: 1416.3794 P: 59.23% R: 65.39% F1: 62.16% ET: 2 m 6 s
63/63 loss: 451.0509 P: 89.72% R: 89.54% F1: 89.63% ET: 4 s
2 m 9 s / 21 m 33 s ETA: 19 m 24 s (saved)
Epoch 2 / 10:
623/623 loss: 286.6173 P: 31.52% R: 63.34% F1: 42.09% ET: 2 m 8 s
63/63 loss: 486.3771 P: 0.37% R: 8.27% F1: 0.71% ET: 4 s
4 m 22 s / 21 m 50 s ETA: 17 m 28 s (1)
Epoch 3 / 10:
623/623 loss: 211.5965 P: 21.55% R: 61.44% F1: 31.91% ET: 2 m 8 s
63/63 loss: 470.1510 P: 0.39% R: 8.66% F1: 0.75% ET: 4 s
6 m 34 s / 21 m 53 s ETA: 15 m 19 s (2)
Epoch 4 / 10:
623/623 loss: 173.5003 P: 16.42% R: 59.68% F1: 25.75% ET: 2 m 9 s
63/63 loss: 469.0070 P: 0.37% R: 8.32% F1: 0.71% ET: 4 s
8 m 47 s / 21 m 57 s ETA: 13 m 10 s (3)
Epoch 5 / 10:
623/623 loss: 149.7656 P: 13.30% R: 58.04% F1: 21.63% ET: 2 m 9 s
63/63 loss: 482.1117 P: 0.38% R: 8.61% F1: 0.73% ET: 4 s
10 m 59 s / 21 m 59 s ETA: 10 m 59 s (4)
Epoch 6 / 10:
623/623 loss: 131.2736 P: 11.19% R: 56.51% F1: 18.69% ET: 2 m 9 s
63/63 loss: 530.2560 P: 0.36% R: 8.13% F1: 0.69% ET: 4 s
13 m 12 s / 22 m 0 s ETA: 8 m 48 s (5) early stop
Max score of dev is P: 89.72% R: 89.54% F1: 89.63% at epoch 1
Average time of each epoch is 2 m 12 s
13 m 12 s elapsed
P: 89.72% R: 89.54% F1: 89.63%
The text was updated successfully, but these errors were encountered: