Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduct NER results for ACE05 #6

Closed
YeDeming opened this issue May 4, 2021 · 2 comments
Closed

Reproduct NER results for ACE05 #6

YeDeming opened this issue May 4, 2021 · 2 comments

Comments

@YeDeming
Copy link

YeDeming commented May 4, 2021

Hi Zexuan,

I tried to reproduct the NER results for ACE05. (88.7 for single sentence. 90.1 for cross sentence).

My commands are following:

single sentence:

python run_entity.py     --do_train --do_eval --eval_test     --learning_rate=1e-5 --task_learning_rate=5e-4     --train_batch_size=16     --context_window 0     --task ace05       --data_dir data/ace05     --model ../bert_models/bert-base-uncased       --output_dir models/bsz16_seed0_ctx0

And I get test F1=83.475

cross sentence:

python run_entity.py     --do_train --do_eval --eval_test     --learning_rate=1e-5 --task_learning_rate=5e-4     --train_batch_size=16     --context_window 300     --task ace05       --data_dir data/ace05     --model ../bert_models/bert-base-uncased       --output_dir models/bsz16_seed42_ctx300 --seed 42

And I get test F1=84.96

The numbers are highly lower than those your reported. Is anything wrong?

Best,
Deming

@a3616001
Copy link
Member

a3616001 commented May 4, 2021

Hi Deming,

I just downloaded the code and had a run using this command (single sentence):

python run_entity.py     --do_train --do_eval --eval_test     --learning_rate=1e-5 --task_learning_rate=5e-4     --train_batch_size=16     --context_window 0     --task ace05       --data_dir $DATA     --model bert-base-uncased       --output_dir models/bsz16_seed0_ctx0

Here is the result I got:

05/04/2021 14:22:16 - INFO - root - Accuracy: 0.995173
05/04/2021 14:22:16 - INFO - root - Cor: 4843, Pred TOT: 5495, Gold TOT: 5476
05/04/2021 14:22:16 - INFO - root - P: 0.88135, R: 0.88440, F1: 0.88287
05/04/2021 14:22:16 - INFO - root - Used time: 11.560116
05/04/2021 14:22:27 - INFO - root - Total pred entities: 5495
05/04/2021 14:22:28 - INFO - root - Output predictions to models/bsz16_seed0_ctx0/ent_pred_test.json..

Sanity check: would you be able to reproduce the numbers using the pre-trained models we released?

Best,
Zexuan

@YeDeming
Copy link
Author

YeDeming commented May 5, 2021

Thanks for your rapid reply!

I found my mistake. I foget to lowercase the input text while I change the BertTokenizer in Transformers.

Thanks!

Best,
Deming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants