Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About 'X' label #1

Closed
kugwzk opened this issue Mar 19, 2019 · 22 comments
Closed

About 'X' label #1

kugwzk opened this issue Mar 19, 2019 · 22 comments
Labels
good first issue Good for newcomers

Comments

@kugwzk
Copy link

kugwzk commented Mar 19, 2019

In my opinion, you should remove the 'X' label's signal in evaluation, because you add more label than standard dataset, so I can't know very well the F1-score increase because the more label of 'X'. I think the 'X' label is not equal the 'O' label in standard dataset and the BERT paper, but in your code it may be same.

@kamalkraj
Copy link
Owner

kamalkraj commented Mar 19, 2019

Label 'X' is not considered for f1 metrics

if m and label_map[label_ids[i][j]] != "X":

Label 'X' is not equal to Label 'O'

@kugwzk
Copy link
Author

kugwzk commented Mar 19, 2019

But did you use in train?

@kamalkraj
Copy link
Owner

kamalkraj commented Mar 19, 2019

For training I used "X".
While Inference and f1 metrics only the first output label of token is considered . same as in BERT paper

@kugwzk
Copy link
Author

kugwzk commented Mar 19, 2019

I think add more label in the conll2003 NER standard dataset make it not very comparable for previous works. Could you remove 'X' label during training and get a similar result?

@kamalkraj
Copy link
Owner

If you remove "X" while training or replace "X" with "O" , model performance will drop to ~89 f1 score

@kugwzk
Copy link
Author

kugwzk commented Mar 19, 2019

So this is my opinion, use ‘X’ label make it high F1-score, it's not fair. I get the similar result about 91.3 F1 score. And I think BERT origin paper is also remove 'X' label because they use document information to get high F1-score. In short, 'X' label don't have any signal.

@kamalkraj
Copy link
Owner

91.3 without using "X" ??

@kugwzk
Copy link
Author

kugwzk commented Mar 19, 2019

Yes. I get the word piece output from BERT model, and then map the first token's vector so I get the same numbers vectors as the standard dataset. And then use a softmax matrix to get the final result. But I only use the BERTModel in pytorch_pretrained_bert.

@kamalkraj
Copy link
Owner

For example
After extracting features for the sentence below
Jim Hen ##son was a puppet ##eer
You're giving only [Jim , Hen , was , a, puppet] hidden states to linear classification layer ?

@kugwzk
Copy link
Author

kugwzk commented Mar 19, 2019

Yes. Because I think the fine tune bert could learn this pattern.

@kamalkraj
Copy link
Owner

I will try this way and let you know

@kamalkraj
Copy link
Owner

kamalkraj commented Mar 19, 2019

@kugwzk
Can you share your code ?
How are you handling padding after extracting first sub-token hidden states from BERT ?

@kugwzk
Copy link
Author

kugwzk commented Mar 19, 2019

I just record the origin word position use a dict in python. For example: [Jim Hen ##son was a puppet ##eer] for [0,1,3,4,5], so I padding the origin word sequence again in the classifier layer. It may be slowly :).

@alphanlp
Copy link

i think we can add mask to X label when training

@ereday
Copy link

ereday commented May 2, 2019

I agree with @kugwzk on "misusage of X label". @tkukurin In the latest version, are you still using X during training (or evaluation) or have you already removed it as suggested ?

@kamalkraj
Copy link
Owner

@ereday
Latest version still use X , I have code without usingX label , I need to clean the code a bit. I will try to push code by monday

@Nic-Ma
Copy link

Nic-Ma commented May 6, 2019

Hi @kamalkraj ,

I am Nic from NVIDIA, thanks for your contribution on this project!
I tried to replace [CLS], X with O directly, and removed [SEP].
I think you supposed to release new code today, have you finished?
Thanks.

@kamalkraj
Copy link
Owner

kamalkraj commented May 6, 2019

@toxic2m
Check out branch experiment

@kamalkraj kamalkraj added the good first issue Good for newcomers label May 6, 2019
@Nic-Ma
Copy link

Nic-Ma commented May 8, 2019

Hi @kamalkraj ,

Actually, I already done this part locally, and I suggest you to map [CLS] and [SEP] to O directly.
Then your FC layer only output real number of classifications, can get better performance.
Thanks.

@sbmaruf
Copy link

sbmaruf commented May 16, 2019

Is there anyone able to reproduce BERT_NER paper's results (92.4F1 for BERT Base) ?
The experiment on the master branch supports the result given in the BERT paper. But as @kugwzk mentioned the problem. Is there anyone able to reproduce the results without inferred X tags?

@kugwzk
Copy link
Author

kugwzk commented May 21, 2019

@sbmaruf The result of Conll03 NER reported in the BERT origin paper used document context, which is different from the standard sentence-based evaluation. You can see something about that in here: allenai/allennlp#2067 (comment)

@sbmaruf
Copy link

sbmaruf commented May 24, 2019

@kugwzk thanks for your reply. but how to add document level context with NER?
any idea or code repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants