Is there any sample code for fine-tuning BERT on sequence labeling tasks, e.g., NER on CoNLL-2003? #1216

tuvuumass · 2019-09-06T18:38:07Z

❓ Questions & Help

Is there any sample code for fine-tuning BERT on sequence labeling tasks, e.g., NER on CoNLL-2003, using BertForTokenClassification?

stefan-it · 2019-09-07T22:24:33Z

Hi @tuvuumass,

Issue #64 is a good start for sequence labeling tasks. It also points to some repositories that show how to fine-tune BERT with PyTorch-Transformers (with focus on NER).

Nevertheless, it would be awesome to get some kind of fine-tuning examples (reference implementation) integrated into this outstanding PyTorch-Transformers library 🤗 Maybe run_glue.py could be a good start 🤔

tuvuumass · 2019-09-07T22:34:04Z

Thanks, @stefan-it. I found #64 too. But it seems like none of the repositories in #64 could replicate BERT's results (i.e., 96.6 dev F1 and 92.8 test F1 for BERT large, 96.4 dev F1 and 92.4 test F1 for BERT base). Yes, I agree that it would be great if there is a fine-tuning example for sequence labeling tasks.

thomwolf · 2019-09-08T09:01:17Z

Yes I think it would be nice to have a clean example showing how the model can be trained and used on a token classification task like NER.

We won’t have the bandwidth/use-case to do that internally but if someone in the community has a (preferably self contained) script he can share, happy to welcome a PR and include it in the repo.

Maybe you have something Stefan?

stefan-it · 2019-09-09T23:38:12Z

Update on that:

I used the data preprocessing functions and forward implementation from @kamalkraj's BERT-NER ported it from pytorch-pretrained-bert to pytorch-transformers, and integrated it into a run_glue copy 😅

Fine-tuning is working - evaluation on dev set (using a BERT base and cased model):

           precision    recall  f1-score   support

      PER     0.9713    0.9745    0.9729      1842
     MISC     0.8993    0.9197    0.9094       922
      LOC     0.9769    0.9679    0.9724      1837
      ORG     0.9218    0.9403    0.9310      1341

micro avg     0.9503    0.9562    0.9533      5942
macro avg     0.9507    0.9562    0.9534      5942

Evaluation on test set:

09/09/2019 23:20:02 - INFO - __main__ -   
           precision    recall  f1-score   support

      LOC     0.9309    0.9287    0.9298      1668
     MISC     0.7937    0.8276    0.8103       702
      PER     0.9614    0.9549    0.9581      1617
      ORG     0.8806    0.9145    0.8972      1661

micro avg     0.9066    0.9194    0.9130      5648
macro avg     0.9078    0.9194    0.9135      5648

Trained for 5 epochs using the default parameters from run_glue. Each epoch took ~5 minutes on a RTX 2080 TI.

However, it's an early implementation and maybe (with a little help from @kamalkraj) we can integrate it here 🤗

olix20 · 2019-09-10T14:43:09Z

@stefan-it could you pls share your fork? thanks :)

stefan-it · 2019-09-13T10:21:28Z

@olix20 Here's the first draft of an implementation:

https://gist.github.com/stefan-it/feb6c35bde049b2c19d8dda06fa0a465

(Just a gist at the moment) :)

stecklin · 2019-09-17T15:14:20Z

After working with BERT-NER for a few days now, I tried to come up with a script that could be integrated here.
Compared to that repo and @stefan-it's gist, I tried to do the following:

Use the default BertForTokenClassification class instead modifying the forward pass in a subclass. For that to work, I changed the way label ids are stored: I use the real label ids for the first sub-token of each word and padding ids for the remaining sub-tokens. Padding ids get ignored in the cross entropy loss function, instead of picking only the desired tokens in a for loop before feeding them to the loss computation.
Log metrics to tensorboard.
Remove unnecessary parts copied over from glue (e.g. DataProcessor class).

kamalkraj · 2019-10-22T12:04:03Z

BERT-NER using tensorflow 2.0
https://github.com/kamalkraj/BERT-NER-TF

stale · 2019-12-21T12:36:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aj-232425 · 2022-07-29T10:10:54Z

Similar, can we use conll type/format data to fine tune BERT for relation extraction..!!?

stecklin mentioned this issue Sep 17, 2019

Implement fine-tuning BERT on CoNLL-2003 named entity recognition task #1275

Merged

stale bot added the wontfix label Dec 21, 2019

stale bot closed this as completed Dec 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any sample code for fine-tuning BERT on sequence labeling tasks, e.g., NER on CoNLL-2003? #1216

Is there any sample code for fine-tuning BERT on sequence labeling tasks, e.g., NER on CoNLL-2003? #1216

tuvuumass commented Sep 6, 2019 •

edited

Loading

stefan-it commented Sep 7, 2019

tuvuumass commented Sep 7, 2019 •

edited

Loading

thomwolf commented Sep 8, 2019

stefan-it commented Sep 9, 2019

olix20 commented Sep 10, 2019

stefan-it commented Sep 13, 2019

stecklin commented Sep 17, 2019

kamalkraj commented Oct 22, 2019

stale bot commented Dec 21, 2019

Aj-232425 commented Jul 29, 2022

Is there any sample code for fine-tuning BERT on sequence labeling tasks, e.g., NER on CoNLL-2003? #1216

Is there any sample code for fine-tuning BERT on sequence labeling tasks, e.g., NER on CoNLL-2003? #1216

Comments

tuvuumass commented Sep 6, 2019 • edited Loading

❓ Questions & Help

stefan-it commented Sep 7, 2019

tuvuumass commented Sep 7, 2019 • edited Loading

thomwolf commented Sep 8, 2019

stefan-it commented Sep 9, 2019

olix20 commented Sep 10, 2019

stefan-it commented Sep 13, 2019

stecklin commented Sep 17, 2019

kamalkraj commented Oct 22, 2019

stale bot commented Dec 21, 2019

Aj-232425 commented Jul 29, 2022

tuvuumass commented Sep 6, 2019 •

edited

Loading

tuvuumass commented Sep 7, 2019 •

edited

Loading