TextClassifier : reported metrics after training always report precision=1.0 #1748

GuillaumeDD · 2020-07-09T08:04:27Z

Describe the bug
The reported metrics after training always report precision=1.0.

To Reproduce

Training code:

from torch.optim.adam import Adam

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# 1. get the corpus
corpus: Corpus = TREC_6()

# 2. create the label dictionary
label_dict = corpus.make_label_dictionary()

# 3. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True)

# 4. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict)

# 5. initialize the text classifier trainer with Adam optimizer
trainer = ModelTrainer(classifier, corpus, optimizer=Adam)

# 6. start the training
trainer.train('/tmp/taggers/trec',
              learning_rate=3e-5, # use very small learning rate
              mini_batch_size=16,
              mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine
              max_epochs=5, # terminate after 5 epochs
              )

Example of produced report:

2020-07-09 09:50:21,395 Testing using best model ...
2020-07-09 09:50:21,395 loading file /tmp/taggers/trec/best-model.pt
2020-07-09 09:50:27,486         0.964
2020-07-09 09:50:27,487 
Results:
- F-score (micro) 0.9823
- F-score (macro) 0.9745
- Accuracy 0.964

By class:
              precision    recall  f1-score   support

        DESC     1.0000    0.9931    0.9965       145
        ENTY     1.0000    0.8750    0.9333        96
        ABBR     1.0000    0.8889    0.9412         9
         HUM     1.0000    0.9851    0.9925        67
         NUM     1.0000    0.9915    0.9957       117
         LOC     1.0000    0.9762    0.9880        84

   micro avg     1.0000    0.9653    0.9823       518
   macro avg     1.0000    0.9516    0.9745       518
weighted avg     1.0000    0.9653    0.9818       518
 samples avg     1.0000    0.9820    0.9880       518

2020-07-09 09:50:27,487 ----------------------------------------------------------------------------------------------------

Expected behavior
Reports correct metrics.

Screenshots
N/A

Environment (please complete the following information):

OS [e.g. iOS, Linux]: CentOS
Version: flair 0.5.1, scikit-learn==0.23.1

Additional context
Same problem with other datasets.

The text was updated successfully, but these errors were encountered:

alanakbik · 2020-07-09T09:17:52Z

Thanks for reporting this! This seems to be an error in the evaluation routine that occurs if no label_type is passed to the model. Can you run the above code with

# 4. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type='question_type')

I will put in a PR shortly that fixes this if no label type is passed.

(Edit: label_type instead of label_name)

GuillaumeDD · 2020-07-09T09:51:14Z

This does not seem to solve the problem.

Here is what I have tested following the suggested code:

from torch.optim.adam import Adam

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# 1. get the corpus
corpus: Corpus = TREC_6()

# 2. create the label dictionary
label_dict = corpus.make_label_dictionary()

# 3. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True)

# 4. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type='question_type')

# 5. initialize the text classifier trainer with Adam optimizer
trainer = ModelTrainer(classifier, corpus, optimizer=Adam)

# 6. start the training
trainer.train('/tmp/taggers/trec',
              learning_rate=3e-5, # use very small learning rate
              mini_batch_size=16,
              mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine
              max_epochs=5, # terminate after 5 epochs
              )

Results:

2020-07-09 11:45:16,849 ----------------------------------------------------------------------------------------------------
2020-07-09 11:45:16,850 Testing using best model ...
2020-07-09 11:45:16,850 loading file /tmp/taggers/trec/best-model.pt
2020-07-09 11:45:22,217         0.964
2020-07-09 11:45:22,218 
Results:
- F-score (micro) 0.9823
- F-score (macro) 0.9845
- Accuracy 0.964

By class:
              precision    recall  f1-score   support

        ENTY     1.0000    0.8947    0.9444        95
        DESC     1.0000    0.9653    0.9823       144
        ABBR     1.0000    1.0000    1.0000        10
         HUM     1.0000    0.9851    0.9925        67
         NUM     1.0000    1.0000    1.0000       120
         LOC     1.0000    0.9756    0.9877        82

   micro avg     1.0000    0.9653    0.9823       518
   macro avg     1.0000    0.9701    0.9845       518
weighted avg     1.0000    0.9653    0.9820       518
 samples avg     1.0000    0.9820    0.9880       518

2020-07-09 11:45:22,218 ----------------------------------------------------------------------------------------------------

GH-1748: fix TextClassifier evaluation if no label_type is passed

alanakbik · 2020-07-09T10:31:40Z

Argh, you're right. I just pushed a PR that I believe fixes this. Could you try installing from master?

pip install --upgrade git+https://github.com/flairNLP/flair.git

GuillaumeDD · 2020-07-09T12:03:52Z

From the master it looks like better:

2020-07-09 12:46:46,615 ----------------------------------------------------------------------------------------------------
2020-07-09 12:46:46,615 Testing using best model ...
2020-07-09 12:46:46,616 loading file /tmp/taggers/trec/best-model.pt
2020-07-09 12:46:51,939         0.97
2020-07-09 12:46:51,939 
Results:
- F-score (micro) 0.97
- F-score (macro) 0.9665
- Accuracy 0.97

By class:
              precision    recall  f1-score   support

        DESC     0.9384    0.9928    0.9648       138
        ENTY     0.9882    0.8936    0.9385        94
        ABBR     1.0000    0.8889    0.9412         9
         HUM     0.9846    0.9846    0.9846        65
         NUM     0.9739    0.9912    0.9825       113
         LOC     0.9877    0.9877    0.9877        81

   micro avg     0.9700    0.9700    0.9700       500
   macro avg     0.9788    0.9564    0.9665       500
weighted avg     0.9709    0.9700    0.9697       500
 samples avg     0.9700    0.9700    0.9700       500
2020-07-09 12:46:51,939 ----------------------------------------------------------------------------------------------------

ForLittleBeauty · 2020-08-18T06:23:14Z

I have also encountered this problem
By class:
precision recall f1-score support

       4     1.0000    0.4470    0.6179      1322
       1     1.0000    0.5561    0.7148      3064
       3     1.0000    0.5726    0.7282      2513
       2     1.0000    0.3269    0.4927      1661
       5     1.0000    0.7080    0.8290      2325
       0     1.0000    0.8719    0.9316      4677

micro avg 1.0000 0.6427 0.7825 15562
macro avg 1.0000 0.5804 0.7190 15562
weighted avg 1.0000 0.6427 0.7672 15562
samples avg 1.0000 0.7220 0.8147 15562

and I have tried to input:
pip install --upgrade git+https://github.com/flairNLP/flair.git

but here the error comes out:
ERROR: Command errored out with exit status 128: git clone -q https://github.com/flairNLP/flair.git /tmp/pip-req-build-f1vp5jsw Check the logs for full command output.

can you help me @alanakbik

alanakbik · 2020-08-18T14:56:13Z

Try doing a fresh pip install flair. We just released a new flair version so you don't have to install from master.

ForLittleBeauty · 2020-08-19T02:47:02Z

Try doing a fresh pip install flair. We just released a new flair version so you don't have to install from master.

Okay, that works, thank you

GuillaumeDD added the bug Something isn't working label Jul 9, 2020

alanakbik added a commit that referenced this issue Jul 9, 2020

GH-1748: fix TextClassifier evaluation if no label_type is passed

faf3ffb

alanakbik mentioned this issue Jul 9, 2020

GH-1748: fix TextClassifier evaluation if no label_type is passed #1749

Merged

alanakbik closed this as completed in #1749 Jul 9, 2020

alanakbik added a commit that referenced this issue Jul 9, 2020

Merge pull request #1749 from flairNLP/GH-1748-label-name

c9be62c

GH-1748: fix TextClassifier evaluation if no label_type is passed

nightlessbaron mentioned this issue Jul 30, 2020

How to set evaluation metric during training? #1785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextClassifier : reported metrics after training always report precision=1.0 #1748

TextClassifier : reported metrics after training always report precision=1.0 #1748

GuillaumeDD commented Jul 9, 2020 •

edited

alanakbik commented Jul 9, 2020 •

edited

GuillaumeDD commented Jul 9, 2020

alanakbik commented Jul 9, 2020

GuillaumeDD commented Jul 9, 2020

ForLittleBeauty commented Aug 18, 2020

alanakbik commented Aug 18, 2020

ForLittleBeauty commented Aug 19, 2020

TextClassifier : reported metrics after training always report precision=1.0 #1748

TextClassifier : reported metrics after training always report precision=1.0 #1748

Comments

GuillaumeDD commented Jul 9, 2020 • edited

alanakbik commented Jul 9, 2020 • edited

GuillaumeDD commented Jul 9, 2020

alanakbik commented Jul 9, 2020

GuillaumeDD commented Jul 9, 2020

ForLittleBeauty commented Aug 18, 2020

alanakbik commented Aug 18, 2020

ForLittleBeauty commented Aug 19, 2020

GuillaumeDD commented Jul 9, 2020 •

edited

alanakbik commented Jul 9, 2020 •

edited