Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextClassifier : reported metrics after training always report precision=1.0 #1748

Closed
GuillaumeDD opened this issue Jul 9, 2020 · 7 comments · Fixed by #1749
Closed

TextClassifier : reported metrics after training always report precision=1.0 #1748

GuillaumeDD opened this issue Jul 9, 2020 · 7 comments · Fixed by #1749
Labels
bug Something isn't working

Comments

@GuillaumeDD
Copy link

GuillaumeDD commented Jul 9, 2020

Describe the bug
The reported metrics after training always report precision=1.0.

To Reproduce

Training code:

from torch.optim.adam import Adam

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# 1. get the corpus
corpus: Corpus = TREC_6()

# 2. create the label dictionary
label_dict = corpus.make_label_dictionary()

# 3. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True)

# 4. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict)

# 5. initialize the text classifier trainer with Adam optimizer
trainer = ModelTrainer(classifier, corpus, optimizer=Adam)

# 6. start the training
trainer.train('/tmp/taggers/trec',
              learning_rate=3e-5, # use very small learning rate
              mini_batch_size=16,
              mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine
              max_epochs=5, # terminate after 5 epochs
              )

Example of produced report:

2020-07-09 09:50:21,395 Testing using best model ...
2020-07-09 09:50:21,395 loading file /tmp/taggers/trec/best-model.pt
2020-07-09 09:50:27,486         0.964
2020-07-09 09:50:27,487 
Results:
- F-score (micro) 0.9823
- F-score (macro) 0.9745
- Accuracy 0.964

By class:
              precision    recall  f1-score   support

        DESC     1.0000    0.9931    0.9965       145
        ENTY     1.0000    0.8750    0.9333        96
        ABBR     1.0000    0.8889    0.9412         9
         HUM     1.0000    0.9851    0.9925        67
         NUM     1.0000    0.9915    0.9957       117
         LOC     1.0000    0.9762    0.9880        84

   micro avg     1.0000    0.9653    0.9823       518
   macro avg     1.0000    0.9516    0.9745       518
weighted avg     1.0000    0.9653    0.9818       518
 samples avg     1.0000    0.9820    0.9880       518

2020-07-09 09:50:27,487 ----------------------------------------------------------------------------------------------------

Expected behavior
Reports correct metrics.

Screenshots
N/A

Environment (please complete the following information):

  • OS [e.g. iOS, Linux]: CentOS
  • Version: flair 0.5.1, scikit-learn==0.23.1

Additional context
Same problem with other datasets.

@GuillaumeDD GuillaumeDD added the bug Something isn't working label Jul 9, 2020
@alanakbik
Copy link
Collaborator

alanakbik commented Jul 9, 2020

Thanks for reporting this! This seems to be an error in the evaluation routine that occurs if no label_type is passed to the model. Can you run the above code with

# 4. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type='question_type')

I will put in a PR shortly that fixes this if no label type is passed.

(Edit: label_type instead of label_name)

@GuillaumeDD
Copy link
Author

This does not seem to solve the problem.

Here is what I have tested following the suggested code:

from torch.optim.adam import Adam

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# 1. get the corpus
corpus: Corpus = TREC_6()

# 2. create the label dictionary
label_dict = corpus.make_label_dictionary()

# 3. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True)

# 4. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type='question_type')

# 5. initialize the text classifier trainer with Adam optimizer
trainer = ModelTrainer(classifier, corpus, optimizer=Adam)

# 6. start the training
trainer.train('/tmp/taggers/trec',
              learning_rate=3e-5, # use very small learning rate
              mini_batch_size=16,
              mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine
              max_epochs=5, # terminate after 5 epochs
              )

Results:

2020-07-09 11:45:16,849 ----------------------------------------------------------------------------------------------------
2020-07-09 11:45:16,850 Testing using best model ...
2020-07-09 11:45:16,850 loading file /tmp/taggers/trec/best-model.pt
2020-07-09 11:45:22,217         0.964
2020-07-09 11:45:22,218 
Results:
- F-score (micro) 0.9823
- F-score (macro) 0.9845
- Accuracy 0.964

By class:
              precision    recall  f1-score   support

        ENTY     1.0000    0.8947    0.9444        95
        DESC     1.0000    0.9653    0.9823       144
        ABBR     1.0000    1.0000    1.0000        10
         HUM     1.0000    0.9851    0.9925        67
         NUM     1.0000    1.0000    1.0000       120
         LOC     1.0000    0.9756    0.9877        82

   micro avg     1.0000    0.9653    0.9823       518
   macro avg     1.0000    0.9701    0.9845       518
weighted avg     1.0000    0.9653    0.9820       518
 samples avg     1.0000    0.9820    0.9880       518

2020-07-09 11:45:22,218 ----------------------------------------------------------------------------------------------------

alanakbik added a commit that referenced this issue Jul 9, 2020
GH-1748: fix TextClassifier evaluation if no label_type is passed
@alanakbik
Copy link
Collaborator

Argh, you're right. I just pushed a PR that I believe fixes this. Could you try installing from master?

pip install --upgrade git+https://github.com/flairNLP/flair.git 

@GuillaumeDD
Copy link
Author

From the master it looks like better:

2020-07-09 12:46:46,615 ----------------------------------------------------------------------------------------------------
2020-07-09 12:46:46,615 Testing using best model ...
2020-07-09 12:46:46,616 loading file /tmp/taggers/trec/best-model.pt
2020-07-09 12:46:51,939         0.97
2020-07-09 12:46:51,939 
Results:
- F-score (micro) 0.97
- F-score (macro) 0.9665
- Accuracy 0.97

By class:
              precision    recall  f1-score   support

        DESC     0.9384    0.9928    0.9648       138
        ENTY     0.9882    0.8936    0.9385        94
        ABBR     1.0000    0.8889    0.9412         9
         HUM     0.9846    0.9846    0.9846        65
         NUM     0.9739    0.9912    0.9825       113
         LOC     0.9877    0.9877    0.9877        81

   micro avg     0.9700    0.9700    0.9700       500
   macro avg     0.9788    0.9564    0.9665       500
weighted avg     0.9709    0.9700    0.9697       500
 samples avg     0.9700    0.9700    0.9700       500
2020-07-09 12:46:51,939 ----------------------------------------------------------------------------------------------------

@ForLittleBeauty
Copy link

I have also encountered this problem
By class:
precision recall f1-score support

       4     1.0000    0.4470    0.6179      1322
       1     1.0000    0.5561    0.7148      3064
       3     1.0000    0.5726    0.7282      2513
       2     1.0000    0.3269    0.4927      1661
       5     1.0000    0.7080    0.8290      2325
       0     1.0000    0.8719    0.9316      4677

micro avg 1.0000 0.6427 0.7825 15562
macro avg 1.0000 0.5804 0.7190 15562
weighted avg 1.0000 0.6427 0.7672 15562
samples avg 1.0000 0.7220 0.8147 15562

and I have tried to input:
pip install --upgrade git+https://github.com/flairNLP/flair.git

but here the error comes out:
ERROR: Command errored out with exit status 128: git clone -q https://github.com/flairNLP/flair.git /tmp/pip-req-build-f1vp5jsw Check the logs for full command output.

can you help me @alanakbik

@alanakbik
Copy link
Collaborator

Try doing a fresh pip install flair. We just released a new flair version so you don't have to install from master.

@ForLittleBeauty
Copy link

Try doing a fresh pip install flair. We just released a new flair version so you don't have to install from master.

Okay, that works, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants