Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to load text pairs with CSVClassificationCorpus #2149

Merged
merged 4 commits into from Mar 13, 2021

Conversation

alanakbik
Copy link
Collaborator

@alanakbik alanakbik commented Mar 13, 2021

You can now load a data pair object with the CSV classification corpus, in case you want to train a classifier for sentence pair.

Example:

import torch

from flair.data import Corpus
from flair.datasets import CSVClassificationCorpus
from flair.embeddings import TransformerDocumentEmbeddings

# 1. get you text pair corpus
corpus = CSVClassificationCorpus("path/to/your/dataset",
                                 train_file="what.csv",
                                 column_name_map={0: "text", 1: "pair", 2: "label_entailment"},
                                 skip_header=True,
                                 in_memory=False,
                                 max_chars_per_doc=50,
                                 )

# 2. make the tag dictionary from the corpus
label_dictionary = corpus.make_label_dictionary()

# 3. initialize text pair tagger
from flair.models import TextPairClassifier

tagger = TextPairClassifier(
    document_embeddings=TransformerDocumentEmbeddings(),
    label_dictionary=label_dictionary,
)

# 4. train trainer with AdamW
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)

# 5. run training
trainer.train('resources/taggers/text-pair-classifier',
              learning_rate=2e-5,
              mini_batch_size=4,
              mini_batch_chunk_size=1, # this can be removed if you hae a big GPU
              train_with_dev=True,
              monitor_test=True,
              max_epochs=10)

@alanakbik alanakbik merged commit 51c1b5d into master Mar 13, 2021
@alanakbik alanakbik deleted the text-pair-dataset branch April 22, 2021 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant