Skip to content

v2.0.1 - spacy_component, trainer

Choose a tag to compare

@shon-otmazgin shon-otmazgin released this 25 Oct 11:26
· 64 commits to main since this release

Adding the following features:

  1. Spacy component (Thanks to @mlostar )
from fastcoref import spacy_component
import spacy


texts = ['Alice goes down the rabbit hole. Where she would discover a new reality beyond her expectations.']

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("fastcoref")

docs = nlp(texts)
docs[0]._.coref_clusters
> [[(0, 5), (39, 42), (79, 82)]]
  1. Trainer
from fastcoref import TrainingArgs, CorefTrainer

args = TrainingArgs(
    output_dir='test-trainer',
    overwrite_output_dir=True,
    model_name_or_path='distilroberta-base',
    device='cuda:2',
    epochs=129,
    logging_steps=100,
    eval_steps=100
)   # you can control other arguments such as learning head and others.

trainer = CorefTrainer(
    args=args,
    train_file='train_file_with_clusters.jsonlines', 
    dev_file='path-to-dev-file',    # optional
    test_file='path-to-test-file'   # optional
)
trainer.train()
trainer.evaluate(test=True)

trainer.push_to_hub('your-fast-coref-model-path')
  1. predict now support output file:
from fastcoref import LingMessCoref

model = LingMessCoref()
preds = model.predict(texts=texts, output_file='train_file_with_clusters.jsonlines')