New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: spacy DocBin
cookbook
#1642
Conversation
Hi, @ignacioct, and thanks for your work!! Maybe it could be nice to use a dataset and show a "real" workflow. The import spacy
import rubrix as rb
from datasets import load_dataset
ds = load_dataset("conll2003", split="train")
rds = rb.DatasetForTokenClassification.from_datasets(ds, tags="ner_tags")
nlp = spacy.blank("en") # A blank nlp pipeline works faster
db = rds.prepare_for_training(framework="spacy", lang=nlp) |
Okay, using that dataset I believe there's one row unsupported, is this a problem for the example or we can go through?
|
No, It's okay. It's just a warning. You can go ahead. Thanks |
That is already implemented, so we can go forward and merge then :) @frascuchon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
(cherry picked from commit 625d153)
Closes #420
Following our work in #420 and #1635, I'm creating a small example into the Cookbook. Is this enough, or do we need something closer to an actual phrase or dataset?