New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement 'prepare_for_training' for text classification datasets #1209
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1209 +/- ##
==========================================
+ Coverage 95.23% 95.28% +0.04%
==========================================
Files 124 124
Lines 5143 5155 +12
==========================================
+ Hits 4898 4912 +14
+ Misses 245 243 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I agree we shouldn't rely on spaCy. Let me think of alternatives |
The server part computes tags at token level. You could use for this purpose. If i remember the computed tags are in the IOB format. It could be a begining |
Definitely, that's more than enough! |
Nice, totally forgot about this. Hm, should I copy the logic to the client part? Shall we require the user to upload/load the records before calling |
Not sure about the user flow here. We can discuss tomorrow in a call @dcfidalgo |
This PR implements the
prepare_for_training
method for theDatasetForTextClassification
. I modified the first tutorial to show the usage: https://rubrix.readthedocs.io/en/feat-prepare_for_training/tutorials/01-labeling-finetuning.htmlI don't think I will manage to implement the
prepare_for_training
for all tasks before the release.@dvsrepo @frascuchon For the TokenClassification task, should we rely on spacy utilities to make the conversion from spans to ner tags, or should we provide our own utilities? To me it would seem a bit strange if we required spacy to be installed, to prepare the dataset for training with transformers ... @dvsrepo Do you know of any other library that provides some utilities for converting spans to ner tags?