-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting from conllu to json via cli vs. biluo_tags_from_offsets #5740
Comments
The two conversions are doing very different things. The first one is converting tags and dependency parses from conllu to json (note that there's no NER info found by the converter -- it's likely the spacy 2.1.9 converter doesn't handle your particular conllu+NER format) and the second one is using automatically tagged and parsed docs with entities added from your annotation. Which model(s) are you trying to train? Just an NER model? Also a tagger and a parser? Can you show a sample of your conllu data (at least one full sentence, anonymized if needed)? |
An excerpt from the source conllu file (the misbehaving sentence). Note, ner tag info (Tag=B-LOC eg) is present in the source file:
The corresponding generated json part (without ner tag and causing cycle):
The training error message:
Actually neither spacy 2.3.1 nor 2.1.9 is able to convert to json so that model can train on resulting files. I'm training just |
I tried a different Russian corpus from here but still error messages while training:
Am I missing something or there is a bug? |
I'll close this issue since I think the underlying question is addressed in #5753. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
How to reproduce the behaviour
I'm trying to convert from
conllu
format to json to use json formatted data incli
training.The following conversion scenario leads to error message while training:
The excerpt from resulting json:
The error message:
However when I try conversion from spacy's "simple" format to json via:
everything, including training, goes fine.
The resulting json seems to be different (note
raw
tag andner
), but working:Question:
Is me converting through cli wrong and could it be cured?
Your Environment
The text was updated successfully, but these errors were encountered: