You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering whether there was a way for me to use predict.py with my corpus data (.conllu) which is not annotated for dependencies, but is annotated for POS. My goal is not to calculate evaluation metrics, at the moment, but rather have my pretrained model give me predictions on dependencies to hopefully get a head start with dependency annotations. I am working on an underdocumented language and would like to have a first row of dependencies predictions that I would then go back to, verify and update to create the GOLD standard for my language.
Is there a reason my input file have to conform to the conllu format other than for evaluation metrics? My issue seems to be that my "head" and "deprel" columns are not integers but simply "_" because they're empty. I would preferably like to keep the .conllu format of my input file as it contains POS information already which could give me better predictions.
Thank you for the research, it's super helpful, especially for underdocumented languages.
Here is my error message :
The text was updated successfully, but these errors were encountered:
I suppose this is a while ago, but do you remember what your solution was? I am also trying to parse a file without annotated dependencies and facing this issue.
Maybe the solution is just to fill each head field with the index of the token minus 1? This is what I did and it worked. The format looked a bit like this:
# sent_id = 1# text = "This is a sentence"
1 This _ _ _ _ 0 _ _ _
2 is _ _ _ _ 1 _ _ _
3 a _ _ _ _ 2 _ _ _
4 sentence _ _ _ _ 3 _ _ _
# sent_id = 2# text = "This is another sentence"
1 This _ _ _ _ 0 _ _ _
2 is _ _ _ _ 1 _ _ _
3 another _ _ _ _ 2 _ _ _
4 sentence _ _ _ _ 3 _ _ _
Hi there,
I was wondering whether there was a way for me to use predict.py with my corpus data (.conllu) which is not annotated for dependencies, but is annotated for POS. My goal is not to calculate evaluation metrics, at the moment, but rather have my pretrained model give me predictions on dependencies to hopefully get a head start with dependency annotations. I am working on an underdocumented language and would like to have a first row of dependencies predictions that I would then go back to, verify and update to create the GOLD standard for my language.
Is there a reason my input file have to conform to the conllu format other than for evaluation metrics? My issue seems to be that my "head" and "deprel" columns are not integers but simply "_" because they're empty. I would preferably like to keep the .conllu format of my input file as it contains POS information already which could give me better predictions.
Thank you for the research, it's super helpful, especially for underdocumented languages.
Here is my error message :
The text was updated successfully, but these errors were encountered: