predict.py to work with .conllu files NOT annotated for dependencies? #26

lmompela · 2021-10-05T12:29:21Z

Hi there,

I was wondering whether there was a way for me to use predict.py with my corpus data (.conllu) which is not annotated for dependencies, but is annotated for POS. My goal is not to calculate evaluation metrics, at the moment, but rather have my pretrained model give me predictions on dependencies to hopefully get a head start with dependency annotations. I am working on an underdocumented language and would like to have a first row of dependencies predictions that I would then go back to, verify and update to create the GOLD standard for my language.

Is there a reason my input file have to conform to the conllu format other than for evaluation metrics? My issue seems to be that my "head" and "deprel" columns are not integers but simply "_" because they're empty. I would preferably like to keep the .conllu format of my input file as it contains POS information already which could give me better predictions.

Thank you for the research, it's super helpful, especially for underdocumented languages.

Here is my error message :

lmompela · 2021-10-12T20:59:00Z

Nevermind, found a way around! Thanks

andidyer · 2024-02-28T16:53:43Z

Nevermind, found a way around! Thanks

I suppose this is a while ago, but do you remember what your solution was? I am also trying to parse a file without annotated dependencies and facing this issue.

Maybe the solution is just to fill each head field with the index of the token minus 1? This is what I did and it worked. The format looked a bit like this:

# sent_id = 1
# text = "This is a sentence"
1    This    _    _    _    _    0    _    _    _
2    is    _    _    _    _    1    _    _    _
3    a    _    _    _    _    2    _    _    _
4    sentence    _    _    _    _    3    _    _    _

# sent_id = 2
# text = "This is another sentence"
1    This    _    _    _    _    0    _    _    _
2    is    _    _    _    _    1    _    _    _
3    another    _    _    _    _    2    _    _    _
4    sentence    _    _    _    _    3    _    _    _

lmompela closed this as completed Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict.py to work with .conllu files NOT annotated for dependencies? #26

predict.py to work with .conllu files NOT annotated for dependencies? #26

lmompela commented Oct 5, 2021

lmompela commented Oct 12, 2021

andidyer commented Feb 28, 2024 •

edited

predict.py to work with .conllu files NOT annotated for dependencies? #26

predict.py to work with .conllu files NOT annotated for dependencies? #26

Comments

lmompela commented Oct 5, 2021

lmompela commented Oct 12, 2021

andidyer commented Feb 28, 2024 • edited

andidyer commented Feb 28, 2024 •

edited