Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token annotation error for XML output with non-standard rules #82

Closed
marijnschraagen opened this issue Sep 12, 2019 · 3 comments
Closed
Assignees
Labels

Comments

@marijnschraagen
Copy link

Maybe related to #80?

When using XML output with non-standard rules there is a token-annotation error. Command:
frog -t myfile.txt -X myresult.xml --language=nld-vnn

Output:

frog 0.19 (c) CLTS, ILK 1998 - 2019
CLST  - Centre for Language and Speech Technology,Radboud University
ILK   - Induction of Linguistic Knowledge Research Group,Tilburg University
based on [ucto 0.19, libfolia 2.4, timbl 6.4.14, ticcutils 0.23, mbt 3.5]
removing old debug files using: 'find frog.*.debug -mtime +1 -exec rm {} \;'
frog-:config read from: /usr/local/share/frog/nld-vnn/frog.cfg
frog-:Missing [[mbma]] section in config file.
frog-:Disabled the Morhological analyzer.
frog-:Missing [[IOB]] section in config file.
frog-:Disabled the IOB Chunker.
frog-:Missing [[NER]] section in config file.
frog-:Disabled the NER.
frog-:Missing [[mwu]] section in config file.
frog-:Disabled the Multi Word Unit.
frog-:Also disabled the parser.
frog-mblem-:Initiating lemmmmatizer...
ucto: textcat configured from: /usr/local/share/ucto/textcat.cfg
frog-tok-:Language List =[nld-vnn]
ucto: No useful settingsfile(s) could be found (initiating from language list: [nld-vnn])
frog-tagger-tagger-:reading subsets from /usr/local/share/frog/nld-vnn//babsub.cgn
frog-tagger-tagger-:reading constraints from /usr/local/share/frog/nld-vnn//babconstraints.cgn
frog-:Thu Sep 12 19:09:35 2019 Initialization done.
frog-:Thu Sep 12 19:09:35 2019 Frogging myfile.txt
[first sentence processed ok, removed here]

Word(class='WORD-COMPOUND',generate_id='myfile.txt.p.1.s.1',
set='tokconfig-nld-vnn',space='no') creation failed: DeclarationError:
Set 'tokconfig-nld-vnn' is used but has no declaration for token-annotation

The regular column-based output works without any problems.

@proycon
Copy link
Member

proycon commented Sep 15, 2019

I can indeed replicate this. It seems related to LanguageMachines/ucto#72 .

@kosloot
Copy link
Collaborator

kosloot commented Sep 16, 2019

Well.... The problem is here that frog uses the 'language' nld-vnn which refers to the configuration in /usr/local/share/frog/nld-vnn/
Ucto is then initialized from /usr/local/share/frog/nld-vnn/frog.cfg using:

[[tokenizer]]
rulesFile=tokconfig-nld-historical

So for ucto the language is nld-historical

This is confusing for us as well the software....

When I run Frog like this:
frog -c /usr/local/share/frog/nld-vnn/frog.cfg -X uit.xml -t txt
all seem well.
So that might be a quick workaround.

As a matter of fact, I am inclined to think that this is an abuse of the --language parameter.
It is meant to give frog a hint about the languages to detect, and NOT to tell which configuration to use.

When using --languages, frog should ignore the rulesFile information from the frog config file.
This was so until @proycon "fixed" it in #80
That was putting the cart before the horse probably.

We need to rethink this.

@kosloot
Copy link
Collaborator

kosloot commented Jun 26, 2020

fixed according to #80

@kosloot kosloot closed this as completed Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants