-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing set in annotation declaration #54
Comments
Agreed that this is too lax and we should probably remove this behaviour. It is used in practise though so it would be a change from FoLiA v1.6 forward then as we can't demand this from older versions due to backward compatibility. On a related note: I thikn we should also be strict in demanding declarations for structural elements (token, paragraph, sentence), these are now optional, but if you really want the declarations to be meaningful they'd better be strict too. For the lazy users we can provide a tool that automatically generates some ad-hoc declarations. |
yes, let's do this:
I am still a bit reluctant towards making declarations mandatory. But in the long run it might be needed. So start requiring this too. |
Just for clarity: of course the libraries should still remain capable of parsing pre-1.6 documents (with the missing setnames and all). The upgrade script is not a replacement for that but just an additional tool. |
We do have something extra to consider; for certain annotation types set is optional (this applies to a lot of structure elemnets), or in rare cases not present at all perhaps even. In such cases a declaration without I also want to enforce that if there is a set, then there must be a class on the annotations (and obviously if there is no set, there can't be a class on annotations). |
In summary, for FoLiA 2.0:
|
Both the C++ and the Python implementation seem to accept annotation declarations without a set,
from example.xml:
<token-annotation annotator="ilktok" annotatortype="auto" />
The documentation states:
The set attribute is mandatory
with a footnote:
Technically, it can be omitted, but then the set defaults to “undefined”. This is allowed for flexibility and less explicit usage of FoLiA in limited settings, but not recommended!
I think this to lax, and set names should be mandatory unconditionally.
For instance: We run into trouble when a module would like to add another token-annotation.
per definition there is no default set anymore then, but it is rather complicated or impossible to assign a set to the already existing tokens, to distinguish those from the newly added ones.
afik, these nameless declaration are quite rare, probably only in testfiles???
We could investigate this, but NOT allowing this is important.
The text was updated successfully, but these errors were encountered: