Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing set in annotation declaration #54

Closed
kosloot opened this issue Sep 4, 2018 · 5 comments
Closed

missing set in annotation declaration #54

kosloot opened this issue Sep 4, 2018 · 5 comments
Assignees
Labels
enhancement ready Implemented but not released yet
Milestone

Comments

@kosloot
Copy link
Collaborator

kosloot commented Sep 4, 2018

Both the C++ and the Python implementation seem to accept annotation declarations without a set,
from example.xml:
<token-annotation annotator="ilktok" annotatortype="auto" />

The documentation states:
The set attribute is mandatory

with a footnote:
Technically, it can be omitted, but then the set defaults to “undefined”. This is allowed for flexibility and less explicit usage of FoLiA in limited settings, but not recommended!

I think this to lax, and set names should be mandatory unconditionally.
For instance: We run into trouble when a module would like to add another token-annotation.
per definition there is no default set anymore then, but it is rather complicated or impossible to assign a set to the already existing tokens, to distinguish those from the newly added ones.

afik, these nameless declaration are quite rare, probably only in testfiles???
We could investigate this, but NOT allowing this is important.

@proycon
Copy link
Owner

proycon commented Sep 4, 2018

Agreed that this is too lax and we should probably remove this behaviour. It is used in practise though so it would be a change from FoLiA v1.6 forward then as we can't demand this from older versions due to backward compatibility. On a related note: I thikn we should also be strict in demanding declarations for structural elements (token, paragraph, sentence), these are now optional, but if you really want the declarations to be meaningful they'd better be strict too.

For the lazy users we can provide a tool that automatically generates some ad-hoc declarations.

@proycon proycon added this to the v1.6 milestone Sep 4, 2018
@kosloot
Copy link
Collaborator Author

kosloot commented Sep 4, 2018

yes, let's do this:

  • down't allow missing setnames for >= 1.6
  • provide a simple upgrade script to 1.6 (might also include test fixing ans some other goodies...)

I am still a bit reluctant towards making declarations mandatory. But in the long run it might be needed. So start requiring this too.
A conversion script adding some default declarations might be more difficult though .

@proycon
Copy link
Owner

proycon commented Sep 5, 2018

Just for clarity: of course the libraries should still remain capable of parsing pre-1.6 documents (with the missing setnames and all). The upgrade script is not a replacement for that but just an additional tool.

@proycon proycon added to do staged to be worked on and removed bug labels Sep 5, 2018
@proycon
Copy link
Owner

proycon commented Sep 13, 2018

We do have something extra to consider; for certain annotation types set is optional (this applies to a lot of structure elemnets), or in rare cases not present at all perhaps even. In such cases a declaration without set is permitted.

I also want to enforce that if there is a set, then there must be a class on the annotations (and obviously if there is no set, there can't be a class on annotations).

@proycon proycon added in progress and removed to do staged to be worked on labels Sep 14, 2018
@proycon
Copy link
Owner

proycon commented Feb 13, 2019

In summary, for FoLiA 2.0:

  • Certain declarations may be set-less
    • this is indicated in the new documentation
    • this is determined by whether class is a required property for an annotation type or not, if it is then it can never be setless.
  • Annotations that assign classes must always have a set
  • There is no "undefined" set anymore that may get assigned automatically, but
  • All annotation types (including structural ones and text itself) need to be declared (the FoLiApy library can do this automatically to a certain extent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ready Implemented but not released yet
Projects
None yet
Development

No branches or pull requests

2 participants