Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving annotation conflicts #27

Open
hiroshinoji opened this issue Feb 16, 2016 · 3 comments
Open

Resolving annotation conflicts #27

hiroshinoji opened this issue Feb 16, 2016 · 3 comments

Comments

@hiroshinoji
Copy link
Contributor

Currently, if we apply two annotators which annotate the same element, both are added to the result. Stanford CoreNLP instead overrides the old annotation. Following this, I implemented a method that checks whether there already exist the same elements when adding XML elements. Such duplicate occurs, e.g., when running a joint parser of POS and tree after applying POS tagger.

I plan to push this modification but I was also wondering this overriding method is the best way to resolve conflicts. Maybe it's better also to output some warnings, but this may be future work.

@hiroshinoji
Copy link
Contributor Author

Now CabochaAnnotator replaces the old annotations (chunks and dependencies) if exist.
323a3b0#diff-9b2b4b9eb3146599a3ce60c12afa4ddeR46

@hiroshinoji
Copy link
Contributor Author

Another option:

  • add an option to remain the old annotations;
  • distinguish different annotations with the same tag using attribute.

Anyway, each annotation should have an attribute recording the used annotator, e.g.:

<tokens annotators="juman">...</tokens>
<tokens annotators="knp">...</tokens>

@hiroshinoji
Copy link
Contributor Author

I've changed this behavior of cabocha in d17b751 to remain the old annotation, because now annotator name (cabocha) is recorded on every element.

It may be better to support some option to decide whether leaving or replacing the old annotation as in -knp.replaceJumanTokens.

Generally, remaining the same type of annotations with different annotators seems to make the lower-level processing a bit complicated, so the default behavior might be better to replace the old annotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant