New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curation does not automatically take stacked annotations #1893
Comments
Why does one of your annotators create multiple annotations of the same type with the same labels at the same location? |
In our domain, it is reasonable that a sentence can have more than one annotation with the custom Layer. For example, the Layer could be "Sentence_Topic" and the String Tagset "Topic_Tagset" could be the defined as "Technology, Medicine, Space, Physics, Geology" The example sentence "Scientists are using computational modeling techniques to investigate molecular interactions for medical research" would be annotated with "Technology" and "Medicine". In these cases, the curator must click on these annotations during the curation from the annotator, they will be not automatically put on top, even if there was agreement, or even if there was only one annotator for the project. There is also no indication via the red markers on the left that something in this sentence requires attention during curation, which could make curation error prone in this case. |
The merge code calculates a "Diff" between the various annotators. This "Diff" consists of "ConfigurationSets" representing the annotations at a certain logical position. Within the "ConfigurationSet", we have a number of "Configurations", each essentially representing a particular feature value (or combination of feature values if the annotation has more than one feature). Currently, if a user make multiple annotations at the same logical position, then the system interprets that as the annotator being unsure (not agreeing with him/herself) and thus completely discarding that position from its considerations. The "Configuration" even goes so far as to only retain information about a single feature value combination from each annotator. I believe what you'd be asking for would entail that we retain all the feature value combinations and in case a particular feature combination has been provided by all annotators, then consider it agreement and pre-merge it. I think we had tried something like this way back, but at least back then abandoned it because it ended up being quite complicated (webanno/webanno#21)... Also, I believe things may become tricky if the spans are involved in relations or act as slot fillers. Might be worth giving it another try though... There are also two alternatives that could be considered:
|
Thanks, your suggestion of defining a distinct layer for each single annotation makes sense, even if it is not as convenient for rapid annotations, because the annotators would need to switch layers all the time. For a new project, this might be the way to do for us, for the current one, it is too late to change the setup. If the merge code is too hard to adjust, I think it would already greatly help if during curation, the program would indicate, via the red markers on the left, if an annotator annotated multiple annotations at the same logical position. That would give a cue to the curator to look at this particular place. As it is now, it is too easy for a curator to completely miss annotations, because stacked annotations at the same logical position are disregarded as you wrote, and there is no indication that this has happened unless the curator scrolls through every sentence and checks for this, which is cumbersome for a larger corpus. Such an update would help already. |
…lations - Delegate deletion of attached relations properly to the respective adapter - Added / expanded checks and repairs to test for this kind of dangling relations
…lations - Fix count of deleted annotations
…which-relations-connect-leaves-dangling-relations #1893 - Deleting a span to which relations connect leaves dangling relations
…-broken * master: #2077 - Unable to merge via curation siderbar if username contains "." #1895 - Merging relations and slots is broken #2077 - Unable to merge via curation siderbar if username contains "." No issue. Depend on snapshot version of WebAnno. #1893 - Deleting a span to which relations connect leaves dangling relations #2075 - Deleting a span to which relations connect leaves dangling relations #1893 - Deleting a span to which relations connect leaves dangling relations #2075 - Deleting a span to which relations connect leaves dangling relations
Thank you so much for the notice on this! stacked_tags_proj15278849064284370577.zip I used threshold merge but I must not be setting things correctly? |
Thank you so much for the guidance! I am going to try more complex cases next and will come back here to post a comment again to report any findings within three days. |
I have written a bit of documentation here: https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/view/INCEpTION/job/INCEpTION%20main/de.tudarmstadt.ukp.inception.app$inception-app-webapp/doclinks/1/#_merge_incomplete_agreeing_non_stacked_annotations Note in particular the current section on the threshold based strategy:
The tricky thing I think is dealing with the ties. If you say "top results 3", does it mean you get only 3 different labels merged? What if there is a tied between the top 4 or 5 labels? Do we include them or exclude them? What if we say top-results 3 and there is a tie between the first two labels? Do we exclude them but include the third one which does not have a tie? In the previous implementations where only one label was allowed to be merged, ties were always not merged. I have tried preserving that when top-votes is set to 1 but go for a laxer handling of ties with top-votes is 2 or greater. |
Well, first, lets consider that there is no link - how can you tell which lorem tag is which? The algorithm sees that both users have assigned lorem at least once - it conflates these since no difference between these spans can be seen and merges them up. As a second step, it looks at the lots. It finds the merged lorem tag and the targets to it. That said, I'm surprised that one of the "lorem" tags in each of the users is black? I assume they are all from the same layer? Black is normally not a color that should be assigned, so something fishy seems to be happening here... |
Same tag being stacked is really complex, perhaps as long as a pair of parent tag, link/relation, child tag are matched among annotators, it could be merged to curation, however, this can be ignored for now. And I've attached here the exported project if you are curious of seeing what might be happening further. |
Describe the bug
I created a custom Layer with sentence level granularity and overlap "Stacking only". I also created a string Tagset and added it as Feature to this custom Layer.
During curation, sentences annotated with a single annotation get automatically put into the top most curation space as suggestions if there is agreement, which is nice.
However, if a sentence is annotated with two or more annotations, no annotation will be put automatically to the top curation selection, even if there is no disagreement between annotators.
I tested this with a project and a single annotator, and even in this case, this behavior above can be seen.
Could this be addressed, so that the stacked annotations, that the annotators agree on, will be automatically put to the curation space above?
Expected behavior
Stacked annotations, that the annotators agree on, should be automatically put to the curation space above.
Please complete the following information:
I tested this on version 0.17.2
The text was updated successfully, but these errors were encountered: