Curation does not automatically take stacked annotations #1893

toltoxgh · 2020-12-16T01:55:09Z

Describe the bug
I created a custom Layer with sentence level granularity and overlap "Stacking only". I also created a string Tagset and added it as Feature to this custom Layer.

During curation, sentences annotated with a single annotation get automatically put into the top most curation space as suggestions if there is agreement, which is nice.

However, if a sentence is annotated with two or more annotations, no annotation will be put automatically to the top curation selection, even if there is no disagreement between annotators.

I tested this with a project and a single annotator, and even in this case, this behavior above can be seen.

Could this be addressed, so that the stacked annotations, that the annotators agree on, will be automatically put to the curation space above?

Expected behavior
Stacked annotations, that the annotators agree on, should be automatically put to the curation space above.

Please complete the following information:
I tested this on version 0.17.2

reckart · 2020-12-16T07:39:21Z

Why does one of your annotators create multiple annotations of the same type with the same labels at the same location?

toltoxgh · 2020-12-17T01:00:28Z

In our domain, it is reasonable that a sentence can have more than one annotation with the custom Layer.

For example, the Layer could be "Sentence_Topic" and the String Tagset "Topic_Tagset" could be the defined as "Technology, Medicine, Space, Physics, Geology"

The example sentence "Scientists are using computational modeling techniques to investigate molecular interactions for medical research" would be annotated with "Technology" and "Medicine".

In these cases, the curator must click on these annotations during the curation from the annotator, they will be not automatically put on top, even if there was agreement, or even if there was only one annotator for the project. There is also no indication via the red markers on the left that something in this sentence requires attention during curation, which could make curation error prone in this case.

reckart · 2020-12-17T21:31:17Z

The merge code calculates a "Diff" between the various annotators. This "Diff" consists of "ConfigurationSets" representing the annotations at a certain logical position. Within the "ConfigurationSet", we have a number of "Configurations", each essentially representing a particular feature value (or combination of feature values if the annotation has more than one feature).

Currently, if a user make multiple annotations at the same logical position, then the system interprets that as the annotator being unsure (not agreeing with him/herself) and thus completely discarding that position from its considerations. The "Configuration" even goes so far as to only retain information about a single feature value combination from each annotator.

I believe what you'd be asking for would entail that we retain all the feature value combinations and in case a particular feature combination has been provided by all annotators, then consider it agreement and pre-merge it.

I think we had tried something like this way back, but at least back then abandoned it because it ended up being quite complicated (webanno/webanno#21)...

Also, I believe things may become tricky if the spans are involved in relations or act as slot fillers.

Might be worth giving it another try though...

There are also two alternatives that could be considered:

implementing multi-valued features: instead of having multiple annotations with different labels, have one annotation with multiple labels - also not something to implement in an afternoon though...
having a different layer type for each kind, e.g. a Technology layer and a Medicine layer - that is a bit more inconvenient on the annotation page because it means often switching between layers - but it should work out of the box as desired in terms of curation.

toltoxgh · 2021-01-04T23:13:35Z

Thanks, your suggestion of defining a distinct layer for each single annotation makes sense, even if it is not as convenient for rapid annotations, because the annotators would need to switch layers all the time.

For a new project, this might be the way to do for us, for the current one, it is too late to change the setup.

If the merge code is too hard to adjust, I think it would already greatly help if during curation, the program would indicate, via the red markers on the left, if an annotator annotated multiple annotations at the same logical position.

That would give a cue to the curator to look at this particular place.

As it is now, it is too easy for a curator to completely miss annotations, because stacked annotations at the same logical position are disregarded as you wrote, and there is no indication that this has happened unless the curator scrolls through every sentence and checks for this, which is cumbersome for a larger corpus. Such an update would help already.

…lations - Delegate deletion of attached relations properly to the respective adapter - Added / expanded checks and repairs to test for this kind of dangling relations

…lations - Fix count of deleted annotations

…which-relations-connect-leaves-dangling-relations #1893 - Deleting a span to which relations connect leaves dangling relations

…-broken * master: #2077 - Unable to merge via curation siderbar if username contains "." #1895 - Merging relations and slots is broken #2077 - Unable to merge via curation siderbar if username contains "." No issue. Depend on snapshot version of WebAnno. #1893 - Deleting a span to which relations connect leaves dangling relations #2075 - Deleting a span to which relations connect leaves dangling relations #1893 - Deleting a span to which relations connect leaves dangling relations #2075 - Deleting a span to which relations connect leaves dangling relations

GiantEnemyCrab · 2022-04-19T00:40:18Z

@reckart

Thank you so much for the notice on this!
I have tested and I need help on how to make this work, because when I "re-merge", I am still not getting the agreed stacked tags from user1 and user2 in the simple project that is exported:

stacked_tags_proj15278849064284370577.zip

I used threshold merge but I must not be setting things correctly?

reckart · 2022-04-19T09:50:24Z

What settings fail for you?

These seem to work for me on your project:

GiantEnemyCrab · 2022-04-20T04:46:22Z

Thank you so much for the guidance!
I was confused on Top-voted and I was setting that to 1.
After setting that to 2, it worked!

I am going to try more complex cases next and will come back here to post a comment again to report any findings within three days.

reckart · 2022-04-21T08:13:07Z

I have written a bit of documentation here: https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/view/INCEpTION/job/INCEpTION%20main/de.tudarmstadt.ukp.inception.app$inception-app-webapp/doclinks/1/#_merge_incomplete_agreeing_non_stacked_annotations

Note in particular the current section on the threshold based strategy:

Top-voted: when set to 1, only the single best label is pre-merged. If there is a tie on the best label, then nothing is merged. When set to 2 or higher, the respective n best labels are pre-merged. If there is any tie within the n best labels, then all labels that still meet the lowest score of the tie are merged as well. For example, if set to 2 and three annotators voted for label X and another two anotators voted for Y and Z respectively, then Y and Z have a tie at the second rank, so both of them are merged. Note that this setting only affects annotations on layers that allow stacking annotations. For other layers, an implicit setting of 1 is used here.

The tricky thing I think is dealing with the ties. If you say "top results 3", does it mean you get only 3 different labels merged? What if there is a tied between the top 4 or 5 labels? Do we include them or exclude them? What if we say top-results 3 and there is a tie between the first two labels? Do we exclude them but include the third one which does not have a tie? In the previous implementations where only one label was allowed to be merged, ties were always not merged. I have tried preserving that when top-votes is set to 1 but go for a laxer handling of ties with top-votes is 2 or greater.

GiantEnemyCrab · 2022-04-22T03:44:25Z

I think both approaches are fine and it depends on the use case. It might be yet another config but something like a checkbox for "allow top N ties to be merged". If that's not checked, then ties won't be allowed, and only specified N labels will merge, either randomly or earlier alphabetically ordered label name is selected.

By the way, what do you think of a case like the screenshot below:

Two lorem tags are stacked on the same span with same slot links between user1 and user2. But in curation, it is merged as a single lorem tag. This might be a complex case.

reckart · 2022-04-22T08:43:22Z

Well, first, lets consider that there is no link - how can you tell which lorem tag is which? The algorithm sees that both users have assigned lorem at least once - it conflates these since no difference between these spans can be seen and merges them up.

As a second step, it looks at the lots. It finds the merged lorem tag and the targets to it.

That said, I'm surprised that one of the "lorem" tags in each of the users is black? I assume they are all from the same layer? Black is normally not a color that should be assigned, so something fishy seems to be happening here...

GiantEnemyCrab · 2022-04-22T13:36:49Z

Same tag being stacked is really complex, perhaps as long as a pair of parent tag, link/relation, child tag are matched among annotators, it could be merged to curation, however, this can be ignored for now.

And I've attached here the exported project if you are curious of seeing what might be happening further.
same_tag_stacked_example2715221680151684197.zip

reckart self-assigned this Jan 25, 2021

reckart added Module: Curation 🐛Bug Something isn't working labels Jan 25, 2021

reckart added this to the Bug backlog milestone Jan 25, 2021

reckart added this to To do in Kanban via automation Jan 25, 2021

reckart added a commit that referenced this issue Mar 12, 2021

#1893 - Deleting a span to which relations connect leaves dangling re…

67831ff

…lations - Fix count of deleted annotations

reckart added a commit that referenced this issue Mar 12, 2021

Merge pull request #1894 from webanno/bugfix/1893-Deleting-a-span-to-…

faab92f

…which-relations-connect-leaves-dangling-relations #1893 - Deleting a span to which relations connect leaves dangling relations

reckart modified the milestones: 🦟 Bug backlog, 0.19.1 Mar 14, 2021

reckart modified the milestones: 0.19.1, 0.19.2 Mar 23, 2021

reckart modified the milestones: 0.19.2, 0.19.3 Apr 6, 2021

reckart modified the milestones: 0.19.3, 0.20.1 Apr 18, 2021

reckart modified the milestones: 0.19.4, 0.19.5 May 23, 2021

reckart modified the milestones: 0.19.5, 0.19.6 Jun 5, 2021

reckart moved this from 🔖 To do to 📥 Inbox in Kanban Jun 5, 2021

reckart modified the milestones: 0.19.6, 0.19.7, 0.19.8 Jun 8, 2021

reckart closed this as completed Apr 17, 2022

Kanban automation moved this from 🔖 To do to 🍹 Done Apr 17, 2022

reckart reopened this Apr 19, 2022

Kanban automation moved this from 🍹 Done to 🏃‍♀️ In progress Apr 19, 2022

reckart modified the milestones: 24.0, 24.1 Jun 20, 2022

reckart moved this from 🏃‍♀️ In progress to 🔖 To do in Kanban Jul 9, 2022

reckart modified the milestones: 24.1, 24.2, 24.3 Aug 22, 2022

reckart modified the milestones: 24.3, 24.4 Sep 10, 2022

reckart modified the milestones: 24.4, v25.1 Sep 30, 2022

reckart modified the milestones: 25.1, 25.2 Oct 11, 2022

reckart modified the milestones: 25.2, 25.3 Oct 18, 2022

reckart modified the milestones: 25.3, 25.4, 🦟 Bug backlog Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curation does not automatically take stacked annotations #1893

Curation does not automatically take stacked annotations #1893

toltoxgh commented Dec 16, 2020

reckart commented Dec 16, 2020 •

edited

toltoxgh commented Dec 17, 2020

reckart commented Dec 17, 2020

toltoxgh commented Jan 4, 2021 •

edited

GiantEnemyCrab commented Apr 19, 2022

reckart commented Apr 19, 2022

GiantEnemyCrab commented Apr 20, 2022

reckart commented Apr 21, 2022

GiantEnemyCrab commented Apr 22, 2022

reckart commented Apr 22, 2022

GiantEnemyCrab commented Apr 22, 2022

Curation does not automatically take stacked annotations #1893

Curation does not automatically take stacked annotations #1893

Comments

toltoxgh commented Dec 16, 2020

reckart commented Dec 16, 2020 • edited

toltoxgh commented Dec 17, 2020

reckart commented Dec 17, 2020

toltoxgh commented Jan 4, 2021 • edited

GiantEnemyCrab commented Apr 19, 2022

reckart commented Apr 19, 2022

GiantEnemyCrab commented Apr 20, 2022

reckart commented Apr 21, 2022

GiantEnemyCrab commented Apr 22, 2022

reckart commented Apr 22, 2022

GiantEnemyCrab commented Apr 22, 2022

reckart commented Dec 16, 2020 •

edited

toltoxgh commented Jan 4, 2021 •

edited