Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curation does not automatically take stacked annotations #1893

Open
toltoxgh opened this issue Dec 16, 2020 · 24 comments
Open

Curation does not automatically take stacked annotations #1893

toltoxgh opened this issue Dec 16, 2020 · 24 comments
Assignees
Labels
🐛Bug Something isn't working Module: Curation
Projects

Comments

@toltoxgh
Copy link

Describe the bug
I created a custom Layer with sentence level granularity and overlap "Stacking only". I also created a string Tagset and added it as Feature to this custom Layer.

During curation, sentences annotated with a single annotation get automatically put into the top most curation space as suggestions if there is agreement, which is nice.

However, if a sentence is annotated with two or more annotations, no annotation will be put automatically to the top curation selection, even if there is no disagreement between annotators.

I tested this with a project and a single annotator, and even in this case, this behavior above can be seen.

Could this be addressed, so that the stacked annotations, that the annotators agree on, will be automatically put to the curation space above?

Expected behavior
Stacked annotations, that the annotators agree on, should be automatically put to the curation space above.

Please complete the following information:
I tested this on version 0.17.2

@reckart
Copy link
Member

reckart commented Dec 16, 2020

Why does one of your annotators create multiple annotations of the same type with the same labels at the same location?

@toltoxgh
Copy link
Author

In our domain, it is reasonable that a sentence can have more than one annotation with the custom Layer.

For example, the Layer could be "Sentence_Topic" and the String Tagset "Topic_Tagset" could be the defined as "Technology, Medicine, Space, Physics, Geology"

The example sentence "Scientists are using computational modeling techniques to investigate molecular interactions for medical research" would be annotated with "Technology" and "Medicine".

In these cases, the curator must click on these annotations during the curation from the annotator, they will be not automatically put on top, even if there was agreement, or even if there was only one annotator for the project. There is also no indication via the red markers on the left that something in this sentence requires attention during curation, which could make curation error prone in this case.

@reckart
Copy link
Member

reckart commented Dec 17, 2020

The merge code calculates a "Diff" between the various annotators. This "Diff" consists of "ConfigurationSets" representing the annotations at a certain logical position. Within the "ConfigurationSet", we have a number of "Configurations", each essentially representing a particular feature value (or combination of feature values if the annotation has more than one feature).

Currently, if a user make multiple annotations at the same logical position, then the system interprets that as the annotator being unsure (not agreeing with him/herself) and thus completely discarding that position from its considerations. The "Configuration" even goes so far as to only retain information about a single feature value combination from each annotator.

I believe what you'd be asking for would entail that we retain all the feature value combinations and in case a particular feature combination has been provided by all annotators, then consider it agreement and pre-merge it.

I think we had tried something like this way back, but at least back then abandoned it because it ended up being quite complicated (webanno/webanno#21)...

Also, I believe things may become tricky if the spans are involved in relations or act as slot fillers.

Might be worth giving it another try though...

There are also two alternatives that could be considered:

  • implementing multi-valued features: instead of having multiple annotations with different labels, have one annotation with multiple labels - also not something to implement in an afternoon though...
  • having a different layer type for each kind, e.g. a Technology layer and a Medicine layer - that is a bit more inconvenient on the annotation page because it means often switching between layers - but it should work out of the box as desired in terms of curation.

@toltoxgh
Copy link
Author

toltoxgh commented Jan 4, 2021

Thanks, your suggestion of defining a distinct layer for each single annotation makes sense, even if it is not as convenient for rapid annotations, because the annotators would need to switch layers all the time.

For a new project, this might be the way to do for us, for the current one, it is too late to change the setup.

If the merge code is too hard to adjust, I think it would already greatly help if during curation, the program would indicate, via the red markers on the left, if an annotator annotated multiple annotations at the same logical position.

That would give a cue to the curator to look at this particular place.

As it is now, it is too easy for a curator to completely miss annotations, because stacked annotations at the same logical position are disregarded as you wrote, and there is no indication that this has happened unless the curator scrolls through every sentence and checks for this, which is cumbersome for a larger corpus. Such an update would help already.

@reckart reckart self-assigned this Jan 25, 2021
@reckart reckart added Module: Curation 🐛Bug Something isn't working labels Jan 25, 2021
@reckart reckart added this to the Bug backlog milestone Jan 25, 2021
@reckart reckart added this to To do in Kanban via automation Jan 25, 2021
reckart added a commit that referenced this issue Mar 12, 2021
…lations

- Delegate deletion of attached relations properly to the respective adapter
- Added / expanded checks and repairs to test for this kind of dangling relations
reckart added a commit that referenced this issue Mar 12, 2021
…lations

- Fix count of deleted annotations
reckart added a commit that referenced this issue Mar 12, 2021
…which-relations-connect-leaves-dangling-relations

#1893 - Deleting a span to which relations connect leaves dangling relations
reckart added a commit that referenced this issue Mar 12, 2021
…-broken

* master:
  #2077 - Unable to merge via curation siderbar if username contains "."
  #1895 - Merging relations and slots is broken
  #2077 - Unable to merge via curation siderbar if username contains "."
  No issue. Depend on snapshot version of WebAnno.
  #1893 - Deleting a span to which relations connect leaves dangling relations
  #2075 - Deleting a span to which relations connect leaves dangling relations
  #1893 - Deleting a span to which relations connect leaves dangling relations
  #2075 - Deleting a span to which relations connect leaves dangling relations
@reckart reckart modified the milestones: 🦟 Bug backlog, 0.19.1 Mar 14, 2021
@reckart reckart modified the milestones: 0.19.1, 0.19.2 Mar 23, 2021
@reckart reckart modified the milestones: 0.19.2, 0.19.3 Apr 6, 2021
@reckart reckart modified the milestones: 0.19.3, 0.20.1 Apr 18, 2021
@reckart reckart modified the milestones: 0.19.4, 0.19.5 May 23, 2021
@reckart reckart modified the milestones: 0.19.5, 0.19.6 Jun 5, 2021
@reckart reckart moved this from 🔖 To do to 📥 Inbox in Kanban Jun 5, 2021
@reckart reckart modified the milestones: 0.19.6, 0.19.7, 0.19.8 Jun 8, 2021
@reckart reckart closed this as completed Apr 17, 2022
Kanban automation moved this from 🔖 To do to 🍹 Done Apr 17, 2022
@GiantEnemyCrab
Copy link
Contributor

@reckart

Thank you so much for the notice on this!
I have tested and I need help on how to make this work, because when I "re-merge", I am still not getting the agreed stacked tags from user1 and user2 in the simple project that is exported:

stacked_tags_proj15278849064284370577.zip

I used threshold merge but I must not be setting things correctly?

@reckart reckart reopened this Apr 19, 2022
Kanban automation moved this from 🍹 Done to 🏃‍♀️ In progress Apr 19, 2022
@reckart
Copy link
Member

reckart commented Apr 19, 2022

What settings fail for you?

These seem to work for me on your project:

Screenshot 2022-04-19 at 11 48 44

Screenshot 2022-04-19 at 11 49 08

@GiantEnemyCrab
Copy link
Contributor

Thank you so much for the guidance!
I was confused on Top-voted and I was setting that to 1.
After setting that to 2, it worked!

I am going to try more complex cases next and will come back here to post a comment again to report any findings within three days.

@reckart
Copy link
Member

reckart commented Apr 21, 2022

I have written a bit of documentation here: https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/view/INCEpTION/job/INCEpTION%20main/de.tudarmstadt.ukp.inception.app$inception-app-webapp/doclinks/1/#_merge_incomplete_agreeing_non_stacked_annotations

Note in particular the current section on the threshold based strategy:

Top-voted: when set to 1, only the single best label is pre-merged. If there is a tie on the best label, then nothing is merged. When set to 2 or higher, the respective n best labels are pre-merged. If there is any tie within the n best labels, then all labels that still meet the lowest score of the tie are merged as well. For example, if set to 2 and three annotators voted for label X and another two anotators voted for Y and Z respectively, then Y and Z have a tie at the second rank, so both of them are merged. Note that this setting only affects annotations on layers that allow stacking annotations. For other layers, an implicit setting of 1 is used here.

The tricky thing I think is dealing with the ties. If you say "top results 3", does it mean you get only 3 different labels merged? What if there is a tied between the top 4 or 5 labels? Do we include them or exclude them? What if we say top-results 3 and there is a tie between the first two labels? Do we exclude them but include the third one which does not have a tie? In the previous implementations where only one label was allowed to be merged, ties were always not merged. I have tried preserving that when top-votes is set to 1 but go for a laxer handling of ties with top-votes is 2 or greater.

@GiantEnemyCrab
Copy link
Contributor

I think both approaches are fine and it depends on the use case. It might be yet another config but something like a checkbox for "allow top N ties to be merged". If that's not checked, then ties won't be allowed, and only specified N labels will merge, either randomly or earlier alphabetically ordered label name is selected.

By the way, what do you think of a case like the screenshot below:
image

Two lorem tags are stacked on the same span with same slot links between user1 and user2. But in curation, it is merged as a single lorem tag. This might be a complex case.

@reckart
Copy link
Member

reckart commented Apr 22, 2022

Well, first, lets consider that there is no link - how can you tell which lorem tag is which? The algorithm sees that both users have assigned lorem at least once - it conflates these since no difference between these spans can be seen and merges them up.

As a second step, it looks at the lots. It finds the merged lorem tag and the targets to it.

That said, I'm surprised that one of the "lorem" tags in each of the users is black? I assume they are all from the same layer? Black is normally not a color that should be assigned, so something fishy seems to be happening here...

@GiantEnemyCrab
Copy link
Contributor

Same tag being stacked is really complex, perhaps as long as a pair of parent tag, link/relation, child tag are matched among annotators, it could be merged to curation, however, this can be ignored for now.

And I've attached here the exported project if you are curious of seeing what might be happening further.
same_tag_stacked_example2715221680151684197.zip

@reckart reckart modified the milestones: 24.0, 24.1 Jun 20, 2022
@reckart reckart moved this from 🏃‍♀️ In progress to 🔖 To do in Kanban Jul 9, 2022
@reckart reckart modified the milestones: 24.1, 24.2, 24.3 Aug 22, 2022
@reckart reckart modified the milestones: 24.3, 24.4 Sep 10, 2022
@reckart reckart modified the milestones: 24.4, v25.1 Sep 30, 2022
@reckart reckart modified the milestones: 25.1, 25.2 Oct 11, 2022
@reckart reckart modified the milestones: 25.2, 25.3 Oct 18, 2022
@reckart reckart modified the milestones: 25.3, 25.4, 🦟 Bug backlog Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛Bug Something isn't working Module: Curation
Projects
Kanban
  
🔖 To do
Development

No branches or pull requests

3 participants