Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offsets incorrectly displaying in annotation #1008

Closed
Lolologist opened this issue Jun 2, 2021 · 22 comments
Closed

Offsets incorrectly displaying in annotation #1008

Lolologist opened this issue Jun 2, 2021 · 22 comments
Assignees
Labels
editor Label Studio Frontend ner problem bug or something isn't working

Comments

@Lolologist
Copy link

image

image

In the attached pictures you can see the proper text for "Ecuador" as an entity, and the annotation tool showing it incorrectly.
My first guess is that it is related to the emoji being used, but if so, it isn't all emoji:
image
As you can see in the third image, offsets continue correctly in that particular case after being exposed to some emoji.

@Lolologist
Copy link
Author

the text before the incorrect Ecuador:
"Network data from the NetBlocks internet observatory show that mobile network operator Claro has suffered severe outages beginning 20:00 UTC (3 p.m. local time), Saturday 12 October 2019, lasting for several hours. Confirmed: Severe disruption to #Claro mobile internet service across #Quito from ~20:00 UTC (3 p.m. local time) as #Ecuador crisis escalates 📉📰 https://t.co/2ExbFY54zq pic.twitter.com/6P13WjAkuW— NetBlocks (@NetBlocks) October 13, 2019The outages have been widely reported by customers and come as the Army has been deployed in "
I checked manually in the source for the task and the offsets are still correct there. I expect this is an issue with the visualization part only.

@Lolologist
Copy link
Author

Manually removing the incorrect offset and adding what appears in the annotator as "Ecuador" (see picture) results in incorrect offsets, now off by 2 (as might be expected):
image

@Lolologist
Copy link
Author

image
offset for "crisis escalates" is correct, https should be 360, not 362.
It seems these specific emoji are being counted as two characters long.
Both of the two emoji here were introduced as part of Unicode 6.0 and documented as part of Emoji 1.0, so they aren't recent.

@makseq
Copy link
Member

makseq commented Jun 2, 2021

Yes, these problems usually happen because of emoji and special characters.

@lluissalord
Copy link

Is there any fix in process to correctly count the characters of emoji and special characters to not have this issue of offsets?

@makseq
Copy link
Member

makseq commented Sep 20, 2021

@lluissalord what LS version do you have now?

@lluissalord
Copy link

@makseq I am using version 1.3

An example of what is happening to me is below, where "Jr" should be labelled as SENIORITY. The current result for this case is:

{
          "value": {
            "labels": [
              "SENIORITY"
            ],
            "start": 21,
            "end": 23,
            "text": "Jr"
          },
          "id": "45",
          "from_name": "label",
          "to_name": "text",
          "type": "labels"
}

image

Hence, we can see that the value on the result is correct, however the visualization on Label Studio is not correct.

Besides, if I try to label it correctly, now the "start" and "end" does not match to what it should counting the characters:

{
          "value": {
            "start": 23,
            "end": 25,
            "text": "Jr",
            "labels": [
              "SENIORITY"
            ]
          },
          "id": "9StoKHXy__",
          "from_name": "label",
          "to_name": "text",
          "type": "labels"
}

@nicholasrq
Copy link
Member

@lluissalord could you please share a sample text, your labeling config and a full result that's displaying incorrectly? it'd be super helpful for debugging the issue

@lluissalord
Copy link

Sample text: "👋🏽 Hola, soy Roberto Jr"

Labelling config:

<View>
  <Labels name="label" toName="text">
    <Label value="DATE"/>
    <Label value="LANGUAGE"/>
    <Label value="SENIORITY"/>
  </Labels>
  <Text name="text" value="$text"/>
</View>

Result:

"result": [
        {
          "value": {
            "labels": [
              "SENIORITY"
            ],
            "start": 21,
            "end": 23,
            "text": "Jr"
          },
          "id": "0",
          "from_name": "label",
          "to_name": "text",
          "type": "labels"
        }
      ],
      "was_cancelled": false,
      "ground_truth": false,
      "created_at": "2021-09-21T10:27:33.810925Z",
      "updated_at": "2021-09-21T10:27:33.810959Z",
      "lead_time": null,
      "task": 20075
    }
  ]

@nicholasrq
Copy link
Member

thank you! i'll get back with further investigation

@nicholasrq
Copy link
Member

@lluissalord after checking your example we found out that this is an emoji issue. we're experiencing problems with calculating length of composite emojis that contain more than one unicode character. currently we're working on a fix that will be released during LS 1.3 lifecycle

@hlomzik
Copy link
Collaborator

hlomzik commented Oct 3, 2021

Hi all! @Lolologist @lluissalord
I fixed this some time ago, but forgot to update LS with this fix. Could you please test this PR — #1559? My manual and automatic tests are fine on it, so just want to check if it fixes your problems.
Thanks!

@lluissalord
Copy link

Hi @hlomzik I can confirm that the PR fixed my use case. Thank you!!

@makseq makseq closed this as completed Oct 8, 2021
@makseq makseq added problem bug or something isn't working editor Label Studio Frontend ner labels Oct 8, 2021
@lluissalord
Copy link

Hi @nicholasrq @makseq Do we know when will be PR from @hlomzik (#1559) be included on master?

@makseq
Copy link
Member

makseq commented Oct 22, 2021

@lluissalord Hey, I think it's already in the master branch of LS.

@lluissalord
Copy link

I tested on the new version 1.3.0post1 and it did not work so I supposed it was not in master.

@makseq
Copy link
Member

makseq commented Oct 26, 2021

@lluissalord Sorry, I was in a hurry. Could you check it again from the master?

@lluissalord
Copy link

lluissalord commented Oct 26, 2021

@makseq It is working as expected on master. Thank you! 😄

@JulesBelveze
Copy link

Currently having the same problem with v1.4, any idea when this fix is expected to be released?

@makseq
Copy link
Member

makseq commented Jan 18, 2022

@JulesBelveze Are you on master branch from LS github repository?

@JulesBelveze
Copy link

Nop, I'm on v1.4.0. I was just wondering if there's a patch release expected soon

@makseq
Copy link
Member

makseq commented Jan 20, 2022

Yes, we are going to release 1.4.1 and include this patch too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editor Label Studio Frontend ner problem bug or something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants