Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge highlighted text when neighbors #2767

Merged
merged 13 commits into from
Dec 10, 2022

Conversation

payoto
Copy link
Contributor

@payoto payoto commented Dec 7, 2022

In the previous implementation dictionary inputs were not correctly
processed meaning: a None token was added on the empty string between
each entity. That None entry is removed letting neighbouring entities
be merged.

Description

Please include:

  • relevant motivation
  • a summary of the change
  • which issue is fixed.
  • any additional dependencies that are required for this change.

Closes: # (issue)

Checklist:

  • I have performed a self-review of my own code
  • I have added a short summary of my change to the CHANGELOG.md
  • My code follows the style guidelines of this project
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

A note about the CHANGELOG

Hello 👋 and thank you for contributing to Gradio!

All pull requests must update the change log located in CHANGELOG.md, unless the pull request is labeled with the "no-changelog-update" label.

Please add a brief summary of the change to the Upcoming Release > Full Changelog section of the CHANGELOG.md file and include
a link to the PR (formatted in markdown) and a link to your github profile (if you like). For example, "* Added a cool new feature by [@myusername](link-to-your-github-profile) in [PR 11111](https://github.com/gradio-app/gradio/pull/11111)".

If you would like to elaborate on your change further, feel free to include a longer explanation in the other sections.
If you would like an image/gif/video showcasing your feature, it may be best to edit the CHANGELOG file using the
GitHub web UI since that lets you upload files directly via drag-and-drop.

In the previous implementation dictionary inputs were not correctly
processed meaning: a None token was added on the empty string between
each entity. That None entry is removed letting neighbouring entities
be merged.
@payoto payoto marked this pull request as ready for review December 7, 2022 00:48
@@ -1449,6 +1449,16 @@ def test_postprocess(self):
result_ = component.postprocess({"text": text, "entities": entities})
assert result == result_

# Test split entity is merged
text = "Wolfgang lives in Berlin"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @payoto !

This test is no longer passing with your change. Can you fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can, FYI the fix I can see would be to edit the reference result which includes some of the ('', None), is that an ok fix?

Entries are still only merged when the user passes the specific argument.

This change does include a change in the behaviour of the postprocess method. An alternative way to fix this bug which would not change the default behavior would be to ignore entries which are ('', None) when merging neighbors. Which solution do you prefer?

Personally I think the first one is the way to go but you certainly have more context as to how these features might be used than I do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok in the end I went with the change that creates least changes of behaviour, so I do the merge by not considering the entries which say ('', None) in the merge strategy

@@ -3257,7 +3257,8 @@ def postprocess(
index = 0
entities = sorted(entities, key=lambda x: x["start"])
for entity in entities:
list_format.append((text[index : entity["start"]], None))
if index != entity["start"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear - users would still need to set combine_adjacent=True right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct

@abidlabs
Copy link
Member

abidlabs commented Dec 9, 2022

Thanks so much for the PR @payoto! So that we can test it out, can you provide an example of an example that was being processed incorrectly previously?

@payoto
Copy link
Contributor Author

payoto commented Dec 9, 2022

Merging on dictionary inputs would simply not work:

from transformers import pipeline
import gradio as gr

nlp = pipeline("ner", model="dslim/bert-base-NER")
example = "My name is Sylvain and I live in Berlin"

ner_results = nlp(example)
print(ner_results)

gr.Interface(
    lambda p: dict(text=p, entities=nlp(p)),
    inputs=gr.Textbox(),
    outputs=gr.HighlightedText(combine_adjacent=True,),
    examples=[example],
).launch()

Raw input

[{'entity': 'B-PER', 'score': 0.9987564, 'index': 4, 'word': 'S', 'start': 11, 'end': 12}, {'entity': 'B-PER', 'score': 0.93149716, 'index': 5, 'word': '##yl', 'start': 12, 'end': 14}, {'entity': 'B-PER', 'score': 0.79625636, 'index': 6, 'word': '##va', 'start': 14, 'end': 16}, {'entity': 'B-PER', 'score': 0.8565875, 'index': 7, 'word': '##in', 'start': 16, 'end': 18}, {'entity': 'B-LOC', 'score': 0.9995517, 'index': 12, 'word': 'Berlin', 'start': 33, 'end': 39}]

Before the patch:

image

After the patch:

image

test/test_components.py Outdated Show resolved Hide resolved
test/test_components.py Outdated Show resolved Hide resolved
gradio/components.py Outdated Show resolved Hide resolved
@abidlabs
Copy link
Member

abidlabs commented Dec 9, 2022

Thanks for making the PR @payoto! Was thinking through this PR and while this does fix the bug, it treats the None label differently than the labels for the other entities. It would be better if we could treat all of the entities the same, and I think we can do that by changing the condition from elif text == "" and category is None: to elif text == "" (or the more Pythonic elif not text).

Let me know what you think of the suggestion. I also added some suggestions for the tests.

If this looks good, please go ahead and make the suggestions, and happy to review again. cc @freddyaboulton

payoto and others added 2 commits December 9, 2022 09:53
Co-authored-by: Abubakar Abid <abubakar@huggingface.co>
Copy link
Collaborator

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @payoto ! Thank you for addressing the changes and I can confirm it now properly merges adjacent entries. Please fix the linting bash scripts/format_backend.sh and I will merge 🚀

This branch

image

Main

image

@abidlabs
Copy link
Member

Merging, thank you so much @payoto for this fix.

@abidlabs abidlabs merged commit c70f8fe into gradio-app:main Dec 10, 2022
@payoto payoto deleted the fix-highlighted-text-merging branch December 10, 2022 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants