-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge highlighted text when neighbors #2767
Merge highlighted text when neighbors #2767
Conversation
In the previous implementation dictionary inputs were not correctly processed meaning: a None token was added on the empty string between each entity. That None entry is removed letting neighbouring entities be merged.
@@ -1449,6 +1449,16 @@ def test_postprocess(self): | |||
result_ = component.postprocess({"text": text, "entities": entities}) | |||
assert result == result_ | |||
|
|||
# Test split entity is merged | |||
text = "Wolfgang lives in Berlin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @payoto !
This test is no longer passing with your change. Can you fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I can, FYI the fix I can see would be to edit the reference result which includes some of the ('', None)
, is that an ok fix?
Entries are still only merged when the user passes the specific argument.
This change does include a change in the behaviour of the postprocess method. An alternative way to fix this bug which would not change the default behavior would be to ignore entries which are ('', None)
when merging neighbors. Which solution do you prefer?
Personally I think the first one is the way to go but you certainly have more context as to how these features might be used than I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok in the end I went with the change that creates least changes of behaviour, so I do the merge by not considering the entries which say ('', None)
in the merge strategy
gradio/components.py
Outdated
@@ -3257,7 +3257,8 @@ def postprocess( | |||
index = 0 | |||
entities = sorted(entities, key=lambda x: x["start"]) | |||
for entity in entities: | |||
list_format.append((text[index : entity["start"]], None)) | |||
if index != entity["start"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear - users would still need to set combine_adjacent=True
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct
Thanks so much for the PR @payoto! So that we can test it out, can you provide an example of an example that was being processed incorrectly previously? |
Merging on dictionary inputs would simply not work: from transformers import pipeline
import gradio as gr
nlp = pipeline("ner", model="dslim/bert-base-NER")
example = "My name is Sylvain and I live in Berlin"
ner_results = nlp(example)
print(ner_results)
gr.Interface(
lambda p: dict(text=p, entities=nlp(p)),
inputs=gr.Textbox(),
outputs=gr.HighlightedText(combine_adjacent=True,),
examples=[example],
).launch() Raw input
Before the patch: After the patch: |
Thanks for making the PR @payoto! Was thinking through this PR and while this does fix the bug, it treats the Let me know what you think of the suggestion. I also added some suggestions for the tests. If this looks good, please go ahead and make the suggestions, and happy to review again. cc @freddyaboulton |
Co-authored-by: Abubakar Abid <abubakar@huggingface.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me @payoto ! Thank you for addressing the changes and I can confirm it now properly merges adjacent entries. Please fix the linting bash scripts/format_backend.sh
and I will merge 🚀
This branch
Main
Merging, thank you so much @payoto for this fix. |
In the previous implementation dictionary inputs were not correctly
processed meaning: a None token was added on the empty string between
each entity. That None entry is removed letting neighbouring entities
be merged.
Description
Please include:
Closes: # (issue)
Checklist:
A note about the CHANGELOG
Hello 👋 and thank you for contributing to Gradio!
All pull requests must update the change log located in CHANGELOG.md, unless the pull request is labeled with the "no-changelog-update" label.
Please add a brief summary of the change to the Upcoming Release > Full Changelog section of the CHANGELOG.md file and include
a link to the PR (formatted in markdown) and a link to your github profile (if you like). For example, "* Added a cool new feature by
[@myusername](link-to-your-github-profile)
in[PR 11111](https://github.com/gradio-app/gradio/pull/11111)
".If you would like to elaborate on your change further, feel free to include a longer explanation in the other sections.
If you would like an image/gif/video showcasing your feature, it may be best to edit the CHANGELOG file using the
GitHub web UI since that lets you upload files directly via drag-and-drop.