Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling tag with no prefix for aggregation_strategy in TokenClassificationPipeline #13325

Closed
jbpolle opened this issue Aug 29, 2021 · 9 comments · Fixed by #13493
Closed

Handling tag with no prefix for aggregation_strategy in TokenClassificationPipeline #13325

jbpolle opened this issue Aug 29, 2021 · 9 comments · Fixed by #13493

Comments

@jbpolle
Copy link

jbpolle commented Aug 29, 2021

🚀 Feature request

Previously the parameter grouped_entities would handle entity with no prefix (like "PER" instead of "B-PER") and would correctly group similar entities next to each others. With the new parameter aggregation_strategy, this is not the case anymore.

Motivation

In some simple models, the prefix add some complexity that is not always required. Because of this we are forced to add a prefix to make aggregation works even if not required by the model.

Your contribution

@LysandreJik
Copy link
Member

cc @Narsil

@Narsil
Copy link
Contributor

Narsil commented Aug 30, 2021

Hi @jbpolle what do you mean correctly ? We should not have changed behavior there, but indeed it's not part of the testing right now, so there might be some issues.

Could you provide a small script on an older transformers version that displays the intended behavior ?

@jbpolle
Copy link
Author

jbpolle commented Aug 30, 2021 via email

@jbpolle
Copy link
Author

jbpolle commented Aug 30, 2021

Adding missing screenshot in previous message:

PastedGraphic-1

@Narsil
Copy link
Contributor

Narsil commented Aug 31, 2021

I went back to 4.3.3 and I can see that the splitting was exactly the same. (no grouping when tags didn't include B-, I- ).

The fact that the cache wasn't probably cleaned on the widget is still an issue, clearing it.

@jbpolle
Copy link
Author

jbpolle commented Sep 3, 2021

I was working on 4.3.2 and here is how this was working:
image

But now in 4.9:
image

And even when playing with new aggregation_strategy parameters, I can't get previous results.
Anyway it's fixed in my case by adding the prefix so don't hesitate to close the ticket.

Thank you,

@Narsil
Copy link
Contributor

Narsil commented Sep 4, 2021

Ok, I must have tested it wrong before. I can confirm. This is indeed because the default for tags wasn't really explicited, but did behave as I-

Code was:

entity["entity"].split("-")[0] != "B"

Which would resolve to "PER" != "B" whereas now the default tag was explicitely set as B-:
https://github.com/huggingface/transformers/blob/master/src/transformers/pipelines/token_classification.py#L413

The fix would be easy but I am unsure about reverting this now that was merged 6th June.
Tagging a core maintainer for advice for how to handle this. @LysandreJik

We would need to run some numbers on the hub too, to get an idea of amount of affected repos.

@LysandreJik
Copy link
Member

I would fix it to behave the same as it was in v4.3.2 as this is the expected behavior when using grouped_entities

@Narsil
Copy link
Contributor

Narsil commented Sep 9, 2021

PR opened. #13493

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants