-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for truncation
parameter on feature-extraction
pipeline.
#14193
Adding support for truncation
parameter on feature-extraction
pipeline.
#14193
Conversation
Can you also include |
Padding for pipelines is something, I would like to keep orthogonal to business logic (See #13724). It's more batching than padding, but I imagine you pad mostly for the batch. |
Only slightly, I think. Right now you get embeddings of varying size depending on the size of your input sequence. If you want to use somehow these embeddings in a downstream task, it's weird to have varying size. In my case, I think I'm going to use only the embedding corresponding to the CLS, so I'm good. |
@Narsil Note that padding is supported in other pipelines. I think as a user, it's maddening to have varied behavior depending on which pipeline you use. I personally think that lack of consistency across pipelines is problematic. Take a look at this pipeline code, note there is logic around padding:
|
You're 100% correct, that's part of the reason of the large rewrite which is happening. For instance, the rewrite enables you to write either If anything dropping padding from the code you're quoting would be the way to go. (At least, deprecating it first, we have to maintain compatibility as much as possible). This code is currently legacy, and should be rewritten sometime in the future. The thing is there are a couple of directions to be considered for Padding, is like batching, it was very spurious support across pipelines, we're closing the gap, but it takes time, and backward compatibility is important. The core idea is to get orthogonal behavior whereever possible. So as much as possible, individual pipelines should NOT handle them, all this logic should be enabled in the parent class. Not all models are even capable of padding ( Truncation for instance, is not orthogonal, since That's also the reason why adding new parameters is something we try to think about before jumping to it.
Hope this clears a bit what's going on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for taking care of it @Narsil, and thank you for the discussion @ioana-blue!
1a04b27
to
3da7ba3
Compare
Fixes #14183
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.