Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnlyFirst and OnlySecond truncation strategies #108

Closed
Pierrci opened this issue Jan 28, 2020 · 1 comment · Fixed by #112
Closed

OnlyFirst and OnlySecond truncation strategies #108

Pierrci opened this issue Jan 28, 2020 · 1 comment · Fixed by #112
Labels
bug Something isn't working

Comments

@Pierrci
Copy link
Member

Pierrci commented Jan 28, 2020

The current behavior for OnlyFirst and OnlySecond truncation strategies is not the one I would expect, and diverges from the current behavior in transformers:

TruncationStrategy::OnlyFirst | TruncationStrategy::OnlySecond => {
let target = if params.strategy == TruncationStrategy::OnlyFirst {
Ok(&mut encoding)
} else if let Some(encoding) = pair_encoding.as_mut() {
Ok(encoding)
} else {
Err(Box::new(Error::SecondSequenceNotProvided))
}?;
if target.get_ids().len() > params.max_length {
target.truncate(params.max_length, params.stride);
}
}

It currently takes only the first encoding (OnlyFirst) or the second one (OnlySecond), and then truncates it to make its length less than the desired max_length.

But this doesn't guarantee that the combined encodings have a length inferior to max_length, which is the behavior I was expecting: those strategies should take into account the combined encodings length when truncating only the first or second one.

What do you think @n1t0 @mfuntowicz?

@Pierrci Pierrci added the bug Something isn't working label Jan 28, 2020
@mfuntowicz
Copy link
Member

I agree that I would also expect, in both case, the final - combined - length should be <= max_len, otherwise it sounds weird.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants