-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hello,
I saw this and wondered, wouldn't this cut complete words too? Assume, nothing is in the alignatt threshold, the hypothesis is:
['▁U', 'ser', '▁Inter', 'ac', 'tion', '.']
Then the whole Interaction is cut, even though it was not in the frame_threshold
| selected_tokens = self._strip_incomplete_words(selected_tokens) |
If this is not intended, I would put it like this:
# Truncate tokens up to the first invalid alignment (if any)
if len(invalid_tok_ids) > 0:
selected_tokens = selected_tokens[:invalid_tok_ids[0]]
if self.word_level_postprocess:
selected_tokens = self._strip_incomplete_words(selected_tokens)But maybe that's intended for the models that output partials words
Reactions are currently unavailable