Skip to content

_strip_incomplete_words cuts complete words #32

@azziko

Description

@azziko

Hello,

I saw this and wondered, wouldn't this cut complete words too? Assume, nothing is in the alignatt threshold, the hypothesis is:

['▁U', 'ser', '▁Inter', 'ac', 'tion', '.']

Then the whole Interaction is cut, even though it was not in the frame_threshold

selected_tokens = self._strip_incomplete_words(selected_tokens)

If this is not intended, I would put it like this:

        # Truncate tokens up to the first invalid alignment (if any)
        if len(invalid_tok_ids) > 0:
            selected_tokens = selected_tokens[:invalid_tok_ids[0]]
            if self.word_level_postprocess:
                selected_tokens = self._strip_incomplete_words(selected_tokens)

But maybe that's intended for the models that output partials words

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions