fix: prevent IndexError in Whisper word timestamp decode by guoyangzhen · Pull Request #44885 · huggingface/transformers

guoyangzhen · 2026-03-20T13:03:54Z

Problem

In _split_tokens_on_unicode(), when the decoded token stream ends with a dangling Unicode replacement character (U+FFFD), the computed index can equal len(decoded_full), causing IndexError: string index out of range.

The failing line:

decoded_full[unicode_offset + decoded.index(replacement_char)] == replacement_char

When unicode_offset + decoded.index(replacement_char) >= len(decoded_full), this crashes.

Fix

Add bounds check before indexing into decoded_full:

if (
    replacement_char not in decoded
    or (unicode_offset + decoded.index(replacement_char) < len(decoded_full)
        and decoded_full[unicode_offset + decoded.index(replacement_char)] == replacement_char)
):

This ensures we only access decoded_full when the index is valid.

…#44869)

github-actions · 2026-03-20T13:04:56Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: whisper

github-actions · 2026-03-20T13:17:17Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44885&sha=a326a1

Rocketknight1 · 2026-03-23T12:01:13Z

Your bot has created a duplicate PR at #44902 for the same issue. As we mentioned in CONTRIBUTING, we're being flooded with code agent PRs right now, so temporarily blocking you to cut down the notification spam

fix: prevent IndexError in Whisper word timestamp decode (huggingface…

a326a1c

…#44869)

Rocketknight1 closed this Mar 23, 2026

Rocketknight1 added the Code agent slop label Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent IndexError in Whisper word timestamp decode#44885

fix: prevent IndexError in Whisper word timestamp decode#44885
guoyangzhen wants to merge 1 commit intohuggingface:mainfrom
guoyangzhen:fix/whisper-timestamp-indexerror

guoyangzhen commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Rocketknight1 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guoyangzhen commented Mar 20, 2026

Problem

Fix

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Rocketknight1 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants