🚨🚨[Whisper Tok] Update integration test by sanchit-gandhi · Pull Request #29368 · huggingface/transformers

sanchit-gandhi · 2024-02-29T11:04:58Z

What does this PR do?

The merges for the Whisper tokenizers were updated on the Hub in this PR. While this is a breaking change, it is a required fix to ensure we have parity with the original OpenAI repo.

This PR updates the integration tests for the Whisper tokenizer to reflect the merge changes.

sanchit-gandhi · 2024-02-29T11:06:05Z

tests/models/whisper/test_tokenization_whisper.py

        self.assertListEqual(
            tokenizer.convert_tokens_to_ids(tokens),
-            [5723, 307, 257, 220, 31636],
+            [5723, 307, 257, 1500],


This now gives equivalent results to the original:

from whisper.tokenizer import get_tokenizer tokenizer = get_tokenizer(True) tokens = tokenizer.encode("This is a test") print(tokens)

Print Output:

[5723, 307, 257, 1500]

sanchit-gandhi · 2024-02-29T11:06:36Z

tests/models/whisper/test_tokenization_whisper.py

        self.assertEqual(output, [])
-
-    @require_jinja
-    def test_tokenization_for_chat(self):


Chat template doesn't make sense for Whisper (a speech recognition model) - have removed the test to keep the CI lightweight (cc @Rocketknight1)

Fine with me!

sanchit-gandhi · 2024-02-29T11:25:45Z

Also cc @ydshieh as this PR will prevent a red CI on main

HuggingFaceDocBuilderDev · 2024-02-29T11:35:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks for the prompt fix, it's breaking so I'll probably update the PR tittle with ⚠️

sanchit-gandhi · 2024-03-01T09:22:28Z

The GH PR itself is not strictly breaking (there's no change to the code), but rather it's the Hub PR which is breaking. Fine for me to leave the 🚨 in the title though to book-log this!

sanchit-gandhi added 2 commits February 29, 2024 11:02

[Whisper Tok] Update integration test

f82c997

make style

196f072

sanchit-gandhi commented Feb 29, 2024

View reviewed changes

sanchit-gandhi requested a review from ArthurZucker February 29, 2024 11:07

ArthurZucker approved these changes Mar 1, 2024

View reviewed changes

ArthurZucker changed the title ~~[Whisper Tok] Update integration test~~ 🚨🚨[Whisper Tok] Update integration test Mar 1, 2024

sanchit-gandhi merged commit 0a0a279 into huggingface:main Mar 1, 2024

sanchit-gandhi deleted the whisper-tokenizer branch March 1, 2024 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨🚨[Whisper Tok] Update integration test#29368

🚨🚨[Whisper Tok] Update integration test#29368
sanchit-gandhi merged 2 commits intohuggingface:mainfrom
sanchit-gandhi:whisper-tokenizer

sanchit-gandhi commented Feb 29, 2024

Uh oh!

sanchit-gandhi Feb 29, 2024

Uh oh!

sanchit-gandhi Feb 29, 2024

Uh oh!

Rocketknight1 Feb 29, 2024

Uh oh!

sanchit-gandhi commented Feb 29, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Feb 29, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

sanchit-gandhi commented Mar 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sanchit-gandhi commented Feb 29, 2024

What does this PR do?

Uh oh!

sanchit-gandhi Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi commented Feb 29, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Feb 29, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi commented Mar 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants