Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix #8793

idcore · 2023-08-05T11:37:44Z

Description: new parameter forced_decoder_ids for OpenAIWhisperParserLocal to force input language, and enable optional translate mode. Usage example:
processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="transcribe")
#forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="translate")
loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParserLocal(lang_model="openai/whisper-medium",forced_decoder_ids=forced_decoder_ids))
Issue Add option to directly set input language for OpenAIWhisperParserLocal #8792
Tag maintainer: @rlancemartin, @eyurtsev

Please make sure you're PR is passing linting and testing before submitting. Run make format, make lint and make test to check this locally.

See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

…ng tasks (translate/transcribe)

vercel · 2023-08-05T11:37:47Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 7, 2023 7:52am

hwchase17 · 2023-08-06T00:22:50Z

libs/langchain/langchain/document_loaders/parsers/audio.py

@@ -136,10 +156,19 @@ def __init__(self, device: str = "0", lang_model: Optional[str] = None):
        # load model for inference
        self.pipe = pipeline(
            "automatic-speech-recognition",
-            model="openai/whisper-medium",
+            model=self.lang_model,  # fix to use model name that was evaluated earlier


i would remove this comment, it loses meaning outside of this pr

Thank you. Fixed.

hwchase17 · 2023-08-06T00:23:49Z

libs/langchain/langchain/document_loaders/parsers/audio.py

            chunk_length_s=30,
            device=self.device,
        )
+        try:
+            if forced_decoder_ids is not None:


nit: i feel like its slightly nicer to do:

if ...: try: ... except: ...

dont care super strongly tho

Thank you. Fixed.

…sper_local

hwchase17

lgtm! thanks

…sper_local

idcore · 2023-08-07T07:46:09Z

Minor fix - shortened 1 line due to lint failed check. Will look into why local lint didn't displayed it for me.

idcore added 2 commits August 5, 2023 12:32

Added forced_decoder_ids parameter to WhisperLocalParser for specifiy…

aa14625

…ng tasks (translate/transcribe)

Fix formatting in OpenAIWhisperParserLocal

73ec173

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Aug 5, 2023

hwchase17 reviewed Aug 6, 2023

View reviewed changes

idcore and others added 2 commits August 6, 2023 14:08

Minor adjustments in OpenaiWhisperParserLocal

873b197

Merge branch 'langchain-ai:master' into add_forced_decoder_ids_to_whi…

33f783d

…sper_local

vercel bot deployed to Preview – langchain August 6, 2023 11:19 View deployment

hwchase17 approved these changes Aug 6, 2023

View reviewed changes

hwchase17 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 6, 2023

idcore and others added 3 commits August 7, 2023 10:23

minor lint fix in OpenAIWhisperParserLocal

e2e8500

Merge of document_loaders/parsers/audio.py

d5a7cf9

Merge branch 'langchain-ai:master' into add_forced_decoder_ids_to_whi…

eb90210

…sper_local

vercel bot deployed to Preview – langchain August 7, 2023 07:52 View deployment

baskaryan merged commit fe78aff into langchain-ai:master Aug 7, 2023
22 checks passed

idcore deleted the add_forced_decoder_ids_to_whisper_local branch August 12, 2023 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix #8793

Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix #8793

idcore commented Aug 5, 2023 •

edited

Loading

vercel bot commented Aug 5, 2023 •

edited

Loading

hwchase17 Aug 6, 2023

idcore Aug 6, 2023

hwchase17 Aug 6, 2023

idcore Aug 6, 2023

hwchase17 left a comment

idcore commented Aug 7, 2023

Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix #8793

Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix #8793

Conversation

idcore commented Aug 5, 2023 • edited Loading

vercel bot commented Aug 5, 2023 • edited Loading

hwchase17 Aug 6, 2023

Choose a reason for hiding this comment

idcore Aug 6, 2023

Choose a reason for hiding this comment

hwchase17 Aug 6, 2023

Choose a reason for hiding this comment

idcore Aug 6, 2023

Choose a reason for hiding this comment

hwchase17 left a comment

Choose a reason for hiding this comment

idcore commented Aug 7, 2023

idcore commented Aug 5, 2023 •

edited

Loading

vercel bot commented Aug 5, 2023 •

edited

Loading