add always_use_initial_prompt #1040

mercury233 · 2023-03-07T03:23:04Z

always_use_initial_prompt: bool
    if True, the initial_prompt will be used to all windows, and condition_on_previous_text
    will be ignored. Enabling this may make the text more consistent if the audio is long
    and you set the initial_prompt properly.

ryanheise · 2023-03-07T07:02:33Z

I think some variation on this idea might help it to remember your prompting in long audio, but when a window boundary occurs mid sentence, I think it's also important to have the previous text as the prompt.

As a compromise, have you thought about truncating the previous text at a sentence boundary and then prepending the initial prompt before that? It might be the best of both worlds.

mercury233 · 2023-03-07T07:08:26Z

I think some variation on this idea might help it to remember your prompting in long audio, but when a window boundary occurs mid sentence, I think it's also important to have the previous text as the prompt.

As a compromise, have you thought about truncating the previous text at a sentence boundary and then prepending the initial prompt before that? It might be the best of both worlds.

I agree, but I don't know how to do that.

ryanheise · 2023-03-07T08:28:01Z

A really cheap modification might be to add a check here:

            if not condition_on_previous_text or result.temperature > 0.5:
                # do not feed the prompt tokens if a high temperature was used
                prompt_reset_since = len(all_tokens)

so that you also check if your option is enabled and if the latest token ends with one of these characters ".。!！?？", effectively resetting the prompt after every sentence boundary. Then when feeding the prompt:

            decode_options["prompt"] = all_tokens[prompt_reset_since:]

If your option is enabled you could prepend the initial prompt here.

BUT, I think it might be more useful to parameterise how many previous sentences back to include in the prompt. For that the code would be a bit more complicated. But you could keep a FIFO buffer, e.g. to remember the last 3 sentences, you have a FIFO of size 3 containing the last 3 sentence boundary positions, which you put into the FIFO under the same condition as that first block of code above. The oldest sentence boundary gets popped out so you never have more than the last 3 in there.

umar009ali · 2023-03-16T12:40:59Z

I use whisper ai for audio transcription and translation but since 25 February it is not transcribing and Translating clearly !!! I would be happy if anybody help me. plz

radurevutchi · 2023-03-17T22:41:42Z

@ryanheise note that this code in decoding.py:594 truncates the list of all prompt tokens (from the beginning), not the end. That means that simply prepending without checking for prompt window length will not always work. Truncation size depends on the model config.

tokens = ( [self.tokenizer.sot_prev] + prompt_tokens[-(self.n_ctx // 2 - 1) :] + tokens )

Paxosman

Check your write on 22

FurkanGozukara · 2023-05-03T21:19:29Z

testing this right now

FurkanGozukara · 2023-05-03T21:31:57Z

the output after this is just amazing

i dont get why this is still not implemented

FurkanGozukara · 2023-05-04T16:49:53Z

@mercury233 it is hallucinating significantly after this change. anyway to prevent it? other than that it works great. you found a solution for hallucination ? I can use very big beam size and best of but they didnt help.

FurkanGozukara · 2023-05-04T16:51:29Z

 not condition_on_previous_text or result.temperature > 0.5

can you share modified file like this? i would like to test. currently it is having problems

FurkanGozukara · 2023-05-04T16:58:56Z

yes with this way it is skipping 30 second blocks sometimes. we need optimization. @mercury233 @ryanheise @radurevutchi

mercury233 · 2023-05-04T22:16:21Z

@mercury233 it is hallucinating significantly after this change. anyway to prevent it?

Sorry, I didn't

jonathanjfshaw · 2023-07-12T17:54:35Z

I have used the same basic idea of applying the initial prompt to every window to supply a dictionary of obscure words that might be in the transcript. It's very effective at boosting recognition of some words. However, I don't see it as in opposition to condition_on_previous text; the basic idea of using context from the end of the previous window to influence understanding of the beginning of the next window is still valuable.

add always_use_initial_prompt

bd54b68

mercury233 force-pushed the patch-always-use-initial-prompt branch from 2f5b957 to bd54b68 Compare March 7, 2023 03:23

typo

38dc689

Paxosman approved these changes Mar 30, 2023

View reviewed changes

hollance mentioned this pull request Apr 3, 2023

feat: Whisper prompting huggingface/transformers#22496

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add always_use_initial_prompt #1040

add always_use_initial_prompt #1040

mercury233 commented Mar 7, 2023

ryanheise commented Mar 7, 2023 •

edited

mercury233 commented Mar 7, 2023

ryanheise commented Mar 7, 2023

umar009ali commented Mar 16, 2023

radurevutchi commented Mar 17, 2023 •

edited

Paxosman left a comment

FurkanGozukara commented May 3, 2023

FurkanGozukara commented May 3, 2023

FurkanGozukara commented May 4, 2023

FurkanGozukara commented May 4, 2023

FurkanGozukara commented May 4, 2023

mercury233 commented May 4, 2023

jonathanjfshaw commented Jul 12, 2023

add always_use_initial_prompt #1040

Are you sure you want to change the base?

add always_use_initial_prompt #1040

Conversation

mercury233 commented Mar 7, 2023

ryanheise commented Mar 7, 2023 • edited

mercury233 commented Mar 7, 2023

ryanheise commented Mar 7, 2023

umar009ali commented Mar 16, 2023

radurevutchi commented Mar 17, 2023 • edited

Paxosman left a comment

Choose a reason for hiding this comment

FurkanGozukara commented May 3, 2023

FurkanGozukara commented May 3, 2023

FurkanGozukara commented May 4, 2023

FurkanGozukara commented May 4, 2023

FurkanGozukara commented May 4, 2023

mercury233 commented May 4, 2023

jonathanjfshaw commented Jul 12, 2023

ryanheise commented Mar 7, 2023 •

edited

radurevutchi commented Mar 17, 2023 •

edited