Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add always_use_initial_prompt #1040

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mercury233
Copy link

always_use_initial_prompt: bool
    if True, the initial_prompt will be used to all windows, and condition_on_previous_text
    will be ignored. Enabling this may make the text more consistent if the audio is long
    and you set the initial_prompt properly.

@mercury233 mercury233 force-pushed the patch-always-use-initial-prompt branch from 2f5b957 to bd54b68 Compare March 7, 2023 03:23
@ryanheise
Copy link
Contributor

ryanheise commented Mar 7, 2023

I think some variation on this idea might help it to remember your prompting in long audio, but when a window boundary occurs mid sentence, I think it's also important to have the previous text as the prompt.

As a compromise, have you thought about truncating the previous text at a sentence boundary and then prepending the initial prompt before that? It might be the best of both worlds.

@mercury233
Copy link
Author

I think some variation on this idea might help it to remember your prompting in long audio, but when a window boundary occurs mid sentence, I think it's also important to have the previous text as the prompt.

As a compromise, have you thought about truncating the previous text at a sentence boundary and then prepending the initial prompt before that? It might be the best of both worlds.

I agree, but I don't know how to do that.

@ryanheise
Copy link
Contributor

A really cheap modification might be to add a check here:

            if not condition_on_previous_text or result.temperature > 0.5:
                # do not feed the prompt tokens if a high temperature was used
                prompt_reset_since = len(all_tokens)

so that you also check if your option is enabled and if the latest token ends with one of these characters ".。!!??", effectively resetting the prompt after every sentence boundary. Then when feeding the prompt:

            decode_options["prompt"] = all_tokens[prompt_reset_since:]

If your option is enabled you could prepend the initial prompt here.

BUT, I think it might be more useful to parameterise how many previous sentences back to include in the prompt. For that the code would be a bit more complicated. But you could keep a FIFO buffer, e.g. to remember the last 3 sentences, you have a FIFO of size 3 containing the last 3 sentence boundary positions, which you put into the FIFO under the same condition as that first block of code above. The oldest sentence boundary gets popped out so you never have more than the last 3 in there.

@umar009ali
Copy link

I use whisper ai for audio transcription and translation but since 25 February it is not transcribing and Translating clearly !!! I would be happy if anybody help me. plz

@radurevutchi
Copy link

radurevutchi commented Mar 17, 2023

@ryanheise note that this code in decoding.py:594 truncates the list of all prompt tokens (from the beginning), not the end. That means that simply prepending without checking for prompt window length will not always work. Truncation size depends on the model config.

tokens = ( [self.tokenizer.sot_prev] + prompt_tokens[-(self.n_ctx // 2 - 1) :] + tokens )

Copy link

@Paxosman Paxosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check your write on 22

@FurkanGozukara
Copy link

testing this right now

@FurkanGozukara
Copy link

the output after this is just amazing

i dont get why this is still not implemented

@FurkanGozukara
Copy link

@mercury233 it is hallucinating significantly after this change. anyway to prevent it? other than that it works great. you found a solution for hallucination ? I can use very big beam size and best of but they didnt help.

@FurkanGozukara
Copy link

 not condition_on_previous_text or result.temperature > 0.5

can you share modified file like this? i would like to test. currently it is having problems

@FurkanGozukara
Copy link

yes with this way it is skipping 30 second blocks sometimes. we need optimization. @mercury233 @ryanheise @radurevutchi

@mercury233
Copy link
Author

@mercury233 it is hallucinating significantly after this change. anyway to prevent it?

Sorry, I didn't

@jonathanjfshaw
Copy link

I have used the same basic idea of applying the initial prompt to every window to supply a dictionary of obscure words that might be in the transcript. It's very effective at boosting recognition of some words. However, I don't see it as in opposition to condition_on_previous text; the basic idea of using context from the end of the previous window to influence understanding of the beginning of the next window is still valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants