Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculative Decoding Snippet Not Working #29869

Closed
4 tasks
hieunguyenquoc opened this issue Mar 26, 2024 · 4 comments
Closed
4 tasks

Speculative Decoding Snippet Not Working #29869

hieunguyenquoc opened this issue Mar 26, 2024 · 4 comments
Assignees
Labels

Comments

@hieunguyenquoc
Copy link

hieunguyenquoc commented Mar 26, 2024

System Info

transformers==4.39.1
python==3.8.17
torch==2.0.1+cpu

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import torch
from operator import itemgetter
import time
print(torch.__version__)

class PhoWhisper_Finetune_Model:
      def __init__(self) -> None:
          self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
          self.torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
          self.MODEL_ID = "phowhisper_medium_finetuned"
          self.model = AutoModelForSpeechSeq2Seq.from_pretrained(
              self.MODEL_ID, torch_dtype=self.torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
          )
          self.model.to(self.device)
          self.processor = AutoProcessor.from_pretrained("vinai/PhoWhisper-medium")
    
   
    def infer(self,audiopath:str) -> str:
        assistant_model_id = "vinai/PhoWhisper-tiny"
        assistant_model = AutoModelForSpeechSeq2Seq.from_pretrained(
            assistant_model_id,
            torch_dtype=self.torch_dtype,
            low_cpu_mem_usage=True
        ) 
        assistant_model.to(self.device)
        
        pipe = pipeline(
        "automatic-speech-recognition",
            model=self.model,
            tokenizer=self.processor.tokenizer,
            feature_extractor=self.processor.feature_extractor,
            chunk_length_s=20,
            batch_size=16,
            return_timestamps=True,
            torch_dtype=self.torch_dtype,
            device=self.device,
            generate_kwargs={"task": "transcribe","language":'vi', "assistant_model": assistant_model} 
        )
        prediction = pipe(audiopath)
        result_string = " ".join(map(itemgetter('text'), prediction["chunks"]))
        return result_string

if __name__ == "__main__":
        whisper = PhoWhisper_Finetune_Model()
        start = time.time()
        result = whisper.infer("-184354569133200865_104448_105080.wav")
        print(result)
        print("Time :",time.time() - start)

Expected behavior

I have tried Speculative Decoding on two version of PhoWhisper (Whisper finetuned version) followed by this post [https://huggingface.co/blog/whisper-speculative-decoding]. I have this error :
ValueError: Whisper expects the mel input features to be of length 3000, but found 1500. Make sure to pad the input mel features to 3000.
Could you help me ? Thank you @sanchit-gandhi

@huggingface huggingface deleted a comment from github-actions bot Apr 25, 2024
@amyeroberts
Copy link
Collaborator

Gentle ping @sanchit-gandhi

@jdvin
Copy link

jdvin commented Apr 29, 2024

I believe the problem lies here:

if "assistant_encoder_outputs" in model_kwargs:
assistant_kwargs["encoder_outputs"] = model_kwargs["assistant_encoder_outputs"]
elif assistant_model.config.is_encoder_decoder:
inputs_tensor, model_input_name, assistant_kwargs = assistant_model._prepare_model_inputs(
inputs_tensor, assistant_model.generation_config.bos_token_id, assistant_kwargs
)
assistant_kwargs = assistant_model._prepare_encoder_decoder_kwargs_for_generation(
inputs_tensor, assistant_kwargs, model_input_name
)
elif "encoder_outputs" in model_kwargs:
assistant_kwargs["encoder_outputs"] = model_kwargs["encoder_outputs"]

The check for if "encoder_outputs" in model_kwargs on line 128 should be above the check for if assistant_model.config.is_encoder_decoder on line 121, because otherwise the outputs of the main model's encoder are fed in as inputs to the encoder of the assistant model, when they should be just used as inputs for the assistant decoder.

Happy to submit a PR for this.

@sanchit-gandhi
Copy link
Contributor

Thanks for reporting @hieunguyenquoc! A PR would be most welcome @jdvin if you have the bandwidth, otherwise cc @kamilakesbi if you could take a look

@kamilakesbi
Copy link
Contributor

This issue has been solved with PR #30637 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants