Speculative Decoding Snippet Not Working #29869

hieunguyenquoc · 2024-03-26T04:25:59Z

System Info

transformers==4.39.1
python==3.8.17
torch==2.0.1+cpu

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import torch
from operator import itemgetter
import time
print(torch.__version__)

class PhoWhisper_Finetune_Model:
      def __init__(self) -> None:
          self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
          self.torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
          self.MODEL_ID = "phowhisper_medium_finetuned"
          self.model = AutoModelForSpeechSeq2Seq.from_pretrained(
              self.MODEL_ID, torch_dtype=self.torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
          )
          self.model.to(self.device)
          self.processor = AutoProcessor.from_pretrained("vinai/PhoWhisper-medium")
    
   
    def infer(self,audiopath:str) -> str:
        assistant_model_id = "vinai/PhoWhisper-tiny"
        assistant_model = AutoModelForSpeechSeq2Seq.from_pretrained(
            assistant_model_id,
            torch_dtype=self.torch_dtype,
            low_cpu_mem_usage=True
        ) 
        assistant_model.to(self.device)
        
        pipe = pipeline(
        "automatic-speech-recognition",
            model=self.model,
            tokenizer=self.processor.tokenizer,
            feature_extractor=self.processor.feature_extractor,
            chunk_length_s=20,
            batch_size=16,
            return_timestamps=True,
            torch_dtype=self.torch_dtype,
            device=self.device,
            generate_kwargs={"task": "transcribe","language":'vi', "assistant_model": assistant_model} 
        )
        prediction = pipe(audiopath)
        result_string = " ".join(map(itemgetter('text'), prediction["chunks"]))
        return result_string

if __name__ == "__main__":
        whisper = PhoWhisper_Finetune_Model()
        start = time.time()
        result = whisper.infer("-184354569133200865_104448_105080.wav")
        print(result)
        print("Time :",time.time() - start)

Expected behavior

I have tried Speculative Decoding on two version of PhoWhisper (Whisper finetuned version) followed by this post [https://huggingface.co/blog/whisper-speculative-decoding]. I have this error :
ValueError: Whisper expects the mel input features to be of length 3000, but found 1500. Make sure to pad the input mel features to 3000.
Could you help me ? Thank you @sanchit-gandhi

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-04-25T09:51:04Z

Gentle ping @sanchit-gandhi

jdvin · 2024-04-29T08:07:18Z

I believe the problem lies here:

transformers/src/transformers/generation/candidate_generator.py

Lines 119 to 129 in 73014b5

    
           if "assistant_encoder_outputs" in model_kwargs: 
        
               assistant_kwargs["encoder_outputs"] = model_kwargs["assistant_encoder_outputs"] 
        
           elif assistant_model.config.is_encoder_decoder: 
        
               inputs_tensor, model_input_name, assistant_kwargs = assistant_model._prepare_model_inputs( 
        
                   inputs_tensor, assistant_model.generation_config.bos_token_id, assistant_kwargs 
        
               ) 
        
               assistant_kwargs = assistant_model._prepare_encoder_decoder_kwargs_for_generation( 
        
                   inputs_tensor, assistant_kwargs, model_input_name 
        
               ) 
        
           elif "encoder_outputs" in model_kwargs: 
        
               assistant_kwargs["encoder_outputs"] = model_kwargs["encoder_outputs"]

The check for if "encoder_outputs" in model_kwargs on line 128 should be above the check for if assistant_model.config.is_encoder_decoder on line 121, because otherwise the outputs of the main model's encoder are fed in as inputs to the encoder of the assistant model, when they should be just used as inputs for the assistant decoder.

Happy to submit a PR for this.

sanchit-gandhi · 2024-04-29T16:38:58Z

Thanks for reporting @hieunguyenquoc! A PR would be most welcome @jdvin if you have the bandwidth, otherwise cc @kamilakesbi if you could take a look

kamilakesbi · 2024-05-23T09:04:21Z

This issue has been solved with PR #30637 :)

ArthurZucker added the Audio label Mar 26, 2024

huggingface deleted a comment from github-actions bot Apr 25, 2024

sanchit-gandhi mentioned this issue Apr 29, 2024

Assistant model not working for different sized openai models when using pipeline for ASR #30407

Closed

4 tasks

kamilakesbi mentioned this issue May 2, 2024

Whisper assistant decoding not working with pipeline #30611

Closed

4 tasks

gante mentioned this issue May 9, 2024

Whisper: fix asr pipeline with seq2seq assistant model #30726

Closed

kamilakesbi self-assigned this May 16, 2024

kamilakesbi closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative Decoding Snippet Not Working #29869

Speculative Decoding Snippet Not Working #29869

hieunguyenquoc commented Mar 26, 2024 •

edited by ArthurZucker

Loading

amyeroberts commented Apr 25, 2024

jdvin commented Apr 29, 2024

sanchit-gandhi commented Apr 29, 2024

kamilakesbi commented May 23, 2024

Speculative Decoding Snippet Not Working #29869

Speculative Decoding Snippet Not Working #29869

Comments

hieunguyenquoc commented Mar 26, 2024 • edited by ArthurZucker Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Apr 25, 2024

jdvin commented Apr 29, 2024

sanchit-gandhi commented Apr 29, 2024

kamilakesbi commented May 23, 2024

hieunguyenquoc commented Mar 26, 2024 •

edited by ArthurZucker

Loading