Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

Open
shailesh837 opened this issue May 7, 2024 · 2 comments
Assignees

Comments

@shailesh837
Copy link

I am using from older post from : issues of ipex-llm for TTS [text to speech]
I have 2 issues:
a) it takes 8 seconds for TTS on XPU than on CPU [3 sec] why ?
b) When i run the code below , every time , it converts to int4, why can't it does once and save it local, i tried as well, it fails :
2024-05-06 23:56:05,898 - INFO - intel_extension_for_pytorch auto imported
2024-05-06 23:56:05,898 - INFO - Converting the current model to sym_int4 format.....

conda create -n speecht5-test python=3.9
conda activate speecht5-test

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install datasets soundfile
Runtime Configuration: following here

Code:

import torch
from transformers import SpeechT5Processor, SpeechT5HifiGan, SpeechT5ForTextToSpeech
from datasets import load_dataset
import soundfile as sf
import time

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

from bigdl.llm import optimize_model
model = optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                      "speech_decoder_postnet.prob_out"]) 
model = model.to('xpu')
vocoder = vocoder.to('xpu')

text = "Alright, listen up. Tyres are still a bit cold, but they're getting there. Keep the pace steady and focus on getting them up to temp. We need those pressures closer to 30 psi, so keep an eye on that. Once the tyres are ready, we'll be good to go. Now get out there and give it everything you've got."
inputs = processor(text=text, return_tensors="pt").to('xpu')

# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors",
                                  split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0).to('xpu')

with torch.inference_mode():
  # wamrup
  st = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  print(f'Warmup time: {time.perf_counter() - st}')

  st1 = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  torch.xpu.synchronize()
  st2 = time.perf_counter()
  print(f"Inference time: {st2-st1}")

sf.write("speech_bigdl_llm.wav", speech.to('cpu').numpy(), samplerate=16000)

Saving Local folder, But it didn't worked ., got error:

# Check if the optimized model is already saved on disk
optimized_model_path= "speecht5_tts_optimized"
if os.path.exists(optimized_model_path):
    # Load the optimized model directly
    model= SpeechT5ForTextToSpeech.from_pretrained(optimized_model_path, ignore_mismatched_sizes=True)
else:
    # Load and optimize the model, then save it
    model= SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
    #from bigdl.llm import optimize_model
    model= optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                          "speech_decoder_postnet.prob_out"])
    # Save the optimized model to disk
    model.save_pretrained(optimized_model_path)

Error while using saving code:

warn(
2024-05-07 00:11:10,686 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
File "/home/spandey2/tts/speechT5.py", line 20, in <module>
model= SpeechT5ForTextToSpeech.from_pretrained(optimized_model_path)
File "/home/spandey2/miniconda3/envs/speecht5-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/spandey2/miniconda3/envs/speecht5-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3310, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for SpeechT5ForTextToSpeech:
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.k_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.v_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.q_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.out_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.feed_forward.intermediate_dense.weight: copying a param with shape torch.Size([1253376]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
@sgwhat sgwhat self-assigned this May 7, 2024
@sgwhat
Copy link
Contributor

sgwhat commented May 7, 2024

Hi @shailesh837, we are working on reproducing your issue.

@sgwhat
Copy link
Contributor

sgwhat commented May 7, 2024

  1. For your first question, please set environment variables for optimal performance as below (before running your program):

    export USE_XETLA=OFF
    export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
    export SYCL_CACHE_PERSISTENT=1
  2. For your second issue, please modify your code to save/load optimized model as below:

    # Save the model
    # Load and optimize the model, then save it
    model= SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
    from ipex_llm import optimize_model
    model= optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                      "speech_decoder_postnet.prob_out"])
    # Save the optimized model to disk
    optimized_model_path= "speecht5_tts_optimized"
    model.save_low_bit(optimized_model_path)

    Then load the optimized model as below:

    from ipex_llm.optimize import load_low_bit
    model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")  
    model= load_low_bit(model, optimized_model_path)

Note: We have already migrated bigdl-llm into ipex-llm, so please use ipex-llm instead, you may see ipex-llm installation guide for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants