Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

shailesh837 · 2024-05-07T00:07:11Z

I am using from older post from : issues of ipex-llm for TTS [text to speech]
I have 2 issues:
a) it takes 8 seconds for TTS on XPU than on CPU [3 sec] why ?
b) When i run the code below , every time , it converts to int4, why can't it does once and save it local, i tried as well, it fails :
2024-05-06 23:56:05,898 - INFO - intel_extension_for_pytorch auto imported
2024-05-06 23:56:05,898 - INFO - Converting the current model to sym_int4 format.....

conda create -n speecht5-test python=3.9
conda activate speecht5-test

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install datasets soundfile
Runtime Configuration: following here

Code:

import torch
from transformers import SpeechT5Processor, SpeechT5HifiGan, SpeechT5ForTextToSpeech
from datasets import load_dataset
import soundfile as sf
import time

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

from bigdl.llm import optimize_model
model = optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                      "speech_decoder_postnet.prob_out"]) 
model = model.to('xpu')
vocoder = vocoder.to('xpu')

text = "Alright, listen up. Tyres are still a bit cold, but they're getting there. Keep the pace steady and focus on getting them up to temp. We need those pressures closer to 30 psi, so keep an eye on that. Once the tyres are ready, we'll be good to go. Now get out there and give it everything you've got."
inputs = processor(text=text, return_tensors="pt").to('xpu')

# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors",
                                  split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0).to('xpu')

with torch.inference_mode():
  # wamrup
  st = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  print(f'Warmup time: {time.perf_counter() - st}')

  st1 = time.perf_counter()
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
  torch.xpu.synchronize()
  st2 = time.perf_counter()
  print(f"Inference time: {st2-st1}")

sf.write("speech_bigdl_llm.wav", speech.to('cpu').numpy(), samplerate=16000)

Saving Local folder, But it didn't worked ., got error:

# Check if the optimized model is already saved on disk
optimized_model_path= "speecht5_tts_optimized"
if os.path.exists(optimized_model_path):
    # Load the optimized model directly
    model= SpeechT5ForTextToSpeech.from_pretrained(optimized_model_path, ignore_mismatched_sizes=True)
else:
    # Load and optimize the model, then save it
    model= SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
    #from bigdl.llm import optimize_model
    model= optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                          "speech_decoder_postnet.prob_out"])
    # Save the optimized model to disk
    model.save_pretrained(optimized_model_path)

Error while using saving code:

warn(
2024-05-07 00:11:10,686 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
File "/home/spandey2/tts/speechT5.py", line 20, in <module>
model= SpeechT5ForTextToSpeech.from_pretrained(optimized_model_path)
File "/home/spandey2/miniconda3/envs/speecht5-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/spandey2/miniconda3/envs/speecht5-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3310, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for SpeechT5ForTextToSpeech:
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.k_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.v_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.q_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.attention.out_proj.weight: copying a param with shape torch.Size([313344]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for speecht5.encoder.wrapped_encoder.layers.0.feed_forward.intermediate_dense.weight: copying a param with shape torch.Size([1253376]) from checkpoint, the shape in current model is torch.Size([3072, 768]).

The text was updated successfully, but these errors were encountered:

sgwhat · 2024-05-07T01:29:03Z

Hi @shailesh837, we are working on reproducing your issue.

sgwhat · 2024-05-07T02:55:20Z

For your first question, please set environment variables for optimal performance as below (before running your program):
```
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
```

For your second issue, please modify your code to save/load optimized model as below:

# Save the model
# Load and optimize the model, then save it
model= SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
from ipex_llm import optimize_model
model= optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
                                                  "speech_decoder_postnet.prob_out"])
# Save the optimized model to disk
optimized_model_path= "speecht5_tts_optimized"
model.save_low_bit(optimized_model_path)

Then load the optimized model as below:

from ipex_llm.optimize import load_low_bit
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")  
model= load_low_bit(model, optimized_model_path)

Note: We have already migrated bigdl-llm into ipex-llm, so please use ipex-llm instead, you may see ipex-llm installation guide for more details.

qiyuangong added the user issue label May 7, 2024

sgwhat self-assigned this May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

shailesh837 commented May 7, 2024

sgwhat commented May 7, 2024

sgwhat commented May 7, 2024 •

edited

Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? #10942

Comments

shailesh837 commented May 7, 2024

sgwhat commented May 7, 2024

sgwhat commented May 7, 2024 • edited

sgwhat commented May 7, 2024 •

edited