-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Hi folks,
I'm evaluating Play HT for potential use. While SSML is apparently supported, I can't seem to figure out how to access it using the python interface. Is it simply not implemented yet or is there a flag of some sort that I'm missing?
https://docs.play.ht/reference/api-convert-tts-ssml-standard-premium-voices
Here's the simple test case that leads to my confusion:
# import the playht SDK
from pyht import Client, TTSOptions, Format
import io
import pyaudio
from pydub import AudioSegment
def play_audio_stream(byte_iterator):
# Combine the bytes from the iterator into a single bytes object
mp3_data = b"".join(byte_iterator)
# Load the mp3 data into an AudioSegment
audio = AudioSegment.from_file(io.BytesIO(mp3_data), format="mp3")
# Convert the AudioSegment to raw audio data
raw_data = audio.raw_data
sample_rate = audio.frame_rate
num_channels = audio.channels
sample_width = audio.sample_width
# Initialize PyAudio
p = pyaudio.PyAudio()
# Open a stream
stream = p.open(format=p.get_format_from_width(sample_width),
channels=num_channels,
rate=sample_rate,
output=True)
# Play the audio by writing to the stream
stream.write(raw_data)
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate PyAudio
p.terminate()
# Initialize PlayHT API with your credentials
client = Client("<id>", "<key>")
# configure your stream
options = TTSOptions(
# this voice id can be one of our prebuilt voices or your own voice clone id, refer to the`listVoices()` method for a list of supported voices.
# voice="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
voice="s3://voice-cloning-zero-shot/a59cb96d-bba8-4e24-81f2-e60b888a0275/charlottenarrativesaad/manifest.json",
# you can pass any value between 8000 and 48000, 24000 is default
sample_rate=44_100,
# the generated audio encoding, supports 'raw' | 'mp3' | 'wav' | 'ogg' | 'flac' | 'mulaw'
format=Format.FORMAT_MP3,
# playback rate of generated speech
speed=1,
)
# start streaming!
text = '<speak><p>This is the beginning of a beautiful <break time="1.0s"/> friendship</p></speak>'
# must use turbo voice engine for the best latency
audio_stream = client.tts(text=text, voice_engine="PlayHT2.0-turbo", options=options)
play_audio_stream(iter(audio_stream))Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels