Skip to content

Support for SSML in python interface? #50

@EllenOrange

Description

@EllenOrange

Hi folks,

I'm evaluating Play HT for potential use. While SSML is apparently supported, I can't seem to figure out how to access it using the python interface. Is it simply not implemented yet or is there a flag of some sort that I'm missing?

https://docs.play.ht/reference/api-convert-tts-ssml-standard-premium-voices

Here's the simple test case that leads to my confusion:

# import the playht SDK
from pyht import Client, TTSOptions, Format

import io
import pyaudio
from pydub import AudioSegment

def play_audio_stream(byte_iterator):
    # Combine the bytes from the iterator into a single bytes object
    mp3_data = b"".join(byte_iterator)

    # Load the mp3 data into an AudioSegment
    audio = AudioSegment.from_file(io.BytesIO(mp3_data), format="mp3")

    # Convert the AudioSegment to raw audio data
    raw_data = audio.raw_data
    sample_rate = audio.frame_rate
    num_channels = audio.channels
    sample_width = audio.sample_width

    # Initialize PyAudio
    p = pyaudio.PyAudio()

    # Open a stream
    stream = p.open(format=p.get_format_from_width(sample_width),
                    channels=num_channels,
                    rate=sample_rate,
                    output=True)

    # Play the audio by writing to the stream
    stream.write(raw_data)

    # Stop and close the stream
    stream.stop_stream()
    stream.close()

    # Terminate PyAudio
    p.terminate()


# Initialize PlayHT API with your credentials
client = Client("<id>", "<key>")

# configure your stream
options = TTSOptions(
    # this voice id can be one of our prebuilt voices or your own voice clone id, refer to the`listVoices()` method for a list of supported voices.
    # voice="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
    voice="s3://voice-cloning-zero-shot/a59cb96d-bba8-4e24-81f2-e60b888a0275/charlottenarrativesaad/manifest.json",

    # you can pass any value between 8000 and 48000, 24000 is default
    sample_rate=44_100,
  
    # the generated audio encoding, supports 'raw' | 'mp3' | 'wav' | 'ogg' | 'flac' | 'mulaw'
    format=Format.FORMAT_MP3,

    # playback rate of generated speech
    speed=1,
)

# start streaming!
text = '<speak><p>This is the beginning of a beautiful <break time="1.0s"/> friendship</p></speak>'

# must use turbo voice engine for the best latency
audio_stream = client.tts(text=text, voice_engine="PlayHT2.0-turbo", options=options)

play_audio_stream(iter(audio_stream))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions