# Text to Speech avatar

## From text to speech with a video avatar provided by Azure Speech Services
Custom text to speech avatar allows you to create a customized, one-of-a-kind synthetic talking avatar for your application. With custom text to speech avatar, you can build a unique and natural-looking avatar for your product or brand by providing video recording data of your selected actors. If you also create a custom neural voice for the same actor and use it as the avatar's voice, the avatar will be even more realistic.

<img src="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/media/custom-avatar-workflow.png#lightbox">

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties

In [16]:
import datetime
import json
import requests
import sys
import time

from datetime import date
from ipywidgets import Video
from IPython.display import FileLink

In [17]:
sys.version

'3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0]'

In [18]:
dt = datetime.datetime.today().strftime('%d-%b-%Y %H:%M:%S')
print(f"Today is {dt}")

Today is 09-Oct-2024 12:44:15


In [19]:
# Azure Speech services
azure_speech_key = "tobereplaced"
azure_speech_region = "tobereplaced"

## 1 Settings

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/avatar-gestures-with-ssml#supported-pre-built-avatar-characters-styles-and-gestures

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts

In [20]:
avatar_name = "Harry"
avatar_style = "business"
avatar_voice = "'en-US-RyanMultilingualNeural'"

#avatar_name = "Lisa"
#avatar_style = "graceful-sitting"
#avatar_voice = "'en-US-AvaMultilingualNeural'"

In [21]:
avatar_language = "'en-US'"
avatar_video_file = f"avatar_video_{avatar_name}_{avatar_style}.mp4"

### Text to use

In [22]:
today = date.today()

#avatar_text = f"""
#Je m'appelle {avatar_name}. Nous sommes le {today}. Je suis un avatar Azure. Ceci est une démonstration des avatars Azure.\
# A bientôt.
#"""

avatar_text = f"""
My name is {avatar_name}. Today is {today}. I am an Azure Speech Services avatar. See you soon.
"""

print(f"Text to use:\n{avatar_text}")

Text to use:

My name is Harry. Today is 2024-10-09. I am an Azure Speech Services avatar. See you soon.



## 2 Creating the avatar

In [23]:
job_id = f"jobid_{str(datetime.datetime.today().strftime('%d%b%Y%H%M%S'))}"
url = f"https://{azure_speech_region}.api.cognitive.microsoft.com/avatar/batchsyntheses/{job_id}?api-version=2024-08-01"
headers = {
    "Ocp-Apim-Subscription-Key": azure_speech_key,
    "Content-Type": "application/json"
}

# Define the JSON payload
json = {
    "inputKind": "SSML",
    "inputs": [{
        "content":
        f"<speak version='1.0' xml:lang={avatar_language}><voice name={avatar_voice}>{avatar_text}</voice></speak>"
    }],
    "avatarConfig": {
        "customized": False, # set to True if you want to use customized avatar
        "talkingAvatarCharacter": avatar_name,  # Avatar name
        "talkingAvatarStyle": avatar_style,  # Avatar style
        "videoFormat": "mp4",  # mp4 or webm, webm is required for transparent background
        "videoCodec": "h264",  # hevc, h264 or vp9, vp9 is required for transparent background; default is hevc
        "subtitleType": "soft_embedded",
        #"backgroundColor": "#FFFFFFFF",
        "backgroundImage": "https://media.cntraveler.com/photos/63e6b44a71cc5230e7788d4f/16:9/w_1920%2Cc_limit/Paris_GettyImages-601762971.jpg", 
    }
}

# Make the PUT request
response = requests.put(url, headers=headers, json=json)

# Print the response
print(response.json())

{'id': 'jobid_09Oct2024124415', 'status': 'NotStarted', 'createdDateTime': '2024-10-09T12:44:16.3347787Z', 'lastActionDateTime': '2024-10-09T12:44:16.3347959Z', 'inputKind': 'SSML', 'customVoices': {}, 'properties': {'timeToLiveInHours': 744}, 'avatarConfig': {'talkingAvatarCharacter': 'Harry', 'talkingAvatarStyle': 'business', 'videoFormat': 'Mp4', 'videoCodec': 'h264', 'subtitleType': 'soft_embedded', 'backgroundImage': 'https://media.cntraveler.com/photos/63e6b44a71cc5230e7788d4f/16:9/w_1920%2Cc_limit/Paris_GettyImages-601762971.jpg', 'bitrateKbps': 2000, 'customized': False}}


## 3 Get status

In [24]:
url = f"https://{azure_speech_region}.api.cognitive.microsoft.com/avatar/batchsyntheses/{job_id}?api-version=2024-08-01"
headers = {"Ocp-Apim-Subscription-Key": azure_speech_key}


def check_job_status(url, headers):
    while True:
        try:
            response = requests.get(url, headers=headers)
            response.raise_for_status()  # Raises an HTTPError for bad responses
            status = response.json().get("status")

            if status == 'NotStarted':
                print("Job is still running: please wait.")
            
            elif status == 'Succeeded':
                print('Done.\n')
                response = requests.get(url, headers=headers)
                print(response.json())
                break
            
            elif status == 'Failed':
                print("Job failed!\n")
                print(response.json())
                break
            
            else:
                print(f"Unexpected status: {status}\n")
                print(response.json())

        except requests.exceptions.RequestException as e:
            print(f"An error occurred: {e}")
            break  # Exit on error

        time.sleep(5)  # Wait before polling again


start = time.time()
check_job_status(url, headers)
elapsed = time.time() - start
print(f"\nCompleted in {time.strftime('%H:%M:%S.' + str(elapsed % 1)[2:15], time.gmtime(elapsed))}")

Job is still running: please wait.
Job is still running: please wait.
Done.

{'id': 'jobid_09Oct2024124415', 'status': 'Succeeded', 'createdDateTime': '2024-10-09T12:44:16.3347787Z', 'lastActionDateTime': '2024-10-09T12:44:28.5189198', 'inputKind': 'SSML', 'customVoices': {}, 'properties': {'timeToLiveInHours': 744, 'sizeInBytes': 1508320, 'durationInMilliseconds': 7370, 'succeededCount': 1, 'failedCount': 0, 'billingDetails': {'neuralCharacters': 90, 'talkingAvatarDurationSeconds': 7}}, 'avatarConfig': {'talkingAvatarCharacter': 'Harry', 'talkingAvatarStyle': 'business', 'videoFormat': 'Mp4', 'videoCodec': 'h264', 'subtitleType': 'soft_embedded', 'backgroundImage': 'https://media.cntraveler.com/photos/63e6b44a71cc5230e7788d4f/16:9/w_1920%2Cc_limit/Paris_GettyImages-601762971.jpg', 'bitrateKbps': 2000, 'customized': False}, 'outputs': {'result': 'https://stttssvcproduse2.blob.core.windows.net/batchsynthesis-output/5360d8b7261445ada93c3f12bce4c66e/jobid_09Oct2024124415/0001.mp4?skoid=0e

## 4 List all batch status

In [25]:
# Define the URL and headers
url2 = f"https://{azure_speech_region}.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-08-01"
headers2 = {"Ocp-Apim-Subscription-Key": azure_speech_key}
# Make the GET request
response2 = requests.get(url2, headers=headers2)

print(response2.json())

{'value': [{'id': 'jobid_09Oct2024124415', 'status': 'Succeeded', 'createdDateTime': '2024-10-09T12:44:16.3347787Z', 'lastActionDateTime': '2024-10-09T12:44:28.5189198', 'inputKind': 'SSML', 'customVoices': {}, 'properties': {'timeToLiveInHours': 744, 'sizeInBytes': 1508320, 'durationInMilliseconds': 7370, 'succeededCount': 1, 'failedCount': 0, 'billingDetails': {'neuralCharacters': 90, 'talkingAvatarDurationSeconds': 7}}, 'avatarConfig': {'talkingAvatarCharacter': 'Harry', 'talkingAvatarStyle': 'business', 'videoFormat': 'Mp4', 'videoCodec': 'h264', 'subtitleType': 'soft_embedded', 'backgroundImage': 'https://media.cntraveler.com/photos/63e6b44a71cc5230e7788d4f/16:9/w_1920%2Cc_limit/Paris_GettyImages-601762971.jpg', 'bitrateKbps': 2000, 'customized': False}, 'outputs': {'result': 'https://stttssvcproduse2.blob.core.windows.net/batchsynthesis-output/5360d8b7261445ada93c3f12bce4c66e/jobid_09Oct2024124415/0001.mp4?skoid=0e90ea1b-e7d5-446c-a409-5088e95a73d5&sktid=33e01921-4d64-4f8c-a055-5

## 5 Downloading the avatar video file

### Checking status

In [26]:
url = f"https://{azure_speech_region}.api.cognitive.microsoft.com/avatar/batchsyntheses/{job_id}?api-version=2024-08-01"
headers = {"Ocp-Apim-Subscription-Key": azure_speech_key}

response = requests.get(url, headers=headers)
response.json()["status"]

'Succeeded'

In [27]:
avatar_video_url = response.json()['outputs']["result"]
print(f"Video url file to download: {avatar_video_url}")

Video url file to download: https://stttssvcproduse2.blob.core.windows.net/batchsynthesis-output/5360d8b7261445ada93c3f12bce4c66e/jobid_09Oct2024124415/0001.mp4?skoid=0e90ea1b-e7d5-446c-a409-5088e95a73d5&sktid=33e01921-4d64-4f8c-a055-5bdaffd5e33d&skt=2024-10-09T12%3A37%3A30Z&ske=2024-10-15T12%3A42%3A30Z&sks=b&skv=2023-11-03&sv=2023-11-03&st=2024-10-09T12%3A39%3A30Z&se=2024-10-12T12%3A44%3A30Z&sr=b&sp=rl&sig=X8jq5df%2B6hoR1CLlKh6EfrCMxOhRFXyseyuj2eKzdR8%3D


In [28]:
print("Donwloading the video file...")
# Define the URL and headers
headers = {
    "Ocp-Apim-Subscription-Key": azure_speech_key
}
# Make the GET request to download the content
response = requests.get(avatar_video_url, headers=headers, stream=True)

with open(avatar_video_file, 'wb') as file:
    # Write in chunks to avoid memory overload
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            file.write(chunk)

# Check if the download was successful
if response.status_code == 200:
    print(f"[OK] File downloaded successfully: {avatar_video_file}\n")
    !ls $avatar_video_file -lh
else:
    print(
        f"[ERROR] Cannot download the file. Status code: {response.status_code}")

Donwloading the video file...
[OK] File downloaded successfully: avatar_video_Harry_business.mp4

-rwxrwxrwx 1 root root 1.5M Oct  9 12:44 avatar_video_Harry_business.mp4


## 6 Playing the avatar video file

In [29]:
Video.from_file(avatar_video_file, loop=False)

Video(value=b'\x00\x00\x00 ftypisom\x00\x00\x02\x00isomiso2avc1mp41\x00\x00/\xcduuid...', loop='False')

In [30]:
videolink = FileLink(path=avatar_video_file)
print("Click to download the video file:")
videolink

Click to download the video file:
