# Text to Speech avatar

## From text to speech with a video avatar provided by Azure Speech Services
Custom text to speech avatar allows you to create a customized, one-of-a-kind synthetic talking avatar for your application. With custom text to speech avatar, you can build a unique and natural-looking avatar for your product or brand by providing video recording data of your selected actors. If you also create a custom neural voice for the same actor and use it as the avatar's voice, the avatar will be even more realistic.

<img src="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/media/custom-avatar-workflow.png#lightbox">

> https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/what-is-custom-text-to-speech-avatar 
> https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-speech-announces-public-preview-of-text-to-speech/ba-p/3981448

In [6]:
import datetime
import json
import requests
import sys
import time

from ipywidgets import Video
from IPython.display import display
from moviepy.editor import VideoFileClip
from pathlib import Path

In [7]:
sys.version

'3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]'

In [17]:
dt = datetime.datetime.today().strftime('%d-%b-%Y %H:%M:%S')
print(f"Today is {dt}")

Today is 16-Nov-2023 19:32:38


In [9]:
logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format="[%(asctime)s] %(message)s",
    datefmt="%m/%d/%Y %I:%M:%S %p %Z",
)
logger = logging.getLogger(__name__)

In [10]:
# Azure Speech services informations
azure_speech_key = "tobecompleted"
azure_speech_region = "tobecompleted"

In [11]:
service_host = "customvoice.api.speech.microsoft.com"  # Do not change

## Functions

In [12]:
def submit_synthesis(prompt):
    url = f"https://{azure_speech_region}.{service_host}/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar"

    header = {
        "Ocp-Apim-Subscription-Key": azure_speech_key,
        "Content-Type": "application/json",
    }

    payload = {
        "displayName": "Simple avatar synthesis",
        "description": "Simple avatar synthesis description",
        "textType": "PlainText",
        "synthesisConfig": {
            "voice": "en-US-JennyNeural",
        },
        "customVoices": {
            # "YOUR_CUSTOM_VOICE_NAME": "YOUR_CUSTOM_VOICE_ID"
        },
        "inputs": [
            {
                "text": prompt,
            },
        ],
        "properties": {
            "customized": False,  # set to True if you want to use customized avatar
            "talkingAvatarCharacter": "lisa",  # talking avatar character
            "talkingAvatarStyle": "graceful-sitting",  # talking avatar style, required for prebuilt avatar, optional for custom avatar
            "videoFormat": "webm",  # mp4 or webm, webm is required for transparent background
            "videoCodec": "vp9",  # hevc, h264 or vp9, vp9 is required for transparent background; default is hevc
            "subtitleType": "soft_embedded",
            "backgroundColor": "transparent",
        },
    }

    response = requests.post(url, json.dumps(payload), headers=header)

    if response.status_code < 400:
        logger.info("Batch avatar synthesis job submitted successfully")
        logger.info(f'Job ID: {response.json()["id"]}')
        return response.json()["id"]

    else:
        logger.error(f"Failed to submit batch avatar synthesis job: {response.text}")

In [13]:
def get_synthesis(job_id):
    global avatar_url
    url = f"https://{azure_speech_region}.{service_host}/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar/{job_id}"

    header = {"Ocp-Apim-Subscription-Key": azure_speech_key}

    response = requests.get(url, headers=header)

    if response.status_code < 400:
        logger.debug("Get batch synthesis job successfully")
        logger.debug(response.json())

        status = response.json()["status"]

        if status == "Succeeded":
            avatar_url = response.json()["outputs"]["result"]
            logger.info(f"Batch synthesis job succeeded, download URL: {avatar_url}")

        return status
    else:
        logger.error(f"Failed to get batch synthesis job: {response.text}")

In [14]:
def list_synthesis_jobs(skip: int = 0, top: int = 100):
    """List all batch synthesis jobs in the subscription"""

    url = f"https://{azure_speech_region}.{service_host}/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar?skip={skip}&top={top}"

    header = {"Ocp-Apim-Subscription-Key": azure_speech_key}

    response = requests.get(url, headers=header)

    if response.status_code < 400:
        logger.info(
            f'List batch synthesis jobs successfully, got {len(response.json()["values"])} jobs'
        )
        logger.info(response.json())
    else:
        logger.error(f"Failed to list batch synthesis jobs: {response.text}")

## Test

In [20]:
prompt = f"""
I am Lisa, your avatar powered by Azure Speech Services.
Today is {dt}.

Let me explain you what is Azure Open AI service.

Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-4, GPT-3.5-Turbo, and Embeddings model series. In addition, the new GPT-4 and GPT-3.5-Turbo model series have now reached general availability. These models can be easily adapted to your specific task including but not limited to content generation, summarization, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio.

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use, building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers.

To learn more go to https://azure.microsoft.com/en-us/products/ai-services/ai-speech

Thank you and have a good day.
"""

In [21]:
print(prompt)


I am Lisa, your avatar powered by Azure Speech Services.
Today is 16-Nov-2023 19:32:38.

Let me explain you what is Azure Open AI service.

Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-4, GPT-3.5-Turbo, and Embeddings model series. In addition, the new GPT-4 and GPT-3.5-Turbo model series have now reached general availability. These models can be easily adapted to your specific task including but not limited to content generation, summarization, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio.

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmf

## Avatar batch generation

In [22]:
start = time.time()

job_id = submit_synthesis(prompt)

if job_id is not None:
    while True:
        status = get_synthesis(job_id)
        if status == "Succeeded":
            logger.info("Done! Azure batch avatar synthesis job succeeded.")
            elapsed = time.time() - start
            print("Elapsed time: " + time.strftime("%H:%M:%S.{}".format(str(elapsed % 1)[2:])[:15],
                                                   time.gmtime(elapsed)))

            break
        elif status == "Failed":
            logger.error("Failed")
            break
        else:
            logger.info(f"Please wait. Status: [{status}]")
            time.sleep(30)

[11/16/2023 07:33:33 PM UTC] Batch avatar synthesis job submitted successfully
[11/16/2023 07:33:33 PM UTC] Job ID: 85a1d905-48bc-4ade-bda3-0e69357032de
[11/16/2023 07:33:33 PM UTC] Please wait. Status: [NotStarted]
[11/16/2023 07:34:03 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:34:33 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:35:03 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:35:33 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:36:04 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:36:34 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:37:04 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:37:34 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:38:04 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:38:34 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:39:04 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:39:34 PM UTC] Please wait. Status: [Running]
[11/16/2023 07:40:04 PM UTC] Please wait. Status: [Running]
[11/

## Avatar video file

In [24]:
print(f"\033[1;31;34mThis is the prompt to speak:\n {prompt}")

[1;31;34mThis is the prompt to speak:
 
I am Lisa, your avatar powered by Azure Speech Services.
Today is 16-Nov-2023 19:32:38.

Let me explain you what is Azure Open AI service.

Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-4, GPT-3.5-Turbo, and Embeddings model series. In addition, the new GPT-4 and GPT-3.5-Turbo model series have now reached general availability. These models can be easily adapted to your specific task including but not limited to content generation, summarization, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio.

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI have significant potential benefits, but without careful design and thoughtful mitigations, such models have the poten

In [25]:
# Save avatar video

avatar_file = (
    "azure_avatar_" + str(datetime.datetime.today().strftime("%d%b%Y_%H%M%S")) + ".mp4"
)
VideoFileClip(avatar_url).write_videofile(avatar_file, verbose=False, logger=None)




In [26]:
# Playing the avatar video

Video.from_file(avatar_file)

Video(value=b'\x00\x00\x00 ftypisom\x00\x00\x02\x00isomiso2avc1mp41\x00\x00\x00\x08free...')