# Solution Notebook - Key Features of Speech AI

*Focus on **Video Translator** key functions: generate SRT file, manage silences during translation, surimpose audio on video*

## Instruction

### 1 - Generate subtitles

- Import dependencies and load environment variables

In [1]:
import os
import requests
from dotenv import load_dotenv

# access the environment variables from the .env file
load_dotenv('/workspace/.env')

True

- Transcribe audio into text using **ASR**

In [2]:
# ASR function
def transcribe(audio_input):
    
    # put audio file as endpoint input
    audio_file = [
        ('audio', open(audio_input, 'rb')),
    ]
    
    # get response from endpoint
    response = requests.post(
        os.environ.get('ASR_EN_US_ENDPOINT'), 
        files=audio_file, 
        headers= {
            'accept': 'application/json',
            "Authorization": f"Bearer {os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
        }
    )
    
    # return complete transcription
    output_asr = []
    if response.status_code == 200:
        
        for resp in response.json():
            
            output_sentence = []
    
            result = resp['alternatives'][0]
            output_sentence.append(result['transcript'])
            
            # extract sentence information
            for word in range(len(result['words'])):
                start_sentence = result['words'][0]['start_time']
                end_sentence = result['words'][word]['end_time']
            
            # add sart time and stop time of the sentence
            output_sentence.extend([start_sentence, end_sentence])
            output_asr.append(output_sentence)
    else:
        print("Error:", response.status_code)

    return output_asr

- Convert ms into timecode

In [3]:
# convert ms into timecode
def ms_to_timecode(x):
     
    hour, x = divmod(x, 3600000)
    minute, x = divmod(x, 60000)
    second, x = divmod(x, 1000)
    millisecond, x = divmod(x, 1)

    return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)

- Create SRT file for video subtitles

In [4]:
# create SRT file with subtitles
def generate_str_file(output_asr):
    
    lines = []
    for t in range(len(output_asr)):
        lines.append("%d" % t)
        lines.append(
            "%s --> %s" %
            (
                ms_to_timecode(output_asr[t][1]),
                ms_to_timecode(output_asr[t][2])
            )
        )
        lines.append(output_asr[t][0])
        lines.append('')
    
    return '\n'.join(lines)

- Play audio sample

In [5]:
from IPython.display import Audio

audio_input = "audio_ovhcloud_en_1.wav"
Audio(f"/workspace/workshop-mastering-speech-ai/samples/audio_samples/{audio_input}")

- Get results from **ASR**

In [6]:
# audio transcription
output_asr = transcribe(f"/workspace/workshop-mastering-speech-ai/samples/audio_samples/{audio_input}")
print("Transcription output - ASR:\n\n", output_asr)

Transcription output - ASR:

 [["Is your domain name currently with another registrar and you'd like to transfer it to Vh Cloud? You can do this via a transfer procedure. ", 1080, 8040], ['Before you begin, check out the list of prerequisites available in the description of this video. ', 12320, 17240], ["Is it done? Perfect. Let's start. ", 18040, 20200], ["1st, it's important to ensure that the information associated with the domain name is up to date. If not, please contact the current domain name registrar and correct them. ", 21320, 31080], ['The 2nd step is to unlock the domain name. This operation is carried out through the current domain name registrar. ', 33280, 39960], ['If you are not sure how to perform this stepp, please contact their customer support team for assistance in completing this step. ', 40800, 46680], ['Once unlocked, you will receive a transfer code. ', 49040, 51600], ['This code is sometimes referred to as different names such as transfer code, Auth code, Aut

- Generate **SRT file**

In [7]:
# subtitles generation
with open(f"/workspace/subtitles_{audio_input[:-4]}.srt", 'w') as f:
    f.write(generate_str_file(output_asr))
    print("Generated subtitles - SRT file:\n\n", generate_str_file(output_asr))

Generated subtitles - SRT file:

 0
00:00:01,080 --> 00:00:08,040
Is your domain name currently with another registrar and you'd like to transfer it to Vh Cloud? You can do this via a transfer procedure. 

1
00:00:12,320 --> 00:00:17,240
Before you begin, check out the list of prerequisites available in the description of this video. 

2
00:00:18,040 --> 00:00:20,200
Is it done? Perfect. Let's start. 

3
00:00:21,320 --> 00:00:31,080
1st, it's important to ensure that the information associated with the domain name is up to date. If not, please contact the current domain name registrar and correct them. 

4
00:00:33,280 --> 00:00:39,960
The 2nd step is to unlock the domain name. This operation is carried out through the current domain name registrar. 

5
00:00:40,800 --> 00:00:46,680
If you are not sure how to perform this stepp, please contact their customer support team for assistance in completing this step. 

6
00:00:49,040 --> 00:00:51,600
Once unlocked, you will receive a transfe