# About

This Python notebook demonstrates reading a sample HTML table containing French video captions and then synthesizing audio for the captions using Watson Text to Speech.

This notebook is a sample to support a paper presentation at CASCONxEVOKE 2021.

See:
- [CASCONxEVOKE 2021](https://pheedloop.com/casconevoke2021/site/home)
- [Presentation](https://pheedloop.com/casconevoke2021/site/sessions/?id=SESPZ87C5K5VZKT28)
- [Samples GitHub repo](https://github.com/spackows/CASCON-2021_Processing_video)

# Step 1: Download translated transcript text

An translated transcript for a sample product video is available here: [product_video_French.html](https://raw.githubusercontent.com/spackows/CASCON-2021_Processing_video/main/sample-product-video/product_video_French.html)

In this step, download that transcript file to the notebook working directory.

In [1]:
# Download the file
import urllib.request
transcript_url = "https://raw.githubusercontent.com/spackows/CASCON-2021_Processing_video/main/sample-product-video/product_video_French.html"
transcript_filename = "product_video_French.html"
urllib.request.urlretrieve( transcript_url, transcript_filename )

('product_video_French.html', <http.client.HTTPMessage at 0x7fce3c4cd340>)

In [2]:
# View the contents of the working directory
!ls

product_video_French.html


# Step 2: Read the HTML and extract the French captions

Each row in the transcript table contains two things:
- The timestamp of the caption in the English video
- One sentence translated into French by professional translators

In [24]:
with open( transcript_filename, encoding="utf8" ) as file:
        html = file.read()
print( html )

<html>
<body>
<table>
<tr>
<td>00:00</td>
<td>Cette vidéo vous explique comment essayer gratuitement IBM Cloud Pak for Data as a Service.</td>
</tr>
<tr>
<td>00:06</td>
<td>IBM Cloud Pak for Data as a Service permet d'accéder à Watson Studio, à Watson Knowledge Catalog, à Data Refinery, à des fonctions d'apprentissage automatique et d'apprentissage en profondeur, à des modèles de reconnaissance visuelle et à des tableaux de bord.</td>
</tr>
<tr>
<td>00:22</td>
<td>Vous pouvez vous inscrire sur <a href="https://dataplatform.cloud.ibm.com">dataplatform.cloud.ibm.com</a> pour essayer le service gratuitement.</td>
</tr>
<tr>
<td>00:28</td>
<td>Lorsque vous vous inscrivez pour obtenir un compte Cloud Pak for Data, vous êtes automatiquement inscrit pour recevoir un compte IBM Cloud gratuit.</td>
</tr>
</table>
</body>
</html>



In [68]:
from bs4 import BeautifulSoup

def getCaptions( html ):
    soup = BeautifulSoup( html )
    tr_arr = soup.find("table").find_all("tr")
    captions_arr = []
    for tr in tr_arr:
        td_arr = tr.find_all("td")
        captions_arr.append( { "ts" : td_arr[0].text, "caption" : td_arr[1].text } )
    return captions_arr

In [69]:
import json
captions_arr = getCaptions( html )
print( json.dumps( captions_arr, indent=2, ensure_ascii=False ) )

[
  {
    "ts": "00:00",
    "caption": "Cette vidéo vous explique comment essayer gratuitement IBM Cloud Pak for Data as a Service."
  },
  {
    "ts": "00:06",
    "caption": "IBM Cloud Pak for Data as a Service permet d'accéder à Watson Studio, à Watson Knowledge Catalog, à Data Refinery, à des fonctions d'apprentissage automatique et d'apprentissage en profondeur, à des modèles de reconnaissance visuelle et à des tableaux de bord."
  },
  {
    "ts": "00:22",
    "caption": "Vous pouvez vous inscrire sur dataplatform.cloud.ibm.com pour essayer le service gratuitement."
  },
  {
    "ts": "00:28",
    "caption": "Lorsque vous vous inscrivez pour obtenir un compte Cloud Pak for Data, vous êtes automatiquement inscrit pour recevoir un compte IBM Cloud gratuit."
  }
]


# Step 3: Get Credentials for Watson Text to Speech

1. Create a free (Lite plan) instance of the Watson Text to Speech to Text service in the IBM Cloud catalog: [Watson Text to Speech](https://cloud.ibm.com/catalog/services/text-to-speech)
2. On the Service credentials tab of your Watson Text to Speech service instance, generate new credentials and then copy the apikey and the url[1] into the code cell below

[1] Note, the url should be of the form: `https://api.<region>.text-to-speech.watson.cloud.ibm.com/instances/<unique-instance-ID>`

In [70]:
tts_apikey = ""
tts_url = ""

# Step 4: Test sending French text to Watson Text to Speech

In this step, send the audio to Watson Speech to Text for processing.

See:
- [Watson Text to Speech API documentation](https://cloud.ibm.com/apidocs/text-to-speech?code=python)
- [Authentication](https://cloud.ibm.com/apidocs/text-to-speech?code=python#authentication)
- [`synthesize`](https://cloud.ibm.com/apidocs/text-to-speech?code=python#synthesize)

In [None]:
# Install the library
!pip install --upgrade "ibm-watson>=5.3.0"

In [72]:
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator( tts_apikey )
text_to_speech = TextToSpeechV1( authenticator=authenticator )
text_to_speech.set_service_url( tts_url )

In [76]:
# Synthesize one caption
audio_filename_0 = re.sub( "\:", "_", captions_arr[0]["ts"] ) + ".mp3"
caption_0 = captions_arr[0]["caption"]
with open( audio_filename_0, 'wb') as audio_file:
    audio_file.write( text_to_speech.synthesize( caption_0, voice="fr-CA_LouiseV3Voice", accept="audio/mp3" ).get_result().content )

In [77]:
# View the contents of the working directory
!ls

00_00.mp3  product_video_French.html


In [78]:
# Play the audio
import IPython
IPython.display.Audio( audio_filename_0 )

# Step 5 Generate translation audio

To generate translated audio for our English-produce video, we follow these steps:
1. Read captions from translated transcript table returned from (human) professional translators
2. Synthesize audio for each translated caption and size in a short audio file
3. Recombine the short audio files, adjusting timestamps if necessary (eg. if the translated audio is longer than the corresponding English audio)
4. Adjust video length, if necessary (eg. if one or more sections of video need to be longer because the corresponding audio is longer)

Step 2 is demonstrated below.

In [82]:
# Now, use a loop to synthesize all captions
audio_filenames_arr = []
for caption_obj in captions_arr:
    audio_filename = re.sub( "\:", "_", caption_obj["ts"] ) + ".mp3"
    audio_filenames_arr.append( audio_filename )
    caption = caption_obj["caption"]
    with open( audio_filename, 'wb') as audio_file:
        audio_file.write( text_to_speech.synthesize( caption, voice="fr-CA_LouiseV3Voice", accept="audio/mp3" ).get_result().content )

In [83]:
# View the contents of the working directory
!ls

00_00.mp3  00_06.mp3  00_22.mp3  00_28.mp3  product_video_French.html


In [84]:
IPython.display.Audio( audio_filenames_arr[0] )

In [85]:
IPython.display.Audio( audio_filenames_arr[1] )

In [86]:
IPython.display.Audio( audio_filenames_arr[2] )

In [87]:
IPython.display.Audio( audio_filenames_arr[3] )