<a href="https://colab.research.google.com/github/saad688/hydrogen-template/blob/main/WhisperX.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Transcription and Translation

AdvancedCI.

For technical assistance contact beining@chineseaci.com .

## Step 1: Install

Execute all steps.

In [1]:
#@title Step 1.1: GPU Model
!nvidia-smi

Fri Apr 25 15:13:54 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   46C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

You need a GPU with minimal 11GiB VRAM: if not, turn it on under "Runtime - Change Runtime Type".

In [18]:
#@title Step 1.2 Install packages - takes ~2 mins
%pip install srt requests tqdm googletrans==4.0.0rc1 git+https://github.com/cnbeining/whisperX-silero.git httpx aiometer --quiet
!wget -q https://chineseaci.com/tools/megadl ./megadl
!chmod +x ./megadl

  Preparing metadata (setup.py) ... [?25l[?25hdone


In [17]:
!pip uninstall langsmith

Found existing installation: langsmith 0.3.11
Uninstalling langsmith-0.3.11:
  Would remove:
    /usr/local/lib/python3.11/dist-packages/langsmith-0.3.11.dist-info/*
    /usr/local/lib/python3.11/dist-packages/langsmith/*
Proceed (Y/n)? y
  Successfully uninstalled langsmith-0.3.11


In [19]:
# https://github.com/m-bain/whisperX/issues/118
!pip install pyannote-audio -U --quiet

import locale
locale.getpreferredencoding = lambda: "UTF-8"

## Step 2: Setup WhisperX

WhisperX has 2 ways for transcription:

1. Run the whole file in Whisper, like the original repo;
2. Run Voice Activity Detection(VAD), and only run Whisper for sections with human voice.

We use approach 2 by default.

### Variables

See [original repo](https://github.com/m-bain/whisperX/blob/main/whisperx/transcribe.py#L574-L616) for uncovered variables. They are setup as default value.

- `vad_filter`: Run Voice Activity Detection before Whisper Sound To Text - author (reported improved performance)[https://arxiv.org/abs/2303.00747].
- `hf_token`: If VAD or diarization(not used in this notebook) is needed a valid Huggingface token on an account that has accepted EULA for models is required: Follow guidance at [original repo](https://github.com/m-bain/whisperX#voice-activity-detection-filtering--diarization) and fill in `hf_token` with generated Huggingface API key.
- `parallel_bs`: Number of Whisper tasks to execute in parallel. Only valid if `vad_filter` is set to True. VAD shall cut audio input into multiple smaller segments - this variable controls number of segments to run at the same time.
- `transcription_cutoff_char`: Max number of chars per line.
- `transcription_sentence_interval`: We try to merge short sentences into longer ones if they are close enough to avoid having subtitle flashing on screen for too short - but shall separate sentences if no voice activity could be detected for too long. This setup the max interval for consideration.
- `translation_thread`: Number of requests to send out at the same time.
- `translation_lines_per_request`: The author shall send multiple lines to translation engine in the same API call to provide some context for increased quality - but some engines may have much lower input length limit.

In [24]:
#@title Step 2.1: Import model and setup parameters
# setup model
import torch

vad_filter = False #@param {type:"boolean"}
hf_token = 'hf_' #@param {type:"string"}
parallel_bs = 4 #@param {type:"integer"}
temperature = 0 #@param {type:"integer"}
temperature_increment_on_fallback = 0.2 #@param {type:"number"}
interpolate_method = 'nearest' #@param {choices:["nearest", "linear", "ignore"]}

align_extend = 2 #@param {type:"integer"}
align_from_prev = True #@param {type:"boolean"}


transcription_cutoff_char = 120 #@param {type:"integer"}
transcription_sentence_interval = 1.5 #@param {type:"number"}

translation_thread = 8 #@param {type:"integer"}
translation_lines_per_request = 10 #@param {type:"integer"}
device = "cuda"
#device = "cuda" if torch.cuda.is_available() else "cpu"

vad_model = 'silero' #@param {choices:["silero", "pya"]}

import whisperx

In [25]:
#@title Step 2.2: Select and load Model.
import whisperx
if vad_filter:
    from whisperx.vad import VADSegmentPipeline
    vad_pipeline = VADSegmentPipeline(model_name = vad_model,
                                          device = device,
                                          hf_token = hf_token,
                                          chunk_length = 30)


# Whisper is on large-v2 by default
model_name = 'large-v2' #@param ["tiny", "small", "medium", "large-v2", "tiny.en", "small.en", "medium.en"]

model = whisperx.load_model(model_name)

100%|█████████████████████████████████████| 2.87G/2.87G [00:55<00:00, 55.7MiB/s]
  checkpoint = torch.load(fp, map_location=device)


[link text](https://)Colab should have enough VRAM for any model selected on *any* GPU provided, including `large` - expect ~13G of VRAM usage for `large` model. Prefer `medium` than `medium.en` per [original paper](https://cdn.openai.com/papers/whisper.pdf).

Disconnect and reconnect if you change the model in the middle of execution(and only do so if you know what you are doing) to avoid VRAM OOM. All progress and uploaded/generated files shall be lost.

Select desired model and run the cell above. It takes ~3 mins to download the `medium` model: move to Step 3 while waiting.

## Step 3: Prepare audio for transcription

### Step 3.1: Convert video to audio

**Audio file has to be in WAV format until https://github.com/m-bain/whisperX/issues/118 is fixed.**

While you are waiting(download takes ~2 mins) convert original video to MP3 with `FFmpeg` by running:

`ffmpeg -i Air_Crash_Investigation_S22E071.mp4 -vn -c:a -ar 16000 aci.wav`

or use GUI tools like

- `Maruko Toolbox` (Windows only): https://maruko.appinn.me/
- `Handbrake` for all platforms: https://handbrake.fr/downloads.php .

_ACICFG has sponsorship relationship with `Maruko Toolbox`._

#### Alternatively:

FFmpeg is installed on Colab: Use the following command to convert on the colab instance:
`ffmpeg -i aci.mp4 -vn -c:a -ar 16000 aci.wav`


### Step 3.2: Upload audio file to Colab

Rename the file to something benign - without space or any special character.

Click the "file" icon on the left, click the "upload to session storage" button to upload the audio file. Maximum upload speed is ~1.1MB/s.

Input the exact name of your uploaded file to the field below. Select the main language of the audio.

**NOTE: All uploaded and generated files are strictly for this session and shall be deleted when you disconnect from the instance - no recovery possible!**

Type in the name of the audio file you uploaded below, should be ending with `.wav` and select language of your audio file.

In [29]:
audio_file_name = '002-111.mp3' #@param {type:"string"}
audio_file_language = 'arabic' #@param ['English', 'chinese', 'german', 'spanish', 'russian', 'korean', 'french', 'japanese', 'portuguese', 'turkish', 'polish', 'catalan', 'dutch', 'arabic', 'swedish', 'italian', 'indonesian', 'hindi', 'finnish', 'vietnamese', 'hebrew', 'ukrainian', 'greek', 'malay', 'czech', 'romanian', 'danish', 'hungarian', 'tamil', 'norwegian', 'thai', 'urdu', 'croatian', 'bulgarian', 'lithuanian', 'latin', 'maori', 'malayalam', 'welsh', 'slovak', 'telugu', 'persian', 'latvian', 'bengali', 'serbian', 'azerbaijani', 'slovenian', 'kannada', 'estonian', 'macedonian', 'breton', 'basque', 'icelandic', 'armenian', 'nepali', 'mongolian', 'bosnian', 'kazakh', 'albanian', 'swahili', 'galician', 'marathi', 'punjabi', 'sinhala', 'khmer', 'shona', 'yoruba', 'somali', 'afrikaans', 'occitan', 'georgian', 'belarusian', 'tajik', 'sindhi', 'gujarati', 'amharic', 'yiddish', 'lao', 'uzbek', 'faroese', 'haitian creole', 'pashto', 'turkmen', 'nynorsk', 'maltese', 'sanskrit', 'luxembourgish', 'myanmar', 'tibetan', 'tagalog', 'malagasy', 'assamese', 'tatar', 'hawaiian', 'lingala', 'hausa', 'bashkir', 'javanese', 'sundanese']

## Step 4: Transcribe

Execute the steps below.

Expected speed for `medium.en` model is ~5X on T4 - aka 45 min episode should take ~8 mins. Larger model shall take longer to process.

Wait till the process is finished.

If you have enough VRAM you can keep Whisper model in VRAM; or, if you have enough RAM, use `model = model.cpu()` to offload to CPU in case you want to come back again.

In [30]:
#@title Step 4.1 Transcribe the audio file
# Do the work
# Speed for medium.en is 5X on T4 - aka 45 min episode should take ~8 mins
#result = model.transcribe(audio_file_name, verbose=True, language=audio_file_language)

audio_path = './' + audio_file_name


if vad_filter:
    if parallel_bs > 1:
        print("Performing VAD and parallel transcribing ...")
        result = whisperx.transcribe_with_vad_parallel(model, audio_path, vad_pipeline, temperature=temperature, batch_size=parallel_bs, language=audio_file_language, task='transcribe', verbose=True)
    else:
        print("Performing VAD...")
        result = whisperx.transcribe_with_vad(model, audio_path, vad_pipeline, temperature=temperature, verbose=True, language=audio_file_language)
else:
    print("Performing transcription...")
    result = whisperx.transcribe(model, audio_path, temperature=temperature, verbose=True, language=audio_file_language)



Performing transcription...
[00:00.000 --> 00:05.000]  بسم الله الرحمن الرحيم
[00:06.000 --> 00:14.000]  الف لا مين
[00:15.000 --> 00:18.000]  ذلك الكتاب لا ريب فيه
[00:18.000 --> 00:21.000]  هدى للمتقين
[00:21.000 --> 00:26.000]  الذين يؤمنون بالغيب ويقيمون الصلاة
[00:26.000 --> 00:32.000]  ومما رزقناهم ينفقون
[00:32.000 --> 00:44.000]  والذين يؤمنون بما أنزل إليك وما أنزل من قبلك وبالآخرة هم يوقنون
[00:44.000 --> 00:56.000]  109 أولئك على هدى من ربهم وأولئك هم المفلحون
[00:56.000 --> 01:09.000]  110 إن الذين كفروا سواء عليهم أأنذرتهم أم لم تنذرهم لا يؤمنون
[01:09.000 --> 01:21.000]  143. ختم الله على قلوبهم وعلى سمعهم وعلى أبصارهم غشاوة ولهم عذاب عظيم
[01:21.000 --> 01:32.000]  144. ومن الناس من يقول آمنا بالله وباليوم الآخر وما هم بمؤمنين
[01:32.000 --> 01:43.800]  109 يخادعون الله والذين آمنوا وما يخدعون إلا أنفسهم وما يشعرون
[01:43.800 --> 01:56.000]  110 في قلوبهم مرض فزادهم الله مرضا ولهم عذاب أليم بما كانوا يكذبون
[01:56.000 --> 02:05.000]  109 وإذا قيل لهم لا تفسدوا في الأرض ق

In [31]:
#@title Step 4.2 (Optional) Offload Whisper model to free up GPU memory
# offload the model to free up GPU memory - we only got 16G VRAM on Colab with T4 but should be enough for medium model and VAD
# model = model.cpu()
del model

## Step 5: Forced Alignment

With transcription and _better_ timestamp we now use forced alignment to acquire per-word(and per-char) timestamp and redo sentence segmentation manually.

Execute the steps below.

In [32]:
#@title Step 5.1 Load the alignment model
language_code = whisperx.tokenizer.TO_LANGUAGE_CODE.get(result["language"], 'ar')
model_alignment, metadata_alignment = whisperx.alignment.load_align_model(language_code=language_code, device=device)

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


preprocessor_config.json:   0%|          | 0.00/158 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.56k [00:00<?, ?B/s]



vocab.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

In [35]:
#@title Step 5.2 Conduct forced alignment

result_aligned = whisperx.alignment.align(result["segments"], model_alignment, metadata_alignment, audio_path, device,
                        extend_duration=align_extend, start_from_previous=align_from_prev, interpolate_method=interpolate_method)


  word_segments_arr["segment-text-start"] = word_grp["segment-text-start"].transform(min)
  word_segments_arr["segment-text-end"] = word_grp["segment-text-end"].transform(max)
  segments_arr["subsegment-idx-start"] = seg_grp_dup["subsegment-idx-start"].transform(min)
  segments_arr["subsegment-idx-end"] = seg_grp_dup["subsegment-idx-end"].transform(max)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  segments_arr['start'].fillna(pd.Series([x['start'] for x in transcript]), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For ex

In [36]:
print(result_aligned)

{'segments': [{'start': 0.7421203438395416, 'end': 4.73352435530086, 'text': ' بسم الله الرحمن الرحيم', 'word-segments':                    start       end     score  segment-text-start  \
subsegment-idx                                                     
0               0.742120  1.103152  0.625964                   1   
0               1.143266  1.845272  0.753358                   5   
0               1.885387  3.008596  0.784563                  10   
0               3.048711  4.733524  0.786936                  17   

                segment-text-end  word-idx  
subsegment-idx                              
0                              4         0  
0                              9         1  
0                             16         2  
0                             23         3  , 'char-segments':     subsegment-idx  word-idx char     start       end     score  \
0                0         0            NaN       NaN       NaN   
1                0         0    ب  0.742120  0.8

In [None]:
#@title Step 5.3 Setup timecode reformatting

import copy

def word_segment_to_sentence(segments, max_text_len=80):
    """
    Convert word segments to sentences.
    :param segments: [{"text": "Hello,", "start": 1.1, "end": 2.2}, {"text": "World!", "start": 3.3, "end": 4.4}]
    :type segments: list of dicts
    :return: Segments, but with sentences instead of words.
    :rtype: list of dicts  [{"text": "Hello, World!", "start": 1.1, "end": 4.4}]
    """
    end_of_sentence_symbols = tuple(['.', '!', '?', ',', ';', ':'])
    sentence_results = []

    current_sentence = {"text": "", "start": 0, "end": 0}
    current_sentence_template = {"text": "", "start": 0, "end": 0}

    for segment in segments:
        if current_sentence["text"] == "":
            current_sentence["start"] = segment["start"]
        current_sentence["text"] += ' ' + segment["text"] + ' '
        current_sentence["end"] = segment["end"]
        if segment["text"][-1] in end_of_sentence_symbols:
            current_sentence["text"] = current_sentence["text"].strip()
            sentence_results.append(copy.deepcopy(current_sentence))
            current_sentence = copy.deepcopy(current_sentence_template)
    return sentence_results



def sentence_segments_merger(segments, max_text_len=80, max_segment_interval=2):
    """
    Merge sentence segments to one segment, if the length of the text is less than max_text_len.
    :param segments: [{"text": "Hello, World!", "start": 1.1, "end": 4.4}, {"text": "Hello, World!", "start": 1.1, "end": 4.4}]
    :type segments: list of dicts
    :param max_text_len: Max length of the text
    :type max_text_len: int
    :return: Segments, but with merged sentences.
    :rtype: list of dicts  [{"text": "Hello, World! Hello, World!", "start": 1.1, "end": 4.4}]
    """

    merged_segments = []
    current_segment = {"text": "", "start": 0, "end": 0}
    current_segment_template = {"text": "", "start": 0, "end": 0}

    for segment in segments:
        if current_segment["text"] == "":
            current_segment["start"] = segment["start"]


        if segment["start"] - current_segment["end"] < max_segment_interval and \
                len(current_segment["text"] + " " + segment['text']) < max_text_len:
            print('merge')
            current_segment["text"] += ' ' + segment["text"]
            current_segment["end"] = segment["end"]
        else:
            current_segment["text"] = current_segment["text"].strip()
            merged_segments.append(copy.deepcopy(current_segment))
            current_segment = copy.deepcopy(segment)

    return merged_segments


## Step 6: Collect results

### Step 6.1: Convert transcription to SRT

Execute the 3 cells below to peek the result.

In [None]:
#@title Import packages
import srt
from datetime import timedelta

In [None]:
#@title Create SRT with transcription
result_srt_list = []
for i, v in enumerate(result_merged):
    result_srt_list.append(srt.Subtitle(index=i, start=timedelta(seconds=v['start']), end=timedelta(seconds=v['end']), content=v['text'].strip()))

composed_transcription = srt.compose(result_srt_list)

In [None]:
#@title Optional: Peek the transcription SRT file
print(composed_transcription)

1
00:00:01,300 --> 00:00:03,060
or high above the Himalayas.

2
00:00:06,421 --> 00:00:13,423
The windshield of Sichuan Airlines flight 8633 cracks. It's on the inside. That's not good.

3
00:00:16,584 --> 00:00:16,944
And then,

4
00:00:25,795 --> 00:00:28,296
It's almost like having a bomb explode right beside you.

5
00:00:28,716 --> 00:00:33,177
The decompression forces the first officer halfway out of the plane. Two!

6
00:00:34,438 --> 00:00:40,879
In freezing temperatures and rapidly running out of oxygen, the captain needs to find a way to get his plane to safety.

7
00:00:41,319 --> 00:00:48,321
Mutable consciousness is probably about 40 seconds. The lives of everyone on board now rest in the hands of one man.

8
00:01:16,959 --> 00:01:23,263
High above central China, Szechuan Airlines Flight 8633 reaches cruising altitude.

9
00:01:26,265 --> 00:01:26,665
Level at 3, 2, 1.

10
00:01:37,132 --> 00:01:43,373
45-year-old Captain Liu Chuanxian is a highly experienced former milit

### Step 6.2: Generate and download transcribed srt

Input desired name of the file for transcribed srt below, and execute the 2 cells below.

In [None]:
#@title Step 6.2 Name of the transcribed srt to generate, should be ending with `.srt`

transcribed_srt_name = 'transcribed.srt' #@param {type:"string"}


In [None]:
#@title Write the SRT
with open(transcribed_srt_name, 'w') as f:
    f.write(composed_transcription)

You should see a `srt` file generated with desired name: right click and download the file.

## Step 7: Translate

### Step 7.1: Execute translation

We will use DeepL's undocumented API for translation.

Execute the 3 cells below.

In [None]:
#@title 7.1.1 Import packages


#import requests
import random
from functools import partial
from hashlib import md5
from tqdm.notebook import tqdm
from tqdm.contrib.concurrent import process_map  # or thread_map
from googletrans import Translator
# from joblib import Parallel, delayed
import aiometer
import httpx
from time import time

%autoawait asyncio

In [None]:
#@title 7.1.2 Setup Variables: Thread Number, Source Language, Target Language

result_list_translated = []
result_list_assembled = []
#s = requests.Session()
session_async = httpx.AsyncClient()

google_translator = Translator()


chunk_size = "8" #@param [1, 2, 4, 8, 16, 24, 32, 64, 128]
thread_num = 16 #@param [1, 2, 4, 6, 8, 12, 16]
source_lang = "EN" #@param ["auto", "BG", "CS", "DA", "DE", "EL", "EN", "EN-GB", "EN-US", "ES", "ET", "FI", "FR", "HU", "ID", "IT", "JA", "LT", "LV", "NL", "PL", "PT", "PT-BR", "PT-PT", "RO", "RU", "SK", "SL", "SV", "TR", "UK", "ZH"]
target_lang = "ZH" #@param ["BG", "CS", "DA", "DE", "EL", "EN", "EN-GB", "EN-US", "ES", "ET", "FI", "FR", "HU", "ID", "IT", "JA", "LT", "LV", "NL", "PL", "PT", "PT-BR", "PT-PT", "RO", "RU", "SK", "SL", "SV", "TR", "UK", "ZH"]
translation_engine = "deepl_gmx" #@param ["deepl_gmx", "py-googletrans", "deepl_backup", "baidu-api"]

baidu_app_id = '' #@param {type:"string"}
baidu_secret_key = '' #@param {type:"string"}

# Hacking Google results
if target_lang == "ZH" and translation_engine == "py-googletrans":
    target_lang = "zh-cn"


- `source_lang`, `target_lang`: Language code, See [ISO_639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
- `translation_engine`:
  - `deepl_gmx`: Powered by GMX and DeepL. Fast but quality is a bit lower than original DeepL. Data governance: Germany
  - `py-googletrans`: Powered by unofficial Google Translate AJAX API. Data governance: US
  - `deepl_backup`: Powered by ACICFG with DeepL. Same quality as original DeepL. Reach out to us if you want to use this API. Data governance: Canada, Germany and somewhere random although no log is kept on ACICFG's server.
  - `baidu-api`: Powered by [Baidu Fanyi](http://api.fanyi.baidu.com/). Data governance: Mainland China
    - `baidu_app_id` and `baidu_secret_key` are optional - only required when you use `baidu-api`.

In [None]:
#@title 7.1.3 Setup Translation Engine

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]


def translate_via_googletrans(content):
    try:
        resp = google_translator(content, src=source_lang.lower(), dest=target_lang.lower()).text
    except Exception as e:
        print(content)
        print(e)
        return ''

    return resp


async def translate_via_gmx(content):
    try:
        resp = await session_async.get('https://search.gmx.com/translate', params={"q": content, "source": source_lang.lower(), "target": target_lang.lower(), "lang": 'en', "reload": "true"})
        resp_json = resp.json()
    except Exception as e:
        print(content)
        print(e)
        return ''

    return resp_json['Translation']


async def translate_via_deepl_backup(content):
    try:
        resp = await session_async.post('https://deepl.cnbeining.com/translate', json={"text": content, "source_lang": source_lang, "target_lang": target_lang})
        resp_json = resp.json()
    except Exception as e:
        print(content)
        print(e)
        if resp['code'] != 200:
            print('Error calling API: ' + resp['msg'])
        return ''

    return resp_json['result']['texts'][0]['text']


async def translate_via_baidu(content):
    # app_id = '20221011001385250'
    # secret_key = 'J1qY4VXuCF9QOeumC_R4'
    salt = random.randint(32768, 65536)
    temp_str = app_id + content + str(salt) + secret_key
    sign = md5(temp_str.encode('utf-8')).hexdigest()
    payload = {'appid': app_id, 'q': content, 'from': source_lang.lower(), 'to': target_lang.lower(), 'salt': salt, 'sign': sign}
    try:
        resp = await session_async.post('http://api.fanyi.baidu.com/api/trans/vip/translate', params=payload)
        resp_json = resp.json()

    except Exception as e:
        print(content)
        print(e)
        return ''

    return resp_json['trans_result'][0]['dst']

translation_function = translate_via_gmx
translation_engine = 'deepl_gmx'
if translation_engine == "deepl_gmx":
    translation_function = translate_via_gmx
elif translation_engine == "deepl_backup":
    translation_function = translate_via_deepl_backup
elif translation_engine == "py-googletrans":
    translation_function = translate_via_googletrans
elif translation_engine == "baidu-api":
    translation_function = translate_via_baidu


In [None]:
#@title 7.1.4 Call API for translation: ~1.2 x thread number lines/sec when single threaded
# preprocess source texts

#source_texts = [line['text'].strip() for line in result_merged['segments']]
source_texts = [line['text'].strip() for line in result_merged]
chunk_size=translation_lines_per_request
source_text_chunks = list(chunks(source_texts, int(chunk_size)))
source_text_chunks_merged = ['\n---\n'.join(chunk) for chunk in source_text_chunks]

result_list_translated = []
result_api_call = await aiometer.run_all([partial(translate_via_gmx, i) for i in source_text_chunks_merged],
        max_per_second=translation_thread,  # here we can set max rate per second
    )
#print(result_api_call)
for chunk in result_api_call:
    chunk = [i.strip() for i in chunk.split('---')] # in case the translator messes up the line breaks
    result_list_translated.extend(chunk)

print(len(result_list_translated))

299


### 7.1.5 Assemble results

In [None]:
#@title Create versions of SRT

for i, j in zip(source_texts, result_list_translated):
    result_list_assembled.append(f"{j}\n{i}")

result_srt_list_translated = []

for i, v in enumerate(result_merged):
    result_srt_list_translated.append(srt.Subtitle(index=i, start=timedelta(seconds=v['start']), end=timedelta(seconds=v['end']), content=result_list_translated[i]))

result_srt_list_assembled = []

for i, v in enumerate(result_merged):
    result_srt_list_assembled.append(srt.Subtitle(index=i, start=timedelta(seconds=v['start']), end=timedelta(seconds=v['end']), content=result_list_assembled[i]))

composed_transcription_translated = srt.compose(result_srt_list_translated)
composed_transcription_assembled = srt.compose(result_srt_list_assembled)

In [None]:
#@title Optional: Remove special characters according to ACICFG's standard
composed_transcription_translated = composed_transcription_translated.replace("。", " ").replace("，", " ").replace("、", " ").replace("！", "! ")
composed_transcription_assembled = composed_transcription_assembled.replace("。", " ").replace("，", " ").replace("、", " ").replace("！", "! ")

In [None]:
#@title Optional: Execute the cell below to peak the assembled results.
print(composed_transcription_assembled)

1
00:00:01,300 --> 00:00:03,060
或在喜马拉雅山的高处 
or high above the Himalayas.

2
00:00:06,421 --> 00:00:13,423
四川航空8633航班的挡风玻璃出现裂缝 它在里面 这不是好事 
The windshield of Sichuan Airlines flight 8633 cracks. It's on the inside. That's not good.

3
00:00:16,584 --> 00:00:16,944
然后 
And then,

4
00:00:25,795 --> 00:00:28,296
这几乎就像有一个炸弹在你身边爆炸 
It's almost like having a bomb explode right beside you.

5
00:00:28,716 --> 00:00:33,177
减压迫使大副半边身子离开飞机 二! 
The decompression forces the first officer halfway out of the plane. Two!

6
00:00:34,438 --> 00:00:40,879
在冰冷的温度和迅速耗尽的氧气中 机长需要找到一种方法让他的飞机到达安全地带 
In freezing temperatures and rapidly running out of oxygen, the captain needs to find a way to get his plane to safety.

7
00:00:41,319 --> 00:00:48,321
可变的意识可能是40秒左右 机上所有人的生命现在都掌握在一个人的手中 
Mutable consciousness is probably about 40 seconds. The lives of everyone on board now rest in the hands of one man.

8
00:01:16,959 --> 00:01:23,263
在中国中部的高空 四川航空8633航班达到巡航高度 
High above central China, Szechuan Airlines Flight 

### Step 7.2: Collect translated results

Toggle the selection to generate Assembled SRT(Translation - Transcription) rather than Translated SRT(Translation only);

Also change the desired filename below.

Execute the 2 cells below and collect generated SRT file on the left.

In [None]:
#@title Generation Settings

translated_result_filename = 'translated.srt' #@param {type:"string"}
is_generate_assembled_srt = True #@param {type:"boolean"}

In [None]:
#@title Generate SRT
with open(translated_result_filename, 'w', encoding="utf-8") as f:
    if is_generate_assembled_srt:
        f.write(composed_transcription_assembled)
    else:
        f.write(composed_transcription_translated)

# Debug with xterm

In [None]:
!pip install colab-xterm
%load_ext colabxterm
%xterm

## Recycle

Recycle bin for code snippets: None of them should be necessary for ordinary users.

In [None]:
#@title Unused: Single Threaded version


with tqdm(total=len(result['segments'])) as pbar:
    for line in result['segments']:
        content = line['text'].strip()
        try:
            resp = s.post('https://deepl.cnbeining.com/translate', json={"text": content, "source_lang": "auto", "target_lang": "ZH"}).json()
            result_list_translated.append(resp['data'])

        except Exception as e:
            print(line)
            print(e)
            if resp['code'] != 200:
                print('Error calling API: ' + resp['msg'])
            result_list_translated.append(content)
            result_list_assembled.append(content)
            continue


        result_list_translated.append(resp['data'])
        result_list_assembled.append(f"{resp['data']}\n{content}")

        pbar.update(1)
