🎬 1. Extract Audio from Video

In [1]:
%pip install moviepy speechrecognition pydub

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from moviepy.editor import VideoFileClip

video = VideoFileClip("Subtitle-Video/video/Demo-2.mp4")
video.audio.write_audiofile("Subtitle-Video/temp_audio/temp_audio.wav")

MoviePy - Writing audio in Subtitle-Video/temp_audio/temp_audio.wav


                                                                    

MoviePy - Done.




🧠 2. Transcribe Audio to Text (Speech Recognition)

In [3]:
%pip install haystack-ai

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
from haystack.components.audio import LocalWhisperTranscriber
from pydub import AudioSegment
from pydub.silence import split_on_silence

In [5]:
# Initialize the Whisper transcriber
transcriber = LocalWhisperTranscriber()
transcriber.warm_up()

In [6]:
transcription = transcriber.run(sources=["./Subtitle-Video/temp_audio/temp_audio.mp3"])

In [7]:
print(transcription["documents"][0].content)

 Sir Keir Starmer has hit the ground running on his first full day as Prime Minister, holding his first Cabinet meeting and then a press conference laying out his plans for the immediate future. He'll start a tour of the UK's nations tomorrow and warned tough decisions will be made early. He said both prisons and the NHS in England are broken, but that work to fix the NHS has already begun. And he confirmed the previous government's Rwanda migrants plan has been scrapped. Here's our political correspondent, Shihab Khan, on Sir Keir's first full day in charge.


In [8]:
transcript_text = transcription["documents"][0].content

In [9]:
transcript_text

" Sir Keir Starmer has hit the ground running on his first full day as Prime Minister, holding his first Cabinet meeting and then a press conference laying out his plans for the immediate future. He'll start a tour of the UK's nations tomorrow and warned tough decisions will be made early. He said both prisons and the NHS in England are broken, but that work to fix the NHS has already begun. And he confirmed the previous government's Rwanda migrants plan has been scrapped. Here's our political correspondent, Shihab Khan, on Sir Keir's first full day in charge."

In [10]:
import os

In [11]:
sound = AudioSegment.from_wav("Subtitle-Video/temp_audio/temp_audio.wav")

In [12]:
chunks = split_on_silence(
    sound,
    min_silence_len=300,                # Detect shorter pauses (was 700)
    silence_thresh=sound.dBFS - 16,     # Be more sensitive to soft speech
    keep_silence=300                    # Add 300ms silence before/after each chunk
)

In [13]:
subtitles = []
start_time = 0

In [14]:
from pydub.silence import detect_nonsilent

nonsilent_ranges = detect_nonsilent(
    sound,
    min_silence_len=500,
    silence_thresh=sound.dBFS - 16
)

# Extend each detected range by 300ms (within bounds)
buffer_ms = 300
nonsilent_ranges = [
    (max(start - buffer_ms, 0), min(end + buffer_ms, len(sound)))
    for start, end in nonsilent_ranges
]

In [15]:
for i, (start_ms, end_ms) in enumerate(nonsilent_ranges):
    chunk = sound[start_ms:end_ms]
    chunk_filename = os.path.abspath(f"Subtitle-Video/temp_audio/chunk{i}.wav")
    chunk.export(chunk_filename, format="wav")

In [16]:
for i, (start_ms, end_ms) in enumerate(nonsilent_ranges):
    print(f"Start : {start_ms}, type {type(start_ms)}")
    print(f"End: {end_ms}, type: {type(end_ms)}")
    chunk = sound[start_ms:end_ms]
    chunk_filename = f"./Subtitle-Video/temp_audio/chunk{i}.mp3"
    chunk.export(chunk_filename, format="mp3")

    try:
        # Use the absolute path to avoid errors
        transcription_result = transcriber.transcribe(sources=[chunk_filename])
        text = transcription_result[0].content

        if text:
            print(f"Transcribed text: {text}")
            start_time = start_ms / 1000.0
            end_time = end_ms / 1000.0
            subtitles.append((start_time, end_time, text))
    except Exception as e:
        print(f"Error during transcription: {e}")
        continue

    # Remove the temporary audio file
    os.remove(chunk_filename)

Start : 0, type <class 'int'>
End: 10683, type: <class 'int'>
Transcribed text:  Sir Keir Starmer has hit the ground running on his first full day as Prime Minister, holding his first Cabinet meeting and then a press conference laying out his plans for the immediate future.
Start : 10628, type <class 'int'>
End: 23337, type: <class 'int'>
Transcribed text:  He'll start a tour of the UK's nations tomorrow and warned tough decisions will be made early. He said both prisons and the NHS in England are broken, but that work to fix the NHS has already begun.
Start : 23321, type <class 'int'>
End: 28383, type: <class 'int'>
Transcribed text:  and he confirmed the previous government's Rwanda migrants plan has been scrapped.
Start : 28548, type <class 'int'>
End: 33733, type: <class 'int'>
Transcribed text:  Here's our political correspondent, Shihab Khan, on Sir Keir's first full day in charge.


💬 3. Burn Subtitles into Video

In [17]:
from moviepy.editor import TextClip, CompositeVideoClip

In [None]:
# import moviepy.config as mpy_config
# mpy_config.change_settings({"IMAGEMAGICK_BINARY": r"C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\convert.exe"})

In [18]:
from moviepy.config import change_settings
change_settings({"IMAGEMAGICK_BINARY": r"C:\\Program Files\\ImageMagick-7.1.1-Q16-HDRI\\magick.exe"})

In [19]:
from moviepy.editor import TextClip
# solution to MoviePy not found error: https://stackoverflow.com/questions/51928807/moviepy-cant-detect-imagemagick-binary-on-windows

clip = TextClip("Hello, world!", fontsize=70, color='white', bg_color='black')
clip.save_frame("test_output.png")

In [20]:
import textwrap

In [21]:
import re

def split_text(text, max_words=8):
    """Split a sentence into smaller chunks with up to `max_words` each."""
    words = text.split()
    return [' '.join(words[i:i+max_words]) for i in range(0, len(words), max_words)]

In [22]:
subtitle_clips = []

for start, end, text in subtitles:
    duration = end - start
    chunks = split_text(text, max_words=8)

    print(f"Text chunk: {chunks}")

    word_counts = [len(chunk.split()) for chunk in chunks]
    total_words = sum(word_counts)

    # Compute proportional durations for each chunk
    chunk_durations = [(wc / total_words) * duration for wc in word_counts]

    current_time = start
    for chunk, chunk_duration in zip(chunks, chunk_durations):
        chunk_end = current_time + chunk_duration

        wrapped_text = textwrap.fill(chunk, width=50)

        txt_clip = TextClip(
            wrapped_text,
            fontsize=64,
            color='white',
            method='caption',
            size=(int(video.w * 0.9), None)
        )

        txt_clip = txt_clip.on_color(
            size=txt_clip.size,
            color=(0, 0, 0),
            col_opacity=0.6
        ).set_position(("center", "bottom"))

        txt_clip = txt_clip.set_start(current_time).set_duration(chunk_duration)
        subtitle_clips.append(txt_clip)

        current_time = chunk_end

Text chunk: ['Sir Keir Starmer has hit the ground running', 'on his first full day as Prime Minister,', 'holding his first Cabinet meeting and then a', 'press conference laying out his plans for the', 'immediate future.']
Text chunk: ["He'll start a tour of the UK's nations", 'tomorrow and warned tough decisions will be made', 'early. He said both prisons and the NHS', 'in England are broken, but that work to', 'fix the NHS has already begun.']
Text chunk: ["and he confirmed the previous government's Rwanda migrants", 'plan has been scrapped.']
Text chunk: ["Here's our political correspondent, Shihab Khan, on Sir", "Keir's first full day in charge."]


In [23]:
final_video = CompositeVideoClip([video] + subtitle_clips, size=video.size)
final_video.write_videofile("Subtitle-Video/output/Output-Demo2.mp4", codec="libx264", fps=video.fps)

Moviepy - Building video Subtitle-Video/output/Output-Demo2.mp4.
MoviePy - Writing audio in Output-Demo2TEMP_MPY_wvf_snd.mp3


                                                                    

MoviePy - Done.
Moviepy - Writing video Subtitle-Video/output/Output-Demo2.mp4



                                                                

Moviepy - Done !
Moviepy - video ready Subtitle-Video/output/Output-Demo2.mp4
