# Convert Video Previews

Video previews in mp4 format will be converted to mp3 format using a suitable Python package. For this purpose, the package `pydub` is used.

Import the necessary packages and set the path to the input and output folders.

In [2]:
# Import the necessary modules for the conversion process
import os
from pydub import AudioSegment

# Set the path to the folder containing video files
video_folder = "/Users/msheleg/Public/af_marketing"

# Set the path for the converted audio files
audio_folder = "/Users/eugkabanov/Downloads/af_marketing_audio"


This code block will loop through each file in the video folder and check if it is a video file based on the file extension. It will then set the paths for the input video file and the output audio file using the os module, load the video clip using AudioSegment, and extract the audio from the clip using the export method, which saves the audio clip in mp3 format with the same filename as the input video file.

In [3]:
# Loop through each file in the video folder
for filename in os.listdir(video_folder):

    # Check if the file is a video file
    if filename.endswith(".mp4"):

        # Set the paths for the input video file and the output audio file
        video_path = os.path.join(video_folder, filename)
        audio_path = os.path.join(audio_folder, os.path.splitext(filename)[0] + ".mp3")

        # Load the video clip using pydub
        video_clip = AudioSegment.from_file(video_path, format="mp4")

        # Extract the audio from the video clip and save it in mp3 format
        audio_clip = video_clip.export(audio_path, format="mp3")

Now we will find all files in `audio_path` folder that exceed 25 Mb limit and split them to chunks with max size of 25 Mb. It's limitation of the Whisper model.

In [17]:
# Set the maximum size limit for the audio files (25 Mb in this case)
max_size_bytes = 24 * 1024 * 1024
max_size = 1500000

# Iterate through each file in the folder
for file_name in os.listdir(audio_folder):
    file_path = os.path.join(audio_folder, file_name)

    # Check if the file is an mp3 file and its size exceeds the maximum limit
    if file_name.endswith(".mp3") and os.path.getsize(file_path) > max_size_bytes:
        # Load the mp3 file using pydub
        audio = AudioSegment.from_file(file_path, format="mp3")

        # Calculate the maximum size limit for the audio files in milliseconds
        duration_s = int(len(audio))

        # Calculate the number of chunks required to split the file
        num_chunks = (duration_s // max_size) + 1

        # Split the file into chunks and save them to the same folder
        for i in range(num_chunks):
            start = i * max_size
            end = (i + 1) * max_size

            # Set the end index to the last byte of the file if it exceeds the file size
            if end > duration_s:
                end = duration_s

            # Extract the chunk of audio and save it to a new file with a suffix indicating the chunk number
            chunk = audio[start:end]
            chunk.export(f"{file_path[:-4]}_chunk{i+1}.mp3", format="mp3")

        # Remove the original file
        os.remove(file_path)


Chunk 1: 0 - 1500000
Chunk 2: 1500000 - 3000000
Chunk 3: 3000000 - 3067472


That's it! Now you have all data converted and splitted to 25 Mb chunks.
