# Setup 
Here we install the required packages for this application. Additionally, we will remove a single line from the ImageMagick policy that would have prevented this code from running, create our experiments directory, and restart the kernel. 

> Note: We need to restart the kernel due to an odd behavior from MoviePy that stops this from working in the same session as the install.

> Just move onto the next section of this notebook after running the install cell below. 

> We should be especially wary of this if we intend to 'Run All' cells, as it will catch here.

In [None]:
import os
!pip install -r requirements.txt
!pip install git+https://github.com/openai/whisper.git 
!pip install yt-dlp
!pip install moviepy --upgrade
!apt-get update
!apt install imagemagick -y
# remove line 88 of vim ~/../etc/ImageMagick-6/policy.xml to run MoviePy
!sed -i '88d' ~/../etc/ImageMagick-6/policy.xml 
!mkdir experiments
os._exit(00)

Collecting more-itertools
  Downloading more_itertools-9.0.0-py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
Collecting ffmpeg-python==0.2.0
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Installing collected packages: more-itertools, ffmpeg-python
Successfully installed ffmpeg-python-0.2.0 more-itertools-9.0.0
[0m--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/pip/_internal/utils/logging.py", line 177, in emit
    self.console.print(renderable, overflow="ignore", crop=False, style=style)
  File "/usr/local/lib/python3.9/dist-packages/pip/_vendor/rich/console.py", line 1673, in print
    extend(render(renderable, render_options))
  File "/usr/local/lib/python3.9/dist-packages/pip/_vendor/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/usr/local/lib/python3.9/dist-packages/pip/_interna

# The function

The `subtitle_video` function does all the work for us to autocaption our supplied video with the generated text captions from Whisper at the correct time stamps.

This works for both youtube links and videos uploaded directly to this Notebook, and will automatically scale the size of the captions to the video size.

See the params notes section of the start of the function and the function call below for more details on the arguments for this function.

In [23]:
## Imports
from __future__ import unicode_literals
from yt_dlp import YoutubeDL
import yt_dlp
from IPython.display import Video
import whisper
import cv2
import pandas as pd
from moviepy.editor import VideoFileClip
import moviepy.editor as mp
from IPython.display import display, Markdown
from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
import os

import cv2

def subtitle_video(download, url, aud_opts, vid_opts, model_type, name, audio_file, input_file, output, uploaded_vid = None):
# ------------------------------------------------------------------------------------------------------------------------------
#     Params:
# ------------------------------------------------------------------------------------------------------------------------------
#     download:      bool, this tells your function if you are downloading a youtube video
#     url: str,      str, the URL of youtube video to download if download is True
#     aud_opts:      dict, audio file youtube-dl options 
#     vid_opts:      dict, video file youtube-dl options    
#     model_type:    str, which pretrained model to download. Options are:
#                    ['tiny', 'small', 'base', 'medium','large','tiny.en', 'small.en', 'base.en', 'medium.en']
#                    More details about model_types can be found in table in original repo here:
#                    https://github.com/openai/whisper#Available-models-and-languages
#.    name:          str, name of directory to store files in in experiments folder
#     audio_file:    str, path to extracted audio file for Whisper
#     input_file:    str, path to video file for MoviePy to caption
#     output:        str, destination of final output video file
#     uploaded_vid:  str, path to uploaded video file if download is False
#     
#--------------------------------------------------------------------------------------------------------------------------------
#     Returns:       An annotated video with translated captions into english, saved to name/output
#--------------------------------------------------------------------------------------------------------------------------------
    
    ## First, this checks if your expermiment name is taken. If not, it will create the directory.
    ## Otherwise, we will be prompted to retry with a new name
    try:
        os.mkdir(f'experiments/{name}')
        print('Starting AutoCaptioning...')
        print(f'Results will be stored in experiments/{name}')
        
    except:
        return print('Choose another folder name! This one already has files in it.')
    
    ## Use audio and video options for youtube-dl if downloading from youtube
    vid_opts['outtmpl'] = f'experiments/{name}/{input_file}'
    aud_opts['outtmpl'] = f'experiments/{name}/{audio_file}'

    URLS = [url]
    if download:
        with YoutubeDL(aud_opts) as ydl:
            ydl.download(url)
        with YoutubeDL(vid_opts) as ydl:
            ydl.download(URLS)
    else:
        # Use local clip if not downloading from youtube
        my_clip = mp.VideoFileClip(uploaded_vid)
        my_clip.write_videofile(f'experiments/{name}/{input_file}')
        my_clip.audio.write_audiofile(f'experiments/{name}/{audio_file}')

    # Instantiate whisper model using model_type variable
    model = whisper.load_model(model_type)
    
    # Get text from speech for subtitles from audio file
    result = model.transcribe(f'experiments/{name}/{audio_file}', task = 'translate')
    
    # create Subtitle dataframe, and save it
    dict1 = {'start':[], 'end':[], 'text':[]}
    for i in result['segments']:
        dict1['start'].append(int(i['start']))
        dict1['end'].append(int(i['end']))
        dict1['text'].append(i['text'])
    df = pd.DataFrame.from_dict(dict1)
    df.to_csv(f'experiments/{name}/subs.csv')
    vidcap = cv2.VideoCapture(f'experiments/{name}/{input_file}')
    success,image = vidcap.read()
    height = image.shape[0]
    width =image.shape[1]

    # Instantiate MoviePy subtitle generator with TextClip, subtitles, and SubtitlesClip
    generator = lambda txt: TextClip(txt, font='P052-Bold', fontsize=width/50, stroke_width=.7, color='white', stroke_color = 'black', size = (width, height*.25), method='caption')
    # generator = lambda txt: TextClip(txt, color='white', fontsize=20, font='Georgia-Regular',stroke_width=3, method='caption', align='south', size=video.size)
    subs = tuple(zip(tuple(zip(df['start'].values, df['end'].values)), df['text'].values))
    subtitles = SubtitlesClip(subs, generator)
    
    # Ff the file was on youtube, add the captions to the downloaded video
    if download:
        video = VideoFileClip(f'experiments/{name}/{input_file}')
        final = CompositeVideoClip([video, subtitles.set_pos(('center','bottom'))])
        final.write_videofile(f'experiments/{name}/{output}', fps=video.fps, remove_temp=True, codec="libx264", audio_codec="aac")
    else:
        # If the file was a local upload:
        video = VideoFileClip(uploaded_vid)
        final = CompositeVideoClip([video, subtitles.set_pos(('center','bottom'))])
        final.write_videofile(f'experiments/{name}/{output}', fps=video.fps, remove_temp=True, codec="libx264", audio_codec="aac")




## Declare relevant variables

In [24]:
# Options for youtube download to ensure we get a high quality audio file extraction. 
# This is key, as extracting from the video in the same download seemed to significantly affect caption Word Error Rate in our experiments.
# Only modify these if needed. Lowered audio quality may inhibit the transcription's word error rate.
opts_aud = {
    'format': 'mp3/bestaudio/best',
    'keep-video':True}

# Options for youtube video to get right video file for final output
opts_vid = {'format': 'mp4/bestvideo/best'}

# Youtube URL
URL = 'https://www.youtube.com/watch?v=lHcIfogQQ60' # The Hobbit Smaug in many languages

URL = 'https://youtu.be/3DKDt693p8Y' # steamed hams in many languages, sample link


# Generate subtitles

To autocaption our video, we just simply fill in the fields below with the relevant values.

The only required change is to the URL value if we would like that to be a different video from the sample. 

> Note: If we run into an error, we can try restarting the kernel and running these 3 code cells again. It is unclear why this happens, but MoviePy seems to require a restart to the kernel occasionally.

In [25]:
!ls

Dockerfile    data		      outputs		tests
LICENSE       experiments	      requirements.txt	whisper
MANIFEST.in   inputs		      results		whisper-caption.ipynb
README.md     language-breakdown.svg  setup.py
app.py	      model-card.md	      spec.yaml
approach.png  notebooks		      templates


In [29]:
subtitle_video(
    download=True,
    uploaded_vid=None,     # path to local file
    url = URL,
    name = 'run1',
    aud_opts = opts_aud,
    vid_opts = opts_vid,   # Video download settings
    model_type = 'medium', # change to 'large' if you want more accurate results, 
                           #change to 'medium.en' or 'large.en' for all english language tasks,
                           #and change to 'small' or 'base' for faster inference
    audio_file = "audio.mp3",
    input_file = 'video.mp4',
    output = 'output.mp4')


Starting AutoCaptioning...
Results will be stored in experiments/run9


t:   0%|          | 2/515 [01:47<7:37:29, 53.51s/it, now=None]

Moviepy - Building video experiments/run9/test.mp4.
MoviePy - Writing audio in testTEMP_MPY_wvf_snd.mp3



chunk:   0%|          | 0/378 [00:00<?, ?it/s, now=None][A
chunk:  27%|██▋       | 102/378 [00:00<00:00, 1004.27it/s, now=None][A
chunk:  54%|█████▍    | 204/378 [00:00<00:00, 1006.72it/s, now=None][A
chunk:  81%|████████  | 305/378 [00:00<00:00, 1001.87it/s, now=None][A
t:   0%|          | 2/515 [01:47<7:40:35, 53.87s/it, now=None]      [A

MoviePy - Done.
Moviepy - Writing video experiments/run9/test.mp4




t:   0%|          | 0/515 [00:00<?, ?it/s, now=None][A
t:   3%|▎         | 17/515 [00:00<00:02, 167.78it/s, now=None][A
t:   7%|▋         | 34/515 [00:00<00:02, 166.18it/s, now=None][A
t:  10%|▉         | 51/515 [00:00<00:03, 147.95it/s, now=None][A
t:  13%|█▎        | 67/515 [00:00<00:03, 127.04it/s, now=None][A
t:  16%|█▌        | 82/515 [00:00<00:03, 130.31it/s, now=None][A
t:  19%|█▊        | 96/515 [00:00<00:03, 120.87it/s, now=None][A
t:  21%|██        | 109/515 [00:00<00:03, 116.96it/s, now=None][A
t:  23%|██▎       | 121/515 [00:00<00:03, 108.97it/s, now=None][A
t:  26%|██▌       | 134/515 [00:01<00:03, 114.03it/s, now=None][A
t:  29%|██▊       | 147/515 [00:01<00:03, 117.85it/s, now=None][A
t:  31%|███       | 159/515 [00:01<00:03, 117.17it/s, now=None][A
t:  34%|███▍      | 176/515 [00:01<00:02, 130.55it/s, now=None][A
t:  37%|███▋      | 190/515 [00:01<00:02, 128.80it/s, now=None][A
t:  40%|███▉      | 205/515 [00:01<00:02, 130.32it/s, now=None][A
t:  43%|███

Moviepy - Done !
Moviepy - video ready experiments/run9/test.mp4
MoviePy - Writing audio in experiments/run9/audio.mp3



chunk:   0%|          | 0/378 [00:00<?, ?it/s, now=None][A
chunk:   1%|          | 2/378 [00:00<01:06,  5.69it/s, now=None][A
chunk:  28%|██▊       | 106/378 [00:00<00:00, 304.34it/s, now=None][A
chunk:  58%|█████▊    | 219/378 [00:00<00:00, 545.41it/s, now=None][A
chunk:  83%|████████▎ | 313/378 [00:00<00:00, 661.43it/s, now=None][A
t:   0%|          | 2/515 [01:53<8:04:27, 56.66s/it, now=None]     [A

MoviePy - Done.


t:   0%|          | 2/515 [02:10<9:19:20, 65.42s/it, now=None]

Moviepy - Building video experiments/run9/test.mp4.
MoviePy - Writing audio in testTEMP_MPY_wvf_snd.mp4



chunk:   0%|          | 0/378 [00:00<?, ?it/s, now=None][A
chunk:  27%|██▋       | 102/378 [00:00<00:00, 995.08it/s, now=None][A
chunk:  64%|██████▍   | 242/378 [00:00<00:00, 1228.77it/s, now=None][A
t:   0%|          | 2/515 [02:11<9:22:03, 65.74s/it, now=None]      [A

MoviePy - Done.
Moviepy - Writing video experiments/run9/test.mp4




t:   0%|          | 0/515 [00:00<?, ?it/s, now=None][A
t:   1%|          | 6/515 [00:00<00:09, 51.84it/s, now=None][A
t:   2%|▏         | 12/515 [00:00<00:10, 48.10it/s, now=None][A
t:   3%|▎         | 17/515 [00:00<00:10, 47.95it/s, now=None][A
t:   4%|▍         | 22/515 [00:00<00:10, 47.28it/s, now=None][A
t:   5%|▌         | 27/515 [00:00<00:10, 47.35it/s, now=None][A
t:   6%|▌         | 32/515 [00:00<00:10, 47.19it/s, now=None][A
t:   7%|▋         | 37/515 [00:00<00:10, 47.52it/s, now=None][A
t:   8%|▊         | 42/515 [00:00<00:10, 47.17it/s, now=None][A
t:   9%|▉         | 47/515 [00:00<00:10, 46.78it/s, now=None][A
t:  10%|█         | 53/515 [00:01<00:09, 47.19it/s, now=None][A
t:  11%|█▏        | 58/515 [00:01<00:09, 47.34it/s, now=None][A
t:  12%|█▏        | 63/515 [00:01<00:09, 47.97it/s, now=None][A
t:  13%|█▎        | 68/515 [00:01<00:09, 48.01it/s, now=None][A
t:  14%|█▍        | 73/515 [00:01<00:09, 47.66it/s, now=None][A
t:  15%|█▌        | 78/515 [00:01<

Moviepy - Done !
Moviepy - video ready experiments/run9/test.mp4


# Display your video output in markdown
<video controls src="experiments/run1/test.mp4" />
