# OpenAI Whisper Transcription and Subtitle Generation 
https://github.com/openai/whisper

## Prerequisites to use CUDA to accelerate Whisper (Not-Required)

1. **Install Anaconda**
2. **Install CUDA**, if your machine has a CUDA-enabled GPU.
3. **Windows Build Requirements**: If you want to build on Windows, you'll need Visual Studio with MSVC toolset, and NVTX. Find the exact requirements for those dependencies [here](#).
4. **PyTorch Installation**: Follow the steps described [here](https://github.com/pytorch/pytorch#from-source).


In [None]:
# Install openai-whisper python-docx dependecies 
!pip install -U openai-whisper
!pip install python-docx


### Recommend ensuring that CUDA with pytorch is running to accelerate that transcription 



In [None]:
# Only run if missing CUDA version and have all the comaptible dependencies 
#!pip uninstall torch
#!pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

In [1]:
import torch

if torch.cuda.is_available():
  print("PyTorch is using CUDA")
else:
  print("PyTorch is not using CUDA, Transcriptions will be slow ")

  from .autonotebook import tqdm as notebook_tqdm


PyTorch is using CUDA


In [1]:
import whisper
import os
from glob import glob
from docx import Document
from datetime import timedelta

  from .autonotebook import tqdm as notebook_tqdm


In [9]:
whisper.available_models()

['tiny.en',
 'tiny',
 'base.en',
 'base',
 'small.en',
 'small',
 'medium.en',
 'medium',
 'large-v1',
 'large-v2',
 'large']

In [2]:
# Selected and load from the list of available models that can fit in your system
model = whisper.load_model("large-v2")

In [4]:
def parse_segments(segments):
    text = []
    for segment in segments:
        text.append(segment['text'].lstrip())
        text.append('\n')

    return ''.join(text)

def save_srt(result, file_name):
    outfile, _ = os.path.splitext(file_name)
    segments = result['segments']
    for segment in segments:
        startTime = str(0)+str(timedelta(seconds=int(segment['start'])))+',000'
        endTime = str(0)+str(timedelta(seconds=int(segment['end'])))+',000'
        text = segment['text']
        segmentId = segment['id']+1
        segment = f"{segmentId}\n{startTime} --> {endTime}\n{text[1:] if text[0] == ' ' else text}\n\n"

        srtFilename = os.path.join(f"{outfile}.srt")
        with open(srtFilename, 'a', encoding='utf-8') as srtFile:
            srtFile.write(segment)


### The following step will search for any .mp3 file under a videos folder where this notebook is located 

In [7]:
for file_name in glob(r'videos/*.mp3'):
    print(file_name)
    outfile, _ = os.path.splitext(file_name)
    result = model.transcribe(file_name)
    save_srt(result, file_name)
    
    document = Document()
    document.add_paragraph(parse_segments(result['segments']))
    document.save(outfile + '.docx')


videos\A 10 Close Air Support.mp3


In [8]:
result

{'text': " Roger, be advised we're east of the riverbed. We're taking fire from the west. Roger, stand by, we're popping smoke. Look for the smoke, it's going to be a west smoke. Roger, be advised, we're east of the riverbed. Okay, got the red smoke. Roger, sun runs north and south, west of the smoke, west of the smoke. Okay, copy, west of the smoke. I'm looking at danger close now. Roger, keep your fires west of the smoke. Commander's initials, Bravo Golf, keep the fucking fire west of the smoke. We have three smokes popped. Okay, copy that. I am visual the smoke. I'm going to keep my fires west of the smoke and west of the road. Roger, I see you. Come on, weatherman, give it to me, I'm in. You're clear to hot. Copy, clear to hot. Copy, visual the smoke. I'm visualing it, copy, to the west. Long way and straight. You got me, right? That's it. Okay, make sure you got me. Go ahead, get up to me, over. Two, two, I need you, come on, two. Two's in hot, another one. Two, same spot, look fo

# MP4 to MP3

In [None]:
!pip install moviepy

In [7]:
from moviepy.editor import *
video = VideoFileClip('c:\\Users\\linoa\\Documents\\Code\\whisper_subtitles\\./videos/ATC_audio.mp4')
video.audio.write_audiofile('c:\\Users\\linoa\\Documents\\Code\\whisper_subtitles\\./videos/ATC_audio.mp3')

MoviePy - Writing audio in c:\Users\linoa\Documents\Code\whisper_subtitles\./videos/ATC_audio.mp3


                                                                      

MoviePy - Done.




# Add subtitles to mp4

In [None]:
from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
import textwrap

# Load the video
video = VideoFileClip("videos/A 10 Close Air Support.mp4")

# Load the transcript file
transcript = result['segments']

# Create a subtitles clip
txt_clips = []
for text in transcript:
  subtitle = text['text']
  subtitle = "\n".join(textwrap.wrap(subtitle, 50))
  txt_clip = TextClip(subtitle,fontsize = 12,  font="Amiri-Bold", kerning=1, bg_color = 'black', color='white')
  txt_clip = txt_clip.set_start(text['start'])
  txt_clip = txt_clip.set_position((0.2,0.8), relative=True).set_duration(text['end']-text['start'])
  txt_clips.append(txt_clip)  
  

# concatenate the video and subtitles
final_video = CompositeVideoClip([video]+txt_clips)

# Save the final video
final_video.write_videofile("videos/output.mp4")

# Download videos from Youtube for testing 

In [None]:
!pip install pytube


In [None]:
from pytube import YouTube

link = "https://www.youtube.com/watch?v=jdBzyAURWEI"
yt = YouTube(link)

# Download the video with the highest resolution and file type (usually mp4)
video = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first()
video.download('./videos/')