## **Silent Parts Removal from Video**

First, lets get all libraries and modules ready and get a transcription from youtube video

In [7]:
! pip install imageio==2.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [8]:
! pip install imageio-ffmpeg

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## **Download Videos from youtube for testing**

In [13]:
!pip install pytube

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytube
  Downloading pytube-12.1.0-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 2.7 MB/s 
[?25hInstalling collected packages: pytube
Successfully installed pytube-12.1.0


In [33]:
from pytube import YouTube
import re
import os


def download_youtube(url):
  yt = YouTube(url)
  mp4_files = yt.streams.filter(file_extension="mp4").get_by_resolution("360p")
  mp4_files.download("test_video")

In [34]:
download_youtube("https://www.youtube.com/watch?v=4WEQtgnBu0I")

**Downloaded the file with 30 seconds silence for testing whether we are able to remove the silence part from the video**

**Remove Silent Parts of the video**

In [24]:
from moviepy.editor import AudioClip, VideoFileClip, concatenate_videoclips

In [None]:
import math
import sys
import subprocess
import os
import shutil

# Iterate over audio to find the non-silent parts. Outputs a list of
# (speaking_start, speaking_end) intervals.
# Args:
#  window_size: (in seconds) hunt for silence in windows of this size
#  volume_threshold: volume below this threshold is considered to be silence
#  ease_in: (in seconds) add this much silence around speaking intervals

def find_speaking(audio_clip, window_size=0.1, volume_threshold=0.01, ease_in=0.25):
    # First, iterate over audio to find all silent windows.
    num_windows = math.floor(audio_clip.end/window_size)
    window_is_silent = []
    for i in range(num_windows):
        s = audio_clip.subclip(i * window_size, (i + 1) * window_size)
        v = s.max_volume()
        window_is_silent.append(v < volume_threshold)

    # Find speaking intervals.
    speaking_start = 0
    speaking_end = 0
    speaking_intervals = []
    for i in range(1, len(window_is_silent)):
        e1 = window_is_silent[i - 1]
        e2 = window_is_silent[i]
        # silence -> speaking
        if e1 and not e2:
            speaking_start = i * window_size
        # speaking -> silence, now have a speaking interval
        if not e1 and e2:
            speaking_end = i * window_size
            new_speaking_interval = [speaking_start - ease_in, speaking_end + ease_in]
            # With tiny windows, this can sometimes overlap the previous window, so merge.
            need_to_merge = len(speaking_intervals) > 0 and speaking_intervals[-1][1] > new_speaking_interval[0]
            if need_to_merge:
                merged_interval = [speaking_intervals[-1][0], new_speaking_interval[1]]
                speaking_intervals[-1] = merged_interval
            else:
                speaking_intervals.append(new_speaking_interval)

    return speaking_intervals

def main():
    # Parse args
    # Input file path"
    file_in = "/content/test_video/30 Second Elevator Pitch.mp4"
    # Output file path
    file_out = "/content/test_video/remove_silence.mp4"
    vid = VideoFileClip(file_in)
    intervals_to_keep = find_speaking(vid.audio)

    print("Keeping intervals: " + str(intervals_to_keep))
    
    keep_clips = [vid.subclip(start, end) for [start, end] in intervals_to_keep]
    print(keep_clips)

    if len(keep_clips) > 0:
      edited_video = concatenate_videoclips(keep_clips)
      vid.close()
    else:
      print("the duration is empty. the entire audio is silent")
if __name__ == '__main__':
    main()

**WRITE VIDEOS TO FILE**

In [45]:
def write_video(path,clip):
  clip.write_videofile(path)

**CROP**

In [48]:
import moviepy.editor as mpy

from moviepy.video.fx.all import crop

def crop_video(url,width=320,height=320):
  clip = mpy.VideoFileClip(url)
  (w, h) = clip.size
  cropped_clip = crop(clip, width, height, x_center=w/2, y_center=h/2)
  return cropped_clip

In [47]:
cropped_clip  = crop_video("/content/test_video/30 Second Elevator Pitch.mp4")

#write to file
write_video("/content/test_video/cropped.mp4",cropped_clip)

[MoviePy] >>>> Building video /content/test_video/cropped.mp4
[MoviePy] Writing audio in croppedTEMP_MPY_wvf_snd.mp3


100%|██████████| 896/896 [00:01<00:00, 842.99it/s]

[MoviePy] Done.
[MoviePy] Writing video /content/test_video/cropped.mp4



100%|██████████| 579/579 [00:06<00:00, 91.77it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: /content/test_video/cropped.mp4 



**TRIM**

In [54]:
path = "/content/test_video/30 Second Elevator Pitch.mp4"

def trim(path,start=0,end=10):
  clip = VideoFileClip(path)
  clip = clip.cutout(start,end)
  return clip

trim(path).ipython_display(width = 360)

100%|██████████| 455/455 [00:00<00:00, 789.50it/s]
100%|██████████| 437/437 [00:01<00:00, 242.75it/s]


**SUBCLIP**

In [53]:
# Import everything needed to edit video clips
from moviepy.editor import *

path = "/content/test_video/30 Second Elevator Pitch.mp4"
def sub_clip(path,start=0,end=5):
  clip = VideoFileClip(path)
  clip = clip.subclip(0, 5)
  return clip

sub_clip(path).ipython_display(width = 360)

100%|██████████| 111/111 [00:00<00:00, 913.37it/s]
100%|██████████| 72/72 [00:00<00:00, 264.69it/s]


**ADD WATERMARK or TEXT to VIDEO**

In [61]:
!apt install imagemagick

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  fonts-droid-fallback fonts-noto-mono ghostscript gsfonts
  imagemagick-6-common imagemagick-6.q16 libcupsfilters1 libcupsimage2
  libdjvulibre-text libdjvulibre21 libgs9 libgs9-common libijs-0.35
  libjbig2dec0 liblqr-1-0 libmagickcore-6.q16-3 libmagickcore-6.q16-3-extra
  libmagickwand-6.q16-3 libnetpbm10 libwmf0.2-7 netpbm poppler-data
Suggested packages:
  fonts-noto ghostscript-x imagemagick-doc autotrace cups-bsd | lpr | lprng
  enscript gimp gnuplot grads hp2xx html2ps libwmf-bin mplayer povray radiance
  sane-utils texlive-base-bin transfig ufraw-batch inkscape libjxr-tools
  libwmf0.2-7-gtk poppler-utils fonts-japanese-mincho | fonts-ipafont-mincho
  fonts-japanese-gothic | fonts-ipafont-gothic fo

In [63]:
!cat /etc/ImageMagick-6/policy.xml | sed 's/none/read,write/g'> /etc/ImageMagick-6/policy.xml

In [None]:
# some issues in ImageMagick
#will fix this later

path = "/content/test_video/30 Second Elevator Pitch.mp4"

from moviepy.editor import *

def Watermark(path,text="AI video Editor",pos="center",color="black",font_size=10):
  clip = VideoFileClip(path)
  duration = clip.duration
  # Generate a text clip
  txt_clip = TextClip(text, fontsize = 10, color = 'black')
  #set pos
  txt_clip.set_pos(pos).set_duration(duration)
  video = CompositeVideoClip([clip, txt_clip]) 
  return video

Watermark(path)

**Translate Audio/VIdeo and DOWNLOAD SRT**

In [None]:
! pip install git+https://github.com/openai/whisper.git
! sudo apt update && sudo apt install ffmpeg
! pip install setuptools-rust
! pip install imageio==2.4.1

In [66]:
import whisper

In [73]:
#download youtube video
download_youtube("https://www.youtube.com/watch?v=aNxd01VzJsw")

In [79]:
#load whisper model
def load_model():
  #load medium model
  model = whisper.load_model("base")
  return model

In [80]:
#it takes around 5-10 mins to run
def translate(path):
  model = load_model()
  result = model.transcribe(path,fp16=False)
  return result

result = translate("/content/test_video/Dunki  Title Announcement  Shah Rukh Khan  Taapsee Pannu  Rajkumar Hirani  22 Dec 23.mp4")

100%|███████████████████████████████████████| 139M/139M [00:02<00:00, 63.3MiB/s]


In [81]:
#translation from hindi to english
result["text"]

' तो ए हु hitting BAR त\x00ureu बोडना भा at ए हրुपन्रूर काfloo उजा भा 6 ये हु console Bevölkerension मूना बाinese बीई in S वाल् स नहा बाल्सरč दागं डे मूने नम कम सके भख़ िесь कि शुब कि ख़ पाव ँा होiksafLaughs really staat P got ya botheredışf raman c года sur u yo br nee ng donื่ k n donavam. Danki ni Sharu, Danki I am Sambya Danki, Danki Danki In and As Means Sir Douth Patan, what are you making? Jobi, Lelu, Lelu'

# need to stuy about https://github.com/LibreTranslate/LibreTranslate

In [None]:
#DOWNLOAD SRT

**SPEAKER LABEL IDENTIFICATION**

**Reference:** https://colab.research.google.com/drive/12W6bR-C6NIEjAML19JubtzHPIlVxdaUq?usp=sharing

Need to Work on Speaker Identification

Need to integrate with OPenAI Whisper