## <p style="text-align:center" color="red"><span style="color:red">Youtube Video Transcriptor</span></p>



<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/labrijisaad/Youtube-video-transcriptor"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

- **`This script is meant to be run in google colaboratory!`**
- This script can sometimes not detect text correctly, it's mainly due to noises or the way we speak in the video (speaking too fast or too slow)
- The general idea summary model is a community model available at [Huggingface](https://huggingface.co/) , it can sometimes get the general idea wrong, especially if there is a lack of data.


> 🙌 Notebook made by [@labriji_saad](https://github.com/labrijisaad)

### Installing the requirements

In [1]:
%%capture 

!pip install --upgrade yt_dlp
!pip install pydub SpeechRecognition ffmpeg ffprobe googletrans==3.1.0a0 transformers

In [2]:
import yt_dlp
import time
import re
import os
from pydub import AudioSegment
import speech_recognition as sr
import math
from tqdm import tqdm
from googletrans import Translator

### Downloading the audio (`url = video_link`)
> - Specify here the link of the video you want to transcribe.

In [3]:
url = "https://www.youtube.com/watch?v=QGpmnA2JJ4U&ab_channel=VOALearningEnglish"

In [4]:
ydl_opts={}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info_dict = ydl.extract_info(url, download=False)
video_title = info_dict['title']
video_name = re.sub('[\\\\/*?:"<>|]', '', video_title)
name = video_name
ydl_opts = {
     'format': 'm4a/bestaudio/best',
         'noplaylist': True,
         'continue_dl': True,
         'outtmpl': f'./{name}.wav',
         'postprocessors': [{
             'key': 'FFmpegExtractAudio',
             'preferredcodec': 'wav',
             'preferredquality': '192',
         }],
         'geobypass':True,
         'ffmpeg_location':'/usr/bin/ffmpeg'
 }

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
     error_code = ydl.download(url)

[youtube] QGpmnA2JJ4U: Downloading webpage
[youtube] QGpmnA2JJ4U: Downloading android player API JSON
[youtube] QGpmnA2JJ4U: Downloading webpage
[youtube] QGpmnA2JJ4U: Downloading android player API JSON
[info] QGpmnA2JJ4U: Downloading 1 format(s): 140
[download] Destination: ./Donald Trump Victory Speech.wav
[download] 100% of 4.70MiB in 00:00                   
[FixupM4a] Correcting container of "./Donald Trump Victory Speech.wav"
[ExtractAudio] Destination: ./Donald Trump Victory Speech.wav
Deleting original file ./Donald Trump Victory Speech.orig.wav (pass -k to keep)


### Spliting the audio (`min_per_split = 1`)
> - When it comes to using the free version of Google's transcriber (**`speech_recognition`**), there is a limit on the length of the video (or audio) that should not be exceeded (this limit is around 5 minutes ). To remedy this problem, the following script splits the video into one minute long intervals and puts the generated mini-videos in a directory that has a name in the form of **`split files for: Video_Name.wav`**

In [5]:
class SplitWavAudioMubin():
    def __init__(self, folder, filename):
        self.folder = folder
        self.filename = filename
        self.filepath = folder + filename
        self.audio = AudioSegment.from_wav(self.filepath)
    
    def get_duration(self):
        return self.audio.duration_seconds
    
    def single_split(self, from_min, to_min, split_filename):
        t1 = from_min * 60 * 1000
        t2 = to_min * 60 * 1000
        split_audio = self.audio[t1:t2]
        split_audio.export(split_filename, format="wav")
        
    def multiple_split(self, min_per_split):
        total_mins = math.ceil(self.get_duration() / 60)
        for i in range(0, total_mins, min_per_split):
            split_fn = str(i) + '_' + self.filename
            self.single_split(i, i+min_per_split, split_fn)
            if i == total_mins - min_per_split:
                print('All splited successfully')
        print('>>> Video duration: ' + str(self.get_duration()))
        
def split_audio(file_name):
    directory = "splitted files for: " + file_name
    os.mkdir(directory)
    os.chdir(directory)
    split_wav = SplitWavAudioMubin("../", file_name)
    split_wav.multiple_split(min_per_split=1)

In [6]:
file_name = "{}.wav".format(video_name)
split_audio(file_name)

All splited successfully
>>> Video duration: 304.7386848072562


### Recognizing the text (` language = "en-US"`) https://cloud.google.com/speech-to-text/docs/languages
> - To perform text detection, we must first specify the language spoken in the video. To do this, we must search for the keyword equivalent to language in the language catalog available in the link on the title. ( In our case, it's **`English`** so the keyword is **`en-US`** )

In [7]:
search_dir = "./"
files = filter(os.path.isfile, os.listdir(search_dir))
files = [os.path.join(search_dir, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
files

['./0_Donald Trump Victory Speech.wav',
 './1_Donald Trump Victory Speech.wav',
 './2_Donald Trump Victory Speech.wav',
 './3_Donald Trump Victory Speech.wav',
 './4_Donald Trump Victory Speech.wav',
 './5_Donald Trump Victory Speech.wav']

In [8]:
texts = []
recognizer = sr.Recognizer()

for file in tqdm(files):
    with sr.AudioFile(file) as source:
        recorded_audio = recognizer.listen(source)
    try:
        text = recognizer.recognize_google(
                recorded_audio, 
                language="en-US" ## Replace with language keyword
            )
        texts.append(text)
    except Exception as ex:
        print(ex)

100%|██████████| 6/6 [00:22<00:00,  3.74s/it]







In [9]:
texts

["it is my high honor and distinct privilege to introduce to you the president-elect of the United States of America Donald Trump thank you very much everything I just received a call from secretary Clinton she congratulated us it's about us and I congratulated her and her family on a very very hard for campaign I mean she she fought very hard",
 "Hillary has worked very long and very hard over a long. Of time and we owe her a major debt of gratitude for her service to our country I mean that very soon soon now it's time for America to bind the wounds of division have to get together do all Republicans and Democrats and independents across this nation I say it is time for us to come together as one United people time I pledge that I will be president for all Americans and this is so important to me",
 "for those who have chosen not to support me in the past of which there were a few people I'm reaching out to you for your guidance and your help so that we can work together and unify ou

### Saving the recognized text
> - After performing the speech detection, we save the resulting text in a file which is in the form of **`Transcription_Video_Name.txt`**

In [10]:
result = ""
for text in texts:
    result += " " + text

os.chdir("../")
text_file = open("Transcription_"+ file_name[:-4] +".txt", "w")
text_file.write(result)
text_file.close()

### Translating the recognized text (`dest='fr'`)
> - In addition, we have tried here to translate the text into French after transcription, using the Google API (**`googletrans`**)
> - To correctly use this API, we must replace the dest variable with the output language keyword ( In our case, **`dest='fr'`**)

In [11]:
translator = Translator()

translate_text = ""
for text in texts:
    translate_text += " " + translator.translate(text, dest='fr').text
print(translate_text)

 c'est pour moi un grand honneur et un privilège distinct de vous présenter le président élu des États-Unis d'Amérique Donald Trump merci beaucoup tout ce que je viens de recevoir un appel de la secrétaire Clinton elle nous a félicités c'est à propos de nous et je l'ai félicitée ainsi que sa famille sur une campagne très très dure, je veux dire qu'elle s'est battue très fort Hillary a travaillé très longtemps et très dur pendant longtemps. Du temps et nous lui devons une grande dette de gratitude pour son service envers notre pays, je veux dire que très bientôt, il est temps pour l'Amérique de panser les blessures de la division, il faut que tous les républicains, démocrates et indépendants de ce pays se réunissent, je dis il est temps pour nous de nous rassembler en un seul peuple uni, je promets que je serai président pour tous les Américains et c'est si important pour moi pour ceux qui ont choisi de ne pas me soutenir dans le passé dont il y avait quelques personnes je vous tends la

### Saving the translated text
> - We save the resulting translated text in a file which is in the form of **`Transcription_translated_Video_Name.txt`**

In [12]:
text_file = open("Transcription_translated_"+ file_name[:-4] +".txt", "w")
text_file.write(translate_text)
text_file.close()

### General Idea summarization
> - Finally, we can use the text we have recovered to have a summary of the general idea discussed in the video
> - Here it is necessary to specify the **`max_length`** and the **`min_length`**, by default we have chosen that the length of the general idea of a text must be at least 10% of the total length of the text.

In [13]:
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

Downloading:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.51G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [14]:
ARTICLE = result
summary_text = summarizer(ARTICLE, max_length=100, min_length=int(len(result.split(" "))/10), do_sample=False)[0]["summary_text"]
print(summary_text)

Donald Trump: It's time for America to bind the wounds of division have to get together do all Republicans and Democrats and independents across this nation. I pledge that I will be president for all Americans.


> - 🙌 Notebook made by [@labriji_saad](https://github.com/labrijisaad)
> - 🔗 Linledin [@labriji_saad](https://www.linkedin.com/in/labrijisaad/)