# Conversão do arquivo para MP3

Antes de efetuar o processamento, eu decidi efetuar a conversão do arquivo de vídeo para áudio, já que para nós o que é relevante é apenas o áudio para no futuro converter em texto.

Para isso eu estou utilizando a lib moviepy que já tinha sido utilizada no outro script, já que ela faz muito bem essa conversão.

In [3]:
!pip install moviepy

Defaulting to user installation because normal site-packages is not writeable
Collecting moviepy
  Downloading moviepy-1.0.3.tar.gz (388 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.3/388.3 KB[0m [31m369.3 kB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m[36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting proglog<=1.0.0
  Downloading proglog-0.1.10-py3-none-any.whl (6.1 kB)
Collecting imageio<3.0,>=2.5
  Downloading imageio-2.35.0-py3-none-any.whl (315 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.4/315.4 KB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m
[?25hCollecting imageio_ffmpeg>=0.2.0
  Downloading imageio_ffmpeg-0.5.1-py3-none-manylinux2010_x86_64.whl (26.9 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.9/26.9 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m0m eta [36m0:00:01[0m0:01[0m:01[0m
Building wheels for collected packages:

In [4]:
from moviepy.editor import VideoFileClip

def convert_video_to_audio(video_file, output_file, extension):
    video = VideoFileClip(video_file)
    audio = video.audio

    audio.write_audiofile(output_file, codec=extension)

    video.close()
    audio.close()

extension = 'mp3'
video_file = 'data/big-o.mp4'
audio_file = 'audio/big-o.{}'.format(extension)

convert_video_to_audio(video_file, audio_file, extension)

MoviePy - Writing audio in audio/big-o.mp3


                                                                        

MoviePy - Done.


## Bibliotecas para criar as chunks de áudio

Para cortar o arquivo de áudio em vários arquivo eu decidi utilizar a lib do python `pydub`, já que ela consegue efetuar com mais facilidade esse processo.

In [1]:
!pip install pydub

Defaulting to user installation because normal site-packages is not writeable
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


## Efetuando o corte do áudio

Nessa etapa eu estou fazendo a busca do conteúdo de áudio e definindo através de `ms` o tamanho do conteúdo definido pela constante `CHUNK_LENGTH`.

Para isso é necessário instalar o módulo ffmpeg para que ele possa trabalhar com o arquivo de áudio, aqui está um breve guia de instalação.

* No Ubuntu/Debian
```bash
sudo apt install ffmpeg
```

* No macOS (usando Homebrew):
```bash
brew install ffmpeg
```

* No Windows:
Baixe 'ffmpeg' (aqui)[https://ffmpeg.org/download.html]
Extraia os arquivos e adicione o caminho para a pasta `bin` do  `ffmpeg` à variável de ambiente `PATH`.

* Verifique a instalação:
Depois que instalar, você pode verificar se o módulo está instalado corretamente executando:
```bash
ffmpeg -version
```

In [10]:
import os
from pydub import AudioSegment

CHUNK_MINUTES = 3

def split_audio(file_path, output_folder):
    chunk_length = CHUNK_MINUTES * 60 * 1000

    file_name, file_extension = os.path.splitext(file_path)
    output_format = file_extension.lstrip('.').lower()

    audio = AudioSegment.from_file(file_path)
    duration = len(audio)

    chunks_number = duration // chunk_length + (1 if duration % chunk_length else 0)
    
    for i in range(chunks_number):
        start_time = i * chunk_length
        end_time = min((i + 1) * chunk_length, duration)

        chunk = audio[start_time:end_time]
        chunk_name = f"part_{i + 1}.{output_format}"

        if not os.path.exists(output_folder):
            os.makedirs(output_folder)
        
        chunk.export(os.path.join(output_folder, chunk_name), format=output_format)
        print(f"Gerou {chunk_name} de {file_name}")


file_path = "audio/big-o.mp3"
output_folder = "chunks"

split_audio(file_path, output_folder)



Gerou part_1.mp3 de audio/big-o
Gerou part_2.mp3 de audio/big-o
Gerou part_3.mp3 de audio/big-o
Gerou part_4.mp3 de audio/big-o
Gerou part_5.mp3 de audio/big-o


# Convertendo cada parte dos áudios em textos

Depois de cortar o arquivo de áudio em vários pedaços, nós vamos pegar cada um desses arquivos e convertê-los em um resumo para que seja processado. Para isso eu peguei cada um dos arquivos que foram adicionados na pasta chunks, a partir disso eu estou usando processamento paralelo em threads para que ele processe de 5 em 5 arquivos, e depois criar um arquivo final com todo o conteúdo extraído dos áudios.

In [12]:
!pip install speechrecognition

Defaulting to user installation because normal site-packages is not writeable
Collecting speechrecognition
  Downloading SpeechRecognition-3.10.4-py2.py3-none-any.whl (32.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.8/32.8 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m0m eta [36m0:00:01[0m[36m0:00:01[0m
Installing collected packages: speechrecognition
Successfully installed speechrecognition-3.10.4


In [4]:
import os
import uuid
import speech_recognition as sr
from moviepy.editor import AudioFileClip
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor

ALLOWED_EXTENSIONS = {'.mp3', '.wav', '.flac'}
BATCH_SIZE = 5

def generate_temp_filename():
    timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
    unique_id = uuid.uuid4().hex
    return f"temp_{timestamp}_{unique_id}.wav"

def transcribe_audio(audio_path):
    recognizer = sr.Recognizer()

    audio_clip = AudioFileClip(audio_path)
    temp_wav = generate_temp_filename()
    audio_clip.write_audiofile(temp_wav, codec='pcm_s16le')

    with sr.AudioFile(temp_wav) as source:
        audio_data = recognizer.record(source)
        try:
            return recognizer.recognize_google(audio_data, language="pt-BR")
        except sr.UnknownValueError:
            return "[Inaudível]"
        except sr.RequestError as e:
            return ""
        finally:
            if os.path.exists(temp_wav):
                os.remove(temp_wav)

def process_file(file_name, folder_path):
    file_path = os.path.join(folder_path, file_name)
    if os.path.isfile(file_path):
        print(f"Transcrevendo {file_name}...")
        transcription = transcribe_audio(file_path)
        return transcription
    return None

def transcribe_folder(folder_path):
    files = [f for f in sorted(os.listdir(folder_path)) if os.path.isfile(os.path.join(folder_path, f)) and os.path.splitext(f)[1].lower() in ALLOWED_EXTENSIONS]

    all_transcriptions = []
    with ThreadPoolExecutor(max_workers=BATCH_SIZE) as executor:
        futures = [executor.submit(process_file, file, folder_path) for file in files]
        for future in futures:
            transcription = future.result()
            all_transcriptions.append(transcription)

    return all_transcriptions

def save_transcriptions(transcriptions, output_folder):
    os.makedirs(output_folder, exist_ok=True)
    
    output_file = os.path.join(output_folder, "transcriptions.txt")
    with open(output_file, 'w', encoding='utf-8') as f:
        for transcription in transcriptions:
            f.write(f"\n{transcription}\n")

input_folder = "chunks"
output_folder = "data"
transcriptions = transcribe_folder(input_folder)
save_transcriptions(transcriptions, output_folder)

print(f"Transcrições salvas em {os.path.join(output_folder, 'transcriptions.txt')}")


Transcrevendo part_1.mp3...
Transcrevendo part_2.mp3...
Transcrevendo part_3.mp3...
Transcrevendo part_4.mp3...
Transcrevendo part_5.mp3...
MoviePy - Writing audio in temp_20240816121519_a786c727ac5043cdb7ba1a711341a64d.wav
MoviePy - Writing audio in temp_20240816121519_a52f81c802b940b1b84eff1f2cfde0a0.wav


chunk:   0%|          | 3/3970 [00:00<00:10, 387.17it/s, now=None]

MoviePy - Writing audio in temp_20240816121519_453d698979ed42fa9ed8ed52b0436235.wav


chunk:   0%|          | 15/3970 [00:00<00:12, 314.26it/s, now=None]

MoviePy - Writing audio in temp_20240816121519_e0692f99dcb84f61b3f43c347ccd4f5f.wav
MoviePy - Writing audio in temp_20240816121519_1fa1d5a721914530b2464022e7db356d.wav



[A


[A[A[A

chunk:   1%|          | 20/3970 [00:00<00:20, 196.50it/s, now=None]
[A


[A[A[A

chunk:   1%|          | 40/3970 [00:00<00:21, 185.39it/s, now=None]


chunk:   2%|▏         | 64/3970 [00:00<00:18, 209.02it/s, now=None]
[A

[A[A


chunk:   2%|▏         | 86/3970 [00:00<00:19, 199.69it/s, now=None]
[A

[A[A


[A[A[A
chunk:   3%|▎         | 107/3970 [00:00<00:20, 183.96it/s, now=None]

[A[A


[A[A[A
chunk:   3%|▎         | 129/3970 [00:00<00:20, 190.60it/s, now=None]

[A[A


chunk:   4%|▍         | 153/3970 [00:00<00:19, 193.07it/s, now=None]

[A[A
[A


chunk:   4%|▍         | 173/3970 [00:00<00:20, 189.09it/s, now=None]

[A[A
[A


chunk:   5%|▍         | 192/3970 [00:01<00:20, 184.03it/s, now=None]
[A

[A[A


chunk:   5%|▌         | 211/3970 [00:01<00:21, 172.75it/s, now=None]
[A


[A[A[A

chunk:   6%|▌         | 229/3970 [00:01<00:22, 166.84it/s, now=None]
[A


[A[A[A

chunk:   6%|▌         | 246/3970 [00:01<00:23, 157.56it/s, now=N

MoviePy - Done.


chunk:  68%|██████▊   | 2705/3970 [00:17<00:09, 126.51it/s, now=None]

[A[A
chunk:  69%|██████▊   | 2724/3970 [00:18<00:08, 140.61it/s, now=None]

[A[A
chunk:  69%|██████▉   | 2744/3970 [00:18<00:07, 153.72it/s, now=None]

[A[A
chunk:  70%|██████▉   | 2763/3970 [00:18<00:07, 161.10it/s, now=None]

[A[A
chunk:  70%|███████   | 2782/3970 [00:18<00:07, 167.00it/s, now=None]

[A[A
chunk:  71%|███████   | 2805/3970 [00:18<00:06, 178.35it/s, now=None]

[A[A
chunk:  71%|███████   | 2827/3970 [00:18<00:06, 188.76it/s, now=None]

[A[A
chunk:  72%|███████▏  | 2854/3970 [00:18<00:05, 209.43it/s, now=None]

[A[A
chunk:  72%|███████▏  | 2876/3970 [00:18<00:05, 209.92it/s, now=None]

[A[A
chunk:  73%|███████▎  | 2898/3970 [00:18<00:05, 198.64it/s, now=None]

[A[A
chunk:  74%|███████▎  | 2919/3970 [00:19<00:05, 192.99it/s, now=None]

[A[A
chunk:  79%|███████▉  | 3142/3970 [00:19<00:01, 583.91it/s, now=None]
[A

[A[A
chunk:  81%|████████  | 3203/3970 [00:19<00:01, 508.84it/s, 

MoviePy - Done.




[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A

[A[A
[A
[A
[A                                                                  


[A[A[A                                                            

[A[A

[A[A

[A[A

MoviePy - Done.




[A[A

[A[A

[A[A

[A[A

[A[A

[A[A

[A[A


[A[A[A                                                            

[A[A

[A[A

[A[A

MoviePy - Done.




[A[A

[A[A

[A[A

MoviePy - Done.
Transcrições salvas em data/transcriptions.txt
