## Voice processor PoC

### Goal

Give back to Lucas free time by automatically processing his voice notes and turning them into feedbacks to kart drivers

### Estimated process

Inputs:

- Voice recorded feedbacks splitted by (1) race and (2) turn
- PDF with race results to identify drivers by their kart number

Processing:

- Create the driver/kart number/position record in memory
- Transform "raw" voice feedback into text and store it in the database
- Split the feedbacks by driver and turn

Outputs:

- Generate a nice feedback in text format to be sent via WhatsApp
- Generate a nice audio recorded feedback to be sent via WhatsApp


## Step 1

Process PDF

In [27]:
from dataclasses import dataclass

@dataclass
class Driver:
    position: int
    kart_number: int
    name: str


In [40]:
from pathlib import Path
import PyPDF2

# Define race name
race_name = "Tabajara 1"
drivers = []

# Find PDF file in data directory
race_dir = Path(f"data/input/{race_name}")
pdf_files = list(race_dir.glob("*.pdf"))

if not pdf_files:
    print(f"No PDF files found in {race_dir}")
else:
    pdf_path = pdf_files[0]

    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        first_page = reader.pages[0]
        text = first_page.extract_text().split("UF")[1]

        for line in text.split('MG'):
            if line.strip():
                parts = line.split(" ")

                if parts[0] == "\nMelhor":
                    break

                position = int(parts[0])
                kart_number = int(parts[1])
                driver_name = " ".join(parts[2:len(parts)-9]).replace("\n", "")
                driver = Driver(position=position, kart_number=kart_number, name=driver_name)
                drivers.append(driver)

print(drivers)


[Driver(position=1, kart_number=137, name='EMILIO DE SOUZA AMADEI BERIN...'), Driver(position=2, kart_number=144, name='JULIO RIBEIRO'), Driver(position=3, kart_number=164, name='THIAGO OLIVEIRA'), Driver(position=4, kart_number=107, name='Luis Caceres'), Driver(position=5, kart_number=143, name='EMILIO S A BERINGHS PAI'), Driver(position=6, kart_number=147, name='Guiga Guilherme Rodrigues'), Driver(position=7, kart_number=140, name='RODRIGO FERNANDES FARIA'), Driver(position=8, kart_number=126, name='Lucas Prates'), Driver(position=9, kart_number=156, name='Carlos Sampaio'), Driver(position=10, kart_number=142, name='ANDRÉ LA ROCCA'), Driver(position=11, kart_number=136, name='GERALDO MAGELA JACINTO'), Driver(position=12, kart_number=116, name='CLAUDEMIR SILVEIRA MAGNO'), Driver(position=13, kart_number=167, name='Newton Angelini')]


## Step 2

Speech to text, first convert the .opus files to .ogg files

In [42]:
import ffmpeg

# Find opus files in race directory
opus_files = list(race_dir.glob("*.opus"))

if not opus_files:
    print(f"No opus files found in {race_dir}")
else:
    for opus_file in opus_files:
        output_file = opus_file.with_suffix('.ogg')
        ffmpeg.input(str(opus_file)).output(str(output_file)).run()


ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with Apple clang version 17.0.0 (clang-1700.0.13.3)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.1.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex

Send each .ogg file to OpenAI

In [43]:
import openai
from dotenv import load_dotenv

load_dotenv()

# Find ogg files in race directory
ogg_files = list(race_dir.glob("*.ogg"))
transcriptions = []

if not ogg_files:
    print(f"No ogg files found in {race_dir}")
else:
    for ogg_file in ogg_files:
        print(f"Transcribing {ogg_file.name}...")
        
        # Open and read the audio file
        with open(ogg_file, "rb") as audio_file:
            # Call OpenAI API for speech-to-text
            transcript = openai.audio.transcriptions.create(
                model="gpt-4o-transcribe",
                file=audio_file,
                language="pt"
            )
            
            transcriptions.append({
                "file": ogg_file.name,
                "text": transcript.text
            })

print("\nTranscriptions:")
for t in transcriptions:
    print(f"\n{t['file']}:")
    print(t['text'])


Transcribing WhatsApp Audio 2025-07-21 at 14.06.49.ogg...

Transcriptions:

WhatsApp Audio 2025-07-21 at 14.06.49.ogg:
Curva um, kart 167 está freando muito tarde na curva. Kart 147 está freando cedo demais e saindo devagar. 126 está perdendo o ápice da curva.
