## Voice processor PoC

### Goal

Give back to Lucas free time by automatically processing his voice notes and turning them into feedbacks to kart drivers

### Estimated process

Inputs:

- Voice recorded feedbacks splitted by (1) race and (2) turn
- PDF with race results to identify drivers by their kart number

Processing:

- Create the driver/kart number/position record in memory
- Transform "raw" voice feedback into text and store it in memory
- Split the feedbacks by driver and turn

Outputs:

- Generate a nice feedback in text format to be sent via WhatsApp
- Generate a nice audio recorded feedback to be sent via WhatsApp


## Step 1

Process PDF

In [3]:
from dataclasses import dataclass

@dataclass
class Driver:
    position: int
    kart_number: int
    name: str


In [4]:
from pathlib import Path
import PyPDF2

# Define race name
race_name = "Tabajara 1"
drivers = []

# Find PDF file in data directory
race_dir = Path(f"data/input/{race_name}")
pdf_files = list(race_dir.glob("*.pdf"))

if not pdf_files:
    print(f"No PDF files found in {race_dir}")
else:
    pdf_path = pdf_files[0]

    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        first_page = reader.pages[0]
        text = first_page.extract_text().split("UF")[1]

        for line in text.split('MG'):
            if line.strip():
                parts = line.split(" ")

                if parts[0] == "\nMelhor":
                    break

                position = int(parts[0])
                kart_number = int(parts[1])
                driver_name = " ".join(parts[2:len(parts)-9]).replace("\n", "")
                driver = Driver(position=position, kart_number=kart_number, name=driver_name)
                drivers.append(driver)

print(drivers)


[Driver(position=1, kart_number=137, name='EMILIO DE SOUZA AMADEI BERIN...'), Driver(position=2, kart_number=144, name='JULIO RIBEIRO'), Driver(position=3, kart_number=164, name='THIAGO OLIVEIRA'), Driver(position=4, kart_number=107, name='Luis Caceres'), Driver(position=5, kart_number=143, name='EMILIO S A BERINGHS PAI'), Driver(position=6, kart_number=147, name='Guiga Guilherme Rodrigues'), Driver(position=7, kart_number=140, name='RODRIGO FERNANDES FARIA'), Driver(position=8, kart_number=126, name='Lucas Prates'), Driver(position=9, kart_number=156, name='Carlos Sampaio'), Driver(position=10, kart_number=142, name='ANDRÉ LA ROCCA'), Driver(position=11, kart_number=136, name='GERALDO MAGELA JACINTO'), Driver(position=12, kart_number=116, name='CLAUDEMIR SILVEIRA MAGNO'), Driver(position=13, kart_number=167, name='Newton Angelini')]


## Step 2

Speech to text, first convert the .opus files to .ogg files

In [5]:
import ffmpeg

# Find opus files in race directory
opus_files = list(race_dir.glob("*.opus"))

if not opus_files:
    print(f"No opus files found in {race_dir}")
else:
    for opus_file in opus_files:
        output_file = opus_file.with_suffix('.ogg')
        ffmpeg.input(str(opus_file)).output(str(output_file)).run()


ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with Apple clang version 17.0.0 (clang-1700.0.13.3)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.1.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex

Send each .ogg file to OpenAI

In [6]:
import openai
from dotenv import load_dotenv

load_dotenv()

# Find ogg files in race directory
ogg_files = list(race_dir.glob("*.ogg"))
transcriptions = []

if not ogg_files:
    print(f"No ogg files found in {race_dir}")
else:
    for ogg_file in ogg_files:
        print(f"Transcribing {ogg_file.name}...")
        
        # Open and read the audio file
        with open(ogg_file, "rb") as audio_file:
            # Call OpenAI API for speech-to-text
            transcript = openai.audio.transcriptions.create(
                model="gpt-4o-transcribe",
                file=audio_file,
                language="pt"
            )
            
            transcriptions.append({
                "file": ogg_file.name,
                "text": transcript.text
            })

print("\nTranscriptions:")
for t in transcriptions:
    print(f"\n{t['file']}:")
    print(t['text'])


Transcribing WhatsApp Audio 2025-07-21 at 14.06.49.ogg...
Transcribing WhatsApp Audio 2025-07-21 at 14.35.47.ogg...
Transcribing WhatsApp Audio 2025-07-21 at 14.36.12.ogg...

Transcriptions:

WhatsApp Audio 2025-07-21 at 14.06.49.ogg:
Curva 1, kart 167 tá freando muito tarde na curva. Kart 147 tá freando cedo demais e saindo devagar. 126 tá perdendo o ápice da curva.

WhatsApp Audio 2025-07-21 at 14.35.47.ogg:
Curva 2: kart 67 tá fazendo direitinho. Kart 147 tá freando no lugar correto, mas saindo devagar. Kart 126 tá fazendo a curva muito lento.

WhatsApp Audio 2025-07-21 at 14.36.12.ogg:
Curva oito. Kart 126 tá perdendo o ápice da curva. Kart 147 tá contornando muito longe do ápice. Kart 167 tá freando em cima da lombada, devia frear um pouco depois.


## Step 3

Summarize the feedbacks by: driver and turn

In [11]:
# Create a prompt with all transcriptions
prompt = """
Please analyze these race feedbacks and:
1. Organize them by kart number
3. Make a summary in Brazilian Portuguese

Feedbacks:
"""

for t in transcriptions:
    prompt += f"\n{t['file']}:\n{t['text']}\n"

# Get summary from OpenAI
response = openai.chat.completions.create(
    model="o4-mini-2025-04-16",
    messages=[
        {"role": "system", "content": "You will analyze race feedback and return a JSON array where each object has 'kart' (the kart number) and 'feedback' (a summary of feedback for that kart in Brazilian Portuguese). Format: [{\"kart\": \"number\", \"feedback\": \"text\"}]"},
        {"role": "user", "content": prompt}
    ],
    response_format={ "type": "json_object" }
)

print("\nResumo organizado por kart:")
feedback_json = response.choices[0].message.content
print(feedback_json)


Resumo organizado por kart:

{"result":[
  {"kart":"67","feedback":"Executa corretamente a curva 2, sem observações relevantes."},
  {"kart":"126","feedback":"Perde o ápice nas curvas 1 e 8 e faz a curva 2 muito lentamente; precisa melhorar a linha de corrida e aumentar a velocidade."},
  {"kart":"147","feedback":"Frena cedo demais na curva 1 e, embora o ponto de frenagem na curva 2 esteja correto, sai devagar; contorna longe do ápice na curva 8; deve otimizar frenagem, saída e aproximação ao ápice."},
  {"kart":"167","feedback":"Frena muito tarde na curva 1 e em cima da lombada na curva 8; deve ajustar e antecipar levemente o ponto de frenagem após a lombada."}
]}


## Step 4

Craft a nice feedback message to be sent to each driver

In [26]:
import json

@dataclass
class FinalFeedback:
    kart_number: int
    driver_name: str
    message: str

# Parse the JSON string into a Python object
feedbacks = json.loads(feedback_json)

final_feedbacks = []

# Create WhatsApp messages for each kart
for feedback in feedbacks["result"]:
    kart_number = feedback['kart']

    # Find driver name for this kart
    driver_name = "Piloto não identificado"
    for driver in drivers:
        if driver.kart_number == int(kart_number):
            driver_name = driver.name
            break
            
    # Create personalized message using OpenAI
    response = openai.chat.completions.create(
        model="o4-mini-2025-04-16",
        messages=[
            {"role": "system", "content": "You are a friendly racing coach writing a WhatsApp message to a kart driver about their race feedback. Write in Brazilian Portuguese in a motivating and constructive way."},
            {"role": "user", "content": f"Driver name: {driver_name}\nKart number: {kart_number}\nFeedback: {feedback['feedback']}\n\nWrite a friendly WhatsApp message incorporating this feedback."}
        ]
    )
    
    message = response.choices[0].message.content

    print("\n" + "="*50) 
    print(f"Mensagem para kart {kart_number} - {driver_name}:")
    print(message)
    
    final_feedbacks.append(FinalFeedback(kart_number, driver_name, message))



Mensagem para kart 67 - Piloto não identificado:
Fala, Piloto não identificado! 👋  
Tudo certo no #67? Recebi seu feedback da última prova e… olha só: você mandou muito bem na curva 2, executou direitinho e sem nenhuma observação! 🎉

Parabéns por essa consistência! Agora é manter o foco e buscar esse mesmo desempenho nas demais partes do traçado. Cada detalhe faz diferença, e você já mostrou que tem ritmo e técnica.

Qualquer dúvida ou desafio, me chama aqui. Vamos juntos afiar ainda mais seu feeling e buscar o pódio na próxima! 💪🏁  

Abraço e bons treinos!  
Seu Coach

Mensagem para kart 126 - Lucas Prates:
Fala, Lucas! Tudo bem? Aqui é o seu coach falando sobre o treino com o kart 126. Gostei da sua dedicação, mas percebi alguns pontos que podemos ajustar para baixar seu tempo por volta:

1) Curvas 1 e 8: você está perdendo o ápice e saindo um pouco largo. Tenta antecipar um tiquinho o ponto de frenagem e “fechar” mais a curva, buscando aquele toque ideal no meio do traçado.  
2) Cu

## Step 5

Convert each nicely crafted message into speech

In [28]:
import os
from openai import OpenAI

# Create output directory if it doesn't exist
os.makedirs(f"data/output/{race_name}", exist_ok=True)

# Initialize OpenAI client
client = OpenAI()

# Convert each message to speech
for feedback in final_feedbacks:
    print(f"\nGenerating audio for kart {feedback.driver_name}...")
    
    # Generate speech using OpenAI
    response = client.audio.speech.create(
        model="tts-1",
        voice="shimmer",
        input=feedback.message
    )
    
    # Save the audio file
    output_file = f"data/output/{race_name}/kart_{feedback.kart_number}_{feedback.driver_name}.mp3"
    with open(output_file, "wb") as f:
        f.write(response.content)
    
    print(f"Audio saved to: {output_file}")

print("\nAll audios were generated successfully!")



Generating audio for kart Piloto não identificado...
Audio saved to: data/output/Tabajara 1/kart_67_Piloto não identificado.mp3

Generating audio for kart Lucas Prates...
Audio saved to: data/output/Tabajara 1/kart_126_Lucas Prates.mp3

Generating audio for kart Guiga Guilherme Rodrigues...
Audio saved to: data/output/Tabajara 1/kart_147_Guiga Guilherme Rodrigues.mp3

Generating audio for kart Newton Angelini...
Audio saved to: data/output/Tabajara 1/kart_167_Newton Angelini.mp3

All audios were generated successfully!
