# Smart Meeting Assistant - WAV Transcript Processing
This notebook transcribes a meeting from an WAV file, performs speaker diarization, generates a summary, translates the content, and extracts action items.

In [None]:
import sys
import os

# Add the project root to sys.path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))

import whisper
from pyannote.audio import Pipeline
from dotenv import load_dotenv
from modules.summarizer import generate_summary
from modules.translator import translate_text
from modules.ds_action_items import extract_action_items_with_deepseek

load_dotenv()

# Load models
whisper_model = whisper.load_model("small")
hf_token = os.getenv("HF_TOKEN")
diarization_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization", use_auth_token=hf_token)

In [3]:
def transcribe_with_diarization(wav_path):
    print(f"🎧 Transcribing {wav_path}...")

    result = whisper_model.transcribe(wav_path)
    segments = result.get("segments", [])

    diarization = diarization_pipeline(wav_path)
    speaker_turns = []
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        speaker_turns.append({
            "speaker": speaker,
            "start": turn.start,
            "end": turn.end
        })

    # Debug print
    print("\n--- Whisper Segments ---")
    for seg in segments:
        print(f"Whisper: {seg['start']:.2f}s - {seg['end']:.2f}s → {seg['text']}")

    print("\n--- PyAnnote Diarization ---")
    for turn in speaker_turns:
        print(f"PyAnnote: {turn['start']:.2f}s - {turn['end']:.2f}s → {turn['speaker']}")

    # Improved speaker matching
    speaker_map = {}
    speaker_counter = 1
    labeled_lines = []

    for seg in segments:
        start, end = seg['start'], seg['end']
        best_match = None
        max_overlap = 0.0

        for turn in speaker_turns:
            overlap_start = max(start, turn["start"])
            overlap_end = min(end, turn["end"])
            overlap = max(0.0, overlap_end - overlap_start)

            if overlap > max_overlap:
                best_match = turn["speaker"]
                max_overlap = overlap

        matched_speaker = best_match or "Unknown"

        if matched_speaker not in speaker_map:
            speaker_map[matched_speaker] = f"Speaker {speaker_counter}"
            speaker_counter += 1

        readable_speaker = speaker_map[matched_speaker]
        labeled_lines.append(f"[{readable_speaker}] {seg['text'].strip()}")

    return labeled_lines


In [None]:
# Path to WAV file
wav_file = "assets/sample_meeting.wav"

# Transcribe and diarize
transcript_lines = transcribe_with_diarization(wav_file)

# Show transcript
print("=== Transcript ===")
print("\n".join(transcript_lines))




🎧 Transcribing assets/sample_meeting2.wav...

--- Whisper Segments ---
Whisper: 0.00s - 4.40s →  Good morning, everyone. Thanks for joining today's planning session.
Whisper: 4.40s - 9.04s →  Let's start with a quick check-in. Sarah, could you give us an update on the web dashboard?
Whisper: 11.20s - 17.04s →  Sure. We completed the new user analytics section last night. The date's pipeline is fully
Whisper: 17.04s - 21.60s →  integrated now, and we've added co-hard filters based on user segments.
Whisper: 21.60s - 26.24s →  QA has started testing, and I expect we can deploy to staging by Thursday.
Whisper: 26.40s - 34.64s →  That's great. On my side, I've been working with the design team on the mobile app layout.
Whisper: 34.64s - 41.44s →  We've finalized the wireframes for onboarding and user profile pages. Development will begin tomorrow.
Whisper: 43.60s - 47.36s →  Excellent. And have the localization issues on Android been resolved?
Whisper: 49.52s - 53.04s →  Not completely. We

In [5]:
# Generate Summary
summary = generate_summary("\n".join(transcript_lines))
print("=== Summary ===")
print(summary)

=== Summary ===
Speaker 1 will send out the meeting notes and action item shortly. Sarah gives an update on the web dashboard. QA has started testing and she expects it to be ready for staging by Thursday. On speaker 2's side, he's been working on the design of the mobile app layout. Speaker 3 has finalized the wireframes for onboarding and user profile pages. The localization issues on Android haven't been resolved yet. The DevOps pipeline is flaky again. The product demo is scheduled for next Monday.


In [6]:
# Translate to French
translation = translate_text("\n".join(transcript_lines), src_lang="en", tgt_lang="fr")
print("=== Translation (French) ===")
print(translation)



=== Translation (French) ===
[Speaker 1] Merci d'avoir rejoint la session de planification d'aujourd'hui. [Speaker 1] Commençons par un check-in rapide. Sarah, pourriez-vous nous donner une mise à jour sur le tableau de bord web? [Speaker 2] Bien sûr. Nous avons terminé la nouvelle section d'analyse d'utilisateur hier soir. [Speaker 3] Le pipeline de la date est entièrement intégré [Speaker 2], et nous avons ajouté des filtres co-dur basés sur des segments d'utilisateur. [Speaker 3] QA a commencé à tester, et je m'attends à ce que nous puissions déployer un nouveau produit avant jeudi. [Speaker 3] C'est génial. De mon côté, j'ai travaillé avec l'équipe de conception sur la mise en page de l'application mobile. [Speaker 3] Nous avons finalisé les images filaires pour les pages d'affichage et de profil utilisateur.


In [7]:
# Extract Action Items
action_items = extract_action_items_with_deepseek(transcript_lines)
print("=== Action Items ===")
print(action_items)




=== Action Items ===
Speaker 1:
- Send out meeting notes and action items
- Follow up on new pricing tiers

Speaker 2:
- Deploy web dashboard to staging by Thursday
- Integrate locale detection with user preferences

Speaker 3:
- Prepare product demo slides by Thursday
- Develop mobile app layout beginning tomorrow

Speaker 4:
- Submit product demo slides by Friday
- Consider auto-detecting locale settings on login
