# Smart Meeting Assistant - WAV Transcript Processing
This notebook transcribes a meeting from an WAV file, performs speaker diarization, generates a summary, translates the content, and extracts action items.

In [7]:
import sys
import os

# Add the project root to sys.path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))

import whisper
from pyannote.audio import Pipeline
from dotenv import load_dotenv
from modules.summarizer import generate_summary
from modules.translator import translate_text
from modules.ds_action_items import extract_action_items_with_deepseek

load_dotenv()

# Load models
whisper_model = whisper.load_model("small")
hf_token = os.getenv("HF_TOKEN")
diarization_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization", use_auth_token=hf_token)

Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.5.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Oscar\.cache\torch\pyannote\models--pyannote--segmentation\snapshots\c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b\pytorch_model.bin`


Model was trained with pyannote.audio 0.0.1, yours is 3.3.2. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.6.0+cpu. Bad things might happen unless you revert torch to 1.x.




In [8]:
def transcribe_with_diarization(wav_path):
    print(f"🎧 Transcribing {wav_path}...")

    result = whisper_model.transcribe(wav_path)
    segments = result.get("segments", [])

    diarization = diarization_pipeline(wav_path)
    speaker_turns = []
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        speaker_turns.append({
            "speaker": speaker,
            "start": turn.start,
            "end": turn.end
        })

    # Debug print
    print("\n--- Whisper Segments ---")
    for seg in segments:
        print(f"Whisper: {seg['start']:.2f}s - {seg['end']:.2f}s → {seg['text']}")

    print("\n--- PyAnnote Diarization ---")
    for turn in speaker_turns:
        print(f"PyAnnote: {turn['start']:.2f}s - {turn['end']:.2f}s → {turn['speaker']}")

    # Improved speaker matching
    speaker_map = {}
    speaker_counter = 1
    labeled_lines = []

    for seg in segments:
        start, end = seg['start'], seg['end']
        best_match = None
        max_overlap = 0.0

        for turn in speaker_turns:
            overlap_start = max(start, turn["start"])
            overlap_end = min(end, turn["end"])
            overlap = max(0.0, overlap_end - overlap_start)

            if overlap > max_overlap:
                best_match = turn["speaker"]
                max_overlap = overlap

        matched_speaker = best_match or "Unknown"

        if matched_speaker not in speaker_map:
            speaker_map[matched_speaker] = f"Speaker {speaker_counter}"
            speaker_counter += 1

        readable_speaker = speaker_map[matched_speaker]
        labeled_lines.append(f"[{readable_speaker}] {seg['text'].strip()}")

    return labeled_lines


In [None]:
# Path to WAV file
wav_file = "assets/sample_meeting_1.wav"

# Transcribe and diarize
transcript_lines = transcribe_with_diarization(wav_file)

# Show transcript
print("=== Transcript ===")
print("\n".join(transcript_lines))


🎧 Transcribing assets/sample_meeting.wav...





--- Whisper Segments ---
Whisper: 0.00s - 3.60s →  Good morning, everyone. Let's get started with today's meeting.
Whisper: 4.24s - 7.76s →  Sure. The mobile team has completed the login feature.
Whisper: 8.44s - 11.04s →  I'll test it and provide feedback by Wednesday.
Whisper: 11.76s - 14.88s →  Great. We also need to finalize the budget proposal.
Whisper: 15.48s - 18.44s →  I can coordinate with finance and draft the document.
Whisper: 19.24s - 21.68s →  Please include the updated cost projections.
Whisper: 22.44s - 25.04s →  Noted. I'll send a draft by Friday.
Whisper: 25.84s - 29.16s →  Also, don't forget to schedule the stakeholder presentation.
Whisper: 29.16s - 31.72s →  I'll book a slot for next Monday.

--- PyAnnote Diarization ---
PyAnnote: 0.03s - 3.61s → SPEAKER_02
PyAnnote: 4.32s - 7.47s → SPEAKER_01
PyAnnote: 7.47s - 11.20s → SPEAKER_00
PyAnnote: 11.83s - 15.02s → SPEAKER_03
PyAnnote: 15.57s - 18.53s → SPEAKER_01
PyAnnote: 18.53s - 21.77s → SPEAKER_00
PyAnnote: 22.49s -

In [10]:
# Generate Summary
summary = generate_summary("\n".join(transcript_lines))
print("=== Summary ===")
print(summary)

=== Summary ===
The mobile team has completed the login feature. Speaker 3 will test it and provide feedback by Wednesday. Speaker 2 will draft the budget proposal and send it by Friday. Speaker 1 will schedule the stakeholder presentation for next Monday.   


In [11]:
# Translate to French
translation = translate_text("\n".join(transcript_lines), src_lang="en", tgt_lang="fr")
print("=== Translation (French) ===")
print(translation)



=== Translation (French) ===
Bonjour, tout le monde. Commençons par la réunion d'aujourd'hui. [Speaker 2] Bien sûr. L'équipe mobile a terminé la fonction de connexion. [Speaker 3] Je vais la tester et fournir des commentaires d'ici mercredi. [Speaker 4] Super. Nous devons également finaliser le projet de budget. [Speaker 2] Je peux coordonner avec la finance et rédiger le document. [Speaker 3] Veuillez inclure les projections de coûts mises à jour. [Speaker 2] Noté. Je vais envoyer un avant-projet d'ici vendredi. [Speaker 1] Aussi, n'oubliez pas de programmer la présentation des intervenants. [Speaker 3] Je vais réserver un créneau pour lundi prochain.


In [12]:
# Extract Action Items
action_items = extract_action_items_with_deepseek(transcript_lines)
print("=== Action Items ===")
print(action_items)




=== Action Items ===
Speaker 1:
- Schedule the stakeholder presentation

Speaker 2:
- Test the mobile team's login feature
- Draft the budget proposal
- Send a draft by Friday

Speaker 3:
- Test the mobile team's login feature
- Provide feedback by Wednesday
- Schedule the stakeholder presentation
- Book a slot for next Monday

Action items are grouped by speaker. Each speaker has their tasks listed, along with any due dates mentioned in the transcript.
