# Smart Meeting Assistant - WAV Transcript Processing
This notebook transcribes a meeting from an WAV file, performs speaker diarization, generates a summary, translates the content, and extracts action items.

In [2]:
import sys
import os

# Add the project root to sys.path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))

import whisper
from pyannote.audio import Pipeline
from dotenv import load_dotenv
from modules.summarizer import generate_summary
from modules.translator import translate_text
from modules.ds_action_items import extract_action_items_with_deepseek

load_dotenv()

# Load models
whisper_model = whisper.load_model("small")
hf_token = os.getenv("HF_TOKEN")
diarization_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization", use_auth_token=hf_token)

🔁 Loading DeepSeek model...


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _speechbrain_save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _speechbrain_load
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for load
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _recover
  if ismodule(module) and hasattr(module, '__file__'):
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.5.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Oscar\.cache\torch\pyannote\models--pyannote--segmentation\snapshots\c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b\pytorch_model.bin`
INFO:speechbrain.utils.fetching:Fetch hyperparams.yaml: Using symlink found at 'C:\Users\Oscar\.cache\torch\pyannote\speechbr

Model was trained with pyannote.audio 0.0.1, yours is 3.3.2. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.6.0+cpu. Bad things might happen unless you revert torch to 1.x.


DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _load
DEBUG:speechbrain.utils.checkpoints:Registered parameter transfer hook for _load
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for load_if_possible
DEBUG:speechbrain.utils.parameter_transfer:Collecting files (or symlinks) for pretraining in C:\Users\Oscar\.cache\torch\pyannote\speechbrain.
INFO:speechbrain.utils.fetching:Fetch embedding_model.ckpt: Using symlink found at 'C:\Users\Oscar\.cache\torch\pyannote\speechbrain\embedding_model.ckpt'
DEBUG:speechbrain.utils.parameter_transfer:Set local path in self.paths["embedding_model"] = C:\Users\Oscar\.cache\torch\pyannote\speechbrain\embedding_model.ckpt
INFO:speechbrain.utils.fetching:Fetch mean_var_norm_emb.ckpt: Us

In [3]:
def transcribe_with_diarization(wav_path):
    print(f"🎧 Transcribing {wav_path}...")

    result = whisper_model.transcribe(wav_path)
    segments = result.get("segments", [])

    diarization = diarization_pipeline(wav_path)
    speaker_turns = []
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        speaker_turns.append({
            "speaker": speaker,
            "start": turn.start,
            "end": turn.end
        })

    # Debug print
    print("\n--- Whisper Segments ---")
    for seg in segments:
        print(f"Whisper: {seg['start']:.2f}s - {seg['end']:.2f}s → {seg['text']}")

    print("\n--- PyAnnote Diarization ---")
    for turn in speaker_turns:
        print(f"PyAnnote: {turn['start']:.2f}s - {turn['end']:.2f}s → {turn['speaker']}")

    # Improved speaker matching
    speaker_map = {}
    speaker_counter = 1
    labeled_lines = []

    for seg in segments:
        start, end = seg['start'], seg['end']
        best_match = None
        max_overlap = 0.0

        for turn in speaker_turns:
            overlap_start = max(start, turn["start"])
            overlap_end = min(end, turn["end"])
            overlap = max(0.0, overlap_end - overlap_start)

            if overlap > max_overlap:
                best_match = turn["speaker"]
                max_overlap = overlap

        matched_speaker = best_match or "Unknown"

        if matched_speaker not in speaker_map:
            speaker_map[matched_speaker] = f"Speaker {speaker_counter}"
            speaker_counter += 1

        readable_speaker = speaker_map[matched_speaker]
        labeled_lines.append(f"[{readable_speaker}] {seg['text'].strip()}")

    return labeled_lines


In [4]:
# Path to WAV file
wav_file = "assets/sample_meeting_4.wav"

# Transcribe and diarize
transcript_lines = transcribe_with_diarization(wav_file)

# Show transcript
print("=== Transcript ===")
print("\n".join(transcript_lines))




🎧 Transcribing assets/sample_meeting_4.wav...

--- Whisper Segments ---
Whisper: 0.00s - 5.00s →  Thanks everyone for joining the quarterly review. Let's start with metrics.
Whisper: 5.00s - 11.00s →  User retention improved by 12%, churn dropped by 3% in Q2.
Whisper: 11.00s - 18.00s →  Marketing spent was 10% under budget. Highest ROI from the LinkedIn campaign.
Whisper: 18.00s - 23.00s →  Engineering velocity increased after switching to two weeks' brints.
Whisper: 23.00s - 27.00s →  That's promising. Let's replicate that velocity across teams.
Whisper: 27.00s - 31.00s →  One blocker, test automation is still manual.
Whisper: 31.00s - 35.00s →  We can pilot playwright tests for the web stack.
Whisper: 35.00s - 39.00s →  I'll lead the automation pilot with Q18 next week.
Whisper: 39.00s - 43.00s →  For strategy, we plan to expand into the education sector.
Whisper: 43.00s - 49.00s →  Agreed. I'll analyze market entry strategies and share by next Monday.
Whisper: 49.00s - 52.00s →  We'

In [5]:
# Generate Summary
summary = generate_summary("\n".join(transcript_lines))
print("=== Summary ===")
print(summary)

=== Summary ===
In the quarterly review, speakers discuss the performance of the business, strategy and technical matters. They also discuss how to improve the quality of life of users and increase the efficiency of the marketing campaign. They plan to expand into the education sector and prepare a landing page mock-up.


In [6]:
# Translate to French
translation = translate_text("\n".join(transcript_lines), src_lang="en", tgt_lang="fr")
print("=== Translation (French) ===")
print(translation)



=== Translation (French) ===
[Speaker 1] Merci à tous d'avoir participé à l'examen trimestriel. Commençons par les mesures. [Speaker 2] La rétention de l'utilisateur s'est améliorée de 12 %, le churn a chuté de 3 % au Q2. [Speaker 3] Le marketing a été dépensé de 10 % sous budget. Le plus haut ROI de la campagne LinkedIn. [Speaker 4] La vitesse de l'ingénierie a augmenté après le passage à deux semaines de brins. [Speaker 1] C'est prometteur. Répliquons cette vitesse entre les équipes. [Speaker 3] Un bloqueur, l'automatisation des tests est toujours manuelle. [Speaker 2] Nous pouvons piloter des tests de dramaturge pour la pile web. [Speaker 4] Je dirigerai le pilote d'automatisation avec Q18 la semaine prochaine. [Speaker 1] Pour la stratégie, nous prévoyons de nous étendre au secteur de l'éducation. [Speaker 2] Accepté. Je vais analyser les stratégies d'entrée sur le marché et partager d'ici lundi. [Speaker 3] Nous allons rejoindre une maquette de page d'atterrissage pour ce segment.

In [7]:
# Extract Action Items
action_items = extract_action_items_with_deepseek(transcript_lines)
print("=== Action Items ===")
print(action_items)




=== Action Items ===
Speaker 1:
- Task: Ensure teams replicate engineering velocity
- Task: Coordinate strategy expansion into education sector

Speaker 2:
- Task: Analyze market entry strategies for education sector
- Task: Share findings by next Monday

Speaker 3:
- Task: Mention blocker of manual test automation
- Task: Coordinate pilot of playwright tests with Q18

Speaker 4:
- Task: Coordinate with design for education sector landing page mock-ups
- Task: Attend next meeting

Note: Due dates are not mentioned in the transcript, so they are not included in the action items.
