# 🧠 WhisperX + PyAnnote Diarization (Colab Edition)
This notebook transcribes audio using WhisperX, then applies speaker diarization using the latest version of `pyannote.audio`. Built for qualitative researchers.


## 🔧 Step 1: Install Required Packages

In [None]:
!pip install -q git+https://github.com/m-bain/whisperx.git
!pip install -q pyannote.audio --upgrade
!pip install -q torch torchvision torchaudio
!pip install -q pandas ipywidgets plotly

## 🔐 Step 2: Enter Hugging Face Token (for diarization)

In [None]:
import os
HF_TOKEN = input("Paste your Hugging Face token here: ")
os.environ['HF_TOKEN'] = HF_TOKEN

## 📁 Step 3: Upload Audio File

In [None]:
from google.colab import files
uploaded = files.upload()
audio_file_path = next(iter(uploaded))

## 📝 Step 4: Transcribe Audio Using WhisperX

In [None]:
import whisperx
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = whisperx.load_model('large-v2', device)
transcription = model.transcribe(audio_file_path)

align_model, metadata = whisperx.load_align_model(
    language_code=transcription['language'], device=device)
aligned_result = whisperx.align(transcription['segments'], align_model, metadata, audio_file_path, device)


## 🗣️ Step 5: Perform Speaker Diarization with pyannote.audio 3.x

In [None]:
from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization', use_auth_token=HF_TOKEN)
diarization = pipeline(audio_file_path)


## 🔗 Step 6: Assign Speakers to Words by Timestamp Overlap

In [None]:
import pandas as pd

# Convert word segments to DataFrame
words = aligned_result['word_segments']
word_df = pd.DataFrame(words)

# Assign speaker to each word
word_df['speaker'] = 'UNKNOWN'
for turn in diarization.itertracks(yield_label=True):
    segment, _, speaker = turn
    mask = (word_df['start'] >= segment.start) & (word_df['end'] <= segment.end)
    word_df.loc[mask, 'speaker'] = speaker

# Save CSV
word_df.to_csv("transcript_with_speakers.csv", index=False)
word_df[['start', 'end', 'word', 'speaker']].head()

## ✅ Step 7: Download Your Output

In [None]:
from google.colab import files
files.download("transcript_with_speakers.csv")