# 🧠 WhisperX Transcription + Diarization for Qual Researchers (Colab Edition)
This notebook transcribes and diarizes audio using WhisperX and pyannote. It's designed for **education researchers** and others analyzing interview or classroom audio. No setup beyond this notebook is required.

## 🔧 Step 1: Install Required Packages

In [None]:
!pip install -q git+https://github.com/m-bain/whisperx.git
!pip install -q speechbrain torch torchvision torchaudio
!pip install -q pandas nltk webvtt-py srt python-dotenv ipywidgets plotly

## 🔐 Step 2: Enter Hugging Face Token (Required for Diarization)

In [None]:
import os
HF_TOKEN = input("Paste your Hugging Face token here: ")
os.environ['HF_TOKEN'] = HF_TOKEN

## 📁 Step 3: Upload Your Audio and (Optional) Pseudonym CSV

In [None]:
from google.colab import files
uploaded = files.upload()
audio_file_path = next(iter(uploaded))

In [None]:
import whisperx
import torch
import pandas as pd

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
compute_type = "float16" if device == "cuda" else "float32"

# Load and transcribe
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
transcription = model.transcribe(audio_file_path)

# Align with WhisperX
model_a, metadata = whisperx.load_align_model(
    language_code=transcription["language"], device=device)
result_aligned = whisperx.align(transcription["segments"], model_a, metadata, audio_file_path, device)

# Apply diarization
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
segments = diarize_model(audio_file_path)

# Assign speaker labels
result_with_speakers = whisperx.assign_word_speakers(
    diarize_model, result_aligned, segments)

# Convert to DataFrame
df = pd.DataFrame(result_with_speakers["segments"])
df.to_csv("transcript_output.csv", index=False)
df[["start", "end", "text", "speaker"]].head()

## ✅ Step 4: Download Your Transcription

In [None]:
from google.colab import files
files.download("transcript_output.csv")