# Caption Podcast Episodes

After downloading the podcast episodes into an organized directory in [lesson 1](./1-download-podcasts.ipynb), I use the whisper model to transcribe each episode.

> 🌧 Quick note! I use the "base" whisper model because it runs on my local machine. There are more powerful versions of the whisper model which could produce higher quality outputs. See [this list of Whisper models](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) and pick what works best for you. 

In [None]:
# Install the latest version from github
!pip install git+https://github.com/openai/whisper.git 

In [None]:
CONFIG = {
    "podcast": {
        "rss_url": "https://anchor.fm/s/74aab30/podcast/rss",
        "summary_regex": r"<p>(?P<speaker>[\w\s]+)\s-\s(?P<reference>.*)<\/p>",
    },
    "output_dir": "media",
}

In [2]:
import whisper
from pathlib import Path

model = whisper.load_model("base")
media_dir = Path(CONFIG["output_dir"])

In [None]:
for mp3_path in media_dir.rglob("**/*.mp3"):
    episode_dir = mp3_path.parent
    transcript_path = episode_dir / "transcript.txt"
    if transcript_path.exists():
        continue
    print(f":: {mp3_path}")
    print(f"   - Begin transcription")
    transcript = model.transcribe(mp3_path.as_posix())
    print(f"   - End transcription")
    transcript_path.write_text(transcript["text"], encoding="utf-8")
    print(f"   - Transcription written to {transcript_path}")