<a href="https://colab.research.google.com/github/mnemnosyne/PodcastAnalysis/blob/main/pod_trans.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transcribing a Podcast

Transcribing a podcast can be a tedious but rewarding task. It can help make a podcast more accessible to a wider audience, enable better search engine optimization, and ensure the accuracy of the content. This code is an attempt to automate this process.

Automatic transcription involves the following steps:

1. Diarization: [pyannote.auto](http://pyannote.auto/) is used to identify the different speakers and when they are speaking.
2. pydub is used to divide the audio file into chunks based on the time stamps output by the diarization. Each block of dialogue is transcribed separately.
3. Transcription of the audio chunks.
4. Putting it all together.

In [None]:
# pyannote.auto implementation heavily inspired by https://colab.research.google.com/github/pyannote/pyannote-audio/blob/develop/tutorials/intro.ipynb
# Install the packages
# for speechbrain
!pip install -qq torch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 torchtext==0.12.0
!pip install -qq speechbrain==0.5.12

# pyannote.audio
!pip install -qq pyannote.audio

# for visualization purposes
!pip install -qq ipython==7.34.0

# for editing the audio file 
!pip install -qq pydub

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m750.6/750.6 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.0/21.0 MB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.4/10.4 MB[0m [31m44.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m496.8/496.8 KB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m33.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.5/109.5 KB[0m [31m521.4 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import pydub
from pyannote.audio import Audio 
from IPython.display import Audio as IPythonAudio

!wget -q https://chrt.fm/track/89ED1D/pdst.fm/e/2.gum.fm/rss.art19.com/episodes/9ffc3a6b-4798-4db5-b5de-c4e399ad857f.mp3
sound_file = pydub.AudioSegment.from_mp3("9ffc3a6b-4798-4db5-b5de-c4e399ad857f.mp3")


In [None]:
test_file = sound_file[0:300000]
test_file.export('podcast.wav', format='wav')
PODCAST_FILE = {'audio': 'podcast.wav'}

# load audio waveform and play it
waveform, sample_rate = Audio()(PODCAST_FILE)
IPythonAudio(data=waveform.squeeze(), rate=sample_rate, autoplay=True)

In [None]:
from huggingface_hub import notebook_login
notebook_login()

Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.huggingface/token
Login successful


In [None]:
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization', use_auth_token=True)
diarization = pipeline(PODCAST_FILE)

Downloading:   0%|          | 0.00/500 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/318 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/83.3M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.53M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/129k [00:00<?, ?B/s]

KeyboardInterrupt: ignored

In [None]:
diarization

In [None]:
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")