Speaker Prediction using Whisper - Lex Podcasts #624

sidhantls · 2022-12-01T15:05:36Z

sidhantls
Dec 1, 2022

I performed speaker prediction on Lex Fridman Podcast captions using hidden states from Whisper, which lead to reasonable results (F1-score of 93%). I explore using the hidden states of various encoder blocks in the Whisper transformer and train a classifier. Using hidden states of some encoder blocks lead to better results than others. I summarized my findings in this repo- https://github.com/sidhantls/lexpod-speaker-prediction and Blog Post

This work is motivated by Andrej Karpathy’s Lexicap project, which he shares in this Twitter thread. He transcribed all of Lex Fridman Podcasts using Whisper. He also shares that it would be interesting to use Whisper for speaker prediction.

Despite the task difference between the original objective (speech recognition might require the model to focus more on occurrences than the speaker) and the task at hand (speaker identification, focus more on the speaker than occurrences/words), the Whisper hidden states seemed to provide reasonable rich embeddings for speaker prediction. And using the inner encoder layers help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker Prediction using Whisper - Lex Podcasts #624

{{title}}

Replies: 0 comments

Select a reply

Speaker Prediction using Whisper - Lex Podcasts #624

sidhantls Dec 1, 2022

Replies: 0 comments

sidhantls
Dec 1, 2022