Speaker Prediction using Whisper - Lex Podcasts #624
sidhantls
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I performed speaker prediction on Lex Fridman Podcast captions using hidden states from Whisper, which lead to reasonable results (F1-score of 93%). I explore using the hidden states of various encoder blocks in the Whisper transformer and train a classifier. Using hidden states of some encoder blocks lead to better results than others. I summarized my findings in this repo- https://github.com/sidhantls/lexpod-speaker-prediction and Blog Post
This work is motivated by Andrej Karpathy’s Lexicap project, which he shares in this Twitter thread. He transcribed all of Lex Fridman Podcasts using Whisper. He also shares that it would be interesting to use Whisper for speaker prediction.
Despite the task difference between the original objective (speech recognition might require the model to focus more on occurrences than the speaker) and the task at hand (speaker identification, focus more on the speaker than occurrences/words), the Whisper hidden states seemed to provide reasonable rich embeddings for speaker prediction. And using the inner encoder layers help.
Beta Was this translation helpful? Give feedback.
All reactions