Trainspodder mobile/desktop app displays Whisper transcription of BBC broadcast or YouTube video #233
Replies: 3 comments 3 replies
-
So the video and picture above are the result of a weekend of hacking to get Whisper working with existing Soundcloud Comments and Annotation GUI in Trainspodder ( https://www.youtube.com/watch?v=kEjU5nsaHdg ). It was meant for much sparser set of comments than you'd have for a transcription, thus the clutter of "comment" icons in the overview display. I've now updated the GUI specifically for Whisper such that the details view shows the transcript line anchored to both start and end times. The overview just shows a "bar" representing whether voice activity occurs at a given time (VAD). The updated interface showing Trainspodder/Whisper interface overlaying a video under analysis looks like: ... Shortly, I'll upload a new video of Trainspodder transcribing for this video. ... What's interesting here is that in the original video ( https://www.youtube.com/watch?v=RoqgWN-Z3iw ) is about "Qt" which is pronounced "cute" -- however Whisper seems to have gleaned from the context that "cute" refers to "Qt" not the word "cute" -- and that correct transcription of that technical acronym occurs throughout the transcript! Very impressive!! |
Beta Was this translation helpful? Give feedback.
-
https://rumble.com/v1n7cx8-trainspodder-and-whisper-transcribes-radio-w-good-proper-noun-spelling-infe.html Showing Trainspodder's updated GUI for displaying Whisper Transcriptions as an additional analysis source, this demonstrates some of the features, both bad and good, of using Whisper for free-form transcription of podcasts and broadcasts within Trainspodder. Trainspodder's own display of segmentation, speech/music segments, beat-events, BPMs, etc also provides good and consistent timing markers against the transcribed text, potentially including speaker-detection for multi-party conversations, esp if the voices are tonally different. Good and/or Interesting to Note:
Bad:
FYI, I'm using the following whisper call to process these media files: The original broadcast being played is at |
Beta Was this translation helpful? Give feedback.
-
@NielsMayer this is amazing work |
Beta Was this translation helpful? Give feedback.
-
Using a trivial extension to Whisper ( #228 ) I extended my still under development Qt-based multi-platform app, Trainspodder, to display the Whisper Transcription of a BBC 6 Broadcast. This demonstrates timings and accuracy of Whisper for both radio disk-jockey banter and song lyrics, alongside animated display of other audio features extracted from an online broadcast/podcast.
This video is made directly from the Screen Recording feature of my Android Mobile phone running the Qt6 version of Trainspodder.
https://www.youtube.com/watch?v=37KqNeBurzA
Beta Was this translation helpful? Give feedback.
All reactions