Skip to content
This repository has been archived by the owner on Oct 7, 2021. It is now read-only.
Vincent Schramer edited this page Apr 10, 2014 · 2 revisions

Welcome to the podcastquotes wiki!

Podcast Transcription Research

This feature would be incredibly handy to provide users with a sort of outline of the podcast to tidy up and transcribe. I can imagine one nice use case such as: A listener remembers hearing the broadcaster talk about fried ice cream. A rough transcription may catch some of these words allowing the listener to search for these terms and find the area where the broadcaster may have uttered "fried", "ice", or "cream". Additionally having a rough time-stamped script of a podcast may make the manual transcription process easier.

When exploring this feature, we should push some podcasts through a few speech recognition engines with various acoustic models to see how feasible this would be.

Some open source speech recognition engines:

  • CMU Sphinx
    • The speech tutorial explains how speech processing is split up into "utterances." This may imply that somewhere in the recognition pipeline there can be time-stamps identified where people begin speaking.
    • Appears to have some premade Acoustic/Language models, such as "US English Broadcast News Acoustic Model"
    • There is a very notable project called Gaupol that uses CMU Sphinx to assist in Movie transcription. This is extremely close to what this PQ feature would try to do.
  • Julius
Clone this wiki locally