Speech-to-text for podcasting prototypes
OpenEdge ABL JavaScript Python Shell
#2 Compare This branch is 118 commits ahead, 2 commits behind sgrau:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
keywords
speechToText
web
.gitignore
README.md

README.md

Introduction

Welcome to the SPINDLE project [blog] [website] code repository. This repository contains three main folders:

  • keywords: this directory contains keywords.py, a python script developed during the SPINDLE project that generates keywords from a text. It has been used during the project to [generate keywords from automatic transcriptions] (http://blogs.oucs.ox.ac.uk/openspires/2012/09/12/spindle-automatic-keyword-generation-step-by-step/).

  • speechToText: this directory contains instructions to set up CMU Sphinx4 in Large Vocabulary Continuous Speech Recognition mode. It also contains a config.xml file for the Transcribe.java application.

  • web: A prototype web interface for the above two systems, which can also do other useful things relating to podcasts, transcription and speech to text:

    • import audio and video podcasts from RSS, matching up audio and video recordings of the same material where possible
    • import transcripts from formats including SRT/WebVTT, Adobe Premiere XMP files, and others
    • edit imported transcripts in sync with the audio or video, or start with a blank transcript and create one from scratch
    • queue items for automated speech-to-text transcription, either using the freely available CMU Sphinx4 system or interfacing with the Koemei commercial ASR service (http://koemei.com)
    • automatically extract keywords for items with associated transcript text
    • export edited captions and transcripts in SRT, plain text, and HTML form
    • publish exported captions and transcripts to a directory of static files, with control over which items are published and in what form
    • publish a copy of the incoming RSS feed, including automatically extracted keywords as <category> tags

Some of these features are closer to completion than others. The web interface is written in Python with the Django framework. See web/README.md for more details on how to install and test it.

Tags

#spindle #openspires #ukoer #oerri