Skip to content
Using Google Speech to Text API Provide a Simple Interface to Convert Audio Files
Find file
Pull request Compare This branch is 15 commits behind taf2:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bin
lib
samples
test
.gitignore
Gemfile
Gemfile.lock
README.rdoc
Rakefile
speech2text.gemspec

README.rdoc

Speech2Text

Using the power of ffmpeg/flac/Google and ruby here is a simple interface to play with to convert speech to text.

Using a new undocumentd speech API from Google with the help of this article: mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/

We're able to provide a very simple API in Ruby to decode simple audio to text.

The API from Google is not yet public and so may change. It also seems to be very fragile as more times than not it will return a 500, so the library has retry code built in - for larger audio files 10+ failures may return before a successful result is retrieved…

It also appears that the API only likes smaller audio files so there is a built in chunker that allows us to split the audio up into smaller chunks.

Example

audio = Speech::AudioToText.new("i-like-pickles.wav")
puts audio.to_text.inspect
=> {"captured_json"=>[["I like pickles", 0.92731786]], "confidence"=>0.92731786}

Command Line

speech2text i-like-pickles.wav
cat i-like-pickles.json
{"captured_json"=>[["I like pickles", 0.92731786]], "confidence"=>0.92731786}
Something went wrong with that request. Please try again.