IBM Watson Speech Services for Web Browsers

Allows you to easily add voice recognition and synthesis to any web app with minimal code.

Built for Browsers

This library is primarily intended for use in web browsers. Check out watson-developer-cloud to use Watson services (speech and others) from Node.js.

However, a server-side component is required to generate auth tokens. The examples/ folder includes example Node.js and Python servers, and SDKs are available for Node.js, Java, Python, and there is also a REST API.

Installation - standalone

Pre-compiled bundles are available from on GitHub Releases - just download the file and drop it into your website: https://github.com/watson-developer-cloud/speech-javascript-sdk/releases

Installation - bower

bower install --save watson-speech

Installation - npm with Browserify or Webpack

This library can be bundled with browserify or Webpack and easy included in larger projects:

npm install --save watson-speech

This method enables a smaller bundle by only including the desired components, for example:

var recognizeMic = require('watson-speech/speech-to-text/recognize-microphone');

Breaking change for v0.22.0

The format of objects emitted in objectMode has changed from {alternatives: [...], index: 1} to {results: [{alternatives: [...]}], result_index: 1}.

There is a new ResultExtractor class that restores the old behavior; recognizeMicrophone() and recognizeFile() both accept a new extract_results option to enable it.

This was done to enable the new speaker_labels feature. The format now exactly matches what the Watson Speech to Text service returns and shouldn't change again unless the Watson service changes.

API & Examples

The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/

See several basic examples at http://watson-speech.mybluemix.net/ (source)

See a more advanced example at https://speech-to-text-demo.mybluemix.net/

All API methods require an auth token that must be generated server-side. (See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/ for a couple of basic examples in Node.js and Python.)

`WatsonSpeech.TextToSpeech`

`.synthesize({text, token})` -> `<audio>`

Speaks the supplied text through an automatically-created <audio> element. Currently limited to text that can fit within a GET URL (this is particularly an issue on Internet Explorer before Windows 10 where the max length is around 1000 characters after the token is accounted for.)

Options:

text - the text to speak
voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
autoPlay - set to false to prevent the audio from automatically playing

Relies on browser audio support: should work reliably in Chrome and Firefox on desktop and Android. Edge works with a little help. Safari and all iOS browsers do not seem to work yet.

`WatsonSpeech.SpeechToText`

The recognizeMicrophone() and recognizeFile() helper methods are recommended for most use-cases. They set up the streams in the appropriate order and enable common options. These two methods are documented below.

The core of the library is the RecognizeStream that performs the actual transcription, and a collection of other Node.js-style streams that manipulate the data in various ways. For less common use-cases, the core components may be used directly with the helper methods serving as optional templates to follow. The full library is documented at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html

`.recognizeMicrophone({token})` -> Stream

Options:

keepMic: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
mediaStream: Optionally pass in an existing media stream rather than prompting the user for microphone access.
Other options passed to RecognizeStream
Other options passed to SpeakerStream if options.resultsbySpeaker is set to true
Other options passed to FormatStream if options.format is not set to false
Other options passed to WritableElementStream if options.outputElement is set

Requires the getUserMedia API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia) Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features

No more data will be set after .stop() is called on the returned stream, but additional results may be recieved for already-sent data.

`.recognizeFile({data, token})` -> Stream

Can recognize and optionally attempt to play a URL, File or Blob (such as from an <input type="file"/> or from an ajax request.)

Options:

file: a String URL or a Blob or File instance. Note that CORS restrictions apply to URLs.
play: (optional, default=false) Attempt to also play the file locally while uploading it for transcription
Other options passed to RecognizeStream
Other options passed to TimingStream if options.realtime is true, or unset and options.play is true
Other options passed to SpeakerStream if options.resultsbySpeaker is set to true
Other options passed to FormatStream if options.format is not set to false
Other options passed to WritableElementStream if options.outputElement is set

playrequires that the browser support the format; most browsers support wav and ogg/opus, but not flac.) Will emit an UNSUPPORTED_FORMAT error on the RecognizeStream if playback fails. This error is special in that it does not stop the streaming of results.

Playback will automatically stop when .stop() is called on the returned stream.

For Mobile Safari compatibility, a URL must be provided, and recognizeFile() must be called in direct response to a user interaction (so the token must be pre-loaded).

Changes

There have been a few breaking changes in recent releases:

Removed SpeechToText.recognizeElement() due to quality issues. The code is avaliable in an (unsupported) example if you wish to use it with current releases of the SDK.
renamed recognizeBlob to recognizeFile to make the primary usage more apparent
Changed playFile option of recognizeBlob() to just play, corrected default
Changed format of objects emitted in objectMode to exactly match what service sends. Added ResultStrean class and extract_results option to enable older behavior.
Changed playback-error event to just error when recognizing and playing a file. Check for error.name == 'UNSUPPORTED_FORMAT' to identify playback errors. This error is special in that it does not stop the streaming of results.
Renamed recognizeFile()'s data option to file because it now may be a URL. Using a URL enables faster playback and mobile Safari support

See CHANGELOG.md for a complete list of changes.

todo

Further solidify API
break components into standalone npm modules where it makes sense
run integration tests on travis (fall back to offline server for pull requests)
add even more tests
better cross-browser testing (IE, Safari, mobile browsers - maybe saucelabs?)
update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)
move result and results events to node wrapper (along with the deprecation notice)
improve docs
consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.
look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)
fix bug where TimingStream shows words slightly before they're spoken

Name		Name	Last commit message	Last commit date
Latest commit History 353 Commits
dist		dist
examples		examples
scripts		scripts
speech-to-text		speech-to-text
test		test
text-to-speech		text-to-speech
util		util
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
bower.json		bower.json
index.js		index.js
karma.conf.js		karma.conf.js
package.json		package.json
webpack.config.js		webpack.config.js

lnickers2004/speech-javascript-sdk

Folders and files

Latest commit

History

Repository files navigation

IBM Watson Speech Services for Web Browsers

Built for Browsers

Installation - standalone

Installation - bower

Installation - npm with Browserify or Webpack

Breaking change for v0.22.0

API & Examples

.synthesize({text, token}) -> <audio>

.recognizeMicrophone({token}) -> Stream

.recognizeFile({data, token}) -> Stream

Changes

todo

About

Resources

Stars

Watchers

Forks

Languages

`.synthesize({text, token})` -> `<audio>`

`.recognizeMicrophone({token})` -> Stream

`.recognizeFile({data, token})` -> Stream