From 2deaa597a7de2975047726b7178d897dd7d12168 Mon Sep 17 00:00:00 2001 From: Paul Sastrasinh Date: Thu, 21 Jul 2016 00:20:27 -0400 Subject: [PATCH] fix(converter): minor changes to formatting - line height is a little bigger by default - nav headers are a little bigger - code text no longer overlaps on consecutive lines - removed hide uri parameters option --- examples/default.html | 24 ++++++++++++------------ examples/flatly_triple.html | 24 ++++++++++++------------ examples/slate_triple.html | 24 ++++++++++++------------ examples/slate_wide.html | 24 ++++++++++++------------ examples/streak_triple.html | 24 ++++++++++++------------ examples/streak_wide.html | 24 ++++++++++++------------ templates/index.jade | 10 ++++++++++ templates/mixins.jade | 5 +---- templates/triple.jade | 10 ++++++++++ 9 files changed, 93 insertions(+), 76 deletions(-) diff --git a/examples/default.html b/examples/default.html index 214a37f..7e6b8aa 100644 --- a/examples/default.html +++ b/examples/default.html @@ -1,4 +1,4 @@ -Speech to Text Back to top

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

Retrieves the models available for the service
GET/speech-to-text/api/v1/models

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models
Response  200
HideShow

OK.

Schema
{
+Speech to Text Back to top

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

Retrieves the models available for the service
GET/speech-to-text/api/v1/models

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models
Response  200
HideShow

OK.

Schema
{
   "description": "Information about the available models.",
   "required": [
     "models"
@@ -86,7 +86,7 @@
       "type": "string"
     }
   }
-}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id
URI Parameters
HideShow
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response  200
HideShow

OK.

Schema
{
+}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id
URI Parameters
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "name",
     "rate",
@@ -184,7 +184,7 @@
       "type": "string"
     }
   }
-}

sessions

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request
HideShow
Schema
{
+}

sessions

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions
URI Parameters
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request
HideShow
Schema
{
   "type": "string"
 }
Response  201
HideShow

Created.

Schema
{
   "required": [
@@ -285,7 +285,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id
URI Parameters
HideShow
session_id
string (required) 

The ID of the session to be deleted.

Response  204
HideShow

No Content.

Response  400
HideShow

Bad Request. Cookie must be set.

Schema
{
+}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id
URI Parameters
session_id
string (required) 

The ID of the session to be deleted.

Response  204
HideShow

No Content.

Response  400
HideShow

Bad Request. Cookie must be set.

Schema
{
   "required": [
     "error",
     "code",
@@ -348,7 +348,7 @@
       "type": "string"
     }
   }
-}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result
URI Parameters
HideShow
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response  200
HideShow

OK.

Schema
{
+}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result
URI Parameters
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "results",
     "result_index"
@@ -675,7 +675,7 @@
       "type": "boolean"
     }
   }
-}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

Response  200
HideShow

OK.

Schema
{
+}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "session"
   ],
@@ -776,7 +776,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
+}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1205,7 +1205,7 @@
       "type": "string"
     }
   }
-}

sessionless

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
+}

sessionless

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize
URI Parameters
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1584,7 +1584,7 @@
       "type": "string"
     }
   }
-}

asynchronous

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback
URI Parameters
HideShow
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request
HideShow
Schema
{
+}

asynchronous

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback
URI Parameters
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request
HideShow
Schema
{
   "type": "string"
 }
Response  200
HideShow

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema
{
   "required": [
@@ -1719,7 +1719,7 @@
       "type": "string"
     }
   }
-}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
HideShow
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Schema
{
+}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Schema
{
   "type": "array",
   "items": {
     "type": "string",
@@ -1802,7 +1802,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id
URI Parameters
HideShow
id
string (required) 

The ID of the job that is to be deleted.

Response  204
HideShow

No Content. The job was successfully deleted.

Response  404
HideShow

Not Found. The specified job ID was not found.

Schema
{
+}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id
URI Parameters
id
string (required) 

The ID of the job that is to be deleted.

Response  204
HideShow

No Content. The job was successfully deleted.

Response  404
HideShow

Not Found. The specified job ID was not found.

Schema
{
   "required": [
     "error",
     "code",
@@ -1844,7 +1844,7 @@
       "type": "string"
     }
   }
-}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id
URI Parameters
HideShow
id
string (required) 

The ID of the job whose status is to be checked.

Response  200
HideShow

OK.

Schema
{
+}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id
URI Parameters
id
string (required) 

The ID of the job whose status is to be checked.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "status"
   ],
diff --git a/examples/flatly_triple.html b/examples/flatly_triple.html
index 2b713c2..0c25bb7 100644
--- a/examples/flatly_triple.html
+++ b/examples/flatly_triple.html
@@ -1,4 +1,4 @@
-Speech to Text

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

GET /speech-to-text/api/v1/models
Responses200406415

OK.

Schema
{
+Speech to Text

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

GET /speech-to-text/api/v1/models
Responses200406415

OK.

Schema
{
   "description": "Information about the available models.",
   "required": [
     "models"
@@ -184,7 +184,7 @@
       "type": "string"
     }
   }
-}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

URI Parameters
HideShow
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.


sessions

POST /speech-to-text/api/v1/sessions
Requests
Schema
{
+}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

URI Parameters
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.


sessions

POST /speech-to-text/api/v1/sessions
Requests
Schema
{
   "type": "string"
 }
Responses201406415503

Created.

Schema
{
   "required": [
@@ -279,7 +279,7 @@
       "type": "string"
     }
   }
-}

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).


DELETE /speech-to-text/api/v1/sessions/session_id
Responses204400404406

No Content.

Bad Request. Cookie must be set.

Schema
{
+}

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

URI Parameters
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).


DELETE /speech-to-text/api/v1/sessions/session_id
Responses204400404406

No Content.

Bad Request. Cookie must be set.

Schema
{
   "required": [
     "error",
     "code",
@@ -342,7 +342,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session to be deleted.


GET /speech-to-text/api/v1/sessions/session_id/observe_result
Responses200400404406408413415500

OK.

Schema
{
+}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

URI Parameters
session_id
string (required) 

The ID of the session to be deleted.


GET /speech-to-text/api/v1/sessions/session_id/observe_result
Responses200400404406408413415500

OK.

Schema
{
   "required": [
     "results",
     "result_index"
@@ -669,7 +669,7 @@
       "type": "boolean"
     }
   }
-}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.


GET /speech-to-text/api/v1/sessions/session_id/recognize
Responses200404406415

OK.

Schema
{
+}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

URI Parameters
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.


GET /speech-to-text/api/v1/sessions/session_id/recognize
Responses200404406415

OK.

Schema
{
   "required": [
     "session"
   ],
@@ -770,7 +770,7 @@
       "type": "string"
     }
   }
-}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.


POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
Requests
Body
{
+}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.


POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
Requests
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1127,7 +1127,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


sessionless

POST /speech-to-text/api/v1/recognize
Requests
Body
{
+}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


sessionless

POST /speech-to-text/api/v1/recognize
Requests
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1443,7 +1443,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


asynchronous

POST /speech-to-text/api/v1/register_callback
Requests
Schema
{
+}

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
URI Parameters
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


asynchronous

POST /speech-to-text/api/v1/register_callback
Requests
Schema
{
   "type": "string"
 }
Responses200201400503

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema
{
   "required": [
@@ -1517,7 +1517,7 @@
       "type": "string"
     }
   }
-}

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.


GET /speech-to-text/api/v1/recognitions
Responses200503

OK.

Schema
{
+}

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

URI Parameters
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.


GET /speech-to-text/api/v1/recognitions
Responses200503

OK.

Schema
{
   "required": [
     "recognitions"
   ],
@@ -1643,7 +1643,7 @@
       "type": "string"
     }
   }
-}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


DELETE /speech-to-text/api/v1/recognitions/id
Responses204404503

No Content. The job was successfully deleted.

Not Found. The specified job ID was not found.

Schema
{
+}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

URI Parameters
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


DELETE /speech-to-text/api/v1/recognitions/id
Responses204404503

No Content. The job was successfully deleted.

Not Found. The specified job ID was not found.

Schema
{
   "required": [
     "error",
     "code",
@@ -1685,7 +1685,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
id
string (required) 

The ID of the job that is to be deleted.


GET /speech-to-text/api/v1/recognitions/id
Responses200404503

OK.

Schema
{
+}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

URI Parameters
id
string (required) 

The ID of the job that is to be deleted.


GET /speech-to-text/api/v1/recognitions/id
Responses200404503

OK.

Schema
{
   "required": [
     "status"
   ],
@@ -1906,4 +1906,4 @@
       "type": "string"
     }
   }
-}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
id
string (required) 

The ID of the job whose status is to be checked.


\ No newline at end of file +}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

URI Parameters
id
string (required) 

The ID of the job whose status is to be checked.


\ No newline at end of file diff --git a/examples/slate_triple.html b/examples/slate_triple.html index 8c4fd27..5b78c4d 100644 --- a/examples/slate_triple.html +++ b/examples/slate_triple.html @@ -1,4 +1,4 @@ -Speech to Text

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

GET /speech-to-text/api/v1/models
Responses200406415

OK.

Schema
{
+Speech to Text

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

GET /speech-to-text/api/v1/models
Responses200406415

OK.

Schema
{
   "description": "Information about the available models.",
   "required": [
     "models"
@@ -184,7 +184,7 @@
       "type": "string"
     }
   }
-}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

URI Parameters
HideShow
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.


sessions

POST /speech-to-text/api/v1/sessions
Requests
Schema
{
+}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

URI Parameters
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.


sessions

POST /speech-to-text/api/v1/sessions
Requests
Schema
{
   "type": "string"
 }
Responses201406415503

Created.

Schema
{
   "required": [
@@ -279,7 +279,7 @@
       "type": "string"
     }
   }
-}

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).


DELETE /speech-to-text/api/v1/sessions/session_id
Responses204400404406

No Content.

Bad Request. Cookie must be set.

Schema
{
+}

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

URI Parameters
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).


DELETE /speech-to-text/api/v1/sessions/session_id
Responses204400404406

No Content.

Bad Request. Cookie must be set.

Schema
{
   "required": [
     "error",
     "code",
@@ -342,7 +342,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session to be deleted.


GET /speech-to-text/api/v1/sessions/session_id/observe_result
Responses200400404406408413415500

OK.

Schema
{
+}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

URI Parameters
session_id
string (required) 

The ID of the session to be deleted.


GET /speech-to-text/api/v1/sessions/session_id/observe_result
Responses200400404406408413415500

OK.

Schema
{
   "required": [
     "results",
     "result_index"
@@ -669,7 +669,7 @@
       "type": "boolean"
     }
   }
-}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.


GET /speech-to-text/api/v1/sessions/session_id/recognize
Responses200404406415

OK.

Schema
{
+}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

URI Parameters
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.


GET /speech-to-text/api/v1/sessions/session_id/recognize
Responses200404406415

OK.

Schema
{
   "required": [
     "session"
   ],
@@ -770,7 +770,7 @@
       "type": "string"
     }
   }
-}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.


POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
Requests
Body
{
+}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.


POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
Requests
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1127,7 +1127,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


sessionless

POST /speech-to-text/api/v1/recognize
Requests
Body
{
+}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


sessionless

POST /speech-to-text/api/v1/recognize
Requests
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1443,7 +1443,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


asynchronous

POST /speech-to-text/api/v1/register_callback
Requests
Schema
{
+}

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
URI Parameters
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


asynchronous

POST /speech-to-text/api/v1/register_callback
Requests
Schema
{
   "type": "string"
 }
Responses200201400503

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema
{
   "required": [
@@ -1517,7 +1517,7 @@
       "type": "string"
     }
   }
-}

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.


GET /speech-to-text/api/v1/recognitions
Responses200503

OK.

Schema
{
+}

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

URI Parameters
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.


GET /speech-to-text/api/v1/recognitions
Responses200503

OK.

Schema
{
   "required": [
     "recognitions"
   ],
@@ -1643,7 +1643,7 @@
       "type": "string"
     }
   }
-}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


DELETE /speech-to-text/api/v1/recognitions/id
Responses204404503

No Content. The job was successfully deleted.

Not Found. The specified job ID was not found.

Schema
{
+}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

URI Parameters
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


DELETE /speech-to-text/api/v1/recognitions/id
Responses204404503

No Content. The job was successfully deleted.

Not Found. The specified job ID was not found.

Schema
{
   "required": [
     "error",
     "code",
@@ -1685,7 +1685,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
id
string (required) 

The ID of the job that is to be deleted.


GET /speech-to-text/api/v1/recognitions/id
Responses200404503

OK.

Schema
{
+}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

URI Parameters
id
string (required) 

The ID of the job that is to be deleted.


GET /speech-to-text/api/v1/recognitions/id
Responses200404503

OK.

Schema
{
   "required": [
     "status"
   ],
@@ -1906,4 +1906,4 @@
       "type": "string"
     }
   }
-}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
id
string (required) 

The ID of the job whose status is to be checked.


\ No newline at end of file +}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

URI Parameters
id
string (required) 

The ID of the job whose status is to be checked.


\ No newline at end of file diff --git a/examples/slate_wide.html b/examples/slate_wide.html index 8051dde..a1bff83 100644 --- a/examples/slate_wide.html +++ b/examples/slate_wide.html @@ -1,4 +1,4 @@ -Speech to Text Back to top

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

Retrieves the models available for the service
GET/speech-to-text/api/v1/models

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models
Response  200
HideShow

OK.

Schema
{
+Speech to Text Back to top

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

Retrieves the models available for the service
GET/speech-to-text/api/v1/models

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models
Response  200
HideShow

OK.

Schema
{
   "description": "Information about the available models.",
   "required": [
     "models"
@@ -86,7 +86,7 @@
       "type": "string"
     }
   }
-}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id
URI Parameters
HideShow
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response  200
HideShow

OK.

Schema
{
+}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id
URI Parameters
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "name",
     "rate",
@@ -184,7 +184,7 @@
       "type": "string"
     }
   }
-}

sessions

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request
HideShow
Schema
{
+}

sessions

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions
URI Parameters
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request
HideShow
Schema
{
   "type": "string"
 }
Response  201
HideShow

Created.

Schema
{
   "required": [
@@ -285,7 +285,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id
URI Parameters
HideShow
session_id
string (required) 

The ID of the session to be deleted.

Response  204
HideShow

No Content.

Response  400
HideShow

Bad Request. Cookie must be set.

Schema
{
+}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id
URI Parameters
session_id
string (required) 

The ID of the session to be deleted.

Response  204
HideShow

No Content.

Response  400
HideShow

Bad Request. Cookie must be set.

Schema
{
   "required": [
     "error",
     "code",
@@ -348,7 +348,7 @@
       "type": "string"
     }
   }
-}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result
URI Parameters
HideShow
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response  200
HideShow

OK.

Schema
{
+}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result
URI Parameters
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "results",
     "result_index"
@@ -675,7 +675,7 @@
       "type": "boolean"
     }
   }
-}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

Response  200
HideShow

OK.

Schema
{
+}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "session"
   ],
@@ -776,7 +776,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
+}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1205,7 +1205,7 @@
       "type": "string"
     }
   }
-}

sessionless

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
+}

sessionless

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize
URI Parameters
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1584,7 +1584,7 @@
       "type": "string"
     }
   }
-}

asynchronous

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback
URI Parameters
HideShow
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request
HideShow
Schema
{
+}

asynchronous

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback
URI Parameters
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request
HideShow
Schema
{
   "type": "string"
 }
Response  200
HideShow

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema
{
   "required": [
@@ -1719,7 +1719,7 @@
       "type": "string"
     }
   }
-}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
HideShow
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Schema
{
+}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Schema
{
   "type": "array",
   "items": {
     "type": "string",
@@ -1802,7 +1802,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id
URI Parameters
HideShow
id
string (required) 

The ID of the job that is to be deleted.

Response  204
HideShow

No Content. The job was successfully deleted.

Response  404
HideShow

Not Found. The specified job ID was not found.

Schema
{
+}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id
URI Parameters
id
string (required) 

The ID of the job that is to be deleted.

Response  204
HideShow

No Content. The job was successfully deleted.

Response  404
HideShow

Not Found. The specified job ID was not found.

Schema
{
   "required": [
     "error",
     "code",
@@ -1844,7 +1844,7 @@
       "type": "string"
     }
   }
-}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id
URI Parameters
HideShow
id
string (required) 

The ID of the job whose status is to be checked.

Response  200
HideShow

OK.

Schema
{
+}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id
URI Parameters
id
string (required) 

The ID of the job whose status is to be checked.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "status"
   ],
diff --git a/examples/streak_triple.html b/examples/streak_triple.html
index 92bf642..6858040 100644
--- a/examples/streak_triple.html
+++ b/examples/streak_triple.html
@@ -1,4 +1,4 @@
-Speech to Text

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

GET /speech-to-text/api/v1/models
Responses200406415

OK.

Schema
{
+Speech to Text

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

GET /speech-to-text/api/v1/models
Responses200406415

OK.

Schema
{
   "description": "Information about the available models.",
   "required": [
     "models"
@@ -184,7 +184,7 @@
       "type": "string"
     }
   }
-}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

URI Parameters
HideShow
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.


sessions

POST /speech-to-text/api/v1/sessions
Requests
Schema
{
+}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

URI Parameters
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.


sessions

POST /speech-to-text/api/v1/sessions
Requests
Schema
{
   "type": "string"
 }
Responses201406415503

Created.

Schema
{
   "required": [
@@ -279,7 +279,7 @@
       "type": "string"
     }
   }
-}

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).


DELETE /speech-to-text/api/v1/sessions/session_id
Responses204400404406

No Content.

Bad Request. Cookie must be set.

Schema
{
+}

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

URI Parameters
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).


DELETE /speech-to-text/api/v1/sessions/session_id
Responses204400404406

No Content.

Bad Request. Cookie must be set.

Schema
{
   "required": [
     "error",
     "code",
@@ -342,7 +342,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session to be deleted.


GET /speech-to-text/api/v1/sessions/session_id/observe_result
Responses200400404406408413415500

OK.

Schema
{
+}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

URI Parameters
session_id
string (required) 

The ID of the session to be deleted.


GET /speech-to-text/api/v1/sessions/session_id/observe_result
Responses200400404406408413415500

OK.

Schema
{
   "required": [
     "results",
     "result_index"
@@ -669,7 +669,7 @@
       "type": "boolean"
     }
   }
-}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.


GET /speech-to-text/api/v1/sessions/session_id/recognize
Responses200404406415

OK.

Schema
{
+}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

URI Parameters
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.


GET /speech-to-text/api/v1/sessions/session_id/recognize
Responses200404406415

OK.

Schema
{
   "required": [
     "session"
   ],
@@ -770,7 +770,7 @@
       "type": "string"
     }
   }
-}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.


POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
Requests
Body
{
+}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.


POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
Requests
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1127,7 +1127,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


sessionless

POST /speech-to-text/api/v1/recognize
Requests
Body
{
+}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


sessionless

POST /speech-to-text/api/v1/recognize
Requests
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1443,7 +1443,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


asynchronous

POST /speech-to-text/api/v1/register_callback
Requests
Schema
{
+}

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
URI Parameters
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


asynchronous

POST /speech-to-text/api/v1/register_callback
Requests
Schema
{
   "type": "string"
 }
Responses200201400503

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema
{
   "required": [
@@ -1517,7 +1517,7 @@
       "type": "string"
     }
   }
-}

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.


GET /speech-to-text/api/v1/recognitions
Responses200503

OK.

Schema
{
+}

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

URI Parameters
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.


GET /speech-to-text/api/v1/recognitions
Responses200503

OK.

Schema
{
   "required": [
     "recognitions"
   ],
@@ -1643,7 +1643,7 @@
       "type": "string"
     }
   }
-}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


DELETE /speech-to-text/api/v1/recognitions/id
Responses204404503

No Content. The job was successfully deleted.

Not Found. The specified job ID was not found.

Schema
{
+}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

URI Parameters
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.


DELETE /speech-to-text/api/v1/recognitions/id
Responses204404503

No Content. The job was successfully deleted.

Not Found. The specified job ID was not found.

Schema
{
   "required": [
     "error",
     "code",
@@ -1685,7 +1685,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
id
string (required) 

The ID of the job that is to be deleted.


GET /speech-to-text/api/v1/recognitions/id
Responses200404503

OK.

Schema
{
+}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

URI Parameters
id
string (required) 

The ID of the job that is to be deleted.


GET /speech-to-text/api/v1/recognitions/id
Responses200404503

OK.

Schema
{
   "required": [
     "status"
   ],
@@ -1906,4 +1906,4 @@
       "type": "string"
     }
   }
-}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

URI Parameters
HideShow
id
string (required) 

The ID of the job whose status is to be checked.


\ No newline at end of file +}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

URI Parameters
id
string (required) 

The ID of the job whose status is to be checked.


\ No newline at end of file diff --git a/examples/streak_wide.html b/examples/streak_wide.html index d3571ab..3042640 100644 --- a/examples/streak_wide.html +++ b/examples/streak_wide.html @@ -1,4 +1,4 @@ -Speech to Text Back to top

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

Retrieves the models available for the service
GET/speech-to-text/api/v1/models

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models
Response  200
HideShow

OK.

Schema
{
+Speech to Text Back to top

Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

  • /v1/models returns information about the models (languages and sampling rates) available for transcription.

  • /v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • /v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.

  • /v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.

  • /v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

  • You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.

  • You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.

  • The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.

  • By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models

Retrieves the models available for the service
GET/speech-to-text/api/v1/models

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models
Response  200
HideShow

OK.

Schema
{
   "description": "Information about the available models.",
   "required": [
     "models"
@@ -86,7 +86,7 @@
       "type": "string"
     }
   }
-}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id
URI Parameters
HideShow
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response  200
HideShow

OK.

Schema
{
+}

Retrieves information about the model
GET/speech-to-text/api/v1/models/{model_id}

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id
URI Parameters
model_id
string (required) 

The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "name",
     "rate",
@@ -184,7 +184,7 @@
       "type": "string"
     }
   }
-}

sessions

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request
HideShow
Schema
{
+}

sessions

Creates a session
POST/speech-to-text/api/v1/sessions

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions
URI Parameters
model
string (required) 

The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request
HideShow
Schema
{
   "type": "string"
 }
Response  201
HideShow

Created.

Schema
{
   "required": [
@@ -285,7 +285,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id
URI Parameters
HideShow
session_id
string (required) 

The ID of the session to be deleted.

Response  204
HideShow

No Content.

Response  400
HideShow

Bad Request. Cookie must be set.

Schema
{
+}

Deletes the specified session
DELETE/speech-to-text/api/v1/sessions/{session_id}

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id
URI Parameters
session_id
string (required) 

The ID of the session to be deleted.

Response  204
HideShow

No Content.

Response  400
HideShow

Bad Request. Cookie must be set.

Schema
{
   "required": [
     "error",
     "code",
@@ -348,7 +348,7 @@
       "type": "string"
     }
   }
-}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result
URI Parameters
HideShow
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response  200
HideShow

OK.

Schema
{
+}

Observes results for a recognition task within a session
GET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result
URI Parameters
session_id
string (required) 

The ID of the session whose results you want to observe.

sequence_id
string (required) 

The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.

interim_results
string (required) 

If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "results",
     "result_index"
@@ -675,7 +675,7 @@
       "type": "boolean"
     }
   }
-}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

Response  200
HideShow

OK.

Schema
{
+}

Checks whether a session is ready to accept a new recognition task
GET/speech-to-text/api/v1/sessions/{session_id}/recognize

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "session"
   ],
@@ -776,7 +776,7 @@
       "type": "string"
     }
   }
-}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
HideShow
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
+}

Sends audio for speech recognition within a session
POST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

  • Required: session_id, Content-Type, and body

  • Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: session_id, Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
session_id
string (required) 

The ID of the session for the recognition task.

sequence_id
string (required) 

Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.

continuous
string (required) 

Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `
string (required) 

1for infinity. See also thecontinuous` parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1205,7 +1205,7 @@
       "type": "string"
     }
   }
-}

sessionless

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize
URI Parameters
HideShow
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
+}

sessionless

Sends audio for speech recognition in sessionless mode
POST/speech-to-text/api/v1/recognize

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

  • Required: Content-Type and body

  • Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

  • Required: Content-Type, metadata, and multipart

  • Optional: Transfer-Encoding and model

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize
URI Parameters
model
string (required) 

The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).

continuous
string (required) 

Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

Non-multipart only: If true, confidence measure per word is returned.

timestamps
string (required) 

Non-multipart only: If true, time alignment for each word is returned.

profanity_filter
string (required) 

Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Body
{
   "metadata": "Hello, world!",
   "upload": "Hello, world!"
 }
Schema
{
@@ -1584,7 +1584,7 @@
       "type": "string"
     }
   }
-}

asynchronous

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback
URI Parameters
HideShow
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request
HideShow
Schema
{
+}

asynchronous

Registers a callback URL for use with the asynchronous interface
POST/speech-to-text/api/v1/register_callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback
URI Parameters
callback_url
string (required) 

An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

user_secret
string (required) 

A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request
HideShow
Schema
{
   "type": "string"
 }
Response  200
HideShow

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema
{
   "required": [
@@ -1719,7 +1719,7 @@
       "type": "string"
     }
   }
-}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
HideShow
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Schema
{
+}

Creates a job for an asynchronous recognition request
POST/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=
URI Parameters
callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user
string (required) 

specified string with each job to differentiate the callback notifications for the jobs.

events
string (required) 

If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.

user_token
string (required) 

If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

results_ttl
string (required) 

The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

model
string (required) 

The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.

continuous
string (required) 

If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.

inactivity_timeout
string (required) 

The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.

keywords
string (required) 

Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

keywords_threshold
string (required) 

Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.

max_alternatives
string (required) 

Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

word_alternatives_threshold
string (required) 

Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.

word_confidence
string (required) 

If true, confidence measure per word is returned.

timestamps
string (required) 

If true, time alignment for each word is returned.

profanity_filter
string (required) 

If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.

smart_formatting
string (required) 

If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request
HideShow
Schema
{
   "type": "array",
   "items": {
     "type": "string",
@@ -1802,7 +1802,7 @@
       "type": "string"
     }
   }
-}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id
URI Parameters
HideShow
id
string (required) 

The ID of the job that is to be deleted.

Response  204
HideShow

No Content. The job was successfully deleted.

Response  404
HideShow

Not Found. The specified job ID was not found.

Schema
{
+}

Deletes the specified asynchronous job
DELETE/speech-to-text/api/v1/recognitions/{id}

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id
URI Parameters
id
string (required) 

The ID of the job that is to be deleted.

Response  204
HideShow

No Content. The job was successfully deleted.

Response  404
HideShow

Not Found. The specified job ID was not found.

Schema
{
   "required": [
     "error",
     "code",
@@ -1844,7 +1844,7 @@
       "type": "string"
     }
   }
-}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id
URI Parameters
HideShow
id
string (required) 

The ID of the job whose status is to be checked.

Response  200
HideShow

OK.

Schema
{
+}

Checks the status of the specified asynchronous job
GET/speech-to-text/api/v1/recognitions/{id}

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id
URI Parameters
id
string (required) 

The ID of the job whose status is to be checked.

Response  200
HideShow

OK.

Schema
{
   "required": [
     "status"
   ],
diff --git a/templates/index.jade b/templates/index.jade
index 38ccb06..6d280db 100644
--- a/templates/index.jade
+++ b/templates/index.jade
@@ -9,12 +9,22 @@ html
         link(rel="stylesheet", href="https://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css")
         style!= self.css
         style.
+          .heading a {
+            font-size: 1.2em;
+          }
           p img {
             width: 100%;
           }
           dd p {
             margin: 0.5em 0em;
           }
+          p {
+            line-height: 150%;
+          }
+          code {
+            padding: 0em 0.25em;
+            margin: 0em;
+          }
     body.preload
         a.text-muted.back-to-top(href='#top')
             i.fa.fa-toggle-up
diff --git a/templates/mixins.jade b/templates/mixins.jade
index 23432a8..7cd2aaf 100644
--- a/templates/mixins.jade
+++ b/templates/mixins.jade
@@ -67,10 +67,7 @@ mixin Parameters(params)
     //- examples and descriptions.
     .title
         strong URI Parameters
-        .collapse-button.show
-            span.close Hide
-            span.open Show
-    .collapse-content(style="padding-bottom: 1em")
+    div
         dl.inner: each param in params || []
             dt(style="padding: 0.2em 0em")= self.urldec(param.name)
             dd(style="padding: 0.2em 0em")
diff --git a/templates/triple.jade b/templates/triple.jade
index 66085bc..f6d497e 100644
--- a/templates/triple.jade
+++ b/templates/triple.jade
@@ -9,12 +9,22 @@ html
         link(rel="stylesheet", href="https://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css")
         style!= self.css
         style.
+          .heading a {
+            font-size: 1.2em;
+          }
           p img {
             width: 100%;
           }
           dd p {
             margin: 0.5em 0em;
           }
+          p {
+            line-height: 150%;
+          }
+          code {
+            padding: 0em 0.25em;
+            margin: 0em;
+          }
     body.preload
         #nav-background
         div.container-fluid.triple