IBM Watson Speech-to-Text Open Microservice

Transcribe audio file of human speech to text.

Introduction

This project is an example implementation of the Open Microservice Specification, a standard originally created at Storyscript for building highly-portable "microservices" that expose the events, actions, and APIs inside containerized software.

Getting Started

The oms command-line interface allows you to interact with Open Microservices. If you're interested in creating an Open Microservice the CLI also helps validate, test, and debug your oms.yml implementation!

See the oms-cli project to learn more!

Installation

npm install -g @microservices/oms

Usage

Open Microservices CLI Usage

Once you have the oms-cli installed, you can run any of the following commands from within this project's root directory:

Actions

transcribe

Transcribe speech in an audio file to text.

Action Arguments

Argument Name	Type	Required	Default	Description
url	`string`	`true`	None	A URL to an audio file to transcribe.
contentType	`string`	`false`	None	The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the IBM docs: https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-audio-formats#audio-formats
model	`enum`	`false`	None	The identifier of the model that is to be used for the recognition request.
speakerLabels	`boolean`	`false`	None	If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
profanityFilter	`boolean`	`false`	None	If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
smartFormatting	`boolean`	`false`	None	If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting. Note: Applies to US English, Japanese, and Spanish transcription only.
timestamps	`boolean`	`false`	None	If true, the service returns time alignment for each word. By default, no timestamps are returned.
audioMetrics	`boolean`	`false`	None	If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.
redaction	`boolean`	`false`	None	If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction. When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1). Note: Applies to US English, Japanese, and Korean transcription only.
API_KEY	`string`	`true`	None	An IBM Cloud `API KEY`. Go to the Speech to Text page in the IBM Cloud Catalog: https://cloud.ibm.com/catalog/services/speech-to-text

oms run transcribe \ 
    -a url='*****' \ 
    -a contentType='*****' \ 
    -a model='*****' \ 
    -a speakerLabels='*****' \ 
    -a profanityFilter='*****' \ 
    -a smartFormatting='*****' \ 
    -a timestamps='*****' \ 
    -a audioMetrics='*****' \ 
    -a redaction='*****' \ 
    -e API_KEY=$API_KEY

Contributing

All suggestions in how to improve the specification and this guide are very welcome. Feel free share your thoughts in the Issue tracker, or even better, fork the repository to implement your own ideas and submit a pull request.

This project is guided by Contributor Covenant. Please read out full Contribution Guidelines.

Additional Resources

Install the CLI - The OMS CLI helps developers create, test, validate, and build microservices.
Example OMS Services - Examples of OMS-compliant services written in a variety of languages.
Example Language Implementations - Find tooling & language implementations in Node, Python, Scala, Java, Clojure.
Storyscript Hub - A public registry of OMS services.
Community Chat - Have ideas? Questions? Join us on Spectrum.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.nvmrc		.nvmrc
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
oms.yml		oms.yml
package-lock.json		package-lock.json
package.json		package.json
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBM Watson Speech-to-Text Open Microservice

Introduction

Getting Started

Installation

Usage

Open Microservices CLI Usage

Actions

transcribe

Action Arguments

Contributing

Additional Resources

About

Releases

Packages

Languages

License

omsable/watson-speech

Folders and files

Latest commit

History

Repository files navigation

IBM Watson Speech-to-Text Open Microservice

Introduction

Getting Started

Installation

Usage

Open Microservices CLI Usage

Actions

transcribe

Action Arguments

Contributing

Additional Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages