Skip to content

An Open Microservice for the IBM's Watson Speech-to-Text API.

License

Notifications You must be signed in to change notification settings

omsable/watson-speech

Repository files navigation

IBM Watson Speech-to-Text Open Microservice

Transcribe audio file of human speech to text.

Open Microservice Specification Version Open Microservices Spectrum Chat Open Microservices Code of Conduct Open Microservices Commitzen PRs Welcome License: MIT

Introduction

This project is an example implementation of the Open Microservice Specification, a standard originally created at Storyscript for building highly-portable "microservices" that expose the events, actions, and APIs inside containerized software.

Getting Started

The oms command-line interface allows you to interact with Open Microservices. If you're interested in creating an Open Microservice the CLI also helps validate, test, and debug your oms.yml implementation!

See the oms-cli project to learn more!

Installation

npm install -g @microservices/oms

Usage

Open Microservices CLI Usage

Once you have the oms-cli installed, you can run any of the following commands from within this project's root directory:

Actions

transcribe

Transcribe speech in an audio file to text.

Action Arguments
Argument Name Type Required Default Description
url string true None A URL to an audio file to transcribe.
contentType string false None The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the IBM docs: https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-audio-formats#audio-formats
model enum false None The identifier of the model that is to be used for the recognition request.
speakerLabels boolean false None If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
profanityFilter boolean false None If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
smartFormatting boolean false None If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting. Note: Applies to US English, Japanese, and Spanish transcription only.
timestamps boolean false None If true, the service returns time alignment for each word. By default, no timestamps are returned.
audioMetrics boolean false None If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.
redaction boolean false None If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction. When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1). Note: Applies to US English, Japanese, and Korean transcription only.
API_KEY string true None An IBM Cloud API KEY. Go to the Speech to Text page in the IBM Cloud Catalog: https://cloud.ibm.com/catalog/services/speech-to-text
oms run transcribe \ 
    -a url='*****' \ 
    -a contentType='*****' \ 
    -a model='*****' \ 
    -a speakerLabels='*****' \ 
    -a profanityFilter='*****' \ 
    -a smartFormatting='*****' \ 
    -a timestamps='*****' \ 
    -a audioMetrics='*****' \ 
    -a redaction='*****' \ 
    -e API_KEY=$API_KEY

Contributing

All suggestions in how to improve the specification and this guide are very welcome. Feel free share your thoughts in the Issue tracker, or even better, fork the repository to implement your own ideas and submit a pull request.

Edit watson-speech on CodeSandbox

This project is guided by Contributor Covenant. Please read out full Contribution Guidelines.

Additional Resources

About

An Open Microservice for the IBM's Watson Speech-to-Text API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages