From 4f0da447c3eb2810404e02ac8abb3965d629a7d3 Mon Sep 17 00:00:00 2001 From: jrobble Date: Wed, 20 Dec 2023 20:45:22 -0500 Subject: [PATCH] Update Whisper and Argos component README. --- python/ArgosTranslation/README.md | 2 +- python/WhisperSpeechDetection/README.md | 28 ++++++++++++++----------- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/python/ArgosTranslation/README.md b/python/ArgosTranslation/README.md index 6d52839f..080f6d3f 100644 --- a/python/ArgosTranslation/README.md +++ b/python/ArgosTranslation/README.md @@ -1,6 +1,6 @@ # Overview -This repository contains source code for the OpenMPF Argos Translation Component. +This repository contains source code for the OpenMPF Argos Translation Component. This component is based on [Argos Translate](https://github.com/argosopentech/argos-translate). This component translates the input text from a given source language to English. The source language can be provided as a job property, or be indicated in the detection properties from a feed-forward track. diff --git a/python/WhisperSpeechDetection/README.md b/python/WhisperSpeechDetection/README.md index 08fa5891..c42d7cdc 100644 --- a/python/WhisperSpeechDetection/README.md +++ b/python/WhisperSpeechDetection/README.md @@ -1,21 +1,24 @@ # Overview -This repository contains source code and model data for the OpenMPF Whisper Speech Detection component. -This component uses the OpenAI Whisper model. +This repository contains source code and model data for the OpenMPF Whisper Speech Detection component. This component +is based on [OpenAI Whisper](https://github.com/openai/whisper). # Introduction -This component identifies the language spoken in audio and video clips. +This component identifies the language spoken in audio and video clips, can perform speech-to-text on the audio, and can +perform translation directly from the audio. # Input Properties -- `WHISPER_MODEL_SIZE`: Size of the Whisper model. Whisper has `tiny`, `base`, `small`, `medium`, and `large` models available for multilingual models. English-only models are available in `tiny`, `base`, `small`, and `medium`. -- `WHISPER_MODEL_LANG`: Whisper has English-only models and multilingual models. Set to `en` for English-only models and `multi` for multilingual models. -- `WHISPER_MODE`: Determines whether Whisper will perform language detection, speech-to-text - transcription, or speech translation. If multiple languages are spoken in a single piece of media, - language detection will detect only one of them. English-only models can only transcribe English - audio. Set to `LANGUAGE_DETECTION` for spoken language detection, `TRANSCRIPTION` for - speech-to-text transcription, and `SPEECH_TRANSLATION` for speech translation. -- `AUDIO_LANGUAGE`: Optional property that indicates the language to use for audio translation or transcription. If left as an empty string, Whisper will automatically detect a single language from the first 30 seconds of audio. +- `WHISPER_MODEL_SIZE`: Size of the Whisper model. Whisper has `tiny`, `base`, `small`, `medium`, and `large` models + available for multilingual models. English-only models are available in `tiny`, `base`, `small`, and `medium`. +- `WHISPER_MODEL_LANG`: Whisper has English-only models and multilingual models. Set to `en` for English-only models and + `multi` for multilingual models. +- `WHISPER_MODE`: Determines whether Whisper will perform language detection, speech-to-text transcription, or speech + translation. If multiple languages are spoken in a single piece of media, language detection will detect only one of + them. English-only models can only transcribe English audio. Set to `LANGUAGE_DETECTION` for spoken language + detection, `TRANSCRIPTION` for speech-to-text transcription, and `SPEECH_TRANSLATION` for speech translation. +- `AUDIO_LANGUAGE`: Optional property that indicates the language to use for audio translation or transcription. If left + as an empty string, Whisper will automatically detect a single language from the first 30 seconds of audio. # Output Properties - `DETECTED_LANGUAGE`: Language with the highest confidence value. @@ -49,7 +52,8 @@ large | English | Correctly translated | Mostly skipped See [whisper_behavior_notes.md](whisper_behavior_notes.md) for more details. # Language Identifiers -The following are the ISO 639-1 codes, the ISO 639-3 codes, and their corresponding languages which Whisper can translate to English. +The following are the ISO 639-1 codes, the ISO 639-3 codes, and their corresponding languages which Whisper can +translate to English. All translations are to English.