Skip to content
This repository has been archived by the owner on May 6, 2022. It is now read-only.

Commit

Permalink
Merge pull request #36 from spokestack/jz-profiles
Browse files Browse the repository at this point in the history
Feature: SpeechPipeline.Builder profiles
  • Loading branch information
space-pope committed Jan 21, 2020
2 parents fb1510a + 37447ea commit 893bd17
Show file tree
Hide file tree
Showing 11 changed files with 447 additions and 41 deletions.
53 changes: 12 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,73 +22,44 @@ than is in this brief introduction.

```java
SpeechPipeline pipeline = new SpeechPipeline.Builder()
.setInputClass("io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass("io.spokestack.spokestack.webrtc.AutomaticGainControl")
.addStageClass("io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.addStageClass("io.spokestack.spokestack.webrtc.VoiceActivityTrigger")
.addStageClass("io.spokestack.spokestack.google.GoogleSpeechRecognizer")
.useProfile("io.spokestack.spokestack.profile.VADTriggerGoogleASR")
.setProperty("google-credentials", "<google-credentials>")
.setProperty("locale", "en-US")
.build();
```

This example creates an active speech recognition pipeline using the Google
Speech API that is triggered by VAD. The `google-credentials` parameter should
This example uses a pre-built profile to create a speech recognition pipeline triggered by VAD
that uses the Google Speech API for speech recognition. The `google-credentials` parameter should
be the contents of a Google Cloud service account credentials file, in JSON
format. For more information, see the [documentation](https://cloud.google.com/speech/docs/streaming-recognize).
See the [javadoc](https://www.javadoc.io/doc/io.spokestack/spokestack) for
See the [javadoc](https://www.javadoc.io/doc/io.spokestack/spokestack-android) for
other component-specific configuration parameters.

### Microsoft Bing Speech API

```java
SpeechPipeline pipeline = new SpeechPipeline.Builder()
.setInputClass("io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass("io.spokestack.spokestack.webrtc.AutomaticGainControl")
.addStageClass("io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.addStageClass("io.spokestack.spokestack.webrtc.VoiceActivityTrigger")
.addStageClass("io.spokestack.spokestack.microsoft.BingSpeechRecognizer")
.setProperty("sample-rate", 16000)
.setProperty("frame-width", 20)
.setProperty("buffer-width", 300)
.setProperty("vad-rise-delay", 100)
.setProperty("vad-fall-delay", 500)
.setProperty("bing-speech-api-key", "<bing-api-key>")
.setProperty("locale", "fr-CA")
.build();
```

This example creates a VAD-triggered pipeline with custom rise/fall delays
using the Microsoft Bing Speech API. For more information on this API, check
out the [documentation](https://azure.microsoft.com/en-us/services/cognitive-services/speech/).

### Wakeword Detection
```java
SpeechPipeline pipeline = new SpeechPipeline.Builder()
.setInputClass("io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass("io.spokestack.spokestack.webrtc.AutomaticGainControl")
.addStageClass("io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.addStageClass("io.spokestack.spokestack.wakeword.WakewordTrigger")
.addStageClass("io.spokestack.spokestack.google.GoogleSpeechRecognizer")
.setProperty("vad-fall-delay", 200)
.setProperty("pre-emphasis", 0.97)
.useProfile("io.spokestack.spokestack.profile.TFWakewordGoogleASR")
.setProperty("wake-filter-path", "<tensorflow-lite-filter-path>")
.setProperty("wake-encode-path", "<tensorflow-lite-encode-path>")
.setProperty("wake-detect-path", "<tensorflow-lite-detect-path>")
.setProperty("wake-smooth-length", 50)
.setProperty("wake-threshold", 0.85)
.setProperty("google-credentials", "<google-credentials>")
.setProperty("locale", "en-US")
.build();
```

This example creates a wakeword-triggered pipeline with the google speech
This example creates a wakeword-triggered pipeline with the Google Speech
recognizer. The wakeword trigger uses three trained
[TensorFlow Lite](https://www.tensorflow.org/lite/) models: a *filter* model
for spectrum preprocessing, an autoregressive encoder *encode* model, and a
*detect* decoder model for keyword classification. For more information on
the wakeword detector and its configuration parameters, click
[here](https://github.com/spokestack/spokestack-android/wiki/wakeword).

The "wake-threshold" property is set by the `TFWakewordGoogleASR` profile, but it is
overridden here to emphasize that properties set after a profile is applied (either directly
in the builder or by another profile) supersede those set by that profile.

To use the demo "Spokestack" wakeword, download the TensorFlow Lite models: [detect](https://d3dmqd7cy685il.cloudfront.net/model/wake/spokestack/detect.lite) | [encode](https://d3dmqd7cy685il.cloudfront.net/model/wake/spokestack/encode.lite) | [filter](https://d3dmqd7cy685il.cloudfront.net/model/wake/spokestack/filter.lite)

## Development
Expand Down Expand Up @@ -143,7 +114,7 @@ For additional information about releasing see http://maven.apache.org/maven-rel

## License

Copyright 2019 Spokestack, Inc.
Copyright 2020 Spokestack, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
26 changes: 26 additions & 0 deletions src/main/java/io/spokestack/spokestack/PipelineProfile.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package io.spokestack.spokestack;

/**
* A pipeline profile encapsulates a series of configuration values tuned for
* a specific task to make building a {@link SpeechPipeline} more convenient.
*
* <p>
* Profiles are not authoritative; they act just like calling a series of
* methods on a {@link SpeechPipeline.Builder}, and any configuration
* properties they set can be overridden by subsequent calls.
* </p>
*
* <p>
* Pipeline profiles must not require arguments in their constructors.
* </p>
*/
public interface PipelineProfile {

/**
* Apply this profile to the pipeline builder.
*
* @param builder The builder to which the profile should be applied.
* @return The modified pipeline builder.
*/
SpeechPipeline.Builder apply(SpeechPipeline.Builder builder);
}
27 changes: 27 additions & 0 deletions src/main/java/io/spokestack/spokestack/SpeechPipeline.java
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,33 @@ public Builder setProperty(String key, Object value) {
return this;
}

/**
* applies configuration from a {@link PipelineProfile} to the current
* builder, returning the modified builder. subsequent calls to {@code
* useProfile} or {@code setProperty} can override configuration set by
* a profile.
*
* @param profileClass class name of the profile to apply.
* @return an updated builder
* @throws IllegalArgumentException if the specified profile does not
* exist
*/
public Builder useProfile(String profileClass)
throws IllegalArgumentException {
PipelineProfile profile;
try {
profile = (PipelineProfile) Class
.forName(profileClass)
.getConstructor()
.newInstance();
} catch (Exception e) {
throw new IllegalArgumentException(
profileClass + " pipeline profile is invalid!");
}

return profile.apply(this);
}

/**
* adds a pipeline event listener.
*
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package io.spokestack.spokestack.profile;

import io.spokestack.spokestack.PipelineProfile;
import io.spokestack.spokestack.SpeechPipeline;

/**
* A speech pipeline profile that relies on manual pipeline activation,
* using Android's {@code SpeechRecognizer} API for ASR.
*
* <p>
* Using Android's built-in ASR requires that an Android {@code Context} object
* be attached to the speech pipeline using it. This must be done separately
* from profile application, using
* {@link SpeechPipeline.Builder#setAndroidContext(android.content.Context)}.
* </p>
*
* @see io.spokestack.spokestack.android.AndroidSpeechRecognizer
*/
public class PushToTalkAndroidASR implements PipelineProfile {
@Override
public SpeechPipeline.Builder apply(SpeechPipeline.Builder builder) {
return builder
.setInputClass(
"io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass(
"io.spokestack.spokestack.webrtc.AcousticNoiseSuppressor")
.addStageClass(
"io.spokestack.spokestack.webrtc.AutomaticGainControl")
.setProperty("agc-compression-gain-db", 15)
.addStageClass(
"io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.addStageClass("io.spokestack.spokestack.ActivationTimeout")
.addStageClass(
"io.spokestack.spokestack.android.AndroidSpeechRecognizer");
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package io.spokestack.spokestack.profile;

import io.spokestack.spokestack.PipelineProfile;
import io.spokestack.spokestack.SpeechPipeline;

/**
* A speech pipeline profile that relies on manual pipeline activation,
* using Google Speech for ASR.
*
* <p>
* Google Speech requires extra configuration, which must be added to the
* pipeline build process separately from this profile:
* </p>
*
* <ul>
* <li>
* <b>google-credentials</b> (string): json-stringified google service
* account credentials, used to authenticate with the speech API
* </li>
* <li>
* <b>locale</b> (string): language code for speech recognition
* </li>
* </ul>
*
* @see io.spokestack.spokestack.google.GoogleSpeechRecognizer
*/
public class PushToTalkGoogleASR implements PipelineProfile {
@Override
public SpeechPipeline.Builder apply(SpeechPipeline.Builder builder) {
return builder
.setInputClass(
"io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass(
"io.spokestack.spokestack.webrtc.AcousticNoiseSuppressor")
.addStageClass(
"io.spokestack.spokestack.webrtc.AutomaticGainControl")
.setProperty("agc-compression-gain-db", 15)
.addStageClass(
"io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.addStageClass("io.spokestack.spokestack.ActivationTimeout")
.addStageClass(
"io.spokestack.spokestack.google.GoogleSpeechRecognizer");
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
package io.spokestack.spokestack.profile;

import io.spokestack.spokestack.PipelineProfile;
import io.spokestack.spokestack.SpeechPipeline;

/**
* A speech pipeline profile that uses TensorFlow Lite for wakeword detection
* and Android's {@code SpeechRecognizer} API for ASR. Properties related to
* signal processing are tuned for the "Spokestack" wakeword.
*
* <p>
* Wakeword detection requires configuration to locate the models used for
* classification; these properties must be set elsewhere:
* </p>
*
* <ul>
* <li>
* <b>wake-filter-path</b> (string, required): file system path to the
* "filter" Tensorflow-Lite model, which is used to calculate a mel
* spectrogram frame from the linear STFT; its inputs should be shaped
* [fft-width], and its outputs [mel-width]
* </li>
* <li>
* <b>wake-encode-path</b> (string, required): file system path to the
* "encode" Tensorflow-Lite model, which is used to perform each
* autoregressive step over the mel frames; its inputs should be shaped
* [mel-length, mel-width], and its outputs [encode-width], with an
* additional state input/output shaped [state-width]
* </li>
* <li>
* <b>wake-detect-path</b> (string, required): file system path to the
* "detect" Tensorflow-Lite model; its inputs shoudld be shaped
* [encode-length, encode-width], and its outputs [1]
* </li>
* </ul>
*
* <p>
* Using Android's built-in ASR requires that an Android {@code Context} object
* be attached to the speech pipeline using it. This must be done separately
* from profile application, using
* {@link SpeechPipeline.Builder#setAndroidContext(android.content.Context)}.
* </p>
*
* @see io.spokestack.spokestack.android.AndroidSpeechRecognizer
* @see io.spokestack.spokestack.wakeword.WakewordTrigger
*/
public class TFWakewordAndroidASR implements PipelineProfile {
@Override
public SpeechPipeline.Builder apply(SpeechPipeline.Builder builder) {
return builder
.setInputClass(
"io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass(
"io.spokestack.spokestack.webrtc.AcousticNoiseSuppressor")
.setProperty("ans-policy", "aggressive")
.addStageClass(
"io.spokestack.spokestack.webrtc.AutomaticGainControl")
.setProperty("agc-target-level-dbfs", 3)
.setProperty("agc-compression-gain-db", 15)
.addStageClass(
"io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.setProperty("vad-mode", "very-aggressive")
.setProperty("vad-fall-delay", 800)
.addStageClass(
"io.spokestack.spokestack.wakeword.WakewordTrigger")
.setProperty("wake-threshold", 0.9)
.setProperty("pre-emphasis", 0.97)
.addStageClass("io.spokestack.spokestack.ActivationTimeout")
.setProperty("active-min", 2000)
.addStageClass(
"io.spokestack.spokestack.android.AndroidSpeechRecognizer");
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
package io.spokestack.spokestack.profile;

import io.spokestack.spokestack.PipelineProfile;
import io.spokestack.spokestack.SpeechPipeline;

/**
* A speech pipeline profile that uses TensorFlow Lite for wakeword detection
* and Google Speech for ASR. Properties related to signal processing are tuned
* for the "Spokestack" wakeword.
*
* <p>
* Wakeword detection requires configuration to locate the models used for
* classification; these properties must be set separately from this profile:
* </p>
*
* <ul>
* <li>
* <b>wake-filter-path</b> (string, required): file system path to the
* "filter" Tensorflow-Lite model, which is used to calculate a mel
* spectrogram frame from the linear STFT; its inputs should be shaped
* [fft-width], and its outputs [mel-width]
* </li>
* <li>
* <b>wake-encode-path</b> (string, required): file system path to the
* "encode" Tensorflow-Lite model, which is used to perform each
* autoregressive step over the mel frames; its inputs should be shaped
* [mel-length, mel-width], and its outputs [encode-width], with an
* additional state input/output shaped [state-width]
* </li>
* <li>
* <b>wake-detect-path</b> (string, required): file system path to the
* "detect" Tensorflow-Lite model; its inputs shoudld be shaped
* [encode-length, encode-width], and its outputs [1]
* </li>
* </ul>
*
* <p>
* Google Speech also requires configuration:
* </p>
*
* <ul>
* <li>
* <b>google-credentials</b> (string): json-stringified google service
* account credentials, used to authenticate with the speech API
* </li>
* <li>
* <b>locale</b> (string): language code for speech recognition
* </li>
* </ul>
*
* @see io.spokestack.spokestack.wakeword.WakewordTrigger
* @see io.spokestack.spokestack.google.GoogleSpeechRecognizer
*/
public class TFWakewordGoogleASR implements PipelineProfile {
@Override
public SpeechPipeline.Builder apply(SpeechPipeline.Builder builder) {
return builder
.setInputClass(
"io.spokestack.spokestack.android.MicrophoneInput")
.addStageClass(
"io.spokestack.spokestack.webrtc.AcousticNoiseSuppressor")
.setProperty("ans-policy", "aggressive")
.addStageClass(
"io.spokestack.spokestack.webrtc.AutomaticGainControl")
.setProperty("agc-target-level-dbfs", 3)
.setProperty("agc-compression-gain-db", 15)
.addStageClass(
"io.spokestack.spokestack.webrtc.VoiceActivityDetector")
.setProperty("vad-mode", "very-aggressive")
.setProperty("vad-fall-delay", 800)
.addStageClass(
"io.spokestack.spokestack.wakeword.WakewordTrigger")
.setProperty("wake-threshold", 0.9)
.setProperty("pre-emphasis", 0.97)
.addStageClass("io.spokestack.spokestack.ActivationTimeout")
.setProperty("active-min", 2000)
.addStageClass(
"io.spokestack.spokestack.google.GoogleSpeechRecognizer");
}
}
Loading

0 comments on commit 893bd17

Please sign in to comment.