This is a sample project that demonstrates how to communicate with
the LumenVox API over gRPC using the published .proto
definition
files.
The sample code is designed to work with Python 3.10.
Creating and using a virtual environment for your Python project will help greatly, and allow you to utilize the project dependencies file with ease.
Virtual Environments (or venv) are too deep a topic to cover here, so please review online references of how best to create these. There are several good guides available, such as this one: https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/
We recommend you create a virtual environment for your project within the project root using something like the following:
python -m venv venv
This will create a venv/
folder within your project which will be
used to hold the various modules used by the project.
Once you have created and activated your venv,
you should initialize the environment using the provided
requirements.txt
file which describes the module dependencies. To
do this run:
pip install -r requirements.txt
To learn how to use protocol buffers or protobuf, please follow the excellent tutorials provided by Google at:
https://developers.google.com/protocol-buffers/docs/tutorials
or review their code samples in the examples directory.
The .proto files are available to LumenVox customers and allow access to the LumenVox API set of functionality.
Before using the .proto (protobuf definition) files, they need to
be compiled
into a format that is compatible with the language
being used. In this case, Python.
See this page for details about how to use protoc to generate the API stubs.
There is a helper script you can run to easily generate these files:
python make_stubs.py
This should generate files in the /lumenvox/api/
and /google/
directories below your project root.
These files can be used by Python applications to talk to the LumenVox API using the gRPC protocol. This is needed in order to run the sample applications described here.
User connectivity data can be viewed or modified in lumenvox_api_user_connection_data.py
.
These samples assume TLS connectivity is used between the Python client
and your LumenVox API server. This is controlled by the ENABLE_TLS
flag.
In order to make a valid TLS connection, you must create a certificate for
the server and assign it to the ingress. The public certificate (.crt) file
should be copied into the Python sample code project folder and named
server.crt
. This file will be used to validate the server certificate
when using TLS connectivity. Without this, you may encounter TLS or SSL
errors when you try to connect.
Note that this configuration also works with self-signed certificates, with the caveat that you should always use appropriate certificates in production, and understand the implications of using self-signed versus trusted CA-signed certificates.
Using connectivity without TLS being enabled is also possible if your server is configured to support this, however the use of TLS is recommended.
Also, TLS connections are often on different ports than non-TLS, so if you
are switching between the two, you should be aware of this and assign your
LUMENVOX_API_SERVICE
port value accordingly.
When using connections to a Kubernetes ingress, you often need to specify
the domain name connection rather than IP address, so that the ingress
can correctly route your requests. The example included in the code is
lumenvox-api.testmachine.com
, however your Kubernetes configuration
will likely differ from this, so please use the correct setting. You may
also need to update your hosts file or DNS to correctly define this
domain name to the Kubernetes IP address, depending on your environment.
The bulk of the code that communicates with the LumenVox API can be found in lumenvox_api_handler.py. The majority of functions in this file are built to wrap over the RPCs and protocol buffer messages used to interact with the API. Additionally, the code in lumenvox_api_helper_common.py provides helper functions to ease the use of common messages, such as grammars and settings. Both of these files are used throughout the sample code.
Upon gaining familiarity with the API, a user can examine the operations these functions perform, and, if preferred, build more specialized solutions towards working with the API.
Before running any tests, please be sure to specify the address of your target LumenVox server by updating the following settings in
lumenvox_api_user_connection_data.py
:
LUMENVOX_API_SERVICE_CONNECTION
address of your serverENABLE_TLS
informs server that a certification should be validated (and CERT_FILE should be modified accordingly)deployment_id
your assigned deployment ID in the serveroperator_id
that you wish to use (identifies API user)
Note that if you do not know your assigned deployment_id
, you may
try using the default installed with the system, which is the value
included in the file. If this works, you can practice with this, but
at some point you should remove this temporary startup deployment ID
and use a more permanent one.
Similarly, for the operator_id
, if you do not have some identifier
that you wish to use for tracking who is making API requests, then
you can use the sample one included for now. In production, it is
best to use your own operator ID values to understand how or what is
making API calls, which can be seen in the logs.
Callbacks form part of the LumenVox API. These are used to communicate events and notifications from the speech system to the API client.
Such callback messages are described towards the middle of the
included session.proto
file in the protobufs/lumenvox/api
directory. These include:
- PartialResult
- FinalResult
- VadEvent
- SessionEvent
These are defined in the SessionResponse
message type. The
most important callback message is the SessionEvent
, which is also
the first message sent back to the API client after SessionCreate
is called. This session_id value for this message is used as a
parameter for other API calls.
Since callbacks can be received at any time, it is generally practical
to use a worker thread to listen for these notifications and process
them when they arrive. In this sample code, the asyncio
library is used
to simulate threading processes; functions such as those that read from
the API are split off into tasks or coroutines to simulate threading
functionality.
Any callback messages received by the API client code (such as these samples), will be received and handled by tasks that read from the stream (task_read_session_streams in lumenvox_api_handler.py, for example). A task like this will place callback messages received from the API into the appropriate queues define at the top of lumenvox_api_handler.py.
Note that this configuration should allow for multiple concurrent session operations to be performed, however these samples only demonstrate single-session use.
For production code, it is assumed that some more structured approach is taken for processing these callback messages. The aim of this sample code was simplicity using Python scripting that many people will be familiar with.
It should be noted that the use of asyncio
libraries involves heavy use of
await/async
syntax. For more information on asyncio
, please refer to the
official Python documentation here.
Several sample files make use of audio streaming. To help achieve this,
an AudioHandler
class in audio_handler.py was created to facilitate the
audio streaming process. This process makes use of asyncio
tasks to simulate
threading, sending audio concurrently while the main task waits for the result.
This approach was provided as a way to both send audio and track the state of the
process in the sample code. Other approaches, however, may be used in production
environments.
Throughout the included examples, the en-US
language code was
selected, as well as the included grammars and other referenced
sample files, however if your system uses a different language,
you should modify the samples accordingly.
Please note that the sample code purposefully contains no error or exception checking to make the code more easily read and understood.
It is assumed that application developers will implement their own robust handling. It is also assumed that in production applications, threading model and behavior would likely be handled in a more robust and/or scalable way than shown in these examples.
In other words, please don't simply copy these examples and use them in production - this would not likely be optimal. These are designed to be very simple examples.
This sample code is written using Python, which was selected for its simplicity to clearly show interactions with the API. Since gRPC is used, many programming languages are automatically supported, so your choice of these should be driven by your business needs, not these simplistic examples.
See the gRPC documentation for details about supported languages and how to utilize protocol buffers with those languages. The steps described here for Python are very similar for other languages supported by gRPC.
If you are using another programming language, you will likely need to create your own handler functions, or something similar to the included LumenVox Speech Helper. Converting these functions to other languages should be relatively straight-forward following the comments.
Note that some sample audio used for the transcription example is courtesy of the Open Speech Repository. Please visit their website for details and conditions of use.
Audio format types need to be defined in the code in order to be used in interactions.
This can be done by referencing the audio_formats.proto
file; the file contains an AudioFormat
protocol buffer
message. Inside the AudioFormat
message, an enum is defined for types of standard audio formats,
a standard_audio_format
to be set to an enum value, and an optional sample_rate_hertz
field required for certain
types.
If one were to define an audio format for ULAW 8kHz in the code, it would like the following:
# audio_formats.proto messages
import lumenvox.api.audio_formats_pb2 as audio_formats
# Import optional_int32 helper definition.
from helpers.common_helper import optional_int32
# Audio format variable for ULAW 8kHz.
AUDIO_FORMAT_ULAW_8KHZ = audio_formats.AudioFormat(
sample_rate_hertz=optional_int32(value=8000),
standard_audio_format=audio_formats.AudioFormat.StandardAudioFormat.STANDARD_AUDIO_FORMAT_ULAW)
For the standard formats where the sample rate is not required to be specified (like WAV), it would look like this:
# audio_formats.proto messages
import lumenvox.api.audio_formats_pb2 as audio_formats
# Audio format variable for ULAW 8kHz.
AUDIO_FORMAT_WAV = audio_formats.AudioFormat(
standard_audio_format=audio_formats.AudioFormat.StandardAudioFormat.STANDARD_AUDIO_FORMAT_WAV)
The helpers/audio_helper.py
file already provides some formats defined in a similar fashion to what was described
above. They can be imported in the sample scripts like this:
# Import an AudioFormat variable defined in helpers/audio_helper.py
from helpers.audio_helper import AUDIO_FORMAT_ULAW_8KHZ
See the asr_batch_sample.py
script for an example of how to
perform a batch-mode ASR decode using the Speech API.
An ASR interaction utilizing batch processing has its audio sent all at once. All the audio sent before the interaction is created is then processed.
See the asr_streaming_sample.py
script for an example of how to
perform a streaming-mode ASR decode using the Speech API.
A streaming-mode decode uses Voice Activity Detection (VAD), and streams chunks of audio into the system, relying on VAD to determine start and end of speech to trigger processing.
See the transcription_sample.py
script for an example of
how to perform a streaming transcription using the Speech API.
Transcription is very similar to grammar-based ASR decodes, but uses a special grammar file that is only used to trigger transcription mode instead of grammar-based ASR decodes.
Transcription can be performed in realtime using this streaming example, or it could be used in batch-mode operations similar to how the Batch Mode ASR sample is used. This batch-mode is sometimes called offline transcription mode may be slightly more efficient and faster than realtime streaming, but requires all the audio be sent at once, so may or may not be suitable for your use case. Additionally, transcription with batch processing can be performed by using a batch ASR interaction with a transcription grammar.
Partial results can be enabled using the enable_partial_results
field in RecognitionSettings. They are turned off by default, but
by setting enable_partial_results.value
to True
, partial
results can be received.
The transcription_dialect_example.py
script uses the streaming transcription
code to demonstrate the differences between results based on dialect (ex.
'en-us' vs. 'en-gb').
The transcription_continuous.py
script uses the streaming transcription
code to demonstrate continuous transcription, where partial results are
returned.
See the enhanced_transcription_sample.py
script for an example of
how to perform an enhanced transcription using the Speech API. This is
based on the code used for the streaming transcription example.
Enhanced transcription is performed by including addition grammars to the transcription interaction. Semantic interpretations will also be included within the results should the content of the audio match any of the specified grammars.
See the transcription_normalization_sample.py
script for an example of
how to perform an enhanced transcription using the Speech API. This is
based on the code used for the streaming transcription example.
Normalized transcription is performed by including normalization settings upon interaction creation. This will include additional, normalization-specific output, on top of the transcript received basic transcription.
See the alias_lexicon_transcription_sample.py
script for an example of
how to perform a transcription interaction with aliases. This is
based on the code used for the batch ASR example, as this requires a grammar.
The grammar for aliases must include a URI reference to a lexicon XML, the contents of which will be visible in the results should transcription include words as aliases under a lexeme in the lexicon file.
The tts_sample.py
script demonstrated a simple TTS synthesis.
In the example, an SSML file is loaded, which contains SSML marks
to show some moderately complex functionality and the level of
details that can be returned from the synthesis result.
You can optionally request the synthesized audio be saved to disk so that you can listen to it if desired.
See the grammar_parse_sample.py
script using the LumenVox API.
Grammar parse interactions will also accept builtin grammars or URL-referenced grammars if one is not locally available.
See the amd_sample.py
or cpa_sample.py
scripts for an example of how to
perform a streaming-mode Call Progress Analysis (CPA) and Tone Detection
(AMD) decodes using the LumenVox API.
A streaming-mode decode uses Voice Activity Detection (VAD), and streams chunks of audio into the system, relying on VAD to determine start and end of speech to trigger processing.
See the normalize_text_sample.py
script for an example of how to
perform a "normalize text" interaction using the Speech API.
Normalize text interactions require a text transcript and normalization settings to run.
Note: If a sample function fails to run with an error about pb2 file or anything similar:
- Ensure that the protocol buffer files have been updated accordingly and their respective Python files are generated
with
make_stubs.py
. - Ensure that all the requirements have been installed in the virtual environment, and that the virtual environment is activated upon running the samples.