-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Google Cloud Speech-To-Text V2 library in mod_google_transcribe #23
Conversation
@ajgolledge can you review our contributor rules and if you are ok to proceed the sign-off on this PR or commit? |
Signed-off-by: Andrew Golledge <andreas.golledge@gmail.com>
Is this OK or do I need to actually modify the previous commit? Thanks for the email btw. |
please just make another commit with the -s flag and push that |
I thought I'd done that. There are now two commits. |
sorry, you are right. Thanks! |
BTW have you tested this PR yourself? |
Yes we have been successfully running this PR on our development servers for about a month now. The way we build it is based on the way the drachtio code was built so that might differ from the way you build jambonz now. |
@ajgolledge I am having some issues in my testing of this PR, and I wonder if you can provide some insight. I am using the google credentials that work fine for V1, now I am using this as the recognizer parent:
However, when I connect I immediately get "operation canceled from google"
Any idea what might be causing this? Have you tested with using a recognizer created on the fly like this successfully? |
@davehorton Yes this has been tested successfully, using recognizers on the fly. Just to be clear, in case you hadn't already, you should set the
The |
At a guess I'd say that it's If you're still having trouble with the recognizer, I can try to reproduce the problem. Do you have any log output which you can share? Do you see this in the log output, for example:
or does it not get this far? |
actually the problem seems to be in testing setting a the speech start and end timeout, and on a branch I added this code
Seems proper but intermittently when I set these timers I get the immediate cancel error. This is on the "feat/new_params_google_v2" branch if you want to try to recreate |
I also tried this branch and I always get the error if I set either the |
yes if neither is set then it does work for me. So I guess its just a matter of figuring out how to properly use those two parameters. |
This PR addresses #149 from the earlier drachtio incarnation of this repository. I noticed that the
mod_google_transcribe
directory in this repository and in the drachtio repository are identical so I took the PR from the old repository and grafted it onto this one, hope that's OK.This PR offers support for the
v2
version of the Speech-To-Text library whilst still supportingv1
simultaneously. The default behaviour is to use thev1
version of the library where everything works identically to the way it did in the previous version. In order to usev2
the FreeSWITCH variableGOOGLE_SPEECH_CLOUD_SERVICES_VERSION
must be set to the value "v2". Setting it to "v1" or not setting it at all results in the default behaviour.If the variable is used then it is essential to provide a so called recognizer parent path in the
GOOGLE_SPEECH_RECOGNIZER_PARENT
FreeSWITCH variable. Failure to do so will result in a failure to construct theGStreamer
class. Recognizers allow commonly used streaming recognition parameters to be stored in the cloud. These stored values can be overridden with parameters passed at runtime but it is essential to provide a recognizer tov2
streaming recognition invocations. If you happen to have already created a recognizer in your Google Cloud account, its id can be passed using theGOOGLE_SPEECH_RECOGNIZER_ID
variable. If this is not set thenmod_google_transcribe
will just use the so called wildcard recognizer id ( the "_" character) and a recognizer will be created on the fly and not stored for future use. Note that even if a persistent recognizer is not required, it is always necessary to provide at least the parent id of the recognizer inGOOGLE_SPEECH_RECOGNIZER_PARENT
, otherwise even the wildcard recognizer cannot be created. This parent id is a path string which consists of the google cloud project id which was used to create the google credentials file used, and a geographical location. For more details about recognizers, see https://cloud.google.com/speech-to-text/v2/docs/recognizersAs long as
GOOGLE_SPEECH_CLOUD_SERVICES_VERSION
is set to "v2" andGOOGLE_SPEECH_RECOGNIZER_PARENT
is also set to a valid recognizer parent id then the "v2" library will be used and calls touuid_google_transcribe
should function as it did previously and any configuration parameters provided at runtime will override anything already defined in a predefined recognizer.Differences between v1 and v2
v2
. That is to say that it is no longer required to specify this as a parameter. Instead it is taken to be implicit from the model selected. If single utterance behaviour is required then this is supported by theshort
model, for example. To see more details on models see https://cloud.google.com/speech-to-text/v2/docs/streaming-recognize.mod_google_transcribe
forv2
but I didn't manage to stumble across a combination of model, language and location which supports this. See https://stackoverflow.com/questions/76779418/speaker-diarization-is-disabled-even-for-supported-languages-in-google-speech-toThere are sure to be many more differences but these are the main things I found so far.
Some Notes on the Code and Building
To avoid code duplication we placed
v1
specific code ingoogle_glue_v1.cpp
and thev2
specific stuff ingoogle_glue_v2.cpp
. Generic code used by both libraries now resides ingeneric_google_glue.h
. We use our own docker image to build the FreeSWITCH modules but our make file is based on this one:https://github.com/drachtio/docker-drachtio-freeswitch-base/blob/main/files/Makefile.am.extra
In order to compile and link the
v2
stuff we had to add the following lines to thenodist_libfreeswitch_libgoogleapis_la_SOURCES
assignment:If you don't do this, you'll most likely get some problems linking.
Signed-off-by: Andrew Golledge andreas.golledge@gmail.com