Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

jdupl123 · 2018-08-18T01:03:46Z

Api

speech_v1p1beta1

os

macOS 10.13.5

python version

Python 3.6.3 :: Anaconda, Inc.

api version

Name: google-cloud-speech
Version: 0.35.0
Summary: Google Cloud Speech API client library

stack trace

_Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (5999457 vs. 4194304)"
        debug_error_string = "{"created":"@1534549673.588268000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"Received message larger than max (5999457 vs. 4194304)","grpc_status":8}"
>

The above exception was the direct cause of the following exception:

ResourceExhausted                         Traceback (most recent call last)
<ipython-input-10-dc5a16b3f3a4> in <module>()
----> 1 res=operation.result()

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    113                 the timeout is reached before the operation completes.
    114         """
--> 115         self._blocking_poll(timeout=timeout)
    116
    117         if self._exception is not None:

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py in _blocking_poll(self, timeout)
     92
     93         try:
---> 94             retry_(self._done_or_raise)()
     95         except exceptions.RetryError:
     96             raise concurrent.futures.TimeoutError(

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    258                 sleep_generator,
    259                 self._deadline,
--> 260                 on_error=on_error,
    261             )
    262

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    175     for sleep in sleep_generator:
    176         try:
--> 177             return target()
    178
    179         # pylint: disable=broad-except

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py in _done_or_raise(self)
     71     def _done_or_raise(self):
     72         """Check if the future is done and raise if it's not."""
---> 73         if not self.done():
     74             raise _OperationNotComplete()
     75

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py in done(self)
    138             bool: True if the operation is complete, False otherwise.
    139         """
--> 140         self._refresh_and_update()
    141         return self._operation.done
    142

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py in _refresh_and_update(self)
    129         # RPC as it will not change once done.
    130         if not self._operation.done:
--> 131             self._operation = self._refresh()
    132             self._set_result_from_operation()
    133

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operations_v1/operations_client.py in get_operation(self, name, retry, timeout)
    125         """
    126         request = operations_pb2.GetOperationRequest(name=name)
--> 127         return self._get_operation(request, retry=retry, timeout=timeout)
    128
    129     def list_operations(

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py in __call__(self, *args, **kwargs)
    137             kwargs['metadata'] = metadata
    138
--> 139         return wrapped_func(*args, **kwargs)
    140
    141

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    258                 sleep_generator,
    259                 self._deadline,
--> 260                 on_error=on_error,
    261             )
    262

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    175     for sleep in sleep_generator:
    176         try:
--> 177             return target()
    178
    179         # pylint: disable=broad-except

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/timeout.py in func_with_timeout(*args, **kwargs)
    204             """Wrapped function that adds timeout."""
    205             kwargs['timeout'] = next(timeouts)
--> 206             return func(*args, **kwargs)
    207
    208         return func_with_timeout

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     54             return callable_(*args, **kwargs)
     55         except grpc.RpcError as exc:
---> 56             six.raise_from(exceptions.from_grpc_error(exc), exc)
     57
     58     return error_remapped_callable

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

ResourceExhausted: 429 Received message larger than max (5999457 vs. 4194304)

Code example

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech_v1p1beta1 as speech

    client = speech.SpeechClient()

    audio = speech.types.RecognitionAudio(uri=gcs_uri)
    config = speech.types.RecognitionConfig(
        encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
        enable_speaker_diarization=True,
        diarization_speaker_count=2,
        sample_rate_hertz=16000,
        language_code='en-AU')

    operation = client.long_running_recognize(config, audio)
    return operation


url='gs://some_gcs_path/somefile.wav'
operation = transcribe_gcs(url)

print('Waiting for operation to complete...')
response = operation.result()

tseaver · 2018-08-20T15:42:52Z

The documented content limits are in terms of approximate minutes of speech: ~180 minutes for asynch requests via a GCS URI.

In the reported error, the limit appears to be 4 megabytes (which is much smaller).

edave · 2018-08-22T00:45:15Z

I can confirm also encountering this error/limit with the same stack trace while trying to do a transcription.

OS type and version: MacOS 10.13.6 (High Sierra)
Python version and virtual environment information: Python 3.7.0
google-cloud-python: Core: 0.28.1, Speech: 0.35.0

Indirectly, this is caused by enabling diarization and how transcriptions are divided into several results which represent discrete chunks of the transcribed dialogue. With diarization enabled, the words array in each result's transcription alternative does not encapsulate just the words for that portion of the transcription but also includes all previous words recognized in the transcription up to that point.

Or put more succinctly, if you have N words in a transcription, with diarization enabled, there are N^2 / 2 words in the transcription result payload instead of just N (to simplify this to the worst case for bounding behavior).

I've found attempting to transcribe any audio file over ~20 minutes will result in this error (exact length varies due to the number of transcribed words returned).

tseaver · 2018-08-22T18:24:51Z

@edave Thanks for the research!

@theacodes Is there a Speech API PoC we can loop in?

tseaver · 2018-08-28T16:52:38Z

@theacodes, @crwilcox Ping for a PoC?

tseaver · 2018-10-18T15:13:03Z

/cc @beccasaurus

sduskis · 2019-06-05T19:50:58Z

@tseaver, is this a grpc issue (message max size) that we could potentially fix via synth.py?

tseaver · 2019-06-05T20:56:23Z

@sduskis I don't think so: I believe those limits are enforced client side, whereas this report shows a 429 response from the server.

sduskis · 2019-06-13T15:13:20Z

Received message larger than max (x vs 4194304) shows up a lot in Google searches, and the general solution isgrpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(maxMsgSize)) with something larger than 4MB.

@plamut, can you please configure synth.py with 256MB for Speech? After that, we can close this issue.

plamut · 2019-06-14T14:20:42Z

@sduskis I presume 256 MiB was meant?

I opened a PR, the same setting in Python is called grpc.max_receive_message_length, I believe.

JustinBeckwith added the triage me I really want to be triaged. label Aug 18, 2018

tseaver changed the title ~~Speech: speech_v1p1beta1 with diarization, 429 Received message larger than max (5999457 vs. 4194304)~~ Speech: v1p1beta1 with diarization, 429 Received message larger than max Aug 20, 2018

tseaver added type: question Request for information or clarification. Not an issue. api: speech Issues related to the Speech-to-Text API. and removed triage me I really want to be triaged. labels Aug 20, 2018

tseaver added the backend label Aug 20, 2018

sduskis added backend and removed backend labels Jun 12, 2019

sduskis assigned plamut Jun 13, 2019

sduskis removed the backend label Jun 13, 2019

plamut mentioned this issue Jun 14, 2019

Speech: Increase speech max received msg size to 256 MiB #8338

Merged

plamut closed this as completed in #8338 Jun 17, 2019

kpurdon mentioned this issue Jan 31, 2020

PubSub: google.api_core.exceptions.ResourceExhausted: 429 Received message larger than max (28100227 vs. 4194304) googleapis/python-pubsub#3

Closed

vinaysraghavan mentioned this issue Feb 11, 2020

Texttospeech, 429 Received message larger than max googleapis/python-texttospeech#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

jdupl123 commented Aug 18, 2018 •

edited by tseaver

tseaver commented Aug 20, 2018

edave commented Aug 22, 2018

tseaver commented Aug 22, 2018

tseaver commented Aug 28, 2018

tseaver commented Oct 18, 2018

sduskis commented Jun 5, 2019

tseaver commented Jun 5, 2019

sduskis commented Jun 13, 2019

plamut commented Jun 14, 2019 •

edited

Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

Comments

jdupl123 commented Aug 18, 2018 • edited by tseaver

Api

os

python version

api version

stack trace

Code example

tseaver commented Aug 20, 2018

edave commented Aug 22, 2018

tseaver commented Aug 22, 2018

tseaver commented Aug 28, 2018

tseaver commented Oct 18, 2018

sduskis commented Jun 5, 2019

tseaver commented Jun 5, 2019

sduskis commented Jun 13, 2019

plamut commented Jun 14, 2019 • edited

jdupl123 commented Aug 18, 2018 •

edited by tseaver

plamut commented Jun 14, 2019 •

edited