Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

Closed
jdupl123 opened this issue Aug 18, 2018 · 9 comments · Fixed by #8338
Closed

Speech: v1p1beta1 with diarization, 429 Received message larger than max #5819

jdupl123 opened this issue Aug 18, 2018 · 9 comments · Fixed by #8338
Assignees
Labels
api: speech Issues related to the Speech-to-Text API. type: question Request for information or clarification. Not an issue.

Comments

@jdupl123
Copy link

jdupl123 commented Aug 18, 2018

Api

speech_v1p1beta1

os

macOS 10.13.5

python version

Python 3.6.3 :: Anaconda, Inc.

api version

Name: google-cloud-speech
Version: 0.35.0
Summary: Google Cloud Speech API client library

stack trace

_Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (5999457 vs. 4194304)"
        debug_error_string = "{"created":"@1534549673.588268000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"Received message larger than max (5999457 vs. 4194304)","grpc_status":8}"
>

The above exception was the direct cause of the following exception:

ResourceExhausted                         Traceback (most recent call last)
<ipython-input-10-dc5a16b3f3a4> in <module>()
----> 1 res=operation.result()

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    113                 the timeout is reached before the operation completes.
    114         """
--> 115         self._blocking_poll(timeout=timeout)
    116
    117         if self._exception is not None:

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py in _blocking_poll(self, timeout)
     92
     93         try:
---> 94             retry_(self._done_or_raise)()
     95         except exceptions.RetryError:
     96             raise concurrent.futures.TimeoutError(

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    258                 sleep_generator,
    259                 self._deadline,
--> 260                 on_error=on_error,
    261             )
    262

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    175     for sleep in sleep_generator:
    176         try:
--> 177             return target()
    178
    179         # pylint: disable=broad-except

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py in _done_or_raise(self)
     71     def _done_or_raise(self):
     72         """Check if the future is done and raise if it's not."""
---> 73         if not self.done():
     74             raise _OperationNotComplete()
     75

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py in done(self)
    138             bool: True if the operation is complete, False otherwise.
    139         """
--> 140         self._refresh_and_update()
    141         return self._operation.done
    142

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py in _refresh_and_update(self)
    129         # RPC as it will not change once done.
    130         if not self._operation.done:
--> 131             self._operation = self._refresh()
    132             self._set_result_from_operation()
    133

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operations_v1/operations_client.py in get_operation(self, name, retry, timeout)
    125         """
    126         request = operations_pb2.GetOperationRequest(name=name)
--> 127         return self._get_operation(request, retry=retry, timeout=timeout)
    128
    129     def list_operations(

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py in __call__(self, *args, **kwargs)
    137             kwargs['metadata'] = metadata
    138
--> 139         return wrapped_func(*args, **kwargs)
    140
    141

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    258                 sleep_generator,
    259                 self._deadline,
--> 260                 on_error=on_error,
    261             )
    262

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    175     for sleep in sleep_generator:
    176         try:
--> 177             return target()
    178
    179         # pylint: disable=broad-except

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/timeout.py in func_with_timeout(*args, **kwargs)
    204             """Wrapped function that adds timeout."""
    205             kwargs['timeout'] = next(timeouts)
--> 206             return func(*args, **kwargs)
    207
    208         return func_with_timeout

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     54             return callable_(*args, **kwargs)
     55         except grpc.RpcError as exc:
---> 56             six.raise_from(exceptions.from_grpc_error(exc), exc)
     57
     58     return error_remapped_callable

~/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

ResourceExhausted: 429 Received message larger than max (5999457 vs. 4194304)

Code example

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech_v1p1beta1 as speech

    client = speech.SpeechClient()

    audio = speech.types.RecognitionAudio(uri=gcs_uri)
    config = speech.types.RecognitionConfig(
        encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
        enable_speaker_diarization=True,
        diarization_speaker_count=2,
        sample_rate_hertz=16000,
        language_code='en-AU')

    operation = client.long_running_recognize(config, audio)
    return operation


url='gs://some_gcs_path/somefile.wav'
operation = transcribe_gcs(url)

print('Waiting for operation to complete...')
response = operation.result()
@JustinBeckwith JustinBeckwith added the triage me I really want to be triaged. label Aug 18, 2018
@tseaver tseaver changed the title Speech: speech_v1p1beta1 with diarization, 429 Received message larger than max (5999457 vs. 4194304) Speech: v1p1beta1 with diarization, 429 Received message larger than max Aug 20, 2018
@tseaver tseaver added type: question Request for information or clarification. Not an issue. api: speech Issues related to the Speech-to-Text API. and removed triage me I really want to be triaged. labels Aug 20, 2018
@tseaver
Copy link
Contributor

tseaver commented Aug 20, 2018

The documented content limits are in terms of approximate minutes of speech: ~180 minutes for asynch requests via a GCS URI.

In the reported error, the limit appears to be 4 megabytes (which is much smaller).

@edave
Copy link

edave commented Aug 22, 2018

I can confirm also encountering this error/limit with the same stack trace while trying to do a transcription.

  1. OS type and version: MacOS 10.13.6 (High Sierra)
  2. Python version and virtual environment information: Python 3.7.0
  3. google-cloud-python: Core: 0.28.1, Speech: 0.35.0

Indirectly, this is caused by enabling diarization and how transcriptions are divided into several results which represent discrete chunks of the transcribed dialogue. With diarization enabled, the words array in each result's transcription alternative does not encapsulate just the words for that portion of the transcription but also includes all previous words recognized in the transcription up to that point.

Or put more succinctly, if you have N words in a transcription, with diarization enabled, there are N^2 / 2 words in the transcription result payload instead of just N (to simplify this to the worst case for bounding behavior).

I've found attempting to transcribe any audio file over ~20 minutes will result in this error (exact length varies due to the number of transcribed words returned).

@tseaver
Copy link
Contributor

tseaver commented Aug 22, 2018

@edave Thanks for the research!

@theacodes Is there a Speech API PoC we can loop in?

@tseaver
Copy link
Contributor

tseaver commented Aug 28, 2018

@theacodes, @crwilcox Ping for a PoC?

@tseaver
Copy link
Contributor

tseaver commented Oct 18, 2018

/cc @beccasaurus

@sduskis
Copy link
Contributor

sduskis commented Jun 5, 2019

@tseaver, is this a grpc issue (message max size) that we could potentially fix via synth.py?

@tseaver
Copy link
Contributor

tseaver commented Jun 5, 2019

@sduskis I don't think so: I believe those limits are enforced client side, whereas this report shows a 429 response from the server.

@sduskis sduskis added backend and removed backend labels Jun 12, 2019
@sduskis
Copy link
Contributor

sduskis commented Jun 13, 2019

Received message larger than max (x vs 4194304) shows up a lot in Google searches, and the general solution isgrpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(maxMsgSize)) with something larger than 4MB.

@plamut, can you please configure synth.py with 256MB for Speech? After that, we can close this issue.

@plamut
Copy link
Contributor

plamut commented Jun 14, 2019

@sduskis I presume 256 MiB was meant?

I opened a PR, the same setting in Python is called grpc.max_receive_message_length, I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: speech Issues related to the Speech-to-Text API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants