Creating a retry policy using the service config causes #25003

Emil-Hansen · 2020-12-15T19:02:03Z

What version of gRPC and what language are you using?

1.34.0, python
(Though also tried grpcio 1.33.2 and 1.32.0)

What operating system (Linux, Windows,...) and version?

Tested on Ubuntu 20.04 (docker), buster (docker) and MacOS 11.1 results were the same in all cases

What runtime / compiler are you using (e.g. python version or version of gcc)

python 3.8

What did you do?

Created a server that when heavily loaded will return RESOURCE_EXHAUSTED to the client and have the client use a service config to retry.

Steps to reproduce:

Installed grpcio and grpcio-tools with version 1.34.0
Ran the server from the file attached
Ran the client from the file attached

Server:

from concurrent import futures
import logging

import grpc

protos, services = grpc.protos_and_services("test.proto")

class Handler(services.HandlerServicer):

    def Call(self, request, context):
        context.abort(code=grpc.StatusCode.RESOURCE_EXHAUSTED,
                          details="Server is busy, come back soon")

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    services.add_HandlerServicer_to_server(Handler(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()


if __name__ == '__main__':
    logging.basicConfig()
    serve()

Client:

import json

import grpc

protos, services = grpc.protos_and_services("test.proto")

SERVICE_CONFIG = json.dumps(
    {
        "methodConfig": [
            {
                "name": [
                    {"service": "test.Handler"},
                ],
                "retryPolicy": {
                    "maxAttempts": 5,
                    "initialBackoff": "0.3s",
                    "maxBackoff": "10s",
                    "backoffMultiplier": 3,
                    "retryableStatusCodes": ["UNAVAILABLE", "RESOURCE_EXHAUSTED"],
                },
            }
        ]
    }
)
options = [
    ("grpc.service_config", SERVICE_CONFIG),
]

channel = grpc.insecure_channel('localhost:50051', options=options)
stub = services.HandlerStub(channel)


for response in stub.Call(protos.Request()):
    pass

proto file:

syntax = "proto3";

package test;


service Handler {
  rpc Call  (Request) returns (stream Response) {}
}

message Request {
}

message Response {}

requirements.txt

grpcio==1.34
grpcio-tools==1.34

What did you expect to see?

Expected client to retry 4 times and then raise a RESOURCE_EXHAUSTED error

What did you see instead?

An error in an underlying library caused the call to crash and bring down client that the call was run from, not allowing python to handle the error.

With GRPC_VERBOSITY=debug, the following stack trace was generated:

D1215 18:44:02.599314007      24 ev_posix.cc:173]            Using polling engine: epollex
D1215 18:44:02.599417707      24 lb_policy_registry.cc:42]   registering LB policy factory for "grpclb"
D1215 18:44:02.599426554      24 lb_policy_registry.cc:42]   registering LB policy factory for "priority_experimental"
D1215 18:44:02.599434638      24 lb_policy_registry.cc:42]   registering LB policy factory for "weighted_target_experimental"
D1215 18:44:02.599438372      24 lb_policy_registry.cc:42]   registering LB policy factory for "pick_first"
D1215 18:44:02.599441453      24 lb_policy_registry.cc:42]   registering LB policy factory for "round_robin"
D1215 18:44:02.599445377      24 dns_resolver_ares.cc:507]   Using ares dns resolver
D1215 18:44:02.599460957      24 lb_policy_registry.cc:42]   registering LB policy factory for "cds_experimental"
D1215 18:44:02.599468822      24 lb_policy_registry.cc:42]   registering LB policy factory for "eds_experimental"
D1215 18:44:02.599503917      24 lb_policy_registry.cc:42]   registering LB policy factory for "eds_drop_experimental"
D1215 18:44:02.599507699      24 lb_policy_registry.cc:42]   registering LB policy factory for "xds_cluster_manager_experimental"
I1215 18:44:02.610086424      24 socket_utils_common_posix.cc:427] Disabling AF_INET6 sockets because ::1 is not available.
I1215 18:44:02.611445714      28 subchannel.cc:1126]         New connected subchannel at 0x7f2364002000 for subchannel 0x55812330b950
E1215 18:44:02.612751183      28 sync.cc:105]                assertion failed: prior > 0

Anything else we should know about your project / environment?

The text was updated successfully, but these errors were encountered:

alexgenco · 2021-03-19T19:07:15Z

I am also seeing this error using the Ruby client. It seems like the only option is to avoid retry policies for streaming endpoints, because if you receive a retryable error like this before any messages have been received, your entire application process will be aborted.

Probably worth noting that if you receive an error AFTER the client has received any streaming messages, it correctly cancels the stream and throws the GRPC status error without retrying.

spacether · 2021-04-26T18:08:09Z

When setting the channel options, I think that one also needs to add:

('grpc.enable_retries', True),

nanonyme · 2021-06-21T18:56:40Z

It's essentially this assert which is being hit, right? https://github.com/grpc/grpc/blob/master/src/core/lib/gpr/sync.cc#L105. What behaviour is that assert guarding against? Shouldn't that be handled through return values instead in release builds?

nanonyme · 2021-06-21T19:01:36Z

@yashykt are you sure the Python label is correct? If the failure is also happening in other client bindings and happens because of an assert in core, isn't this more probably a core issue?

Emil-Hansen added kind/bug priority/P2 labels Dec 15, 2020

Emil-Hansen assigned nicolasnoble Dec 15, 2020

yashykt added the lang/Python label Feb 9, 2021

yashykt assigned lidizheng and gnossen and unassigned nicolasnoble Feb 9, 2021

nanonyme mentioned this issue Jun 16, 2021

_artifactcache.py: make pull/push errors retryable apache/buildstream#1477

Merged

lidizheng removed their assignment Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a retry policy using the service config causes #25003

Creating a retry policy using the service config causes #25003

Emil-Hansen commented Dec 15, 2020 •

edited

alexgenco commented Mar 19, 2021

spacether commented Apr 26, 2021 •

edited

nanonyme commented Jun 21, 2021 •

edited

nanonyme commented Jun 21, 2021

Creating a retry policy using the service config causes #25003

Creating a retry policy using the service config causes #25003

Comments

Emil-Hansen commented Dec 15, 2020 • edited

What version of gRPC and what language are you using?

What operating system (Linux, Windows,...) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

Steps to reproduce:

What did you expect to see?

What did you see instead?

Anything else we should know about your project / environment?

alexgenco commented Mar 19, 2021

spacether commented Apr 26, 2021 • edited

nanonyme commented Jun 21, 2021 • edited

nanonyme commented Jun 21, 2021

Emil-Hansen commented Dec 15, 2020 •

edited

spacether commented Apr 26, 2021 •

edited

nanonyme commented Jun 21, 2021 •

edited