Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC RESOURCE_EXHAUSTED w. "Received message larger than max" with bumped max_{response, request}_size config. #1219

Closed
hampusrosvall opened this issue Aug 27, 2021 · 9 comments
Labels
bug Something isn't working

Comments

@hampusrosvall
Copy link

hampusrosvall commented Aug 27, 2021

Context

  • torchserve version: torchserve:0.4.1-gpu
  • torch-model-archiver version: 0.4.2
  • torch version: 1.9.0
  • torchvision version [if any]: N/A
  • torchtext version [if any]: N/A
  • torchaudio version [if any]: N/A
  • java version: see docker image
  • Operating System and version: see docker image

Your Environment

  • Installed using source? [yes/no]: no
  • Are you planning to deploy it using docker container? [yes/no]: yes
  • Is it a CPU or GPU environment?: GPU
  • Using a default/custom handler? [If possible upload/share custom handler/model]: subclassing VisionHandler (more details below)
  • What kind of model is it e.g. vision, text, audio?: vision
  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.?
    [If public url then provide link.]: local
  • Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs: enable_envvars_config=true in config.properties
  • Link to your project [if any]: N/A

Expected Behavior

By setting the max_response_size config parameter I should be able to send messages over gRPC smaller or equal to the parameter value.

Current Behavior

I am querying torchserve using gRPC and on the client side I am getting the following error:

<_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (5649472 vs. 4194304)"
debug_error_string = "{"created":"@1630054530.266403986","description":"Received message larger than max (5649472 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"

I am using the torchserve:0.4.1-gpu docker image.
I should be able to bump the max_response_size config parameter and hence send messages smaller than or equal to that size? Any idea how I can solve this?

Some additional information on my settings:
I am using the following environment variables for the server

environment:
      - TS_CONFIG_FILE=/config/config.properties
      - TS_MAX_RESPONSE_SIZE=10000000
      - TS_MAX_REQUEST_SIZE=10000000

and the config.properties has the enable_envvars_config flag set to true

enable_envvars_config=true

In the official documentation the default size should be 6553500, however the error message provided above hints that the default size is 4194304.

Here are the logs upon startup of the torchserve service

torchserve_1  | Torchserve version: 0.4.1
torchserve_1  | TS Home: /usr/local/lib/python3.6/dist-packages
torchserve_1  | Current directory: /home/model-server
torchserve_1  | Temp directory: /home/model-server/tmp
torchserve_1  | Number of GPUs: 1
torchserve_1  | Number of CPUs: 16
torchserve_1  | Max heap size: 7990 M
torchserve_1  | Python executable: /usr/bin/python3
torchserve_1  | Config file: /config/config.properties
torchserve_1  | Inference address: http://127.0.0.1:8080
torchserve_1  | Management address: http://127.0.0.1:8081
torchserve_1  | Metrics address: http://127.0.0.1:8082
torchserve_1  | Model Store: /model-store
torchserve_1  | Initial Models: model.mar
torchserve_1  | Log dir: /home/model-server/logs
torchserve_1  | Metrics dir: /home/model-server/logs
torchserve_1  | Netty threads: 0
torchserve_1  | Netty client threads: 0
torchserve_1  | Default workers per model: 1
torchserve_1  | Blacklist Regex: N/A
torchserve_1  | Maximum Response Size: 10000000
torchserve_1  | Maximum Request Size: 10000000
torchserve_1  | Prefer direct buffer: false
torchserve_1  | Allowed Urls: [file://.*|http(s)?://.*]
torchserve_1  | Custom python dependency for model allowed: false
torchserve_1  | Metrics report format: prometheus
torchserve_1  | Enable metrics API: true
torchserve_1  | Workflow Store: /model-store
torchserve_1  | Model config: N/A
@jagadeeshi2i
Copy link
Collaborator

@hampusrosvall Looks fine with the configurations and from the torchserve logs, it seems the request size and response size are set to 10000000. Can you share the steps, so i can try to reproduce the issue.

@marvelous-melanie
Copy link

marvelous-melanie commented Oct 5, 2021

Hi, I am experiencing the same issue -- my grpc client-side channel is set up like so:

def get_inference_stub():
    channel = grpc.insecure_channel(f'{INFERENCE_HOST}:{INFERENCE_PORT}',
                                    options=[('grpc.max_send_message_length', 41943040),
                                            ('grpc.max_recieve_message_length', 41943040)])
    stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)
    return stub

and my config.properties contains the following lines:

max_response_size=65535000
max_request_size=65535000

and yet when I try to run inference I get the response:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "Received message larger than max (23830833 vs. 4194304)"
	debug_error_string = "{"created":"@1633462595.988650123","description":"Received message larger than max (23830833 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"

It would appear that the size gets set on the netty server but not the actual grpc server.

@msaroufim msaroufim added the bug Something isn't working label Oct 5, 2021
@jagadeeshi2i
Copy link
Collaborator

If you are using TS_CONFIG_FILE ENV var then there is an open issue on reading this env var. #1257

@marvelous-melanie
Copy link

@jagadeeshi2i I don't have that env var set -- I set ts config from the command line and nowhere else:

torchserve --start --ts-config config.properties --model-store ./models/ --models model.mar --foreground

@jagadeeshi2i
Copy link
Collaborator

check torchserve startup logs if max_request_size are read from config.properties.

@marvelous-melanie
Copy link

marvelous-melanie commented Oct 6, 2021

@jagadeeshi2i Yes; the max_request_size and max_response_size are indeed set --

Maximum Response Size: 65535000
Maximum Request Size: 65535000

The settings are being recognized and honored for HTTP payload changes. I tested this.

The issue is that these changes do not seem to be propagated down to the gRPC server.

@jagadeeshi2i
Copy link
Collaborator

jagadeeshi2i commented Oct 6, 2021

    private Server startGRPCServer(ConnectorType connectorType) throws IOException {

        ServerBuilder<?> s =
                NettyServerBuilder.forPort(configManager.getGRPCPort(connectorType))
                        .maxInboundMessageSize(configManager.getMaxRequestSize())
                        .addService(
                                ServerInterceptors.intercept(
                                        GRPCServiceFactory.getgRPCService(connectorType),
                                        new GRPCInterceptor()));

The request size is set from the config. properties try changing the client code.

@lxning
Copy link
Collaborator

lxning commented Oct 18, 2021

The root cause of the error "Received message larger than max" is on the client side. Please check stackoverflow.

@lxning lxning closed this as completed Oct 18, 2021
@anishchhaparwal
Copy link

anishchhaparwal commented Oct 18, 2022

Making the following changes in torchserve_grpc_client.py solved the issue:

def get_inference_stub():
    options= [('grpc.max_receive_message_length',200*1024*1024)]
    channel = grpc.insecure_channel("localhost:7070", options=options)
    stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)
    return stub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants