Custom handler request data #529

bigswede74 · 2020-07-13T17:05:59Z

Is there a mechanism to pass more than just the http request body to a custom handler? Currently the data is a dictionary with a single element containing the request body. I have the need to access more request data, the query string would suffice for my purposes.

Potential solution would be to pass the full request to the custom handler.

harshbafna · 2020-07-14T07:14:15Z

@bigswede74: The multiple request data is already supplied to the custom handler as a key-value pair in the data object.

Consider the following example which just dumps the raw binary data of the inputs sent in the request

Dummy handler

import logging

def handle(data,ctx):
    if data :
        logging.info(data)
        return ["read files"]
    else :
        logging.info("initialized")

Sample inference request sending two files in the request

curl -X POST "http://localhost:8080/predictions/multiple_data" -F 'file1=@/Users/harsh_bafna/test_images/kitten.jpg' -F 'file2=@/Users/harsh_bafna/test_images/kitten.jpg'

Backend log generated by the handler

[{'file1': bytearray(b'<__snip__binary_data_dump__>'), 'file2': bytearray(b'<__snip__binary_data_dump__>')}]

Sample request to send data in JSON format

curl -H "Content-Type: application/json" -X POST "http://localhost:8080/predictions/multiple_data" -d '{"key1":"val1", "key2":"val2"}'

Backend log generated by the handler

[{'body': {'key1': 'val1', 'key2': 'val2'}}]

faustomilletari · 2020-07-14T14:19:52Z

eisen deploy might have what you need. you can pass whatever and receive whatever using torch serve. Moreover, you can get some metadata from the model, giving you info about what the model expects in terms of input keys, types and shape.

Of course it all uses torch serve with a custom handler

http://docs.eisen.ai/eisen/deploy.html

buqing2009 · 2020-07-15T09:52:03Z

@bigswede74: The multiple request data is already supplied to the custom handler as a key-value pair in the data object.

Consider the following example which just dumps the raw binary data of the inputs sent in the request

Dummy handler
import logging

def handle(data,ctx):
    if data :
        logging.info(data)
        return ["read files"]
    else :
        logging.info("initialized")
Sample inference request sending two files in the request
curl -X POST "http://localhost:8080/predictions/multiple_data" -F 'file1=@/Users/harsh_bafna/test_images/kitten.jpg' -F 'file2=@/Users/harsh_bafna/test_images/kitten.jpg'
Backend log generated by the handler
[{'file1': bytearray(b'<__snip__binary_data_dump__>'), 'file2': bytearray(b'<__snip__binary_data_dump__>')}]
Sample request to send data in JSON format
curl -H "Content-Type: application/json" -X POST "http://localhost:8080/predictions/multiple_data" -d '{"key1":"val1", "key2":"val2"}'
Backend log generated by the handler
[{'body': {'key1': 'val1', 'key2': 'val2'}}]

I'm using curl POST upload two images, and running handle function in torch serve. I found POST requests cost a large part of time.Input image size is 640*400 RGB channels. The model handle in serve cost about 600ms, but upload POST form cost 4s！I confused if it's the bug of pytorch serve of not.

buqing2009 · 2020-07-15T11:05:47Z

Let me paste some details.
The server whole running time is 5.64s:

But in handle i print all functions time cost(unit: ms):
2020-07-15 18:54:40,170 [INFO ] W-9000-_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - preprocess time :12.755870819091797
2020-07-15 18:54:40,304 [INFO ] W-9000-_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - inference time :133.61120223999023
2020-07-15 18:54:40,749 [INFO ] W-9000-****_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - postprocess time :445.0702667236328

The whole process time is about 580ms, so before running into handle function, it cost about 5s.
So how to reduce the running time in server?

harshbafna · 2020-07-15T11:25:00Z

@buqing2009: Could you please share the output of your curl command with time

e.g.

time curl -X POST "http://127.0.0.1:8080/predictions/modelname" -T image.jpg

Also, share the access_log.log file from the logs directory.

buqing2009 · 2020-07-15T12:10:39Z

@buqing2009: Could you please share the output of your curl command with time

e.g.
time curl -X POST "http://127.0.0.1:8080/predictions/modelname" -T image.jpg 
Also, share the access_log.log file from the logs directory.

the output of time curl:

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
curl -X POST http://10.244.12.30:8080/predictions/test -F  -F   0.01s user 0.00s system 0% cpu 5.979 total

the access_log:
2020-07-15 20:05:15,987 - /10.244.12.30:44492 "POST /models?model_name=test&url=test.mar&batch_size=2&max_batch_delay=5000&initial_workers=1&synchronous=true HTTP/1.1" 200 2490
2020-07-15 20:05:23,927 - /10.244.12.30:34062 "POST /predictions/test HTTP/1.1" 200 5958

buqing2009 · 2020-07-15T12:21:48Z

i removed the batch_size=2&max_batch_delay=5000, it works now.
The running time is 600ms, thanks.

harshbafna · 2020-07-15T12:23:26Z

/models?model_name=test&url=test.mar&batch_size=2&max_batch_delay=5000&initial_workers=1&synchronous=true

The delay is because of the max_batch_delay parameter set to 5000ms at the time of model registration. `TorchServe waits for this configured amount of time or for input the batch to complete before it forwards the request to the backend worker for inference.

In your case, it will wait for either 2 (batch_size) inference requests or for 5 seconds (max_batch_delay) [whichever is earlier].

harshbafna · 2020-07-15T12:23:50Z

i removed the batch_size=2&max_batch_delay=5000, it works now.
The running time is 600ms, thanks.

Cheers.

xingener · 2021-08-12T13:54:52Z

@harshbafna Hi, Bafna. I have read the comments above and issues which mentioned this issue.
It seems that the batch size config from trochserve doesn't control the batch size of models When I set multiple texts in one request. Here is an example:
I can send 30 texts per request after modifying handler like you suggested above. The batch size config from torchserve is 8 and we assume max delay is 10s. Each worker seems wait until there 8 requests before max delay comes. The actual batch size of the model is 8 * 30...
I wish the batch size of the model keeps invariant. Not varies by the number of texts in a request... Could you please give some advice for this situation? Thanks

harshbafna self-assigned this Jul 14, 2020

harshbafna added the triaged_wait Waiting for the Reporter's resp label Jul 14, 2020

harshbafna mentioned this issue Jul 14, 2020

support two/multi inputs #524

Closed

bigswede74 closed this as completed Jul 20, 2020

harshbafna mentioned this issue Aug 24, 2020

About Arguments #621

Closed

harshbafna mentioned this issue Nov 26, 2020

Passing multiple(variable) input images to the handler via curl #805

Closed

harshbafna mentioned this issue Dec 18, 2020

Question about batch support using custom handlers #777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom handler request data #529

Custom handler request data #529

bigswede74 commented Jul 13, 2020

harshbafna commented Jul 14, 2020

faustomilletari commented Jul 14, 2020

buqing2009 commented Jul 15, 2020

buqing2009 commented Jul 15, 2020

harshbafna commented Jul 15, 2020

buqing2009 commented Jul 15, 2020

buqing2009 commented Jul 15, 2020

harshbafna commented Jul 15, 2020

harshbafna commented Jul 15, 2020

xingener commented Aug 12, 2021

Custom handler request data #529

Custom handler request data #529

Comments

bigswede74 commented Jul 13, 2020

harshbafna commented Jul 14, 2020

faustomilletari commented Jul 14, 2020

buqing2009 commented Jul 15, 2020

buqing2009 commented Jul 15, 2020

harshbafna commented Jul 15, 2020

buqing2009 commented Jul 15, 2020

buqing2009 commented Jul 15, 2020

harshbafna commented Jul 15, 2020

harshbafna commented Jul 15, 2020

xingener commented Aug 12, 2021