Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom handler request data #529

Closed
bigswede74 opened this issue Jul 13, 2020 · 10 comments
Closed

Custom handler request data #529

bigswede74 opened this issue Jul 13, 2020 · 10 comments
Assignees
Labels
triaged_wait Waiting for the Reporter's resp

Comments

@bigswede74
Copy link

Is there a mechanism to pass more than just the http request body to a custom handler? Currently the data is a dictionary with a single element containing the request body. I have the need to access more request data, the query string would suffice for my purposes.

Potential solution would be to pass the full request to the custom handler.

@harshbafna harshbafna self-assigned this Jul 14, 2020
@harshbafna
Copy link
Contributor

@bigswede74: The multiple request data is already supplied to the custom handler as a key-value pair in the data object.

Consider the following example which just dumps the raw binary data of the inputs sent in the request

  • Dummy handler
import logging

def handle(data,ctx):
    if data :
        logging.info(data)
        return ["read files"]
    else :
        logging.info("initialized")
  • Sample inference request sending two files in the request
curl -X POST "http://localhost:8080/predictions/multiple_data" -F 'file1=@/Users/harsh_bafna/test_images/kitten.jpg' -F 'file2=@/Users/harsh_bafna/test_images/kitten.jpg'
  • Backend log generated by the handler
[{'file1': bytearray(b'<__snip__binary_data_dump__>'), 'file2': bytearray(b'<__snip__binary_data_dump__>')}]
  • Sample request to send data in JSON format
curl -H "Content-Type: application/json" -X POST "http://localhost:8080/predictions/multiple_data" -d '{"key1":"val1", "key2":"val2"}'
  • Backend log generated by the handler
[{'body': {'key1': 'val1', 'key2': 'val2'}}]

@harshbafna harshbafna added the triaged_wait Waiting for the Reporter's resp label Jul 14, 2020
@faustomilletari
Copy link

eisen deploy might have what you need. you can pass whatever and receive whatever using torch serve. Moreover, you can get some metadata from the model, giving you info about what the model expects in terms of input keys, types and shape.

Of course it all uses torch serve with a custom handler

http://docs.eisen.ai/eisen/deploy.html

@buqing2009
Copy link

@bigswede74: The multiple request data is already supplied to the custom handler as a key-value pair in the data object.

Consider the following example which just dumps the raw binary data of the inputs sent in the request

  • Dummy handler
import logging

def handle(data,ctx):
    if data :
        logging.info(data)
        return ["read files"]
    else :
        logging.info("initialized")
  • Sample inference request sending two files in the request
curl -X POST "http://localhost:8080/predictions/multiple_data" -F 'file1=@/Users/harsh_bafna/test_images/kitten.jpg' -F 'file2=@/Users/harsh_bafna/test_images/kitten.jpg'
  • Backend log generated by the handler
[{'file1': bytearray(b'<__snip__binary_data_dump__>'), 'file2': bytearray(b'<__snip__binary_data_dump__>')}]
  • Sample request to send data in JSON format
curl -H "Content-Type: application/json" -X POST "http://localhost:8080/predictions/multiple_data" -d '{"key1":"val1", "key2":"val2"}'
  • Backend log generated by the handler
[{'body': {'key1': 'val1', 'key2': 'val2'}}]

I'm using curl POST upload two images, and running handle function in torch serve. I found POST requests cost a large part of time.Input image size is 640*400 RGB channels. The model handle in serve cost about 600ms, but upload POST form cost 4s!I confused if it's the bug of pytorch serve of not.

@buqing2009
Copy link

Let me paste some details.
The server whole running time is 5.64s:
image
But in handle i print all functions time cost(unit: ms):
2020-07-15 18:54:40,170 [INFO ] W-9000-_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - preprocess time :12.755870819091797
2020-07-15 18:54:40,304 [INFO ] W-9000-
_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - inference time :133.61120223999023
2020-07-15 18:54:40,749 [INFO ] W-9000-****_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - postprocess time :445.0702667236328

The whole process time is about 580ms, so before running into handle function, it cost about 5s.
So how to reduce the running time in server?

@harshbafna
Copy link
Contributor

@buqing2009: Could you please share the output of your curl command with time

e.g.

time curl -X POST "http://127.0.0.1:8080/predictions/modelname" -T image.jpg 

Also, share the access_log.log file from the logs directory.

@buqing2009
Copy link

@buqing2009: Could you please share the output of your curl command with time

e.g.

time curl -X POST "http://127.0.0.1:8080/predictions/modelname" -T image.jpg 

Also, share the access_log.log file from the logs directory.

the output of time curl:

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
curl -X POST http://10.244.12.30:8080/predictions/test -F  -F   0.01s user 0.00s system 0% cpu 5.979 total

the access_log:
2020-07-15 20:05:15,987 - /10.244.12.30:44492 "POST /models?model_name=test&url=test.mar&batch_size=2&max_batch_delay=5000&initial_workers=1&synchronous=true HTTP/1.1" 200 2490
2020-07-15 20:05:23,927 - /10.244.12.30:34062 "POST /predictions/test HTTP/1.1" 200 5958

@buqing2009
Copy link

i removed the batch_size=2&max_batch_delay=5000, it works now.
The running time is 600ms, thanks.

@harshbafna
Copy link
Contributor

/models?model_name=test&url=test.mar&batch_size=2&max_batch_delay=5000&initial_workers=1&synchronous=true

The delay is because of the max_batch_delay parameter set to 5000ms at the time of model registration. `TorchServe waits for this configured amount of time or for input the batch to complete before it forwards the request to the backend worker for inference.

In your case, it will wait for either 2 (batch_size) inference requests or for 5 seconds (max_batch_delay) [whichever is earlier].

@harshbafna
Copy link
Contributor

i removed the batch_size=2&max_batch_delay=5000, it works now.
The running time is 600ms, thanks.

Cheers.

@xingener
Copy link

@harshbafna Hi, Bafna. I have read the comments above and issues which mentioned this issue.
It seems that the batch size config from trochserve doesn't control the batch size of models When I set multiple texts in one request. Here is an example:
I can send 30 texts per request after modifying handler like you suggested above. The batch size config from torchserve is 8 and we assume max delay is 10s. Each worker seems wait until there 8 requests before max delay comes. The actual batch size of the model is 8 * 30...
I wish the batch size of the model keeps invariant. Not varies by the number of texts in a request... Could you please give some advice for this situation? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged_wait Waiting for the Reporter's resp
Projects
None yet
Development

No branches or pull requests

5 participants