-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update error message when attempting inference on model with 0 workers #29
Comments
User can provide the initial-workers while registering the models through management API. As documented in management api doc :
The asynchronous call will return before trying to create workers with HTTP code 202: curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=false&url=https://<s3_path>/squeezenet_v1.1.mar"
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 29cde8a4-898e-48df-afef-f1a827a3cbc2
< content-length: 33
< connection: keep-alive
<
{
"status": "Worker updated"
} The synchronous call will return after all workers has be adjusted with HTTP code 200. curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=https://<s3_path>/squeezenet_v1.1.mar"
< HTTP/1.1 200 OK
< content-type: application/json
< x-request-id: c4b2804e-42b1-4d6f-9e8f-1e8901fc2c6c
< content-length: 32
< connection: keep-alive
<
{
"status": "Worker scaled"
} We are planning to update the error message as follows :
Please let us know your thoughts. |
Changing the Error Message sounds like a reasonable way. Should we set this to a default value of 1 so clients can make inference? |
Leaving the default workers as 0 is the safest option, and the user still has the option of specifying The two measures I'd recommend:
With this added communication to mitigate user surprise, I think we'd be in good shape. |
I ran into a similar issue when following an example here:
I plugged in my own model. Is it an outdated piece? |
On adding a model via the management-api, the default min/max workers for the model is set to 0. As a result when running prediction against the model after registering gives a 503 error with details as 'No worker is available to serve request: densenet161'. This will be confusing for user's trying to add models from the inference api.
Register model using:
curl -X POST "http://:/models?url=https://<s3_path>/densenet161.mar"
{
"status": "Model densenet161 registered"
}
Inference using:
curl -X POST http://:/predictions/densenet161 -T cutekit.jpeg
{
"code": 503,
"type": "ServiceUnavailableException",
"message": "No worker is available to serve request: densenet161"
}
Model details:
curl -X GET http://:/models/densenet161
[
{
"modelName": "densenet161",
"modelVersion": "1.0",
"modelUrl": "https:///densenet161.mar",
"runtime": "python",
"minWorkers": 0,
"maxWorkers": 0,
"batchSize": 1,
"maxBatchDelay": 100,
"loadedAtStartup": false,
"workers": []
}
]
The text was updated successfully, but these errors were encountered: