-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Request ID
The design has to accommodate multiple concurrent inference requests to the server and a mechanism to tie a specific request to a model output is required. This could be inferred from the processing order which is strictly FIFO but adding a ID token to each request provides additional context for development and debugging. The token has no semantics and is purely passed through the C++. Anomaly Detector flushID is the prior art here.
Payload
The inference payload is a series of numeric tokens. An individual inference request will consist of the request ID, the payload tokens and a marker to delineate each request.
Anomaly Detection uses a concise length encoded binary protocol because of the high volume of data sent across pipes. Compared with Anomaly Detection the input is small so a more verbose input format can be used which has the advantage of being descriptive.
Input Format
A JSON document:
{
“request_id” : “string”,
“token_ids” : [int, int,...],
“attention_mask” : [int, int,...],
“token_type_ids” : [int, int,...],
“position_ids” : [int, int,...]
}
token_ids
and attention_mask
are required for all uses, token_type_ids
and position_ids
are optional depending on the model type.
Output Format
A JSON document for flexibility containing the request ID token, the result tensor and optionally the predicted tokens depending on the model type:
{
“request_id” : “string”,
“predictions” : [float, float,...],
“tokens” : [int, int,...]
}