Skip to content

[ML] [PyTorch] Communications with between ES and the native process #1700

@davidkyle

Description

@davidkyle

Request ID

The design has to accommodate multiple concurrent inference requests to the server and a mechanism to tie a specific request to a model output is required. This could be inferred from the processing order which is strictly FIFO but adding a ID token to each request provides additional context for development and debugging. The token has no semantics and is purely passed through the C++. Anomaly Detector flushID is the prior art here.

Payload

The inference payload is a series of numeric tokens. An individual inference request will consist of the request ID, the payload tokens and a marker to delineate each request.

Anomaly Detection uses a concise length encoded binary protocol because of the high volume of data sent across pipes. Compared with Anomaly Detection the input is small so a more verbose input format can be used which has the advantage of being descriptive.

Input Format

A JSON document:

{
  “request_id” : “string”,  
  “token_ids” : [int, int,...],
  “attention_mask” : [int, int,...],
  “token_type_ids” : [int, int,...],
  “position_ids” : [int, int,...]
}

token_ids and attention_mask are required for all uses, token_type_ids and position_ids are optional depending on the model type.

Output Format

A JSON document for flexibility containing the request ID token, the result tensor and optionally the predicted tokens depending on the model type:

{
  “request_id” : “string”,  
  “predictions” : [float, float,...],
  “tokens” : [int, int,...]
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions