From b035209048515aa4fd49e4b071c131088c0f1987 Mon Sep 17 00:00:00 2001 From: sgolebiewski-intel Date: Thu, 11 Apr 2024 15:17:08 +0200 Subject: [PATCH] Fixing broken links --- .../python/README.md | 14 +-- docs/binary_input.md | 6 +- docs/mediapipe.md | 2 +- docs/model_server_rest_api_tfs.md | 110 +++++++++--------- docs/ovms_quickstart.md | 19 +-- docs/parameters.md | 6 +- docs/stateful_models.md | 34 +++--- 7 files changed, 96 insertions(+), 95 deletions(-) diff --git a/demos/real_time_stream_analysis/python/README.md b/demos/real_time_stream_analysis/python/README.md index 4f675855df..de11ebe802 100644 --- a/demos/real_time_stream_analysis/python/README.md +++ b/demos/real_time_stream_analysis/python/README.md @@ -2,10 +2,10 @@ ## Overview This demo demonstrates how to write an application running AI analysis using OpenVINO Model Server. -In the video analysis we can deal with various form of the source content. Here, you will see how to +In the video analysis we can deal with various form of the source content. Here, you will see how to take the source of the video from a local USB camera, saved encoded video file and an encoded video stream. -The client application is expected to read the video source and send for the analysis every frame to the OpenVINO Model Server via gRPC connection. The analysis can be fully delegated to the model server endpoint with the +The client application is expected to read the video source and send for the analysis every frame to the OpenVINO Model Server via gRPC connection. The analysis can be fully delegated to the model server endpoint with the complete processing pipeline arranged via a [MediaPipe graph](../../../docs/mediapipe.md) or [DAG](../../../docs/dag_scheduler.md). The remote analysis can be also reduced just to inference execution but in such case the video frame preprocessing and the postprocessing of the results must be implemented on the client side. In this demo, reading the video content from a local USB camera and encoded video file is straightforward using OpenCV library. The use case with encoded network stream might require more explanation. @@ -29,12 +29,12 @@ In the demo will be used two gRPC communication patterns which might be advantag ## gRPC streaming with MediaPipe graphs -gRPC stream connection is allowed for served [MediaPipe graphs](). It allows sending asynchronous calls to the endpoint all linked in a single session context. Responses are sent back via a stream and processed in the callback function. -The helper class [StreamClient](../../common/stream_client/stream_client.py) provides a mechanism for flow control and tracking the sequence of the requests and responses. In the StreamClient initialization the streaming mode is set via the parameter `streaming_api=True`. +gRPC stream connection is allowed for served [MediaPipe graphs](../../../docs/mediapipe.md). It allows sending asynchronous calls to the endpoint all linked in a single session context. Responses are sent back via a stream and processed in the callback function. +The helper class [StreamClient](https://github.com/openvinotoolkit/model_server/blob/releases/2024/0/demos/common/stream_client/stream_client.py) provides a mechanism for flow control and tracking the sequence of the requests and responses. In the StreamClient initialization the streaming mode is set via the parameter `streaming_api=True`. Using the streaming API has the following advantages: - good performance thanks to asynchronous calls and sharing the graph execution for multiple calls -- support for stateful pipelines like object tracking when the response is dependent on the sequence of requests +- support for stateful pipelines like object tracking when the response is dependent on the sequence of requests ### Preparing the model server for gRPC streaming with a Holistic graph @@ -66,7 +66,7 @@ For the use case with RTSP client, install also FFMPEG component on the host. Alternatively build a docker image with the client with the following command: ```bash docker build ../../common/stream_client/ -t rtsp_client -``` +``` Client parameters: ```bash @@ -136,7 +136,7 @@ ffmpeg -stream_loop -1 -i ./video.mp4 -f rtsp -rtsp_transport tcp rtsp://localho ffmpeg -f dshow -i video="HP HD Camera" -f rtsp -rtsp_transport tcp rtsp://localhost:8080/channel1 ``` -While the RTSP stream is active, run the client to read it and send the output stream +While the RTSP stream is active, run the client to read it and send the output stream ```bash python3 client.py --grpc_address localhost:9000 --input_stream 'rtsp://localhost:8080/channel1' --output_stream 'rtsp://localhost:8080/channel2' ``` diff --git a/docs/binary_input.md b/docs/binary_input.md index 82f657a100..c4aa732984 100644 --- a/docs/binary_input.md +++ b/docs/binary_input.md @@ -12,7 +12,7 @@ ovms_docs_binary_input_kfs ovms_docs_demo_tensorflow_conversion ``` -For images, to reduce data size and lower bandwidth usage you can send them in binary-encoded instead of array-like format. How you can do it, depends on the kind of servable. +For images, to reduce data size and lower bandwidth usage you can send them in binary-encoded instead of array-like format. How you can do it, depends on the kind of servable. **Single Models and DAG Pipelines**: @@ -23,8 +23,8 @@ automatically from JPEG/PNG to OpenVINO friendly format using built-in [OpenCV]( - [TensorFlow Serving API](./binary_input_tfs.md) - [KServe API](./binary_input_kfs.md) -It's worth noting that with KServe API, you can also send raw data with or without image encoding via REST API. This makes KServe REST API more performant choice comparing to json format in TFS API. The guide linked above explains how to work with both regular data in binary format as well as JPEG/PNG encoded images. +It's worth noting that with KServe API, you can also send raw data with or without image encoding via REST API. This makes KServe REST API more performant choice comparing to json format in TFS API. The guide linked above explains how to work with both regular data in binary format as well as JPEG/PNG encoded images. **MediaPipe Graphs**: -When serving MediaPipe Graph it is possible to configure it to accept binary encoded images. You can either create your own calculator that would implement image decoding and use it in the graph or use `PythonExecutorCalculator` and implement decoding in Python [execute function](./python_support/reference.md#ovmspythonmodel-class). \ No newline at end of file +When serving MediaPipe Graph it is possible to configure it to accept binary encoded images. You can either create your own calculator that would implement image decoding and use it in the graph or use `PythonExecutorCalculator` and implement decoding in Python [execute function](./python_support/reference.md#ovmspythonmodel-class). diff --git a/docs/mediapipe.md b/docs/mediapipe.md index 8fada3f69e..2fd673e5ba 100644 --- a/docs/mediapipe.md +++ b/docs/mediapipe.md @@ -200,7 +200,7 @@ Currently the graph tracing on the model server side is not supported. If you wo ### Benchmarking While you implemented and deployed the graph you have several options to test the performance. -To validate the throughput for unary requests you can use the [benchmark client](../demos/benchmark/python#mediapipe-benchmarking). +To validate the throughput for unary requests you can use the [benchmark client](../demos/benchmark/python/README.md#mediapipe-benchmarking). For streaming gRPC connections, there is available [rtps_client](../demos/mediapipe/holistic_tracking#rtsp-client). It can generate the load to gRPC stream and the mediapipe graph based on the content from RTSP video stream, MPG4 file or from the local camera. diff --git a/docs/model_server_rest_api_tfs.md b/docs/model_server_rest_api_tfs.md index a2cf1383e4..f96f533f6b 100644 --- a/docs/model_server_rest_api_tfs.md +++ b/docs/model_server_rest_api_tfs.md @@ -55,10 +55,10 @@ $ curl http://localhost:8001/v1/models/person-detection/versions/1 { 'model_version_status':[ { - 'version': '1', - 'state': 'AVAILABLE', + 'version': '1', + 'state': 'AVAILABLE', 'status': { - 'error_code': 'OK', + 'error_code': 'OK', 'error_message': '' } } @@ -172,7 +172,7 @@ POST http://${REST_URL}:${REST_PORT}/v1/models/${MODEL_NAME}/versions/${MODEL_VE "instances": |<(nested)list>| "inputs": |<(nested)list>| } -``` +``` Read [How to specify input tensors in row format](https://www.tensorflow.org/tfx/serving/api_rest#specifying_input_tensors_in_row_format) and [How to specify input tensors in column format](https://www.tensorflow.org/tfx/serving/api_rest#specifying_input_tensors_in_column_format) for more details. @@ -220,7 +220,7 @@ Read more about [Predict API usage](https://github.com/openvinotoolkit/model_ser Sends requests via RESTful API to trigger config reloading and gets models and [DAGs](./dag_scheduler.md) statuses as a response. This endpoint can be used with disabled automatic config reload to ensure configuration changes are applied in a specific time and also to get confirmation about reload operation status. Typically this option is to be used when OVMS is started with a parameter `--file_system_poll_wait_seconds 0`. Reload operation does not pass new configuration to OVMS server. The configuration file changes need to be applied by the OVMS administrator. The REST API call just initiate applying the configuration file which is already present. -**URL** +**URL** ``` POST http://${REST_URL}:${REST_PORT}/v1/config/reload ``` @@ -243,37 +243,37 @@ curl --request POST http://${REST_URL}:${REST_PORT}/v1/config/reload **Response** -In case of config reload success, the response contains JSON with aggregation of getModelStatus responses for all models and DAGs after reload is finished, along with operation status: +In case of config reload success, the response contains JSON with aggregation of getModelStatus responses for all models and DAGs after reload is finished, along with operation status: ```JSON -{ -"": -{ - "model_version_status": [ - { +{ +"": +{ + "model_version_status": [ + { "version": |, - "state": |, + "state": |, "status": - { - "error_code": |, - "error_message": | - } - }, - ... - ] -}, -... -} -``` - -In case of any failure during execution: - + { + "error_code": |, + "error_message": | + } + }, + ... + ] +}, +... +} +``` + +In case of any failure during execution: + ```JSON -{ - "error": | -} +{ + "error": | +} ``` When an operation succeeds HTTP response status code is - - `201` when config(config file or model version) was reloaded + - `201` when config(config file or model version) was reloaded - `200` when reload was not required, already applied or OVMS was started in single model mode When an operation fails another status code is returned. @@ -313,54 +313,54 @@ Possible messages returned on error: } ``` -Even if one of models reload failed other may be working properly. To check state of loaded models use [Config Status API](#config-status-api). To detect exact cause of errors described above analyzing sever logs may be necessary. +Even if one of models reload failed other may be working properly. To check state of loaded models use [Config Status API](#config-status). To detect exact cause of errors described above analyzing sever logs may be necessary. ## Config Status API **Description** Sends requests via RESTful API to get a response that contains an aggregation of getModelStatus responses for all models and [DAGs](./dag_scheduler.md). -**URL** +**URL** ``` GET http://${REST_URL}:${REST_PORT}/v1/config ``` -**Request** +**Request** To trigger this API HTTP GET request should be sent on a given URL. Example `curl` command: ``` curl --request GET http://${REST_URL}:${REST_PORT}/v1/config ``` -**Response** -In case of success, the response contains JSON with aggregation of getModelStatus responses for all models and DAGs, along with operation status: +**Response** +In case of success, the response contains JSON with aggregation of getModelStatus responses for all models and DAGs, along with operation status: ```JSON -{ -"": -{ - "model_version_status": [ - { +{ +"": +{ + "model_version_status": [ + { "version": |, - "state": |, + "state": |, "status": - { - "error_code": |, - "error_message": | - } - }, - ... - ] -}, -... -} + { + "error_code": |, + "error_message": | + } + }, + ... + ] +}, +... +} ``` In case of any failure during execution: - + ```JSON -{ - "error": | -} +{ + "error": | +} ``` When operation succeeded HTTP response status code is 200, otherwise, another code is returned. Possible messages returned on error: diff --git a/docs/ovms_quickstart.md b/docs/ovms_quickstart.md index 40a476ec8e..4b05fa9870 100644 --- a/docs/ovms_quickstart.md +++ b/docs/ovms_quickstart.md @@ -1,6 +1,6 @@ # Quickstart Guide {#ovms_docs_quick_start_guide} -OpenVINO Model Server can perform inference using pre-trained models in either [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format/operation-sets.html) +OpenVINO Model Server can perform inference using pre-trained models in either [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format/operation-sets.html) , [ONNX](https://onnx.ai/), [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) or [TensorFlow](https://www.tensorflow.org/) format. You can get them by: - downloading models from [Open Model Zoo](https://storage.openvinotoolkit.org/repositories/open_model_zoo/) @@ -24,12 +24,12 @@ To quickly start using OpenVINO™ Model Server follow these steps: ### Step 1: Prepare Docker -[Install Docker Engine](https://docs.docker.com/engine/install/), including its [post-installation steps](https://docs.docker.com/engine/install/linux-postinstall/), on your development system. +[Install Docker Engine](https://docs.docker.com/engine/install/), including its [post-installation steps](https://docs.docker.com/engine/install/linux-postinstall/), on your development system. To verify installation, test it using the following command. If it displays a test image and a message, it is ready. ``` bash $ docker run hello-world -``` +``` ### Step 2: Download the Model Server @@ -49,7 +49,7 @@ wget https://storage.googleapis.com/tfhub-modules/tensorflow/faster_rcnn/resnet5 tar xzf 1.tar.gz -C model/1 ``` -OpenVINO Model Server expects a particular folder structure for models - in this case `model` directory has the following content: +OpenVINO Model Server expects a particular folder structure for models - in this case `model` directory has the following content: ```bash model └── 1 @@ -59,11 +59,11 @@ model └── variables.index ``` -Sub-folder `1` indicates the version of the model. If you want to upgrade the model, other versions can be added in separate subfolders (2,3...). +Sub-folder `1` indicates the version of the model. If you want to upgrade the model, other versions can be added in separate subfolders (2,3...). For more information about the directory structure and how to deploy multiple models at a time, check out the following sections: - [Preparing models](models_repository.md) - [Serving models](starting_server.md) -- [Serving multiple model versions](model_version_policy.md) +- [Serving multiple model versions](model_version_policy.md) ### Step 4: Start the Model Server Container @@ -107,12 +107,13 @@ python3 object_detection.py --image coco_bike.jpg --output output.jpg --service_ ### Step 8: Review the Results -In the current folder, you can find files containing inference results. +In the current folder, you can find files containing inference results. In our case, it will be a modified input image with bounding boxes indicating detected objects and their labels. ![Inference results](quickstart_result.jpeg) -> **Note**: Similar steps can be performed with other model formats. Check the [ONNX use case example](../demos/using_onnx_model/python/README.md), -[TensorFlow classification model demo](../demos/image_classification_using_tf_model/python/README.md ) or [PaddlePaddle model demo](../demos/segmentation_using_paddlepaddle_model/python/README.md). +> **Note**: Similar steps can be performed with other model formats. Check the [ONNX use case example](../demos/using_onnx_model/python/README.md), +[TensorFlow classification model demo](../demos/image_classification_using_tf_model/python/README.md) +or [PaddlePaddle model demo](../demos/segmentation_using_paddlepaddle_model/python/README.md). Congratulations, you have completed the QuickStart guide. Try other Model Server [demos](../demos/README.md) or explore more [features](features.md) to create your application. diff --git a/docs/parameters.md b/docs/parameters.md index 57af687a1f..585bcfaebe 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -9,7 +9,7 @@ | `"model_path"/"base_path"` | `string` | If using a Google Cloud Storage, Azure Storage or S3 path, see [cloud storage guide](./using_cloud_storage.md). The path may look as follows:
`"/opt/ml/models/model"`
`"gs://bucket/models/model"`
`"s3://bucket/models/model"`
`"azure://bucket/models/model"`
The path can be also relative to the config.json location
(use `model_path` in command line, `base_path` in json config) | | `"shape"` | `tuple/json/"auto"` | `shape` is optional and takes precedence over `batch_size`. The `shape` argument changes the model that is enabled in the model server to fit the parameters. `shape` accepts three forms of the values: * `auto` - The model server reloads the model with the shape that matches the input data matrix. * a tuple, such as `(1,3,224,224)` - The tuple defines the shape to use for all incoming requests for models with a single input. * A dictionary of shapes, such as `{"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"}` - This option defines the shape of every included input in the model.Some models don't support the reshape operation.If the model can't be reshaped, it remains in the original parameters and all requests with incompatible input format result in an error. See the logs for more information about specific errors.Learn more about supported model graph layers including all limitations at [Shape Inference Document](https://docs.openvino.ai/2024/openvino-workflow/running-inference/changing-input-shape.html). | | `"batch_size"` | `integer/"auto"` | Optional. By default, the batch size is derived from the model, defined through the OpenVINO Model Optimizer. `batch_size` is useful for sequential inference requests of the same batch size.Some models, such as object detection, don't work correctly with the `batch_size` parameter. With these models, the output's first dimension doesn't represent the batch size. You can set the batch size for these models by using network reshaping and setting the `shape` parameter appropriately.The default option of using the Model Optimizer to determine the batch size uses the size of the first dimension in the first input for the size. For example, if the input shape is `(1, 3, 225, 225)`, the batch size is set to `1`. If you set `batch_size` to a numerical value, the model batch size is changed when the service starts.`batch_size` also accepts a value of `auto`. If you use `auto`, then the served model batch size is set according to the incoming data at run time. The model is reloaded each time the input data changes the batch size. You might see a delayed response upon the first request. | -| `"layout" `| `json/string` | `layout` is optional argument which allows to define or change the layout of model input and output tensors. To change the layout (add the transposition step), specify `:`. Example: `NHWC:NCHW` means that user will send input data in `NHWC` layout while the model is in `NCHW` layout.

When specified without colon separator, it doesn't add a transposition but can determine the batch dimension. E.g. `--layout CN` makes prediction service treat second dimension as batch size.

When the model has multiple inputs or the output layout has to be changed, use a json format. Set the mapping, such as: `{"input1":"NHWC:NCHW","input2":"HWN:NHW","output1":"CN:NC"}`.

If not specified, layout is inherited from model.

[Read more](shape_batch_size_and_layout.md#changing-model-inputoutput-layout) | +| `"layout" `| `json/string` | `layout` is optional argument which allows to define or change the layout of model input and output tensors. To change the layout (add the transposition step), specify `:`. Example: `NHWC:NCHW` means that user will send input data in `NHWC` layout while the model is in `NCHW` layout.

When specified without colon separator, it doesn't add a transposition but can determine the batch dimension. E.g. `--layout CN` makes prediction service treat second dimension as batch size.

When the model has multiple inputs or the output layout has to be changed, use a json format. Set the mapping, such as: `{"input1":"NHWC:NCHW","input2":"HWN:NHW","output1":"CN:NC"}`.

If not specified, layout is inherited from model.

[Read more](shape_batch_size_and_layout.md#changing-model-input-output-layout) | | `"model_version_policy"` | `json/string` | Optional. The model version policy lets you decide which versions of a model that the OpenVINO Model Server is to serve. By default, the server serves the latest version. One reason to use this argument is to control the server memory consumption.The accepted format is in json or string. Examples:
`{"latest": { "num_versions":2 }`
`{"specific": { "versions":[1, 3] } }`
`{"all": {} }` | | `"plugin_config"` | `json/string` | List of device plugin parameters. For full list refer to [OpenVINO documentation](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) and [performance tuning guide](./performance_tuning.md). Example:
`{"PERFORMANCE_HINT": "LATENCY"}` | | `"nireq"` | `integer` | The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources.| @@ -18,7 +18,7 @@ | `"idle_sequence_cleanup"` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans. See [idle sequence cleanup](stateful_models.md). | | `"max_sequence_number"` | `uint32` | Determines how many sequences can be handled concurrently by a model instance. | | `"low_latency_transformation"` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models/obtaining-stateful-openvino-model.html#lowlatency2-transformation) on model load. | -| `"metrics_enable"` | `bool` | Flag enabling [metrics](https://docs.openvino.ai/2024/ovms_docs_metrics.html) endpoint on rest_port. | +| `"metrics_enable"` | `bool` | Flag enabling [metrics](https://docs.openvino.ai/2024/ovms_docs_metrics.html) endpoint on rest_port. | | `"metrics_list"` | `string` | Comma separated list of [metrics](https://docs.openvino.ai/2024/ovms_docs_metrics.html). If unset, only default metrics will be enabled.| @@ -31,7 +31,7 @@ ## Server configuration options -Configuration options for the server are defined only via command-line options and determine configuration common for all served models. +Configuration options for the server are defined only via command-line options and determine configuration common for all served models. | Option | Value format | Description | |---|---|---| diff --git a/docs/stateful_models.md b/docs/stateful_models.md index 820321e4dc..a91d14a84d 100644 --- a/docs/stateful_models.md +++ b/docs/stateful_models.md @@ -12,7 +12,7 @@ A stateful model recognizes dependencies between consecutive inference requests. --- -**Note** that in the context of the Model Server, a model is considered stateful if it maintains state between **inference requests**. +**Note** that in the context of the Model Server, a model is considered stateful if it maintains state between **inference requests**. Some models might take the whole sequence of data as an input and iterate over the elements of that sequence internally, keeping the state between iterations. Such models are considered stateless since they perform inference on the whole sequence **in just one inference request**. @@ -60,8 +60,8 @@ docker run -d -u $(id -u):$(id -g) -v $(pwd)/rm_lstm4f:/models/stateful_model -v --port 9000 --config_path /models/config.json ``` - Optionally, you can also set additional parameters specific for stateful models. - + Optionally, you can also set additional parameters specific for stateful models. + ### Configuration Options for Stateful Models **Model configuration**: @@ -69,7 +69,7 @@ docker run -d -u $(id -u):$(id -g) -v $(pwd)/rm_lstm4f:/models/stateful_model -v | Option | Value format | Description | Default value | |---|---|---|---| | `stateful` | `bool` | If set to true, model is loaded as stateful. | false | -| `idle_sequence_cleanup` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans.
See [idle sequence cleanup](#stateful_cleanup). | true | +| `idle_sequence_cleanup` | `bool` | If set to true, model will be subject to periodic sequence cleaner scans.
See [idle sequence cleanup](#idle-sequence-cleanup). | true | | `max_sequence_number` | `uint32` | Determines how many sequences can be handled concurrently by a model instance. | 500 | | `low_latency_transformation` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/stateful-models.html) on model load. | false | @@ -79,7 +79,7 @@ docker run -d -u $(id -u):$(id -g) -v $(pwd)/rm_lstm4f:/models/stateful_model -v | Option | Value format | Description | Default value | |---|---|---|---| -| `sequence_cleaner_poll_wait_minutes` | `uint32` | Time interval (in minutes) between next sequence cleaner scans. Sequences of the models that are subjects to idle sequence cleanup that have been inactive since the last scan are removed. Zero value disables sequence cleaner.
See [idle sequence cleanup](#stateful_cleanup). | 5 | +| `sequence_cleaner_poll_wait_minutes` | `uint32` | Time interval (in minutes) between next sequence cleaner scans. Sequences of the models that are subjects to idle sequence cleanup that have been inactive since the last scan are removed. Zero value disables sequence cleaner.
See [idle sequence cleanup](#idle-sequence-cleanup). | 5 | See also [all server and model configuration options](parameters.md) to have a complete setup. @@ -91,14 +91,14 @@ Stateful model works on consecutive inference requests that are associated with Requests to stateful models must contain additional inputs besides the data for prediction: - `sequence_id` - which is a 64-bit unsigned integer identifying the sequence (unique in the scope of the model instance). Value 0 is equivalent to not providing this input at all. -- `sequence_control_input` - which is 32-bit unsigned integer indicating sequence start and end. Accepted values are: +- `sequence_control_input` - which is 32-bit unsigned integer indicating sequence start and end. Accepted values are: - 0 - no control input (has no effect - equivalent to not providing this input at all) - 1 - indicates the beginning of the sequence - 2 - indicates the end of the sequence **Note**: Model server also appends `sequence_id` to every response - the name and format of `sequence_id` output is the same as in `sequence_id` input. -**Both `sequence_id` and `sequence_control_input` shall be provided as tensors with 1 element array (shape:[1]) and appropriate precision.** +**Both `sequence_id` and `sequence_control_input` shall be provided as tensors with 1 element array (shape:[1]) and appropriate precision.** _See examples for gRPC and HTTP below_. In order to successfully infer the sequence, perform these actions: @@ -106,7 +106,7 @@ In order to successfully infer the sequence, perform these actions: To start the sequence you need to add `sequence_control_input` with the value of 1 to your request's inputs. You can also: - add `sequence_id` with the value of your choice or - - add `sequence_id` with 0 or do not add `sequence_id` at all - in this case, the Model Server will provide a unique id for the sequence and since it will be appended to the outputs, you will be able to read it and use with the next requests. + - add `sequence_id` with 0 or do not add `sequence_id` at all - in this case, the Model Server will provide a unique id for the sequence and since it will be appended to the outputs, you will be able to read it and use with the next requests. If the provided `sequence_id` is already occupied, OVMS will return an [error](#error-codes) to avoid conflicts. @@ -157,7 +157,7 @@ stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) request = predict_pb2.PredictRequest() request.model_spec.name = "stateful_model" -""" +""" Add inputs with data to infer """ @@ -206,7 +206,7 @@ sequence_id = response.outputs['sequence_id'].uint64_val[0] Inference on stateful models via HTTP is very similar to inference on stateless models (_see [REST API](model_server_rest_api_tfs.md) for reference_). The difference is that requests to stateful models must contain additional inputs with information necessary for proper sequence handling. -`sequence_id` and `sequence_control_input` must be added to HTTP request by adding new `key:value` pair in `inputs` field of JSON body. +`sequence_id` and `sequence_control_input` must be added to HTTP request by adding new `key:value` pair in `inputs` field of JSON body. For both inputs, the value must be a single number in a 1-dimensional array. @@ -225,7 +225,7 @@ sequence_id = 10 inputs = {} -""" +""" Add inputs with data to infer """ @@ -270,7 +270,7 @@ response_body = json.loads(response.text) sequence_id = response_body["outputs"]["sequence_id"] ``` -### Error Codes +### Error Codes When a request is invalid or could not be processed, you can expect following errors specific to inference on stateful models: @@ -279,25 +279,25 @@ When a request is invalid or could not be processed, you can expect following er | Sequence with a provided ID does not exist. | NOT_FOUND | 404 NOT FOUND | | Sequence with a provided ID already exists. | ALREADY_EXISTS | 409 CONFLICT | | Server received SEQUENCE START request with ID of the sequence that is set for termination, but the last request of that sequence is still being processed. | FAILED_PRECONDITION | 412 PRECONDITION FAILED | -| Max sequence number has been reached. Could not create a new sequence. | UNAVAILABLE | 503 SERVICE UNAVAILABLE | +| Max sequence number has been reached. Could not create a new sequence. | UNAVAILABLE | 503 SERVICE UNAVAILABLE | | Sequence ID has not been provided in request inputs. | INVALID_ARGUMENT | 400 BAD REQUEST | | Unexpected value of sequence control input. | INVALID_ARGUMENT | 400 BAD REQUEST | | Could not find sequence id in expected tensor proto field uint64_val. | INVALID_ARGUMENT | N/A | | Could not find sequence control input in expected tensor proto field uint32_val. | INVALID_ARGUMENT | N/A | | Special input proto does not contain tensor shape information. | INVALID_ARGUMENT | N/A | -## Idle Sequence Cleanup +## Idle Sequence Cleanup Once started sequence might get dropped for some reason like lost connection etc. In this case model server will not receive SEQUENCE_END signal and will not free sequence resources. To prevent keeping idle sequences indefinitely, the Model Server launches a sequence cleaner thread that periodically scans stateful models and checks if their sequences received any valid inference request recently. If not, such sequences are removed, their resources are freed and their ids can be reused. -Two parameters regulate sequence cleanup. -One is `sequence_cleaner_poll_wait_minutes` which holds the value of the time interval between the next scans. If there has been not a single valid request with a particular sequence id between two consecutive checks, the sequence is considered idle and gets deleted. +Two parameters regulate sequence cleanup. +One is `sequence_cleaner_poll_wait_minutes` which holds the value of the time interval between the next scans. If there has been not a single valid request with a particular sequence id between two consecutive checks, the sequence is considered idle and gets deleted. `sequence_cleaner_poll_wait_minutes` is a server parameter and is common for all models. By default, the time between two consecutive cleaner scans is set to 5 minutes. Setting this value to 0 disables sequence cleaner. Stateful models can either be subject to idle sequence cleanup or not. -You can set this **per model** with `idle_sequence_cleanup` parameter. +You can set this **per model** with `idle_sequence_cleanup` parameter. If set to `true` sequence cleaner will check that model. Otherwise, sequence cleaner will skip that model, and its inactive sequences will not get removed. By default, this value is set to `true`. ## Known Limitations