Skip to content

OpenVINO Model Server 2026.2.1

Latest

Choose a tag to compare

@dtrawins dtrawins released this 19 Jun 14:31
· 3 commits to releases/2026/2 since this release
1122f03

This is a minor release focused on bug fixes and improvements related to memory management.

The following changes are included:

  • updated OpenVINO Runtime to version 2026.2.1
  • updated GPU driver in the docker image with ubuntu24 base image
  • updated NPU driver in the docker image with ubuntu24 base image
  • improvements in input validation for KServer and TFS API
  • added new configuration parameters for LLM and VLM models --cache_interval_multiplier

New parameter cache_interval_multiplier is relevant only for model models with linear attention like Qwen3.6-35B-A3B. It defines how prefix caching algorithm is managing KV cache allocations. The default value 8 is optimized for short context. While using long prompts like over 20k tokens, it is recommended to increase the value to reduce memory consumption. Here is and example for deploying Qwen3.6-35B-A3B model

ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3.6-35B-A3B-int4-ov ^
--task text_generation --target_device GPU --tool_parser qwen3coder --reasoning_parser qwen3 ^
--rest_port 8000 --cache_interval_multiplier 64

You can use OpenVINO Model Server public Docker images based on Ubuntu via the following command:

  • docker pull openvino/model_server:2026.2.1 - CPU device support with image based on Ubuntu 24.04
  • docker pull openvino/model_server:2026.2.1-gpu - GPU, NPU and CPU device support with image based on Ubuntu 24.04

or use provided binary packages. Only packages with suffix _python_on have support for python.

There is also additional distribution channel via https://storage.openvinotoolkit.org/repositories/openvino_model_server/packages/2026.2.1/

Check the instructions how to install the binary package. The prebuilt image is available also on RedHat Ecosystem Catalog