diff --git a/docs/changelog/index.md b/docs/changelog/index.md index 32520555a..c35d18b81 100644 --- a/docs/changelog/index.md +++ b/docs/changelog/index.md @@ -4,3 +4,15 @@ CLIP-as-service follows semantic versioning. However, before the project reach 1 This chapter only tracks the most important breaking changes and explain the rationale behind them. +## 0.2.0: improve the service scalability with replicas + +This change is mainly intended to improve the inference performance with replicas. + +Here is the short benchmark summary of the improvement (`replicas=4`): + +| batch_size | before | after | +|-------------|--------|---------| +| 1 | 23.74 | 18.89 | +| 8 | 58.88 | 30.38 | +| 16 | 14.96 | 91.86 | +| 32 | 14.78 | 101.75 | diff --git a/docs/user-guides/server.md b/docs/user-guides/server.md index 2b8c9e8a6..49220f346 100644 --- a/docs/user-guides/server.md +++ b/docs/user-guides/server.md @@ -184,9 +184,9 @@ There are also runtime-specific parameters listed below: ````{tab} ONNX -| Parameter | Description | -|-----------|---------------------------------------------------------------------------------------------------| -| `providers` | [ONNX runtime provides](https://onnxruntime.ai/docs/execution-providers/), default is auto-detect | +| Parameter | Description | +|-----------|--------------------------------------------------------------------------------------------------------------------------------| +| `device` | `cuda` or `cpu`. Default is `None` means auto-detect. ````