Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/features/load-unload.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ In case you got error while loading models. Please check for the correct model p
| `ctx_len` | Integer | The context length for the model operations. |
| `embedding` | Boolean | Whether to use embedding in the model. |
| `n_parallel` | Integer | The number of parallel operations.|
|`cpu_threads`|Integer|The number of threads for CPU inference.|
| `cont_batching` | Boolean | Whether to use continuous batching. |
|`cpu_threads`|Integer|The number of threads for CPU inference.|
| `user_prompt` | String | The prompt to use for the user. |
| `ai_prompt` | String | The prompt to use for the AI assistant. |
| `system_prompt` | String | The prompt for system rules. |
Expand Down
8 changes: 0 additions & 8 deletions docs/openapi/NitroAPI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -441,10 +441,6 @@ components:
default: true
nullable: true
description: Determines if output generation is in a streaming manner.
cache_prompt:
type: boolean
default: true
description: Optimize performance in repeated or similar requests.
temp:
type: number
default: 0.7
Expand Down Expand Up @@ -585,10 +581,6 @@ components:
min: 0
max: 1
description: Set probability threshold for more relevant outputs
cache_prompt:
type: boolean
default: true
description: Optimize performance in repeated or similar requests.
ChatCompletionResponse:
type: object
description: Description of the response structure
Expand Down