Merge pull request #901 from mikkelhegn/ai-api

Document inferencing option defaults
fermyon · Sep 22, 2023 · d965286 · d965286
2 parents 1495e63 + c77f016
commit d965286
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/content/spin/serverless-ai-api-guide.md b/content/spin/serverless-ai-api-guide.md
@@ -53,7 +53,7 @@ The set of operations is common across all supporting language SDKs:
 | Operation | Parameters | Returns | Behavior |
 |:-----|:----------------|:-------|:----------------|
 | `infer`  | model`string`<br /> prompt`string`| `string`  | The `infer` is performed on a specific model.<br /> <br />The name of the model is the first parameter provided (i.e. `llama2-chat`, `codellama-instruct`, or other; passed in as a `string`).<br /> <br />The second parameter is a prompt; passed in as a `string`.<br />|
-| `infer_with_options`  | model`string`<br /> prompt`string`<br /> params`list` | `string`  | The `infer_with_options` is performed on a specific model.<br /> <br />The name of the model is the first parameter provided (i.e. `llama2-chat`, `codellama-instruct`, or other; passed in as a `string`).<br /><br /> The second parameter is a prompt; passed in as a `string`.<br /><br /> The third parameter is a mix of float and unsigned integers relating to inferencing parameters in this order: <br />- `max-tokens` (unsigned 32 integer) Note: the backing implementation may return less tokens. <br /> - `repeat-penalty` (float 32) The amount the model should avoid repeating tokens. <br /> - `repeat-penalty-last-n-token-count` (unsigned 32 integer) The number of tokens the model should apply the repeat penalty to. <br /> - `temperature` (float 32) The randomness with which the next token is selected. <br /> - `top-k` (unsigned 32 integer) The number of possible next tokens the model will choose from. <br /> - `top-p` (float 32) The probability total of next tokens the model will choose from. <br /><br /> The result from `infer_with_options` is a `string` |
+| `infer_with_options`  | model`string`<br /> prompt`string`<br /> params`list` | `string`  | The `infer_with_options` is performed on a specific model.<br /> <br />The name of the model is the first parameter provided (i.e. `llama2-chat`, `codellama-instruct`, or other; passed in as a `string`).<br /><br /> The second parameter is a prompt; passed in as a `string`.<br /><br /> The third parameter is a mix of float and unsigned integers relating to inferencing parameters in this order: <br /><br />- `max-tokens` (unsigned 32 integer) Note: the backing implementation may return less tokens. <br /> Default is 100<br /><br /> - `repeat-penalty` (float 32) The amount the model should avoid repeating tokens. <br /> Default is 1.1<br /><br /> - `repeat-penalty-last-n-token-count` (unsigned 32 integer) The number of tokens the model should apply the repeat penalty to. <br /> Default is 64<br /><br /> - `temperature` (float 32) The randomness with which the next token is selected. <br /> Default is 0.8<br /><br /> - `top-k` (unsigned 32 integer) The number of possible next tokens the model will choose from. <br /> Default is 40<br /><br /> - `top-p` (float 32) The probability total of next tokens the model will choose from. <br /> Default is 0.9<br /><br /> The result from `infer_with_options` is a `string` |
 | `generate-embeddings`  | model`string`<br /> prompt`list<string>`| `string`  | The `generate-embeddings` is performed on a specific model.<br /> <br />The name of the model is the first parameter provided (i.e. `all-minilm-l6-v2`, passed in as a `string`).<br /> <br />The second parameter is a prompt; passed in as a `list` of `string`s.<br /><br /> The result from `generate-embeddings` is a two-dimension array containing float32 type values only |
 
 The exact detail of calling these operations from your application depends on your language: