Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agent framework/throttling/hidden model/OS assistant and update conversational search documentation #6354

Merged
merged 41 commits into from
Feb 20, 2024

Conversation

kolchfa-aws
Copy link
Collaborator

@kolchfa-aws kolchfa-aws commented Feb 5, 2024

Closes #5379
Closes #5839
Closes #6268
Closes #6292

Agent Framework and OS Assistant Toolkit are experimental.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws self-assigned this Feb 5, 2024
@hdhalter hdhalter added release-notes PR: Include this PR in the automated release notes v2.12.0 labels Feb 6, 2024
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@hdhalter hdhalter added 3 - Tech Review PR: Tech review in progress experimental and removed experimental labels Feb 7, 2024
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
**Introduced 2.12**
{: .label .label-purple }

Retrieves message information for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Retrieves message information for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).
Use this API to retrieve message information for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).

**Introduced 2.12**
{: .label .label-purple }

Retrieves a conversational memory for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). Use this command to search for memories.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Retrieves a conversational memory for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). Use this command to search for memories.
This API retrieves a conversational memory for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). Use this command to search for memories.

- [Get model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/)
- [Deploy model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/deploy-model/)
- [Undeploy model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/undeploy-model/)
- [Delete model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/delete-model/)
- [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/): Invokes a model
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/): Invokes a model
- [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/) (invokes a model)


| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `deploy` | Boolean | Whether to deploy the model after registering by calling the [Deploy Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/deploy-model/). Default is `false`. |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `deploy` | Boolean | Whether to deploy the model after registering by calling the [Deploy Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/deploy-model/). Default is `false`. |
| `deploy` | Boolean | Whether to deploy the model after registering it. The deploy operation is performed by calling the [Deploy Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/deploy-model/). Default is `false`. |


Use this command to search for models you've already created.

The response will contain only those model versions to which you have access. For example, if you send a match all query, model versions for the following model group types will be returned:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The response will contain only those model versions to which you have access. For example, if you send a match all query, model versions for the following model group types will be returned:
The response will contain only those model versions to which you have access. For example, if you send a `match_all` query, model versions for the following model group types will be returned:


#### Example request: Rate limiting inference calls for a model

The following request limits the number of times you can call the Predict API on the model to four Predict API calls per minute:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following request limits the number of times you can call the Predict API on the model to four Predict API calls per minute:
The following request limits the number of times you can call the Predict API on the model to 4 Predict API calls per minute:


The OpenSearch Assistant Toolkit helps you create AI-powered assistants for OpenSearch Dashboards. The toolkit includes the following parts:

- [**Agents and tools**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/index/): _Agents_ interface with a large language model (LLM) and execute high-level tasks, such as summarization or generating PPL from natural language. The agent's high-level tasks consist of low-level tasks called _tools_, which can be reused by multiple agents.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [**Agents and tools**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/index/): _Agents_ interface with a large language model (LLM) and execute high-level tasks, such as summarization or generating PPL from natural language. The agent's high-level tasks consist of low-level tasks called _tools_, which can be reused by multiple agents.
- [**Agents and tools**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/index/): _Agents_ interface with a large language model (LLM) and execute high-level tasks, such as summarization or generating Piped Processing Language (PPL) from natural language. The agent's high-level tasks consist of low-level tasks called _tools_, which can be reused by multiple agents.


- [**Agents and tools**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/index/): _Agents_ interface with a large language model (LLM) and execute high-level tasks, such as summarization or generating PPL from natural language. The agent's high-level tasks consist of low-level tasks called _tools_, which can be reused by multiple agents.
- [**Workflow automation**]({{site.url}}{{site.baseurl}}/automating-workflows/index/): Uses templates to set up infrastructure for artificial intelligence and machine learning (AI/ML) applications. For example, you can automate configuring agents to be used for chat or generating PPL queries from natural language.
- **OpenSearch Assistant**: The UI for the AI-powered assistant. The assistant's workflow is configured with various agents and tools.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **OpenSearch Assistant**: The UI for the AI-powered assistant. The assistant's workflow is configured with various agents and tools.
- **OpenSearch Assistant**: This is the OpenSearch Dashboards UI for the AI-powered assistant. The assistant's workflow is configured with various agents and tools.


To enable OpenSearch Assistant, perform the following steps:

- Enable the agent framework and retrieval-augmented generation by configuring the following settings:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Enable the agent framework and retrieval-augmented generation by configuring the following settings:
- Enable the agent framework and retrieval-augmented generation (RAG) by configuring the following settings:

:--- | :--- | :---
`llm_question` | Yes | The question the LLM must answer.
`llm_model` | No | Overrides the original model set in the connection in cases where you want to use a different model (for example, GPT 4 instead of GPT 3.5). This option is required if a default model is not set during pipeline creation.
`memory_id` | No | If you provide a `memory_id`, the pipeline retrieves the 10 most recent messages to the LLM prompt. If you don't specify a `memory_id`, the prior context is not added to the LLM prompt.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`memory_id` | No | If you provide a `memory_id`, the pipeline retrieves the 10 most recent messages to the LLM prompt. If you don't specify a `memory_id`, the prior context is not added to the LLM prompt.
`memory_id` | No | If you provide a `memory_id`, the pipeline retrieves the 10 most recent messages in the specified memory and adds them to the LLM prompt. If you don't specify a `memory_id`, the prior context is not added to the LLM prompt.

`llm_model` | No | Overrides the original model set in the connection in cases where you want to use a different model (for example, GPT 4 instead of GPT 3.5). This option is required if a default model is not set during pipeline creation.
`memory_id` | No | If you provide a `memory_id`, the pipeline retrieves the 10 most recent messages to the LLM prompt. If you don't specify a `memory_id`, the prior context is not added to the LLM prompt.
`context_size` | No | The number of search results sent to the LLM. This is typically needed in order to meet the token size limit, which can vary by model. Alternatively, you can use the `size` parameter in the Search API to control the number of search results sent to the LLM.
`message_size` | No | The number of messages sent to the LLM. Similarly to the number of search results, this affects the total number of tokens seen by the LLM. When not set, the pipeline uses the default message size of `10`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`message_size` | No | The number of messages sent to the LLM. Similarly to the number of search results, this affects the total number of tokens seen by the LLM. When not set, the pipeline uses the default message size of `10`.
`message_size` | No | The number of messages sent to the LLM. Similarly to the number of search results, this affects the total number of tokens received by the LLM. When not set, the pipeline uses the default message size of `10`.


Use the following options when setting up a RAG pipeline under the `retrieval_augmented_generation` argument.
To verify that both messages were added to the memory, provide the `memory_ID` to the Get Message API:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To verify that both messages were added to the memory, provide the `memory_ID` to the Get Message API:
To verify that both messages were added to the memory, provide the `memory_ID` to the Get Messages API:

kolchfa-aws and others added 3 commits February 15, 2024 15:18
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Done with the additional content. Thanks!

## Parameters
## Register parameters

The following table lists all available parameters when registering the tool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The following table lists all parameters that are available when registering the tool."


The following table lists all available parameters.
The following table lists all available parameters when running the tool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The following table lists all parameters that are available when running the tool." Please apply this structure to all instances.

`enable_Content_Generation` | Boolean | Optional | If `true`, returns results generated by an LLM. If `false`, returns results directly, without LLM-assisted content generation. Default is `true`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comma after "directly".

`startIndex`| Integer | The paginated index of the alert to start from. Default is 0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "0" be in code font?

:--- | :--- | :--- | :--- | :---
`name`| String | Required | All | The agent name. |
`type` | String | Required | All | The agent type. Valid values are `flow` and `conversational`. For more information, see [Agents]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/index/). |
`description` | String | Optional| All | The agent description. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A description of the agent."

`app_type` | String | Optional | All | Specifies an optional agent category. You can then perform operations on all agents in the category. For example, you can delete all messages for RAG agents.
`memory.type` | String | Optional | `conversational_flow`, `conversational` | Specifies where to store the conversational memory. Currently, the only supported type is `conversation_index` (store the memory in a conversational system index).
`llm.model_id` | String | Required | `conversational` | The model ID of the LLM to which to send questions.
`llm.parameters.response_filter` | String | Required | `conversational` | The pattern for parsing the LLM response. For each LLM, you need to provide the field where the response is located. For example, for the Anthropic Claude model, the response is in the `completion` field so the pattern is `$.completion`. For OpenAI models, the pattern is `$.choices[0].message.content`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in which the response is located." "the response is located in the completion field, so" (add "located" and a comma after "field").

`memory.type` | String | Optional | `conversational_flow`, `conversational` | Specifies where to store the conversational memory. Currently, the only supported type is `conversation_index` (store the memory in a conversational system index).
`llm.model_id` | String | Required | `conversational` | The model ID of the LLM to which to send questions.
`llm.parameters.response_filter` | String | Required | `conversational` | The pattern for parsing the LLM response. For each LLM, you need to provide the field where the response is located. For example, for the Anthropic Claude model, the response is in the `completion` field so the pattern is `$.completion`. For OpenAI models, the pattern is `$.choices[0].message.content`.
`llm.parameters.max_iteration` | Integer | Optional | `conversational` | The maximum number of messages to send to the LLM. Default is 3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "3" be in code font?

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws added 6 - Done but waiting to merge PR: The work is done and ready to merge and removed 4 - Doc Review PR: Doc review in progress labels Feb 19, 2024
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Model-level rate limiting applies to all users of the model. If you specify both a model-level rate limit and a user-level rate limit, the overall rate limit is set to the more restrictive of the two. For example, if the model-level limit is 2 requests per minute and the user-level limit is 4 requests per minute, the overall limit will be set to 2 requests per minute.

To set the rate limit, you must provide two inputs: the maximum number of requests and the time frame. OpenSearch uses these inputs to calculate the rate limit as the maximum number of requests divided by the time frame. For example, if you set the limit to be 4 requests per minute, the rate limit is `4 requests / 1 minute`, which is `1 request / 0.25 minutes`, or `1 request / 15 seconds`. OpenSearch processes predict requests sequentially, in a first-come-first-served manner, and will limit those requests to 1 request per 15 seconds. Imagine two users, Alice and Bob, calling the Predict API for the same model, which has a rate limit of 1 request per 15 seconds. If Alice calls the Predict API and immediately after that Bob calls the Predict API, OpenSearch processes Alice's predict request first and stores Bob's request in a queue. Once 15 seconds has passed since Alice's request, OpenSearch processes Bob's request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Fanit, sorry for the confusion, but rate limiter cannot store the request. Instead, it will refuse the request directly. So Bob can only make his request after 15 seconds in this scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification; I've updated the docs.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolchfa-aws kolchfa-aws merged commit 3f7468b into main Feb 20, 2024
5 checks passed
oeyh pushed a commit to oeyh/documentation-website that referenced this pull request Mar 14, 2024
…onversational search documentation (opensearch-project#6354)

* Add agent framework documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add hidden model and API updates

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Vale error

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Updated field names

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add updating credentials

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Added tools table

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add OpenSearch forum thread for OS Assistant

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add tech review for conv search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Fix links

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add tools

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add links to tools

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More info about tools

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tool parameters

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update cat-index-tool.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Parameter clarification

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Typo fix

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More tech review feedback: RAG tool

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review feedback: memory APis

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update _ml-commons-plugin/agents-tools/index.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _ml-commons-plugin/opensearch-assistant.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _ml-commons-plugin/agents-tools/tools/ppl-tool.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Separated search and get APIs and add conversational flow agent

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More parameters for PPL tool

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Added more parameters

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review feedback: PPL tool

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Rename to automating configurations

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Editorial comments on the new text

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add parameter to PPl tool

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Changed link to configurations

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Rate limiter feedback and added warning

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
@kolchfa-aws kolchfa-aws deleted the agent-framework branch March 28, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6 - Done but waiting to merge PR: The work is done and ready to merge experimental release-notes PR: Include this PR in the automated release notes v2.12.0
Projects
None yet