Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348

Closed
wants to merge 5 commits into from

Conversation

spong
Copy link
Member

@spong spong commented Apr 22, 2024

Summary

This PR continues to address the evaluation usability enhancements outlined in https://github.com/elastic/security-team/issues/8167

As with all evaluation features, you must set the below feature flag to enable:

xpack.securitySolution.enableExperimental: [ 'assistantModelEvaluation']

This PR adds two simple REST API's for dataset management. A GET datasets route which returns an array of DataSetId's that are available for use, and a POST datasets that takes a DataSetId and a DatasetItem to persist to the dataset.

Note

Currently datasets are managed via LangSmith (and so your LangSmith API key must be configured), but this is expected to be expanded to manage datasets locally within the cluster.

Additionally, some UI elements have been updated to leverage the above endpoints:

  • Dataset selection within the Evaluation UI now populates all available datasets
  • A new 'Add to Dataset' message action was added enabling the ability to add a message to a dataset directly within the assistant

Checklist

Delete any items that are not applicable to this PR.

@spong spong added release_note:skip Skip the PR/issue when compiling release notes Team:Security Generative AI Security Generative AI v8.15.0 labels Apr 22, 2024
@spong spong self-assigned this Apr 22, 2024
@spong
Copy link
Member Author

spong commented Apr 22, 2024

/ci

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 5451 5458 +7

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/elastic-assistant 142 143 +1
@kbn/elastic-assistant-common 217 231 +14
total +15

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 14.6MB 14.7MB +15.3KB
Unknown metric groups

API count

id before after diff
@kbn/elastic-assistant 168 169 +1
@kbn/elastic-assistant-common 232 246 +14
total +15

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @spong

spong added a commit that referenced this pull request Aug 20, 2024
## Summary

This PR updates the existing evaluation framework to support LangGraph.
Since the evaluation code was the last reference to the old agent
executors, we were able to finally remove those as well.

The evaluation feature remains behind a feature flag, and can be enabled
with the following configuration:

```
xpack.securitySolution.enableExperimental:
  - 'assistantModelEvaluation'
```

Once enabled, the `Evaluation` tab will become visible in settings:

<p align="center">
<img width="800"
src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab"
/>
</p> 


Notes:
* We no longer write evaluation results to a local ES index. We can
still do this, but most the value comes from viewing the results in
LangSmith, so I didn't re-plumb this functionality after switching over
to the new LangSmith `evaluator` function.
* Need to add back support for custom datasets if we find this useful.
Currently only LangSmith datasets are supported. Ended up porting over
the `GET datasets` API from
#181348 to make this more useful.
the `GET evaluate` route now returns `datasets`, an array of dataset
names from LangSmith.
* Some additional fields still need to be ported over to the POST
evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to
API, just need presence in UI.
* `Project name` was removed from the eval UI as we no longer need to
tag runs to a specific project with the new LangSmith `evaluator` since
they automatically show up under the `Experiments` section.
* The 'Evaluation (Optional)' section currently isn't used, so it's been
removed. We can re-enable this when there is need to run local evals on
predictions outside of LangSmith.


To test, set a `Run name`, input a Dataset from LangSmith e.g.
`LangGraph Eval Testing`, select a few connectors and the
`DefaultAssistantGraph`, then click `Perform evaluation...`. Results
will show up in LangSmith under `Datasets & Testing`.

Note: It's easy to run into rate limiting errors with Gemini, so just
keep aware of that when running larger datasets. The new LangSmith
`evaluator` function has an option for `maxConcurrency` to control the
maximum number of concurrent evaluations to run, so we can tweak that as
needed..

Once complete, you can compare all results side-by-side in LangSmith
:tada:



<img width="2312" alt="image"
src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49">

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
spong added a commit to spong/kibana that referenced this pull request Aug 26, 2024
…190574)

## Summary

This PR updates the existing evaluation framework to support LangGraph.
Since the evaluation code was the last reference to the old agent
executors, we were able to finally remove those as well.

The evaluation feature remains behind a feature flag, and can be enabled
with the following configuration:

```
xpack.securitySolution.enableExperimental:
  - 'assistantModelEvaluation'
```

Once enabled, the `Evaluation` tab will become visible in settings:

<p align="center">
<img width="800"
src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab"
/>
</p>

Notes:
* We no longer write evaluation results to a local ES index. We can
still do this, but most the value comes from viewing the results in
LangSmith, so I didn't re-plumb this functionality after switching over
to the new LangSmith `evaluator` function.
* Need to add back support for custom datasets if we find this useful.
Currently only LangSmith datasets are supported. Ended up porting over
the `GET datasets` API from
elastic#181348 to make this more useful.
the `GET evaluate` route now returns `datasets`, an array of dataset
names from LangSmith.
* Some additional fields still need to be ported over to the POST
evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to
API, just need presence in UI.
* `Project name` was removed from the eval UI as we no longer need to
tag runs to a specific project with the new LangSmith `evaluator` since
they automatically show up under the `Experiments` section.
* The 'Evaluation (Optional)' section currently isn't used, so it's been
removed. We can re-enable this when there is need to run local evals on
predictions outside of LangSmith.

To test, set a `Run name`, input a Dataset from LangSmith e.g.
`LangGraph Eval Testing`, select a few connectors and the
`DefaultAssistantGraph`, then click `Perform evaluation...`. Results
will show up in LangSmith under `Datasets & Testing`.

Note: It's easy to run into rate limiting errors with Gemini, so just
keep aware of that when running larger datasets. The new LangSmith
`evaluator` function has an option for `maxConcurrency` to control the
maximum number of concurrent evaluations to run, so we can tweak that as
needed..

Once complete, you can compare all results side-by-side in LangSmith
:tada:

<img width="2312" alt="image"
src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49">

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit c276638)

# Conflicts:
#	x-pack/packages/kbn-elastic-assistant-common/constants.ts
#	x-pack/plugins/translations/translations/fr-FR.json
#	x-pack/plugins/translations/translations/ja-JP.json
#	x-pack/plugins/translations/translations/zh-CN.json
@spong
Copy link
Member Author

spong commented Aug 26, 2024

Closing as the important pieces of dataset management from this PR were adopted over in #190574.

@spong spong closed this Aug 26, 2024
spong added a commit that referenced this pull request Aug 26, 2024
…190574) (#191287)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Security Assistant] Adds support for LangGraph evaluations
(#190574)](#190574)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Garrett
Spong","email":"spong@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-08-20T22:03:35Z","message":"[Security
Assistant] Adds support for LangGraph evaluations (#190574)\n\n##
Summary\r\n\r\nThis PR updates the existing evaluation framework to
support LangGraph.\r\nSince the evaluation code was the last reference
to the old agent\r\nexecutors, we were able to finally remove those as
well.\r\n\r\nThe evaluation feature remains behind a feature flag, and
can be enabled\r\nwith the following
configuration:\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n
- 'assistantModelEvaluation'\r\n```\r\n\r\nOnce enabled, the
`Evaluation` tab will become visible in settings:\r\n\r\n<p
align=\"center\">\r\n<img
width=\"800\"\r\nsrc=\"https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab\"\r\n/>\r\n</p>
\r\n\r\n\r\nNotes:\r\n* We no longer write evaluation results to a local
ES index. We can\r\nstill do this, but most the value comes from viewing
the results in\r\nLangSmith, so I didn't re-plumb this functionality
after switching over\r\nto the new LangSmith `evaluator` function.\r\n*
Need to add back support for custom datasets if we find this
useful.\r\nCurrently only LangSmith datasets are supported. Ended up
porting over\r\nthe `GET datasets` API
from\r\nhttps://github.com//pull/181348 to make this more
useful.\r\nthe `GET evaluate` route now returns `datasets`, an array of
dataset\r\nnames from LangSmith.\r\n* Some additional fields still need
to be ported over to the POST\r\nevaluation API, like `size` and
`alertsIndexPattern`. Update: Ported to\r\nAPI, just need presence in
UI.\r\n* `Project name` was removed from the eval UI as we no longer
need to\r\ntag runs to a specific project with the new LangSmith
`evaluator` since\r\nthey automatically show up under the `Experiments`
section.\r\n* The 'Evaluation (Optional)' section currently isn't used,
so it's been\r\nremoved. We can re-enable this when there is need to run
local evals on\r\npredictions outside of LangSmith.\r\n\r\n\r\nTo test,
set a `Run name`, input a Dataset from LangSmith e.g.\r\n`LangGraph Eval
Testing`, select a few connectors and the\r\n`DefaultAssistantGraph`,
then click `Perform evaluation...`. Results\r\nwill show up in LangSmith
under `Datasets & Testing`.\r\n\r\nNote: It's easy to run into rate
limiting errors with Gemini, so just\r\nkeep aware of that when running
larger datasets. The new LangSmith\r\n`evaluator` function has an option
for `maxConcurrency` to control the\r\nmaximum number of concurrent
evaluations to run, so we can tweak that as\r\nneeded..\r\n\r\nOnce
complete, you can compare all results side-by-side in
LangSmith\r\n:tada:\r\n\r\n\r\n\r\n<img width=\"2312\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49\">\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"c276638c543ac42c1b08fcf79f70f8bd6fab45e4","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:skip","Team:
SecuritySolution","Feature:Security Assistant","Team:Security Generative
AI","Feature:Assistant
Evaluation","v8.16.0"],"number":190574,"url":"#190574
Assistant] Adds support for LangGraph evaluations (#190574)\n\n##
Summary\r\n\r\nThis PR updates the existing evaluation framework to
support LangGraph.\r\nSince the evaluation code was the last reference
to the old agent\r\nexecutors, we were able to finally remove those as
well.\r\n\r\nThe evaluation feature remains behind a feature flag, and
can be enabled\r\nwith the following
configuration:\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n
- 'assistantModelEvaluation'\r\n```\r\n\r\nOnce enabled, the
`Evaluation` tab will become visible in settings:\r\n\r\n<p
align=\"center\">\r\n<img
width=\"800\"\r\nsrc=\"https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab\"\r\n/>\r\n</p>
\r\n\r\n\r\nNotes:\r\n* We no longer write evaluation results to a local
ES index. We can\r\nstill do this, but most the value comes from viewing
the results in\r\nLangSmith, so I didn't re-plumb this functionality
after switching over\r\nto the new LangSmith `evaluator` function.\r\n*
Need to add back support for custom datasets if we find this
useful.\r\nCurrently only LangSmith datasets are supported. Ended up
porting over\r\nthe `GET datasets` API
from\r\nhttps://github.com//pull/181348 to make this more
useful.\r\nthe `GET evaluate` route now returns `datasets`, an array of
dataset\r\nnames from LangSmith.\r\n* Some additional fields still need
to be ported over to the POST\r\nevaluation API, like `size` and
`alertsIndexPattern`. Update: Ported to\r\nAPI, just need presence in
UI.\r\n* `Project name` was removed from the eval UI as we no longer
need to\r\ntag runs to a specific project with the new LangSmith
`evaluator` since\r\nthey automatically show up under the `Experiments`
section.\r\n* The 'Evaluation (Optional)' section currently isn't used,
so it's been\r\nremoved. We can re-enable this when there is need to run
local evals on\r\npredictions outside of LangSmith.\r\n\r\n\r\nTo test,
set a `Run name`, input a Dataset from LangSmith e.g.\r\n`LangGraph Eval
Testing`, select a few connectors and the\r\n`DefaultAssistantGraph`,
then click `Perform evaluation...`. Results\r\nwill show up in LangSmith
under `Datasets & Testing`.\r\n\r\nNote: It's easy to run into rate
limiting errors with Gemini, so just\r\nkeep aware of that when running
larger datasets. The new LangSmith\r\n`evaluator` function has an option
for `maxConcurrency` to control the\r\nmaximum number of concurrent
evaluations to run, so we can tweak that as\r\nneeded..\r\n\r\nOnce
complete, you can compare all results side-by-side in
LangSmith\r\n:tada:\r\n\r\n\r\n\r\n<img width=\"2312\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49\">\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"c276638c543ac42c1b08fcf79f70f8bd6fab45e4"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"#190574
Assistant] Adds support for LangGraph evaluations (#190574)\n\n##
Summary\r\n\r\nThis PR updates the existing evaluation framework to
support LangGraph.\r\nSince the evaluation code was the last reference
to the old agent\r\nexecutors, we were able to finally remove those as
well.\r\n\r\nThe evaluation feature remains behind a feature flag, and
can be enabled\r\nwith the following
configuration:\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n
- 'assistantModelEvaluation'\r\n```\r\n\r\nOnce enabled, the
`Evaluation` tab will become visible in settings:\r\n\r\n<p
align=\"center\">\r\n<img
width=\"800\"\r\nsrc=\"https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab\"\r\n/>\r\n</p>
\r\n\r\n\r\nNotes:\r\n* We no longer write evaluation results to a local
ES index. We can\r\nstill do this, but most the value comes from viewing
the results in\r\nLangSmith, so I didn't re-plumb this functionality
after switching over\r\nto the new LangSmith `evaluator` function.\r\n*
Need to add back support for custom datasets if we find this
useful.\r\nCurrently only LangSmith datasets are supported. Ended up
porting over\r\nthe `GET datasets` API
from\r\nhttps://github.com//pull/181348 to make this more
useful.\r\nthe `GET evaluate` route now returns `datasets`, an array of
dataset\r\nnames from LangSmith.\r\n* Some additional fields still need
to be ported over to the POST\r\nevaluation API, like `size` and
`alertsIndexPattern`. Update: Ported to\r\nAPI, just need presence in
UI.\r\n* `Project name` was removed from the eval UI as we no longer
need to\r\ntag runs to a specific project with the new LangSmith
`evaluator` since\r\nthey automatically show up under the `Experiments`
section.\r\n* The 'Evaluation (Optional)' section currently isn't used,
so it's been\r\nremoved. We can re-enable this when there is need to run
local evals on\r\npredictions outside of LangSmith.\r\n\r\n\r\nTo test,
set a `Run name`, input a Dataset from LangSmith e.g.\r\n`LangGraph Eval
Testing`, select a few connectors and the\r\n`DefaultAssistantGraph`,
then click `Perform evaluation...`. Results\r\nwill show up in LangSmith
under `Datasets & Testing`.\r\n\r\nNote: It's easy to run into rate
limiting errors with Gemini, so just\r\nkeep aware of that when running
larger datasets. The new LangSmith\r\n`evaluator` function has an option
for `maxConcurrency` to control the\r\nmaximum number of concurrent
evaluations to run, so we can tweak that as\r\nneeded..\r\n\r\nOnce
complete, you can compare all results side-by-side in
LangSmith\r\n:tada:\r\n\r\n\r\n\r\n<img width=\"2312\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49\">\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"c276638c543ac42c1b08fcf79f70f8bd6fab45e4"}}]}]
BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Security Generative AI Security Generative AI v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants