-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348
Conversation
/ci |
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Unknown metric groupsAPI count
History
To update your PR or re-run it, just comment with: cc @spong |
## Summary This PR updates the existing evaluation framework to support LangGraph. Since the evaluation code was the last reference to the old agent executors, we were able to finally remove those as well. The evaluation feature remains behind a feature flag, and can be enabled with the following configuration: ``` xpack.securitySolution.enableExperimental: - 'assistantModelEvaluation' ``` Once enabled, the `Evaluation` tab will become visible in settings: <p align="center"> <img width="800" src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab" /> </p> Notes: * We no longer write evaluation results to a local ES index. We can still do this, but most the value comes from viewing the results in LangSmith, so I didn't re-plumb this functionality after switching over to the new LangSmith `evaluator` function. * Need to add back support for custom datasets if we find this useful. Currently only LangSmith datasets are supported. Ended up porting over the `GET datasets` API from #181348 to make this more useful. the `GET evaluate` route now returns `datasets`, an array of dataset names from LangSmith. * Some additional fields still need to be ported over to the POST evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to API, just need presence in UI. * `Project name` was removed from the eval UI as we no longer need to tag runs to a specific project with the new LangSmith `evaluator` since they automatically show up under the `Experiments` section. * The 'Evaluation (Optional)' section currently isn't used, so it's been removed. We can re-enable this when there is need to run local evals on predictions outside of LangSmith. To test, set a `Run name`, input a Dataset from LangSmith e.g. `LangGraph Eval Testing`, select a few connectors and the `DefaultAssistantGraph`, then click `Perform evaluation...`. Results will show up in LangSmith under `Datasets & Testing`. Note: It's easy to run into rate limiting errors with Gemini, so just keep aware of that when running larger datasets. The new LangSmith `evaluator` function has an option for `maxConcurrency` to control the maximum number of concurrent evaluations to run, so we can tweak that as needed.. Once complete, you can compare all results side-by-side in LangSmith :tada: <img width="2312" alt="image" src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49"> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…190574) ## Summary This PR updates the existing evaluation framework to support LangGraph. Since the evaluation code was the last reference to the old agent executors, we were able to finally remove those as well. The evaluation feature remains behind a feature flag, and can be enabled with the following configuration: ``` xpack.securitySolution.enableExperimental: - 'assistantModelEvaluation' ``` Once enabled, the `Evaluation` tab will become visible in settings: <p align="center"> <img width="800" src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab" /> </p> Notes: * We no longer write evaluation results to a local ES index. We can still do this, but most the value comes from viewing the results in LangSmith, so I didn't re-plumb this functionality after switching over to the new LangSmith `evaluator` function. * Need to add back support for custom datasets if we find this useful. Currently only LangSmith datasets are supported. Ended up porting over the `GET datasets` API from elastic#181348 to make this more useful. the `GET evaluate` route now returns `datasets`, an array of dataset names from LangSmith. * Some additional fields still need to be ported over to the POST evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to API, just need presence in UI. * `Project name` was removed from the eval UI as we no longer need to tag runs to a specific project with the new LangSmith `evaluator` since they automatically show up under the `Experiments` section. * The 'Evaluation (Optional)' section currently isn't used, so it's been removed. We can re-enable this when there is need to run local evals on predictions outside of LangSmith. To test, set a `Run name`, input a Dataset from LangSmith e.g. `LangGraph Eval Testing`, select a few connectors and the `DefaultAssistantGraph`, then click `Perform evaluation...`. Results will show up in LangSmith under `Datasets & Testing`. Note: It's easy to run into rate limiting errors with Gemini, so just keep aware of that when running larger datasets. The new LangSmith `evaluator` function has an option for `maxConcurrency` to control the maximum number of concurrent evaluations to run, so we can tweak that as needed.. Once complete, you can compare all results side-by-side in LangSmith :tada: <img width="2312" alt="image" src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49"> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit c276638) # Conflicts: # x-pack/packages/kbn-elastic-assistant-common/constants.ts # x-pack/plugins/translations/translations/fr-FR.json # x-pack/plugins/translations/translations/ja-JP.json # x-pack/plugins/translations/translations/zh-CN.json
Closing as the important pieces of dataset management from this PR were adopted over in #190574. |
…190574) (#191287) # Backport This will backport the following commits from `main` to `8.15`: - [[Security Assistant] Adds support for LangGraph evaluations (#190574)](#190574) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Garrett Spong","email":"spong@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-08-20T22:03:35Z","message":"[Security Assistant] Adds support for LangGraph evaluations (#190574)\n\n## Summary\r\n\r\nThis PR updates the existing evaluation framework to support LangGraph.\r\nSince the evaluation code was the last reference to the old agent\r\nexecutors, we were able to finally remove those as well.\r\n\r\nThe evaluation feature remains behind a feature flag, and can be enabled\r\nwith the following configuration:\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n - 'assistantModelEvaluation'\r\n```\r\n\r\nOnce enabled, the `Evaluation` tab will become visible in settings:\r\n\r\n<p align=\"center\">\r\n<img width=\"800\"\r\nsrc=\"https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab\"\r\n/>\r\n</p> \r\n\r\n\r\nNotes:\r\n* We no longer write evaluation results to a local ES index. We can\r\nstill do this, but most the value comes from viewing the results in\r\nLangSmith, so I didn't re-plumb this functionality after switching over\r\nto the new LangSmith `evaluator` function.\r\n* Need to add back support for custom datasets if we find this useful.\r\nCurrently only LangSmith datasets are supported. Ended up porting over\r\nthe `GET datasets` API from\r\nhttps://github.com//pull/181348 to make this more useful.\r\nthe `GET evaluate` route now returns `datasets`, an array of dataset\r\nnames from LangSmith.\r\n* Some additional fields still need to be ported over to the POST\r\nevaluation API, like `size` and `alertsIndexPattern`. Update: Ported to\r\nAPI, just need presence in UI.\r\n* `Project name` was removed from the eval UI as we no longer need to\r\ntag runs to a specific project with the new LangSmith `evaluator` since\r\nthey automatically show up under the `Experiments` section.\r\n* The 'Evaluation (Optional)' section currently isn't used, so it's been\r\nremoved. We can re-enable this when there is need to run local evals on\r\npredictions outside of LangSmith.\r\n\r\n\r\nTo test, set a `Run name`, input a Dataset from LangSmith e.g.\r\n`LangGraph Eval Testing`, select a few connectors and the\r\n`DefaultAssistantGraph`, then click `Perform evaluation...`. Results\r\nwill show up in LangSmith under `Datasets & Testing`.\r\n\r\nNote: It's easy to run into rate limiting errors with Gemini, so just\r\nkeep aware of that when running larger datasets. The new LangSmith\r\n`evaluator` function has an option for `maxConcurrency` to control the\r\nmaximum number of concurrent evaluations to run, so we can tweak that as\r\nneeded..\r\n\r\nOnce complete, you can compare all results side-by-side in LangSmith\r\n:tada:\r\n\r\n\r\n\r\n<img width=\"2312\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49\">\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"c276638c543ac42c1b08fcf79f70f8bd6fab45e4","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:skip","Team: SecuritySolution","Feature:Security Assistant","Team:Security Generative AI","Feature:Assistant Evaluation","v8.16.0"],"number":190574,"url":"#190574 Assistant] Adds support for LangGraph evaluations (#190574)\n\n## Summary\r\n\r\nThis PR updates the existing evaluation framework to support LangGraph.\r\nSince the evaluation code was the last reference to the old agent\r\nexecutors, we were able to finally remove those as well.\r\n\r\nThe evaluation feature remains behind a feature flag, and can be enabled\r\nwith the following configuration:\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n - 'assistantModelEvaluation'\r\n```\r\n\r\nOnce enabled, the `Evaluation` tab will become visible in settings:\r\n\r\n<p align=\"center\">\r\n<img width=\"800\"\r\nsrc=\"https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab\"\r\n/>\r\n</p> \r\n\r\n\r\nNotes:\r\n* We no longer write evaluation results to a local ES index. We can\r\nstill do this, but most the value comes from viewing the results in\r\nLangSmith, so I didn't re-plumb this functionality after switching over\r\nto the new LangSmith `evaluator` function.\r\n* Need to add back support for custom datasets if we find this useful.\r\nCurrently only LangSmith datasets are supported. Ended up porting over\r\nthe `GET datasets` API from\r\nhttps://github.com//pull/181348 to make this more useful.\r\nthe `GET evaluate` route now returns `datasets`, an array of dataset\r\nnames from LangSmith.\r\n* Some additional fields still need to be ported over to the POST\r\nevaluation API, like `size` and `alertsIndexPattern`. Update: Ported to\r\nAPI, just need presence in UI.\r\n* `Project name` was removed from the eval UI as we no longer need to\r\ntag runs to a specific project with the new LangSmith `evaluator` since\r\nthey automatically show up under the `Experiments` section.\r\n* The 'Evaluation (Optional)' section currently isn't used, so it's been\r\nremoved. We can re-enable this when there is need to run local evals on\r\npredictions outside of LangSmith.\r\n\r\n\r\nTo test, set a `Run name`, input a Dataset from LangSmith e.g.\r\n`LangGraph Eval Testing`, select a few connectors and the\r\n`DefaultAssistantGraph`, then click `Perform evaluation...`. Results\r\nwill show up in LangSmith under `Datasets & Testing`.\r\n\r\nNote: It's easy to run into rate limiting errors with Gemini, so just\r\nkeep aware of that when running larger datasets. The new LangSmith\r\n`evaluator` function has an option for `maxConcurrency` to control the\r\nmaximum number of concurrent evaluations to run, so we can tweak that as\r\nneeded..\r\n\r\nOnce complete, you can compare all results side-by-side in LangSmith\r\n:tada:\r\n\r\n\r\n\r\n<img width=\"2312\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49\">\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"c276638c543ac42c1b08fcf79f70f8bd6fab45e4"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"#190574 Assistant] Adds support for LangGraph evaluations (#190574)\n\n## Summary\r\n\r\nThis PR updates the existing evaluation framework to support LangGraph.\r\nSince the evaluation code was the last reference to the old agent\r\nexecutors, we were able to finally remove those as well.\r\n\r\nThe evaluation feature remains behind a feature flag, and can be enabled\r\nwith the following configuration:\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n - 'assistantModelEvaluation'\r\n```\r\n\r\nOnce enabled, the `Evaluation` tab will become visible in settings:\r\n\r\n<p align=\"center\">\r\n<img width=\"800\"\r\nsrc=\"https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab\"\r\n/>\r\n</p> \r\n\r\n\r\nNotes:\r\n* We no longer write evaluation results to a local ES index. We can\r\nstill do this, but most the value comes from viewing the results in\r\nLangSmith, so I didn't re-plumb this functionality after switching over\r\nto the new LangSmith `evaluator` function.\r\n* Need to add back support for custom datasets if we find this useful.\r\nCurrently only LangSmith datasets are supported. Ended up porting over\r\nthe `GET datasets` API from\r\nhttps://github.com//pull/181348 to make this more useful.\r\nthe `GET evaluate` route now returns `datasets`, an array of dataset\r\nnames from LangSmith.\r\n* Some additional fields still need to be ported over to the POST\r\nevaluation API, like `size` and `alertsIndexPattern`. Update: Ported to\r\nAPI, just need presence in UI.\r\n* `Project name` was removed from the eval UI as we no longer need to\r\ntag runs to a specific project with the new LangSmith `evaluator` since\r\nthey automatically show up under the `Experiments` section.\r\n* The 'Evaluation (Optional)' section currently isn't used, so it's been\r\nremoved. We can re-enable this when there is need to run local evals on\r\npredictions outside of LangSmith.\r\n\r\n\r\nTo test, set a `Run name`, input a Dataset from LangSmith e.g.\r\n`LangGraph Eval Testing`, select a few connectors and the\r\n`DefaultAssistantGraph`, then click `Perform evaluation...`. Results\r\nwill show up in LangSmith under `Datasets & Testing`.\r\n\r\nNote: It's easy to run into rate limiting errors with Gemini, so just\r\nkeep aware of that when running larger datasets. The new LangSmith\r\n`evaluator` function has an option for `maxConcurrency` to control the\r\nmaximum number of concurrent evaluations to run, so we can tweak that as\r\nneeded..\r\n\r\nOnce complete, you can compare all results side-by-side in LangSmith\r\n:tada:\r\n\r\n\r\n\r\n<img width=\"2312\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49\">\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"c276638c543ac42c1b08fcf79f70f8bd6fab45e4"}}]}] BACKPORT-->
Summary
This PR continues to address the evaluation usability enhancements outlined in https://github.com/elastic/security-team/issues/8167
As with all evaluation features, you must set the below feature flag to enable:
This PR adds two simple REST API's for dataset management. A
GET
datasets route which returns an array ofDataSetId
's that are available for use, and aPOST
datasets that takes aDataSetId
and aDatasetItem
to persist to the dataset.Note
Currently datasets are managed via LangSmith (and so your LangSmith API key must be configured), but this is expected to be expanded to manage datasets locally within the cluster.
Additionally, some UI elements have been updated to leverage the above endpoints:
Checklist
Delete any items that are not applicable to this PR.