[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348

spong · 2024-04-22T17:13:13Z

Summary

This PR continues to address the evaluation usability enhancements outlined in https://github.com/elastic/security-team/issues/8167

As with all evaluation features, you must set the below feature flag to enable:

xpack.securitySolution.enableExperimental: [ 'assistantModelEvaluation']

This PR adds two simple REST API's for dataset management. A GET datasets route which returns an array of DataSetId's that are available for use, and a POST datasets that takes a DataSetId and a DatasetItem to persist to the dataset.

Note

Currently datasets are managed via LangSmith (and so your LangSmith API key must be configured), but this is expected to be expanded to manage datasets locally within the cluster.

Additionally, some UI elements have been updated to leverage the above endpoints:

Dataset selection within the Evaluation UI now populates all available datasets
A new 'Add to Dataset' message action was added enabling the ability to add a message to a dataset directly within the assistant

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)

spong · 2024-04-22T19:54:13Z

/ci

kibana-ci · 2024-04-22T21:08:42Z

💚 Build Succeeded

Buildkite Build
Commit: b081c05

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`securitySolution`	5451	5458	+7

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/elastic-assistant`	142	143	+1
`@kbn/elastic-assistant-common`	217	231	+14
total			+15

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	14.6MB	14.7MB	+15.3KB

Unknown metric groups

API count

id	before	after	diff
`@kbn/elastic-assistant`	168	169	+1
`@kbn/elastic-assistant-common`	232	246	+14
total			+15

History

💔 Build #205076 failed 67be509

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @spong

## Summary This PR updates the existing evaluation framework to support LangGraph. Since the evaluation code was the last reference to the old agent executors, we were able to finally remove those as well. The evaluation feature remains behind a feature flag, and can be enabled with the following configuration: ``` xpack.securitySolution.enableExperimental: - 'assistantModelEvaluation' ``` Once enabled, the `Evaluation` tab will become visible in settings: <img width="800" src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab" /> Notes: * We no longer write evaluation results to a local ES index. We can still do this, but most the value comes from viewing the results in LangSmith, so I didn't re-plumb this functionality after switching over to the new LangSmith `evaluator` function. * Need to add back support for custom datasets if we find this useful. Currently only LangSmith datasets are supported. Ended up porting over the `GET datasets` API from #181348 to make this more useful. the `GET evaluate` route now returns `datasets`, an array of dataset names from LangSmith. * Some additional fields still need to be ported over to the POST evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to API, just need presence in UI. * `Project name` was removed from the eval UI as we no longer need to tag runs to a specific project with the new LangSmith `evaluator` since they automatically show up under the `Experiments` section. * The 'Evaluation (Optional)' section currently isn't used, so it's been removed. We can re-enable this when there is need to run local evals on predictions outside of LangSmith. To test, set a `Run name`, input a Dataset from LangSmith e.g. `LangGraph Eval Testing`, select a few connectors and the `DefaultAssistantGraph`, then click `Perform evaluation...`. Results will show up in LangSmith under `Datasets & Testing`. Note: It's easy to run into rate limiting errors with Gemini, so just keep aware of that when running larger datasets. The new LangSmith `evaluator` function has an option for `maxConcurrency` to control the maximum number of concurrent evaluations to run, so we can tweak that as needed.. Once complete, you can compare all results side-by-side in LangSmith :tada: <img width="2312" alt="image" src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49"> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

…190574) ## Summary This PR updates the existing evaluation framework to support LangGraph. Since the evaluation code was the last reference to the old agent executors, we were able to finally remove those as well. The evaluation feature remains behind a feature flag, and can be enabled with the following configuration: ``` xpack.securitySolution.enableExperimental: - 'assistantModelEvaluation' ``` Once enabled, the `Evaluation` tab will become visible in settings: <img width="800" src="https://github.com/user-attachments/assets/8a0b8691-73a3-43b7-996b-8cc408edd5ab" /> Notes: * We no longer write evaluation results to a local ES index. We can still do this, but most the value comes from viewing the results in LangSmith, so I didn't re-plumb this functionality after switching over to the new LangSmith `evaluator` function. * Need to add back support for custom datasets if we find this useful. Currently only LangSmith datasets are supported. Ended up porting over the `GET datasets` API from elastic#181348 to make this more useful. the `GET evaluate` route now returns `datasets`, an array of dataset names from LangSmith. * Some additional fields still need to be ported over to the POST evaluation API, like `size` and `alertsIndexPattern`. Update: Ported to API, just need presence in UI. * `Project name` was removed from the eval UI as we no longer need to tag runs to a specific project with the new LangSmith `evaluator` since they automatically show up under the `Experiments` section. * The 'Evaluation (Optional)' section currently isn't used, so it's been removed. We can re-enable this when there is need to run local evals on predictions outside of LangSmith. To test, set a `Run name`, input a Dataset from LangSmith e.g. `LangGraph Eval Testing`, select a few connectors and the `DefaultAssistantGraph`, then click `Perform evaluation...`. Results will show up in LangSmith under `Datasets & Testing`. Note: It's easy to run into rate limiting errors with Gemini, so just keep aware of that when running larger datasets. The new LangSmith `evaluator` function has an option for `maxConcurrency` to control the maximum number of concurrent evaluations to run, so we can tweak that as needed.. Once complete, you can compare all results side-by-side in LangSmith :tada: <img width="2312" alt="image" src="https://github.com/user-attachments/assets/7ca31722-7400-4717-9735-d6c1c97b6e49"> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit c276638) # Conflicts: # x-pack/packages/kbn-elastic-assistant-common/constants.ts # x-pack/plugins/translations/translations/fr-FR.json # x-pack/plugins/translations/translations/ja-JP.json # x-pack/plugins/translations/translations/zh-CN.json

spong · 2024-08-26T19:35:28Z

Closing as the important pieces of dataset management from this PR were adopted over in #190574.

…190574) (#191287) # Backport This will backport the following commits from `main` to `8.15`: - [[Security Assistant] Adds support for LangGraph evaluations (#190574)](#190574)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

spong added 3 commits March 6, 2024 08:20

Initial commit dataset management

469513b

Merge branch 'main' of github.com:elastic/kibana into eval++++

b80d635

Merge branch 'main' of github.com:elastic/kibana into eval++++

e85ca6d

spong added release_note:skip Skip the PR/issue when compiling release notes Team:Security Generative AI Security Generative AI v8.15.0 labels Apr 22, 2024

spong self-assigned this Apr 22, 2024

spong added 2 commits April 22, 2024 12:15

Fixes to work with latest on main

67be509

Updates data-test-subj

b081c05

spong mentioned this pull request Aug 15, 2024

[Security Assistant] Adds support for LangGraph evaluations #190574

Merged

spong closed this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348

[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348

spong commented Apr 22, 2024

spong commented Apr 22, 2024

kibana-ci commented Apr 22, 2024

API count

spong commented Aug 26, 2024

[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348

[Security Assistant] Adds Dataset Management API and 'Add to Dataset' Message Action #181348

Conversation

spong commented Apr 22, 2024

Summary

Checklist

spong commented Apr 22, 2024

kibana-ci commented Apr 22, 2024

💚 Build Succeeded

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

API count

History

spong commented Aug 26, 2024