Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failsOnMKI: serverless security UI Security ML Trained models list page navigation renders trained models list x-pack/test_serverless/functional/test_suites/security/ml/trained_models_list.ts #180481

Closed
wayneseymour opened this issue Apr 10, 2024 · 9 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience :ml skipped-test Team:ML Team label for ML (also use :ml) v8.15.0

Comments

@wayneseymour
Copy link
Member

wayneseymour commented Apr 10, 2024

Against MKI , await ml.trainedModels.assertStats(1); worked, but only await ml.trainedModels.assertStats(0); is working on non-mki.

Related to: #175358

Failure on non-mki

1)    serverless security UI
       Security ML
         Trained models list
           page navigation
             renders trained models list:
      Error: retry.tryForTime reached timeout 5000 ms
 Error: expected 'Total trained models: 1' to sort of equal 'Total trained models: 0'
     at Assertion.assert (expect.js:100:11)
     at Assertion.eql (expect.js:244:8)
     at trained_models.ts:83:32
     at processTicksAndRejections (node:internal/process/task_queues:95:5)
     at runAttempt (retry_for_success.ts:29:15)
     at retryForSuccess (retry_for_success.ts:98:21)
     at RetryService.tryForTime (retry.ts:37:12)
     at Object.assertStats (trained_models.ts:81:7)
     at Context.<anonymous> (trained_models_list.ts:33:9)
     at Object.apply (wrap_function.js:73:16)
       at onFailure (retry_for_success.ts:17:9)
       at retryForSuccess (retry_for_success.ts:84:7)
       at RetryService.tryForTime (retry.ts:37:12)
       at Object.assertStats (trained_models.ts:81:7)
       at Context.<anonymous> (trained_models_list.ts:33:9)
       at Object.apply (wrap_function.js:73:16)
@wayneseymour wayneseymour added bug Fixes for quality problems that affect the customer experience Team:ML Team label for ML (also use :ml) labels Apr 10, 2024
@wayneseymour wayneseymour changed the title failsOnMKI: serverless security UI Security ML Trained models list page navigation renders trained models list failsOnMKI: serverless security UI Security ML Trained models list page navigation renders trained models list x-pack/test_serverless/functional/test_suites/security/ml/trained_models_list.ts Apr 10, 2024
pheyos pushed a commit that referenced this issue Apr 10, 2024
## Summary

Skip "Trained models list" suite on MKI

Details about the failure in
#180481
@spong
Copy link
Member

spong commented Apr 10, 2024

To be un-skipped in https://github.com/elastic/kibana/pull/175358/files#r1560015069. It was mentioned (internal slack) that there were other failures as a result of enabling xpack.ml.nlp.enabled, so I will run these changes against MKI CI and try to include those fixes in this PR as well.

@spong
Copy link
Member

spong commented Apr 12, 2024

#175358 has just been merged which fixes and unskips this test. We've confirmed functionality when running on MKI, but I will wait for this change to be included and pass in the appex-qa/serverless/kibana-ftr-tests pipeline before closing this issue.

@spong
Copy link
Member

spong commented Apr 12, 2024

Alrighty, latest MKI build was successful: https://buildkite.com/elastic/appex-qa-serverless-kibana-ftr-tests/builds/1430

And I've confirmed it included the changes from #175358 as the security project Kibana commit for this build was 79096beea5a63d994ea69fe98dfc4f6103817ec6, which is well after the commit for #175358 which was b29b830e8d8b07f10106af53e07de502ca67d228. Going to go ahead and close this issue.

@spong spong closed this as completed Apr 12, 2024
@mistic mistic reopened this May 17, 2024
mistic added a commit that referenced this issue May 17, 2024
@mistic
Copy link
Member

mistic commented May 17, 2024

Skipped this as the MKI environment started failing again on https://buildkite.com/elastic/appex-qa-serverless-kibana-ftr-tests/builds/1721

main: e4af221

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@jgowdyelastic
Copy link
Member

This failure is slightly different from before. Previously we were expecting no trained models because NLP was disabled in security.
We now expect one trained model, the built in lang_ident_model_1 model, however looking at the screenshot of the failure, it shows that elser has been downloaded during the running of the tests.
image

Perhaps a new test has been added somewhere which is downloading elser and then not cleaning up after itself?
We've discussed this before and it was decided that we shouldn't be downloading the real elser models during the running of tests, instead we added a tiny elser model which can be used.

@spong are you aware of any recently added security tests which are triggering a download of elser?

@spong
Copy link
Member

spong commented May 17, 2024

So ~3wks ago I un-skipped the NLP Cleanup Task tests I had created when implementing our ELSER in serverless functionality. In those tests I use the tiny elser model you mention and make sure to clean up ML resources after as well just in case there was something lingering behind.

That said, I'm not aware of any other new tests that would be downloading ELSER, at least on the security assistant side of things, and I don't currently know anyone else in security leveraging ELSER at the moment... 🤔

I did a quick search in our security_solution_api_integration tests for anyone calling trained models API's or trying to deploy via installElasticModel(), but didn't see anything.

So looks like the model was created at 12:22:42...I looked through the build logs a bit to see if that correlated with any specific test times, but I wasn't seeing granular timestamps on the tests, just that the serverless security ui tests suite that failed had started ~2024-05-17 12:39:28,991.

I recently noticed that the ML Notifications page lists when models are created/deployed. Do these logs have any more details about who may've triggered the model download? Or perhaps there's a more detailed/comprehensive buildkite log we can download to compare the created timestamp with?

@jgowdyelastic
Copy link
Member

This issue was caused by an API test which was downloading ELSER and not deleting it afterwards. That test has been skipped.

Re-enabling this test #183820

@peteharverson
Copy link
Contributor

Closing, fixed by #183820.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience :ml skipped-test Team:ML Team label for ML (also use :ml) v8.15.0
Projects
None yet
Development

No branches or pull requests

6 participants