New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] UpgradeClusterClientYamlTestSuiteIT test {p0=upgraded_cluster/30_ml_jobs_crud/Test open old jobs} failing #79636
Comments
Pinging @elastic/ml-core (Team:ML) |
The server side logs suggest it did exist:
I think the problem is possibly that all the shards of the
The test that asserts the alias exists opens two jobs before that. It seems that we are not waiting for |
It should have been checked by this: Lines 528 to 533 in d383538
|
There was a giant refactor on how Index metadata is accessed (changing how aliases and indices are gathered internally). #79080 This is something to keep in mind if other things start failing. But since this change is very encompassing, I would expect something other than ML to get a failure :/. Still digging. |
We definitely are not. We specifically wait for I think if we were to wait for something for |
In tests and actual usage, it is possible that one job creates the .ml-state-write and another starts immediately afterwards, sees that the index is created, and moves on. But, what this means, is that the second job could blast past the check and the job starts/stops/etc. all with the .ml-state-write alias pointing to an index that is not even readable. This commit waits for the index to be yellow before continuing opening the job. closes: #79636
In tests and actual usage, it is possible that one job creates the .ml-state-write and another starts immediately afterwards, sees that the index is created, and moves on. But, what this means, is that the second job could blast past the check and the job starts/stops/etc. all with the .ml-state-write alias pointing to an index that is not even readable. This commit waits for the index to be yellow before continuing opening the job. closes: elastic#79636
In tests and actual usage, it is possible that one job creates the .ml-state-write and another starts immediately afterwards, sees that the index is created, and moves on. But, what this means, is that the second job could blast past the check and the job starts/stops/etc. all with the .ml-state-write alias pointing to an index that is not even readable. This commit waits for the index to be yellow before continuing opening the job. closes: #79636
In tests and actual usage, it is possible that one job creates the .ml-state-write and another starts immediately afterwards, sees that the index is created, and moves on. But, what this means, is that the second job could blast past the check and the job starts/stops/etc. all with the .ml-state-write alias pointing to an index that is not even readable. This commit waits for the index to be yellow before continuing opening the job. closes: elastic#79636
In this failure the
.ml-state-write
alias did not exist after upgrading a 7.3.2 cluster to 7.16.0. This is a bit worrying - have we done anything recently that would mean we don't reliably set up the.ml-state-write
alias when upgrading from a version that didn't have it?Build scan:
https://gradle-enterprise.elastic.co/s/2uxstqtvl3e2c/tests/:x-pack:qa:rolling-upgrade:v7.3.2%23upgradedClusterTest/org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT/test%20%7Bp0=upgraded_cluster%2F30_ml_jobs_crud%2FTest%20open%20old%20jobs%7D
Reproduction line:
./gradlew ':x-pack:qa:rolling-upgrade:v7.3.2#upgradedClusterTest' -Dtests.class="org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=upgraded_cluster/30_ml_jobs_crud/Test open old jobs}" -Dtests.seed=BD58859DCE7F0251 -Dtests.bwc=true -Dtests.locale=fi -Dtests.timezone=America/Guatemala -Druntime.java=8
Applicable branches:
7.16
Reproduces locally?:
No
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dupgraded_cluster/30_ml_jobs_crud/Test%20open%20old%20jobs%7D
Failure excerpt:
The text was updated successfully, but these errors were encountered: