Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check operand StatefulSet is ready for realm import Job to run #24534

Merged
merged 5 commits into from
Nov 27, 2023

Conversation

pgodowski
Copy link

Check whether the Keycloak operand StatefulSet has at least single replica ready before submitting the RealmImport Job.

Closes #24526

cc @shawkins

@pgodowski pgodowski requested review from a team as code owners November 3, 2023 16:41
@ghost ghost added team/cloud-native labels Nov 3, 2023
@pgodowski pgodowski changed the title check operand StatefulSet is ready check operand StatefulSet is ready for realm import Job to run Nov 3, 2023
Copy link
Contributor

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pgodowski this seems good - polling is simpler than getting the Keycloak CR events.

It should have a test though.

Also I know there always should be a status added when a statefulset is created, but the code is checking for null in one place, but not the other - it should at least be consistent.

@pgodowski
Copy link
Author

Thanks for the review.

Also I know there always should be a status added when a statefulset is created, but the code is checking for null in one place, but not the other - it should at least be consistent.

Fixed this part.

It should have a test though.

Will work on adding a test case in RealmImportTest.java

Copy link
Contributor

@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgodowski Thanks for the PR!

I believe this affects main too where we don't check number of ready instances as well. We should create a parallel fix for it too to consider this fully fixed.

@pgodowski
Copy link
Author

I believe this affects main too where we don't check number of ready instances as well. We should create a parallel fix for it too to consider this fully fixed.

Yes, will work on the fix in master too, but I'm super motivated to start with 22.0 branch, for obvious reasons :)

@pgodowski
Copy link
Author

pgodowski commented Nov 7, 2023

Added unit test to check for the failed realm import when Keycloak operand is not available, still in release/22.0 branch.

@shawkins
Copy link
Contributor

shawkins commented Nov 9, 2023

@pgodowski looks good. Please rebase to a single commit and add your DCO signature - Keycloak has recently adopted a requirement.

Copy link
Contributor

@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (once comment from @shawkins is resolved).

@pgodowski pgodowski force-pushed the fix/realm-import-prereq-check branch from a1dda82 to 1aecef1 Compare November 9, 2023 15:36
Signed-off-by: Piotr Godowski <piotr.godowski@pl.ibm.com>
@pgodowski pgodowski force-pushed the fix/realm-import-prereq-check branch from 1aecef1 to 6dd0531 Compare November 9, 2023 17:26
@pgodowski
Copy link
Author

Please rebase to a single commit and add your DCO signature - Keycloak has recently adopted a requirement.

@shawkins done - thanks!

Copy link
Contributor

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once small nit, could there be a single crSelector.get() rather than 3 calls per polling.

@pgodowski
Copy link
Author

Do you want me to add any changes here?

@ghost
Copy link

ghost commented Nov 10, 2023

Unreported flaky test detected

If the below flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.ui.account2.WelcomeScreenTest#accountSecurityTest

Keycloak CI - Account Console IT (firefox)

org.openqa.selenium.TimeoutException: 
Expected condition failed: waiting for wrapped: element to be clickable: GrapheneElement -> [[FirefoxDriver: firefox on LINUX (da195fcc-5609-4839-bf04-466ff6e0e49f)] -> xpath: //*[@id='landing-device-activity']/a] (tried for 5 second(s) with 500 milliseconds interval)
Build info: version: '3.14.0', revision: 'aacccce0', time: '2018-08-02T20:19:58.91Z'
System info: host: 'fv-az984-520', ip: '10.1.0.77', os.name: 'Linux', os.arch: 'amd64', os.version: '6.2.0-1015-azure', java.version: '17.0.9'
Driver info: org.openqa.selenium.firefox.FirefoxDriver$$EnhancerByGraphene$$e748af4e
...

Report flaky test

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@shawkins
Copy link
Contributor

Do you want me to add any changes here?

It's optional. It can be cleaned up later if needed.

@shawkins shawkins enabled auto-merge (squash) November 10, 2023 17:06
Signed-off-by: Piotr Godowski <piotr.godowski@pl.ibm.com>
kc.getSpec().setInstances(0);

// don't wait for Keycloak being available, since it has no instances
deployKeycloak(k8sclient, getDefaultKeycloakDeployment(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should reference the kc instance created above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, good catch @shawkins - fixed!

@shawkins
Copy link
Contributor

@pgodowski I hope you don't mind, I took the liberty of opening a forward port: #24720

There's still another small change needed here.

@pgodowski
Copy link
Author

Thanks for the forward port - let me know you need any inputs on that. Updated the test case, yet, for some reason I cannot now squash commits, hope it is fine?

Copy link
Contributor

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, don't worry about squashing, it will be done as part of the merge.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@ghost
Copy link

ghost commented Nov 14, 2023

Unreported flaky test detected

If the below flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.x509.X509BrowserCRLTest#loginSuccessWithCRLSignedWithIntermediateCA3FromTruststore

Keycloak CI - FIPS IT (non-strict)

java.lang.RuntimeException: Could not create statement
	at org.jboss.arquillian.junit.Arquillian.methodBlock(Arquillian.java:313)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
...

Report flaky test

org.keycloak.testsuite.x509.X509BrowserCRLTest#loginFailedWithIntermediateRevocationListFromFile

Keycloak CI - FIPS IT (non-strict)

java.lang.RuntimeException: Could not create statement
	at org.jboss.arquillian.junit.Arquillian.methodBlock(Arquillian.java:313)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
...

Report flaky test

org.keycloak.testsuite.x509.X509BrowserCRLTest#loginFailedWithIntermediateRevocationListFromHttp

Keycloak CI - FIPS IT (non-strict)

java.lang.RuntimeException: Could not create statement
	at org.jboss.arquillian.junit.Arquillian.methodBlock(Arquillian.java:313)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
...

Report flaky test

org.keycloak.testsuite.x509.X509BrowserCRLTest#loginFailedWithInvalidSignatureCRL

Keycloak CI - FIPS IT (non-strict)

java.lang.RuntimeException: Could not create statement
	at org.jboss.arquillian.junit.Arquillian.methodBlock(Arquillian.java:313)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
...

Report flaky test

org.keycloak.testsuite.x509.X509BrowserCRLTest#loginWithMultipleRevocationLists

Keycloak CI - FIPS IT (non-strict)

java.lang.RuntimeException: Could not create statement
	at org.jboss.arquillian.junit.Arquillian.methodBlock(Arquillian.java:313)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
...

Report flaky test

org.keycloak.testsuite.x509.X509BrowserCRLTest#loginSuccessWithEmptyRevocationListFromFile

Keycloak CI - FIPS IT (non-strict)

java.lang.RuntimeException: Could not create statement
	at org.jboss.arquillian.junit.Arquillian.methodBlock(Arquillian.java:313)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
...

Report flaky test

@vmuzikar vmuzikar enabled auto-merge (squash) November 15, 2023 07:57
@vmuzikar
Copy link
Contributor

Enabled auto-merge but 22 is frozen until 22.0.6 is released.

@Jamstah
Copy link
Contributor

Jamstah commented Nov 23, 2023

As an FYI (because I'm expecting this one in 22.0.7 anyway), this is actually more dangerous than we thought.

Its not just some failed pods, but there is a timing issue where I think the realm job can trigger a database initialisation at the same time as the first Keycloak pod, and the realm status can say its ready even though its not in the database. The cause is conjecture because I don't have logs from the first keycloak pod run, but on this cluster, the realm job completed successfully but the realm was not present.

Log from a realm import pod:

Updating the configuration and installing your custom providers, if any. Please wait.
2023-11-23 09:27:46,955 INFO [io.quarkus.deployment.QuarkusAugmentor] (main) Quarkus augmentation completed in 24969ms
Server configuration updated and persisted. Run the following command to review the configuration:
kc.sh show-config
2023-11-23 09:27:50,348 INFO [org.keycloak.quarkus.runtime.hostname.DefaultHostnameProvider] (main) Hostname settings: Base URL: <unset>, Hostname: keycloak-cp4i.kepler-roks1-ec111ed5d7db435e1c5eeeb4400d693f-0000.eu-gb.containers.appdomain.cloud, Strict HTTPS: false, Path: <request>, Strict BackChannel: false, Admin URL: <unset>, Admin: <request>, Port: -1, Proxied: true
2023-11-23 09:27:53,946 WARN [io.quarkus.agroal.runtime.DataSources] (main) Datasource <default> enables XA but transaction recovery is not enabled. Please enable transaction recovery by setting quarkus.transaction-manager.enable-recovery=true, otherwise data may be lost if the application is terminated abruptly
2023-11-23 09:27:55,351 WARN [org.infinispan.PERSISTENCE] (keycloak-cache-init) ISPN000554: jboss-marshalling is deprecated and planned for removal
2023-11-23 09:27:55,554 WARN [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000569: Unable to persist Infinispan internal caches as no global state enabled
2023-11-23 09:27:55,763 INFO [org.infinispan.CONTAINER] (keycloak-cache-init) ISPN000556: Starting user marshaller 'org.infinispan.jboss.marshalling.core.JBossUserMarshaller'
2023-11-23 09:27:58,180 INFO [org.keycloak.broker.provider.AbstractIdentityProviderMapper] (main) Registering class org.keycloak.broker.provider.mappersync.ConfigSyncEventListener
2023-11-23 09:27:58,263 INFO [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (main) Node name: node_311387, Site name: null
2023-11-23 09:28:07,869 INFO [org.keycloak.quarkus.runtime.storage.legacy.liquibase.QuarkusJpaUpdaterProvider] (main) Initializing database schema. Using changelog META-INF/jpa-changelog-master.xml
UPDATE SUMMARY
Run: 115
Previously run: 0
Filtered out: 0
-------------------------------
Total change sets: 115
2023-11-23 09:28:19,544 INFO [org.keycloak.services] (main) KC-SERVICES0050: Initializing master realm
2023-11-23 09:28:24,240 INFO [org.keycloak.exportimport.singlefile.SingleFileImportProvider] (main) Full importing from file /mnt/realm-import/cloudpak-realm.json
2023-11-23 09:28:28,364 INFO [org.keycloak.exportimport.util.ImportUtils] (main) Realm 'cloudpak' imported
2023-11-23 09:28:28,673 INFO [io.quarkus] (main) Keycloak 22.0.6.redhat-00002 on JVM (powered by Quarkus 3.2.6.Final-redhat-00002) started in 41.495s. Listening on:
2023-11-23 09:28:28,674 INFO [io.quarkus] (main) Profile import_export activated.
2023-11-23 09:28:28,674 INFO [io.quarkus] (main) Installed features: [agroal, cdi, hibernate-orm, jdbc-h2, jdbc-mariadb, jdbc-mssql, jdbc-mysql, jdbc-oracle, jdbc-postgresql, keycloak, micrometer, narayana-jta, reactive-routes, resteasy, resteasy-jackson, smallrye-context-propagation, smallrye-health, vertx]
2023-11-23 09:28:28,774 INFO [io.quarkus] (main) Keycloak stopped in 0.096s

@pgodowski
Copy link
Author

As an FYI (because I'm expecting this one in 22.0.7 anyway), this is actually more dangerous than we thought.

Are you suggesting we shall check for other thing in the proposed fix than the operand StatefulSet, or it's just your notice saying that the impact is larger than we initially though?

@Jamstah
Copy link
Contributor

Jamstah commented Nov 23, 2023

As an FYI (because I'm expecting this one in 22.0.7 anyway), this is actually more dangerous than we thought.

Are you suggesting we shall check for other thing in the proposed fix than the operand StatefulSet, or it's just your notice saying that the impact is larger than we initially though?

Just that the impact is more than just a UX snag - it can leave an installation in a broken state.

I don't think we need more than a single pod ready check.

@ahus1 ahus1 disabled auto-merge November 27, 2023 19:16
@ahus1 ahus1 merged commit 002f1a7 into keycloak:release/22.0 Nov 27, 2023
62 checks passed
Copy link

cypress bot commented Nov 27, 2023

1 failed and 2 flaky tests on run #9989 ↗︎

1 526 48 0 Flakiness 2

Details:

check operand StatefulSet is ready for realm import Job to run (#24534)
Project: Keycloak Admin UI Commit: 002f1a7ce6
Status: Failed Duration: 10:00 💡
Started: Nov 27, 2023 7:24 PM Ended: Nov 27, 2023 7:34 PM
Failed  cypress/e2e/sessions_test.spec.ts • 1 failed test • chrome

View Output Video

Test Artifacts
Sessions test > revocation > Check if notBefore saved Screenshots Video
Flakiness  clients_test.spec.ts • 1 flaky test • chrome

View Output

Test Artifacts
Clients test > Accessibility tests for clients > Check a11y violations on client registration/ authenticated access policies tab Screenshots
Flakiness  authentication_test.spec.ts • 1 flaky test • chrome

View Output

Test Artifacts
Authentication test > should add a condition Screenshots

Review all test suite changes for PR #24534 ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keycloak operator 22.0.z created RealmImport Job withouth checking Keycloak operand health first
5 participants