Skip to content

[wko-nightly] Fix the failure from ItMiiClusterResource/testSharedClusterResource to delete shared-cluster #3891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 1, 2023

Conversation

hzhao-github
Copy link
Contributor

the Nightly Failure https://build.weblogick8s.org:8443/job/wko-kind-nightly-parallel/1212/testReport/oracle.weblogic.kubernetes/ItMiiClusterResource/testSharedClusterResource/

error:
message: 'Domain domain7 failed due to ''Domain validation error'': Cannot reference
cluster resource ''shared-cluster'' because it is used by ''domain8''. Update
the domain resource to correct the validation error.'
reason: Failed

we do verify domain/cluster deleted using withStandardRetryPolicy. I can't see any other issue causing the failure of deleting the cluster. We can try to increate the waiting time from withStandardRetryPolicy to withLongRetryPolicy
to make sure we have enough time in case the env is picky

Jenkins:
https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/16001/

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jan 25, 2023
Copy link
Member

@anpanigr anpanigr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue found in ItMiiClusterResource/testSharedClusterResource related the fact that "Domain Validation Error " not generated when same cluster resource is associated two domain resources.

How the current change ( delete cluster resource ) will resolve the issue

https://build.weblogick8s.org:8443/job/wko-kind-nightly-parallel/1212/artifact/logdir/jenkins-wko-kind-nightly-parallel-1212/wl_k8s_test_results/diagnostics/ItMiiClusterResource/ItMiiClusterResource.out/*view*/

@hzhao-github
Copy link
Contributor Author

real cause is in https://build.weblogick8s.org:8443/job/wko-kind-nightly-parallel/1212/artifact/logdir/jenkins-wko-kind-nightly-parallel-1212/wl_k8s_test_results/diagnostics/ItMiiClusterResource/testSharedClusterResource/ns-abqknd.list.events.log. The error 1 caused cascade failures, IMO. That's why I started fixing the issue from delete cluster. If it doesn't work down the line, we can look into something else then

message: 'Domain domain7 failed due to ''Domain validation error'': Cannot reference
cluster resource ''shared-cluster'' because it is used by ''domain8''. Update
the domain resource to correct the validation error.'
reason: Failed

message: 'Domain domain7 failed due to ''Internal error'': Cannot invoke "oracle.kubernetes.weblogic.domain.model.DomainResource.isShuttingDown()"
because "domain" is null. Cannot invoke "oracle.kubernetes.weblogic.domain.model.DomainResource.isShuttingDown()"
because "domain" is null. Will retry next at 2023-01-24T08:15:16.734741100Z and
approximately every 120 seconds afterward until 2023-01-25T08:13:16.734741100Z
if the failure is not resolved.. Will retry.'
reason: Failed

message: |-
Exec lifecycle hook ([/weblogic-operator/scripts/stopServer.sh]) for Container "weblogic-server" in Pod "domain5-managed-server1_ns-abqknd(78056307-fff4-49b1-8116-9de88feacbda)" failed - error: command '/weblogic-operator/scripts/stopServer.sh' exited with 1: /weblogic-operator/scripts/stopServer.sh: line 146: /proc/1/fd/1: Permission denied
[ , message: "@[2023-01-24T08:08:06.196224411Z][utils_base.sh:210]FINE] SERVER_NAME='managed-server1'\n/weblogic-operator/scripts/stopServer.sh: line 146: /proc/1/fd/1: Permission denied\n"
reason: FailedPreStopHook

message: 'Cluster domain5-cluster-5 is incomplete for one or more of the following
reasons: there are failures detected, there are pending server shutdowns, or not
all servers expected to be running are ready and at their target image, auxiliary
images, restart version, and introspect version.'
reason: ClusterIncomplete

message: 'Domain domain4 failed due to ''Domain validation error'': Cluster resource
''domain4-cluster-2'' not found in namespace ''ns-abqknd''. Update the domain
resource to correct the validation error.'
reason: Failed

message: 'Cluster domain9-cluster-2 is incomplete for one or more of the following
reasons: there are failures detected, there are pending server shutdowns, or not
all servers expected to be running are ready and at their target image, auxiliary
images, restart version, and introspect version.'

clusterDoesNotExist(clusterName, CLUSTER_VERSION, namespace),
getLogger(),
"cluster {0} to be created in namespace {1}",
getLogger(), "cluster {0} to be created in namespace {1}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it that the method waits for the cluster to be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Sankar, it doesn't wait:

public static boolean doesClusterExist(String clusterResName, String clusterVersion, String namespace) {

Object clusterObject = null;
try {
  clusterObject
      = customObjectsApi.getNamespacedCustomObject(
      "weblogic.oracle", clusterVersion, namespace, "clusters", clusterResName);
} catch (ApiException apex) {
  getLogger().info(apex.getMessage());
}
boolean cluster = (clusterObject != null);
getLogger().info("Cluster Object exists : " + cluster);
return cluster;

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Sankar. I misunderstood you, Yes. the log msg is wrong. I just changed it

Copy link
Member

@sankarpn sankarpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor clarification

Copy link
Member

@anpanigr anpanigr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can surface the actual issue to the top triaging page which just says
expected: but was:

https://build.weblogick8s.org:8443/job/wko-kind-nightly-parallel/1212/testReport/oracle.weblogic.kubernetes/ItMiiClusterResource/testSharedClusterResource/

@hzhao-github
Copy link
Contributor Author

Hi Pani, unfortunately we can't, testUntil throws timeout error and doesClusterExist catch the exception itself

Copy link
Member

@sankarpn sankarpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@rjeberhard rjeberhard merged commit a51a877 into main Feb 1, 2023
@rjeberhard rjeberhard deleted the clusterres-nightly branch February 1, 2023 23:23
rjeberhard pushed a commit to rjeberhard/weblogic-kubernetes-operator that referenced this pull request Apr 14, 2023
…sterResource to delete shared-cluster (oracle#3891)

* [wko-nightly] Fix the failure from ItMiiClusterResource/testSharedClusterResource to delete shared-cluster
robertpatrick pushed a commit that referenced this pull request Apr 26, 2023
…sterResource to delete shared-cluster (#3891)

* [wko-nightly] Fix the failure from ItMiiClusterResource/testSharedClusterResource to delete shared-cluster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants