Issue 41 #131

tbarnes-us · 2018-03-02T08:00:21Z

Status:

Dev complete and all acceptance tests pass

Changes:

(1) Label domainUid of various k8s resources that weren't already labeled.

(2) Add new weblogic.createdByOperator label for operator created resources, and modify operator code label searches to look for weblogic.createdByOperator in addition to a specific weblogic.domainUid (this prevents operator from changing/deleting/watching resources that it doesn't own).

(3) Add new kubernetes/delete-domain.sh script which takes advantage of labels to delete everything associated with the command-line supplied domain-uid(s). Usage:

[adc01jjm weblogic-kubernetes-operator]$ kubernetes/delete-domain.sh       
  Usage:

    delete-domain.sh -d domain-uid,domain-uid,... [-s max-seconds] [-t]
    delete-domain.sh -d all [-s max-seconds] [-t]
    delete-domain.sh -h

  Perform a best-effort delete of the kubernetes resources for
  the given domain(s), and retry until either max-seconds is reached
  or all resources were deleted (default 120 seconds).

  The domains can be specified as a comma-separated list of 
  domain-uids (no spaces), or the keyword 'all'.  The domains can be
  located in any kubernetes namespace.

  Specify '-t' to run the script in a test mode which will
  show kubernetes commands but not actually perform them.

  The script runs in three phases:  

    Phase 1:  Set the startupControl of each domain to NONE if
              it's not already NONE.  This should cause each
              domain's operator to initiate a controlled shutdown
              of the domain.  Immediately proceed to phase 2.

    Phase 2:  Wait up to half of max-seconds for WebLogic
              Server pods to exit normally, and then proceed
              to phase 3.

    Phase 3:  Periodically delete all remaining kubernetes resources
              for the specified domains, including any pods
              leftover from phase 2.  Exit and fail if max-seconds
              is exceeded and there are any leftover kubernetes
              resources.

  This script exits with a zero status on success, and a 
  non-zero status on failure.

Sample run:

[adc01jjm weblogic-kubernetes-operator]$ kubernetes/delete-domain.sh -d all
@@ Deleting kubernetes resources with label weblogic.domainUID 'all'.
@@ 24 resources remaining after 2 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Setting startupControl to NONE on each domain (this should cause operator(s) to initiate a controlled shutdown of the domain's pods.)
domain "domain1" patched
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 24 resources remaining after 7 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 24 resources remaining after 13 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 24 resources remaining after 19 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 24 resources remaining after 25 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 24 resources remaining after 31 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 24 resources remaining after 35 seconds, including 4 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 23 resources remaining after 40 seconds, including 3 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 23 resources remaining after 44 seconds, including 3 WebLogic Server pods. Max wait is 120 seconds.
@@ Waiting for operator to shutdown pods (will wait for no more than half of max wait seconds before directly deleting them).
@@ 20 resources remaining after 48 seconds, including 0 WebLogic Server pods. Max wait is 120 seconds.
@@ All pods shutdown, about to directly delete remaining resources.
domain "domain1" deleted
pod "domain1-cluster-1-traefik-59998fb86d-gzt5j" deleted
job "domain-domain1-job" deleted
deployment "domain1-cluster-1-traefik" deleted
persistentvolumeclaim "domain1-pv-claim" deleted
configmap "domain-domain1-scripts" deleted
configmap "domain1-cluster-1-traefik" deleted
serviceaccount "domain1-cluster-1-traefik" deleted
secret "domain1-weblogic-credentials" deleted
persistentvolume "domain1-pv" deleted
clusterrole "domain1-cluster-1-traefik" deleted
clusterrolebinding "domain1-cluster-1-traefik" deleted
@@ 0 resources remaining after 63 seconds, including 0 WebLogic Server pods. Max wait is 120 seconds.
@@ Success.

Note:

BTW, it turns out Mark had started on a change for this issue via WIP branch issue-41. I discovered this only after I'd written the script and tried to push. So I'm using "issue--41" for this branch instead of "issue-41".

…general purpose delete-domain.sh script.

…onger used.

rjeberhard · 2018-03-02T12:47:18Z

The script needs to handle the case where the operator is running. Once you delete the domain resource (assuming the operator is running), the operator will begin shutting down servers and specifically removing the pods, services, and Ingress entries.

One option is to work with the operator by editing the domain to set domain.spec.startupControl = "NONE". If the operator is running, then it will shutdown all of the pods gracefully. You can watch the domain.status.conditions array for Progressing and Available conditions. For startupControl = "NONE", the operator will set a condition of type = "Available" and reason = "AllServersStopped".

After this, the script could safely delete the domain and other resources.

…cts, set startupControl on each domain to NONE and wait up to half of max wait seconds for operator to shutdown its WLS pods normally. (2) Increase default max wait seconds to 120 seconds.

rjeberhard · 2018-03-05T23:34:23Z

This looks really good with maybe one readability comment that getDomain surprised me by getting all of the objects associated with the domain. When it fails intermittently, what happens?

…edByOperator label to operator owned domain resources, and modify its selectors to look for this label). Plus modify run.sh domain liefecycle test to verify webapp is still OK after a cycling.

… delete logic -- just a small side fix.)

rjeberhard · 2018-03-08T00:26:40Z

Resolves issue #41

Tom Barnes added 4 commits March 1, 2018 22:59

Add domainUID labels to k8s that didn't already have them. Add a new …

4dfcb6b

…general purpose delete-domain.sh script.

Remove domain1-weblogic-credentials.yaml integration test file - no l…

88fc94d

…onger used.

delete script fix: specify all-namespaces in search

dc1daf5

Modify usage text slightly.

7e0d2f4

Tom Barnes added 2 commits March 5, 2018 11:48

Merge remote-tracking branch 'origin/master' into issue--41

d19cec5

Delete script enhancements: (1) Before directly deleting all k8s obje…

099e90b

…cts, set startupControl on each domain to NONE and wait up to half of max wait seconds for operator to shutdown its WLS pods normally. (2) Increase default max wait seconds to 120 seconds.

Tom Barnes added 5 commits March 6, 2018 10:21

Update comments, usage text, function name, and temp file name.

9d18f88

Minor change to run.sh.

619136d

Teach operator to leave other peoples stuff alone (add weblogic.Creat…

ce0a2ad

…edByOperator label to operator owned domain resources, and modify its selectors to look for this label). Plus modify run.sh domain liefecycle test to verify webapp is still OK after a cycling.

Merge remote-tracking branch 'origin/master' into issue--41

773e67d

Add weblogic.clusterName labels to traefik resources. (Not needed for…

ab1fd62

… delete logic -- just a small side fix.)

tbarnes-us changed the title ~~WIP: Issue 41~~ Issue 41 Mar 7, 2018

rjeberhard merged commit c5d5f60 into master Mar 8, 2018

tbarnes-us mentioned this pull request Mar 8, 2018

Provide an easy way to remove a domain #41

Closed

rjeberhard deleted the issue--41 branch March 30, 2018 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue 41 #131

Issue 41 #131

Uh oh!

tbarnes-us commented Mar 2, 2018 •

edited

Loading

Uh oh!

rjeberhard commented Mar 2, 2018

Uh oh!

rjeberhard commented Mar 5, 2018

Uh oh!

rjeberhard commented Mar 8, 2018

Uh oh!

Uh oh!

Issue 41 #131

Issue 41 #131

Uh oh!

Conversation

tbarnes-us commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status:

Changes:

Sample run:

Note:

Uh oh!

rjeberhard commented Mar 2, 2018

Uh oh!

rjeberhard commented Mar 5, 2018

Uh oh!

rjeberhard commented Mar 8, 2018

Uh oh!

Uh oh!

tbarnes-us commented Mar 2, 2018 •

edited

Loading