Adds event information when waiting for a pod #2627

cdrage · 2020-02-24T20:11:18Z

What type of PR is this?
/kind feature

What does does this PR do / why we need it:

This PR:

Adds more information when deploying via "WaitAndGetPod". We parse
the current events which are happening and then look for event counts
which have occured more than 5 times. We then output this information
with the spinner in order to help the user determine what's taking so
long to deploy
Collect event information that's outputted when it fails with a
table.
Changes how we output errors

Example outputs:

▶ odo push -f
Validation
 ✓  Checking component [122ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [733ms]

Pushing to component nodejs-nodejs-ex-dorv of type local
 ✗  Waiting for component to start [5s]

ERROR:
waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app'

~/openshift/nodejs-ex  master ✗                                                                                                                                                                                                                                                                                                                                    28d ⚑ ◒
▶ odo push -f
Validation
 ✓  Checking component [134ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [417ms]

Pushing to component nodejs-nodejs-ex-dorv of type local
 ✗  Waiting for component to start [5s] [WARNING x5: FailedScheduling, use `-v` for more information.]

ERROR:
waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app'
For more information to help determine the cause of the error, re-run with '-v'.
See below for a list of failed events that occured more than 5 times during deployment:
+----------------------------------------------------+-------+------------------+-------------------------------------+
|                        NAME                        | COUNT |      REASON      |               MESSAGE               |
+----------------------------------------------------+-------+------------------+-------------------------------------+
| nodejs-nodejs-ex-dorv-app-1-r46cg.15f66f7d73693e5c |    15 | FailedScheduling | persistentvolumeclaim               |
|                                                    |       |                  | "nodejs-nodejs-ex-dorv-app-s2idata" |
|                                                    |       |                  | not found                           |
| nodejs-nodejs-ex-dorv-app-1-vn8vd.15f66f8482ee88f6 |     5 | FailedScheduling | persistentvolumeclaim               |
|                                                    |       |                  | "nodejs-nodejs-ex-dorv-app-s2idata" |
|                                                    |       |                  | not found                           |
+----------------------------------------------------+-------+------------------+-------------------------------------+

Which issue(s) this PR fixes:

Fixes #2244

How to test changes / Special notes to the reviewer:

Run a test application like so:

git clone https://github.com/openshift/nodejs-ex
odo create node js
odo preference set PushTimeout
watch -n 1 oc delete pvc --all
odo push -f

cdrage · 2020-02-24T20:13:15Z

Ready for review, unit test has been written 👍

If tests fail (integration). I'll work on it. Just have to wait for integration tests to finish completely to help determine what needs to be changed.

Otherwise, here's an asciinema for a preview of the changes:

cdrage · 2020-02-25T14:16:38Z

Hunting a race condition, otherwise, this is still up for review 👍

codecov · 2020-02-26T16:04:47Z

Codecov Report

Merging #2627 into master will increase coverage by 0.07%.
The diff coverage is 52.83%.

@@            Coverage Diff             @@
##           master    #2627      +/-   ##
==========================================
+ Coverage   43.54%   43.61%   +0.07%     
==========================================
  Files          78       78              
  Lines        7289     7334      +45     
==========================================
+ Hits         3174     3199      +25     
- Misses       3809     3827      +18     
- Partials      306      308       +2

Impacted Files	Coverage Δ
pkg/odo/util/cmdutils.go	`1.85% <ø> (+0.01%)`	⬆️
pkg/occlient/occlient.go	`52.65% <52.83%> (-0.08%)`	⬇️
pkg/component/watch.go	`74% <0%> (+1.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c26e14...3a1d78c. Read the comment docs.

kanchwala-yusuf · 2020-02-26T11:05:35Z

pkg/occlient/occlient.go

 		return val, nil
 	case err := <-watchErrorChannel:
 		return nil, err
+	case err := <-watchEventChannel:


Does it make sense to change:

watchEventChannel to podEventErrorChannel

watchErrorChannel to podStatusErrorChannel
?

kanchwala-yusuf · 2020-02-26T11:13:25Z

pkg/occlient/occlient_test.go

 		},
+		/*(


Can we uncomment these tests?

cdrage · 2020-02-28T16:07:00Z

flake

/retest

cdrage · 2020-03-03T12:52:43Z

/retest

**What type of PR is this?** /kind feature **What does does this PR do / why we need it**: This PR: - Adds more information when deploying via "WaitAndGetPod". We parse the current events which are happening and then look for event counts which have occured more than 5 times. We then output this information with the spinner in order to help the user determine what's taking so long to deploy - Collect event information that's outputted when it fails with a table. - Changes how we output errors Example outputs: ```sh ▶ odo push -f Validation ✓ Checking component [122ms] Configuration changes ✓ Initializing component ✓ Creating component [733ms] Pushing to component nodejs-nodejs-ex-dorv of type local ✗ Waiting for component to start [5s] ERROR: waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app' ``` ```sh ~/openshift/nodejs-ex master ✗ 28d ⚑ ◒ ▶ odo push -f Validation ✓ Checking component [134ms] Configuration changes ✓ Initializing component ✓ Creating component [417ms] Pushing to component nodejs-nodejs-ex-dorv of type local ✗ Waiting for component to start [5s] [WARNING x5: FailedScheduling, use `-v` for more information.] ERROR: waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app' For more information to help determine the cause of the error, re-run with '-v'. See below for a list of failed events that occured more than 5 times during deployment: +----------------------------------------------------+-------+------------------+-------------------------------------+ | NAME | COUNT | REASON | MESSAGE | +----------------------------------------------------+-------+------------------+-------------------------------------+ | nodejs-nodejs-ex-dorv-app-1-r46cg.15f66f7d73693e5c | 15 | FailedScheduling | persistentvolumeclaim | | | | | "nodejs-nodejs-ex-dorv-app-s2idata" | | | | | not found | | nodejs-nodejs-ex-dorv-app-1-vn8vd.15f66f8482ee88f6 | 5 | FailedScheduling | persistentvolumeclaim | | | | | "nodejs-nodejs-ex-dorv-app-s2idata" | | | | | not found | +----------------------------------------------------+-------+------------------+-------------------------------------+ ``` **Which issue(s) this PR fixes**: Fixes redhat-developer#2244 **How to test changes / Special notes to the reviewer**: Run a test application like so: ```sh git clone https://github.com/openshift/nodejs-ex odo create node js odo preference set PushTimeout watch -n 1 oc delete pvc --all odo push -f ```

cdrage · 2020-03-06T15:07:20Z

Ping @kanchwala-yusuf @girishramnani @dharmit Please try it out :)

kadel · 2020-03-10T12:48:16Z

/approve
/retest

openshift-ci-robot · 2020-03-10T12:49:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kadel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kadel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dharmit · 2020-03-17T10:42:07Z

I have set TimeOut to 5 seconds but odo push still keeps waiting for the component to start. It's a bug, right?

$ odo version
odo v1.1.0 (3a1d78ce7)

Server: https://api.crc.testing:6443
Kubernetes: v1.16.2

$ git log --oneline -n1
3a1d78ce7 (HEAD, upstream/pr/2627) Adds event information when waiting for a pod

$ odo push
Validation
 ✓  Checking component [11ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [54ms]

Pushing to component nodejs-nodejs-ex-njnk of type local
 ✓  Checking files for pushing [592055ns]
 ✓  Waiting for component to start [20s]         <<---------- this
 ✓  Syncing files to the component [108ms]
 ✓  Building component [10s]
 ✓  Changes successfully pushed to component

$ odo preference view
PARAMETER             CURRENT_VALUE
UpdateNotification    
NamePrefix            
Timeout               5
PushTimeout           
Experimental

dharmit

Mostly questions and one small suggestion for the code comment.

dharmit · 2020-03-11T10:05:22Z

pkg/occlient/occlient.go

@@ -67,6 +70,10 @@ var (
 	DEPLOYMENT_CONFIG_NOT_FOUND           error  = fmt.Errorf("Requested deployment config does not exist")
 )

+// We use a mutex here in order to make 100% sure that functions such as CollectEvents
+// so that there are no race conditions


Slipped writing something here? Or is it just me who can't completely understand this comment?

dharmit · 2020-03-17T10:57:43Z

pkg/occlient/occlient.go

@@ -1785,6 +1793,54 @@ func (c *Client) WaitAndGetDC(name string, desiredRevision int64, timeout time.D
 	}
 }

+// CollectEvents collects events in a Goroutine by manipulating a spinner.
+// We don't care about the error (it's usually ran in a go routine), so erroring out is not needed.
+func (c *Client) CollectEvents(selector string, events map[string]corev1.Event, spinner *log.Status, quit <-chan int) {


Likely a silly question: is this function responsible for printing the information that we see when we do odo push? The actual operations happening during push are handled elsewhere? Under what circumstances would this function fail?

kanchwala-yusuf · 2020-03-17T15:06:43Z

Tried out the PR, works fine for me. The PR looks good to me, otherwise.
/lgtm

openshift-bot · 2020-03-17T15:08:50Z

/retest