Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds event information when waiting for a pod #2627

Conversation

cdrage
Copy link
Member

@cdrage cdrage commented Feb 24, 2020

What type of PR is this?
/kind feature

What does does this PR do / why we need it:

This PR:

  • Adds more information when deploying via "WaitAndGetPod". We parse
    the current events which are happening and then look for event counts
    which have occured more than 5 times. We then output this information
    with the spinner in order to help the user determine what's taking so
    long to deploy
  • Collect event information that's outputted when it fails with a
    table.
  • Changes how we output errors

Example outputs:

▶ odo push -f
Validation
 ✓  Checking component [122ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [733ms]

Pushing to component nodejs-nodejs-ex-dorv of type local
 ✗  Waiting for component to start [5s]

ERROR:
waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app'
~/openshift/nodejs-ex  master ✗                                                                                                                                                                                                                                                                                                                                    28d ⚑ ◒
▶ odo push -f
Validation
 ✓  Checking component [134ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [417ms]

Pushing to component nodejs-nodejs-ex-dorv of type local
 ✗  Waiting for component to start [5s] [WARNING x5: FailedScheduling, use `-v` for more information.]

ERROR:
waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app'
For more information to help determine the cause of the error, re-run with '-v'.
See below for a list of failed events that occured more than 5 times during deployment:
+----------------------------------------------------+-------+------------------+-------------------------------------+
|                        NAME                        | COUNT |      REASON      |               MESSAGE               |
+----------------------------------------------------+-------+------------------+-------------------------------------+
| nodejs-nodejs-ex-dorv-app-1-r46cg.15f66f7d73693e5c |    15 | FailedScheduling | persistentvolumeclaim               |
|                                                    |       |                  | "nodejs-nodejs-ex-dorv-app-s2idata" |
|                                                    |       |                  | not found                           |
| nodejs-nodejs-ex-dorv-app-1-vn8vd.15f66f8482ee88f6 |     5 | FailedScheduling | persistentvolumeclaim               |
|                                                    |       |                  | "nodejs-nodejs-ex-dorv-app-s2idata" |
|                                                    |       |                  | not found                           |
+----------------------------------------------------+-------+------------------+-------------------------------------+

Which issue(s) this PR fixes:

Fixes #2244

How to test changes / Special notes to the reviewer:

Run a test application like so:

git clone https://github.com/openshift/nodejs-ex
odo create node js
odo preference set PushTimeout
watch -n 1 oc delete pvc --all
odo push -f

@openshift-ci-robot openshift-ci-robot added kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation size/L labels Feb 24, 2020
@cdrage
Copy link
Member Author

cdrage commented Feb 24, 2020

Ready for review, unit test has been written 👍

If tests fail (integration). I'll work on it. Just have to wait for integration tests to finish completely to help determine what needs to be changed.

Otherwise, here's an asciinema for a preview of the changes:

asciicast

@cdrage
Copy link
Member Author

cdrage commented Feb 25, 2020

Hunting a race condition, otherwise, this is still up for review 👍

@cdrage cdrage force-pushed the add-more-verbose-information branch 6 times, most recently from 8791ccc to 15a0784 Compare February 26, 2020 16:04
@codecov
Copy link

codecov bot commented Feb 26, 2020

Codecov Report

Merging #2627 into master will increase coverage by 0.07%.
The diff coverage is 52.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2627      +/-   ##
==========================================
+ Coverage   43.54%   43.61%   +0.07%     
==========================================
  Files          78       78              
  Lines        7289     7334      +45     
==========================================
+ Hits         3174     3199      +25     
- Misses       3809     3827      +18     
- Partials      306      308       +2
Impacted Files Coverage Δ
pkg/odo/util/cmdutils.go 1.85% <ø> (+0.01%) ⬆️
pkg/occlient/occlient.go 52.65% <52.83%> (-0.08%) ⬇️
pkg/component/watch.go 74% <0%> (+1.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c26e14...3a1d78c. Read the comment docs.

@cdrage cdrage force-pushed the add-more-verbose-information branch from 15a0784 to 646d4f0 Compare February 26, 2020 20:42
return val, nil
case err := <-watchErrorChannel:
return nil, err
case err := <-watchEventChannel:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to change:

  • watchEventChannel to podEventErrorChannel
  • watchErrorChannel to podStatusErrorChannel
    ?

},
/*(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we uncomment these tests?

@cdrage cdrage force-pushed the add-more-verbose-information branch 2 times, most recently from f7940f6 to ffbe95f Compare February 27, 2020 15:11
@cdrage
Copy link
Member Author

cdrage commented Feb 28, 2020

flake

/retest

@cdrage cdrage force-pushed the add-more-verbose-information branch from ffbe95f to 24bd797 Compare March 2, 2020 19:24
@cdrage
Copy link
Member Author

cdrage commented Mar 3, 2020

/retest

**What type of PR is this?**
/kind feature

**What does does this PR do / why we need it**:

This PR:
  - Adds more information when deploying via "WaitAndGetPod". We parse
  the current events which are happening and then look for event counts
  which have occured more than 5 times. We then output this information
  with the spinner in order to help the user determine what's taking so
  long to deploy
  - Collect event information that's outputted when it fails with a
  table.
  - Changes how we output errors

Example outputs:

```sh
▶ odo push -f
Validation
 ✓  Checking component [122ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [733ms]

Pushing to component nodejs-nodejs-ex-dorv of type local
 ✗  Waiting for component to start [5s]

ERROR:
waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app'
```

```sh
~/openshift/nodejs-ex  master ✗                                                                                                                                                                                                                                                                                                                                    28d ⚑ ◒
▶ odo push -f
Validation
 ✓  Checking component [134ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [417ms]

Pushing to component nodejs-nodejs-ex-dorv of type local
 ✗  Waiting for component to start [5s] [WARNING x5: FailedScheduling, use `-v` for more information.]

ERROR:
waited 5s but was unable to find a running pod matching selector: 'deploymentconfig=nodejs-nodejs-ex-dorv-app'
For more information to help determine the cause of the error, re-run with '-v'.
See below for a list of failed events that occured more than 5 times during deployment:
+----------------------------------------------------+-------+------------------+-------------------------------------+
|                        NAME                        | COUNT |      REASON      |               MESSAGE               |
+----------------------------------------------------+-------+------------------+-------------------------------------+
| nodejs-nodejs-ex-dorv-app-1-r46cg.15f66f7d73693e5c |    15 | FailedScheduling | persistentvolumeclaim               |
|                                                    |       |                  | "nodejs-nodejs-ex-dorv-app-s2idata" |
|                                                    |       |                  | not found                           |
| nodejs-nodejs-ex-dorv-app-1-vn8vd.15f66f8482ee88f6 |     5 | FailedScheduling | persistentvolumeclaim               |
|                                                    |       |                  | "nodejs-nodejs-ex-dorv-app-s2idata" |
|                                                    |       |                  | not found                           |
+----------------------------------------------------+-------+------------------+-------------------------------------+
```

**Which issue(s) this PR fixes**:

Fixes redhat-developer#2244

**How to test changes / Special notes to the reviewer**:

Run a test application like so:

```sh
git clone https://github.com/openshift/nodejs-ex
odo create node js
odo preference set PushTimeout
watch -n 1 oc delete pvc --all
odo push -f
```
@cdrage cdrage force-pushed the add-more-verbose-information branch from 24bd797 to 3a1d78c Compare March 3, 2020 13:35
@cdrage
Copy link
Member Author

cdrage commented Mar 6, 2020

Ping @kanchwala-yusuf @girishramnani @dharmit Please try it out :)

@kadel
Copy link
Member

kadel commented Mar 10, 2020

/approve
/retest

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kadel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. Required by Prow. label Mar 10, 2020
@dharmit
Copy link
Member

dharmit commented Mar 17, 2020

I have set TimeOut to 5 seconds but odo push still keeps waiting for the component to start. It's a bug, right?

$ odo version
odo v1.1.0 (3a1d78ce7)

Server: https://api.crc.testing:6443
Kubernetes: v1.16.2

$ git log --oneline -n1
3a1d78ce7 (HEAD, upstream/pr/2627) Adds event information when waiting for a pod

$ odo push
Validation
 ✓  Checking component [11ms]

Configuration changes
 ✓  Initializing component
 ✓  Creating component [54ms]

Pushing to component nodejs-nodejs-ex-njnk of type local
 ✓  Checking files for pushing [592055ns]
 ✓  Waiting for component to start [20s]         <<---------- this
 ✓  Syncing files to the component [108ms]
 ✓  Building component [10s]
 ✓  Changes successfully pushed to component

$ odo preference view
PARAMETER             CURRENT_VALUE
UpdateNotification    
NamePrefix            
Timeout               5
PushTimeout           
Experimental  

Copy link
Member

@dharmit dharmit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions and one small suggestion for the code comment.

@@ -67,6 +70,10 @@ var (
DEPLOYMENT_CONFIG_NOT_FOUND error = fmt.Errorf("Requested deployment config does not exist")
)

// We use a mutex here in order to make 100% sure that functions such as CollectEvents
// so that there are no race conditions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slipped writing something here? Or is it just me who can't completely understand this comment?

@@ -1785,6 +1793,54 @@ func (c *Client) WaitAndGetDC(name string, desiredRevision int64, timeout time.D
}
}

// CollectEvents collects events in a Goroutine by manipulating a spinner.
// We don't care about the error (it's usually ran in a go routine), so erroring out is not needed.
func (c *Client) CollectEvents(selector string, events map[string]corev1.Event, spinner *log.Status, quit <-chan int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely a silly question: is this function responsible for printing the information that we see when we do odo push? The actual operations happening during push are handled elsewhere? Under what circumstances would this function fail?

@kanchwala-yusuf
Copy link
Contributor

Tried out the PR, works fine for me. The PR looks good to me, otherwise.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. Required by Prow. label Mar 17, 2020
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

16 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 6112ed2 into redhat-developer:master Mar 17, 2020
@cdrage cdrage deleted the add-more-verbose-information branch January 14, 2022 14:50
@rm3l rm3l added estimated-size/L (20-40) Rough sizing for Epics. About 2 sprints of work for a person. and removed size/L labels Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. Required by Prow. estimated-size/L (20-40) Rough sizing for Epics. About 2 sprints of work for a person. kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation lgtm Indicates that a PR is ready to be merged. Required by Prow.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FLAKE] tests failing on " waited 4m0s but couldn't find running pod matching selector"
8 participants