Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore wait4 errs in observe max-err-count #18992

Merged

Conversation

juanvallejo
Copy link
Contributor

@juanvallejo juanvallejo commented Mar 15, 2018

Fixes #17743

These errors should not count against the --maximum-errors count
as the process has cleanly run and exited before wait4 syscall is made.

cc @smarterclayton @soltysh

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 15, 2018
@@ -716,6 +718,13 @@ func measureCommandDuration(m *prometheus.SummaryVec, fn func() error, labels ..
statusCode = -1
}
m.WithLabelValues(append(labels, strconv.Itoa(statusCode))...).Observe(float64(duration / time.Millisecond))

if err != nil && err.Error() == errNoChildProc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not hardcode the value like that. See if you can get some reasonable information from the error, when being cast to exec.ExitError. I think you should be able to get ProcessState.Sys which 'returns system-dependent exit information about the process. Convert it to the appropriate underlying type, such as syscall.WaitStatus on Unix, to access its contents. ' from there you should be able to get some system-wide constant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this isn't a reasonable fix

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 15, 2018
Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit and you're good to go.

@@ -716,9 +716,28 @@ func measureCommandDuration(m *prometheus.SummaryVec, fn func() error, labels ..
statusCode = -1
}
m.WithLabelValues(append(labels, strconv.Itoa(statusCode))...).Observe(float64(duration / time.Millisecond))

errno := errnoError(err)
if errno == syscall.ECHILD {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've already created that method just:

if errnoError(err) == syscall.ECHILD {
...

@juanvallejo
Copy link
Contributor Author

@soltysh thanks, comment addressed

Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit.

return err
}

func errnoError(err error) syscall.Errno {
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

if se, ok := err.(*os.SyscallError); ok {
    if errno, ok := se.Err.(syscall.Errno); ok {
        return errno
    }
}

return 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

@juanvallejo
Copy link
Contributor Author

/test gcp

1 similar comment
@juanvallejo
Copy link
Contributor Author

/test gcp

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 16, 2018

@juanvallejo: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_networking_minimal 7011f02 link /test extended_networking_minimal

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 19, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juanvallejo, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2018
@soltysh
Copy link
Member

soltysh commented Mar 19, 2018

/retest

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 18953, 18992).

@openshift-merge-robot openshift-merge-robot merged commit 0706074 into openshift:master Mar 19, 2018
@juanvallejo juanvallejo deleted the jvallejo/cli-observe-1 branch March 19, 2018 20:36
bobross419 pushed a commit to rackerlabs/s2i-oc-observe that referenced this pull request Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants