New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1512042 - Allowing error messages to make it from apb to user. #607
Conversation
|
Changes Unknown when pulling 17b3518 on shawn-hurley:bug-1512042 into ** on openshift:master**. |
* Make it so that job state holds errors from apb package * Make it so de/provision/update/_job can get the error and handle correctly. * make sure that he broker/handler handles the error and returns * Warn when about permissions for specs that may not be pullable.
17b3518
to
a88a8a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. Once I understand the reason for the conditional in the job error handling I'll approve.
| ser, err := SpecToService(spec) | ||
| if err != nil { | ||
| log.Errorf("not adding spec %v to list of services due to error transforming to service - %v", spec.FQName, err) | ||
| } else { | ||
| services[i] = ser | ||
| services = append(services, ser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/apb/watch_pod.go
Outdated
| @@ -47,6 +52,9 @@ func watchPod(podName string, namespace string) error { | |||
|
|
|||
| switch podStatus.Phase { | |||
| case apiv1.PodFailed: | |||
| if errorPullingImage(podStatus.Conditions) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| @@ -55,8 +55,26 @@ func (p *DeprovisionJob) Run(token string, msgBuffer chan<- JobMsg) { | |||
| if err != nil { | |||
| log.Error("broker::Deprovision error occurred.") | |||
| log.Errorf("%s", err.Error()) | |||
| msgBuffer <- JobMsg{InstanceUUID: p.serviceInstance.ID.String(), PodName: podName, | |||
| JobToken: token, SpecID: p.serviceInstance.Spec.ID, Error: err.Error()} | |||
| // Because we know the error we should return that error. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain I understand why we are returning the specific error message only if apb.ErrorPodPullErr. What's the reason for adding this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to give a specific error about not being able to get the image.
If some other error occurs, I don't want to display that message to the user, that is why I want to do this check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this. Why not just log the error and return a generic error message in watch_pod like:
log.Errorf("Pod [ %s ] failed - %v", podName, podStatus.Message)
return fmt.Error("Error occurred during APB execution. Please contact administrator if it presists.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is not just watch_pod that we are worried about, it is the error that apb.(provision/update/deprovision) returns that we are dealing with here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's hope that is the biggest oversight I have this year.
| // logging to warn users about the potential bug if | ||
| // the svc-acct does not have access to the namespace. | ||
| if ns != "openshift" { | ||
| r.Log.Warningf("You may not be able to load provision images from the namespace: %v.\n"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/apb/watch_pod.go
Outdated
|
|
||
| func errorPullingImage(conds []apiv1.PodCondition) bool { | ||
| for _, cond := range conds { | ||
| if cond.Reason == "ErrImgPull" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find "ErrImgPull" anywhere in vendor, only "ErrImagePull". Is "ErrImgPull" the error we want or is it coming from something we don't have vendored? https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/images/types.go#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using the pod's status field. The conditions slice is where I am looking for this value. It appears that the reason is a string and this is set to ErrImgPull which you can see when you get the pod w/ yaml when it can not pull the image.
Is there a better way to get this? I would prefer a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do something like https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1114-L1147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will re-do this section based on that. It looks much more correct!
|
Changes Unknown when pulling a88a8a3 on shawn-hurley:bug-1512042 into ** on openshift:master**. |
| // https://github.com/kubernetes/kubernetes/blob/886e04f1fffbb04faf8a9f9ee141143b2684ae68/pkg/kubelet/images/types.go#L27 | ||
| status := conds[0].State.Waiting | ||
|
|
||
| if status.Reason == "ErrImagePull" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use images.ErrImagePull.Error() and images.ErrImagePullBackOff.Error() since it's where the strings come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to use something that is not in the kubernetes client-go or kubernetes api.
The images errors that are referenced in the core kuberentes package, I think that eventually, we could remove this dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wfm
| // https://github.com/kubernetes/kubernetes/blob/886e04f1fffbb04faf8a9f9ee141143b2684ae68/pkg/kubelet/images/types.go#L27 | ||
| status := conds[0].State.Waiting | ||
|
|
||
| if status.Reason == "ErrImagePull" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wfm
|
Changes Unknown when pulling 68b8b23 on shawn-hurley:bug-1512042 into ** on openshift:master**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
| @@ -55,8 +55,26 @@ func (p *DeprovisionJob) Run(token string, msgBuffer chan<- JobMsg) { | |||
| if err != nil { | |||
| log.Error("broker::Deprovision error occurred.") | |||
| log.Errorf("%s", err.Error()) | |||
| msgBuffer <- JobMsg{InstanceUUID: p.serviceInstance.ID.String(), PodName: podName, | |||
| JobToken: token, SpecID: p.serviceInstance.Spec.ID, Error: err.Error()} | |||
| // Because we know the error we should return that error. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's hope that is the biggest oversight I have this year.
…penshift#607) * Bug 1512042 - Allowing error messages to make it from apb to user. * Make it so that job state holds errors from apb package * Make it so de/provision/update/_job can get the error and handle correctly. * make sure that he broker/handler handles the error and returns * Warn when about permissions for specs that may not be pullable. * Addressing issues with determining if image pull error.
Describe what this PR does and why we need it:
Fixes bug 1512042 by adding more explicit warning logs and handing errors back to the user.
Which issue this PR fixes (This will close that issue when PR gets merged)
Bug 1512042