Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1512042 - Allowing error messages to make it from apb to user. #607

Merged
merged 2 commits into from Jan 2, 2018

Conversation

shawn-hurley
Copy link
Contributor

  • Make it so that job state holds errors from apb package
  • Make it so de/provision/update/_job can get the error and handle correctly.
  • make sure that the broker/handler handles the error and returns
  • Warn when about permissions for specs that may not be pullable.

Describe what this PR does and why we need it:
Fixes bug 1512042 by adding more explicit warning logs and handing errors back to the user.

Which issue this PR fixes (This will close that issue when PR gets merged)
Bug 1512042

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 2, 2018
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 17b3518 on shawn-hurley:bug-1512042 into ** on openshift:master**.

* Make it so that job state holds errors from apb package
* Make it so de/provision/update/_job can get the error and handle correctly.
* make sure that he broker/handler handles the error and returns
* Warn when about permissions for specs that may not be pullable.
Copy link
Member

@djzager djzager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Once I understand the reason for the conditional in the job error handling I'll approve.

ser, err := SpecToService(spec)
if err != nil {
log.Errorf("not adding spec %v to list of services due to error transforming to service - %v", spec.FQName, err)
} else {
services[i] = ser
services = append(services, ser)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -47,6 +52,9 @@ func watchPod(podName string, namespace string) error {

switch podStatus.Phase {
case apiv1.PodFailed:
if errorPullingImage(podStatus.Conditions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -55,8 +55,26 @@ func (p *DeprovisionJob) Run(token string, msgBuffer chan<- JobMsg) {
if err != nil {
log.Error("broker::Deprovision error occurred.")
log.Errorf("%s", err.Error())
msgBuffer <- JobMsg{InstanceUUID: p.serviceInstance.ID.String(), PodName: podName,
JobToken: token, SpecID: p.serviceInstance.Spec.ID, Error: err.Error()}
// Because we know the error we should return that error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain I understand why we are returning the specific error message only if apb.ErrorPodPullErr. What's the reason for adding this logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to give a specific error about not being able to get the image.

If some other error occurs, I don't want to display that message to the user, that is why I want to do this check here.

Copy link
Member

@djzager djzager Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this. Why not just log the error and return a generic error message in watch_pod like:

log.Errorf("Pod [ %s ] failed - %v", podName, podStatus.Message)
return fmt.Error("Error occurred during APB execution. Please contact administrator if it presists.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not just watch_pod that we are worried about, it is the error that apb.(provision/update/deprovision) returns that we are dealing with here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope that is the biggest oversight I have this year.

// logging to warn users about the potential bug if
// the svc-acct does not have access to the namespace.
if ns != "openshift" {
r.Log.Warningf("You may not be able to load provision images from the namespace: %v.\n"+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


func errorPullingImage(conds []apiv1.PodCondition) bool {
for _, cond := range conds {
if cond.Reason == "ErrImgPull" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find "ErrImgPull" anywhere in vendor, only "ErrImagePull". Is "ErrImgPull" the error we want or is it coming from something we don't have vendored? https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/images/types.go#L33

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the pod's status field. The conditions slice is where I am looking for this value. It appears that the reason is a string and this is set to ErrImgPull which you can see when you get the pod w/ yaml when it can not pull the image.

Is there a better way to get this? I would prefer a better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will re-do this section based on that. It looks much more correct!

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling a88a8a3 on shawn-hurley:bug-1512042 into ** on openshift:master**.

// https://github.com/kubernetes/kubernetes/blob/886e04f1fffbb04faf8a9f9ee141143b2684ae68/pkg/kubelet/images/types.go#L27
status := conds[0].State.Waiting

if status.Reason == "ErrImagePull" {
Copy link
Contributor

@rthallisey rthallisey Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use images.ErrImagePull.Error() and images.ErrImagePullBackOff.Error() since it's where the strings come from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to use something that is not in the kubernetes client-go or kubernetes api.

The images errors that are referenced in the core kuberentes package, I think that eventually, we could remove this dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm

// https://github.com/kubernetes/kubernetes/blob/886e04f1fffbb04faf8a9f9ee141143b2684ae68/pkg/kubelet/images/types.go#L27
status := conds[0].State.Waiting

if status.Reason == "ErrImagePull" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 68b8b23 on shawn-hurley:bug-1512042 into ** on openshift:master**.

Copy link
Member

@djzager djzager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@@ -55,8 +55,26 @@ func (p *DeprovisionJob) Run(token string, msgBuffer chan<- JobMsg) {
if err != nil {
log.Error("broker::Deprovision error occurred.")
log.Errorf("%s", err.Error())
msgBuffer <- JobMsg{InstanceUUID: p.serviceInstance.ID.String(), PodName: podName,
JobToken: token, SpecID: p.serviceInstance.Spec.ID, Error: err.Error()}
// Because we know the error we should return that error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope that is the biggest oversight I have this year.

@rthallisey rthallisey merged commit 0bade96 into openshift:master Jan 2, 2018
@shawn-hurley shawn-hurley mentioned this pull request Jan 11, 2018
jianzhangbjz pushed a commit to jianzhangbjz/ansible-service-broker that referenced this pull request May 17, 2018
…penshift#607)

* Bug 1512042 - Allowing error messages to make it from apb to user.

* Make it so that job state holds errors from apb package
* Make it so de/provision/update/_job can get the error and handle correctly.
* make sure that he broker/handler handles the error and returns
* Warn when about permissions for specs that may not be pullable.

* Addressing issues with determining if image pull error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants